Working with very large text documents

2013-10-18 Thread Armin.Wegner
Hi, What are you doing with very large text documents in an UIMA Pipeline, for example 9 GB in size. A. I expect that you split the large file before putting it into the pipeline. Or do you use a multiplier in the pipeline to split it? Anyway, where do you split the input file? You can not

Re: Working with very large text documents

2013-10-18 Thread Richard Eckart de Castilho
On 18.10.2013, at 10:06, armin.weg...@bka.bund.de wrote: Hi, What are you doing with very large text documents in an UIMA Pipeline, for example 9 GB in size. In that order of magnitude, I'd probably try to get a computer with more memory ;) A. I expect that you split the large file

Re: Working with very large text documents

2013-10-18 Thread Jens Grivolla
On 10/18/2013 10:06 AM, Armin Wegner wrote: What are you doing with very large text documents in an UIMA Pipeline, for example 9 GB in size. Just out of curiosity, how can you possibly have 9GB of text that represent one document? From a quick look at project gutenberg it seems that a full

AW: Working with very large text documents

2013-10-18 Thread Armin.Wegner
Hi Jens, It's a log file. Cheers, Armin -Ursprüngliche Nachricht- Von: Jens Grivolla [mailto:j+...@grivolla.net] Gesendet: Freitag, 18. Oktober 2013 11:05 An: user@uima.apache.org Betreff: Re: Working with very large text documents On 10/18/2013 10:06 AM, Armin Wegner wrote: What

Re: Working with very large text documents

2013-10-18 Thread Richard Eckart de Castilho
. Armin -Ursprüngliche Nachricht- Von: Richard Eckart de Castilho [mailto:r...@apache.org] Gesendet: Freitag, 18. Oktober 2013 10:43 An: user@uima.apache.org Betreff: Re: Working with very large text documents On 18.10.2013, at 10:06, armin.weg...@bka.bund.de wrote: Hi

Re: AW: Working with very large text documents

2013-10-18 Thread Jens Grivolla
Wegner wrote: Hi Jens, It's a log file. Cheers, Armin -Ursprüngliche Nachricht- Von: Jens Grivolla [mailto:j+...@grivolla.net] Gesendet: Freitag, 18. Oktober 2013 11:05 An: user@uima.apache.org Betreff: Re: Working with very large text documents On 10/18/2013 10:06 AM, Armin Wegner

Re: Working with very large text documents

2013-10-18 Thread Richard Eckart de Castilho
Grivolla [mailto:j+...@grivolla.net] Gesendet: Freitag, 18. Oktober 2013 11:05 An: user@uima.apache.org Betreff: Re: Working with very large text documents On 10/18/2013 10:06 AM, Armin Wegner wrote: What are you doing with very large text documents in an UIMA Pipeline, for example 9 GB

AW: Working with very large text documents

2013-10-18 Thread Armin.Wegner
...@apache.org] Gesendet: Freitag, 18. Oktober 2013 12:32 An: user@uima.apache.org Betreff: Re: Working with very large text documents Hi Armin, that's a good point. It's also an issue with UIMA then, because the begin/end offsets are likewise int values. If it is a log file, couldn't you split

Re: Working with very large text documents

2013-10-18 Thread Thomas Ginter
, Armin -Ursprüngliche Nachricht- Von: Richard Eckart de Castilho [mailto:r...@apache.org] Gesendet: Freitag, 18. Oktober 2013 12:32 An: user@uima.apache.org Betreff: Re: Working with very large text documents Hi Armin, that's a good point. It's also an issue with UIMA

Re: AW: Working with very large text documents

2013-10-18 Thread Thilo Goetz
, Armin -Ursprüngliche Nachricht- Von: Jens Grivolla [mailto:j+...@grivolla.net] Gesendet: Freitag, 18. Oktober 2013 11:05 An: user@uima.apache.org Betreff: Re: Working with very large text documents On 10/18/2013 10:06 AM, Armin Wegner wrote: What are you doing with very large text