Hi Jens,
It's a log file.
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Jens Grivolla [mailto:j+...@grivolla.net]
Gesendet: Freitag, 18. Oktober 2013 11:05
An: user@uima.apache.org
Betreff: Re: Working with very large text documents
On 10/18/2013 10:06 AM, Armin Wegner wrote:
What
Ok, but then log files are usually very easy to split since they
normally consist of independent lines. So you could just have one
document per day or whatever gets it down to a reasonable size, without
the risk of breaking grammatical or semantic relationships.
On 10/18/2013 12:25 PM, Armin
Dear Jens, dear Richard,
Looks like I have to use a log file specific pipeline. The problem was that I
did not knew it before the process crashed. It would be so nice having a
general approach.
Thanks,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho
Don't you have a hadoop cluster you can use? Hadoop would handle the
file splitting for you, and if your UIMA analysis is well-behaved, you
can deploy it as a M/R job, one record at a time.
--Thilo
On 10/18/2013 12:25 PM, armin.weg...@bka.bund.de wrote:
Hi Jens,
It's a log file.
Cheers,