Re: TIKA - how to read chunks at a time from a very large file?

Nick Burch Thu, 28 Aug 2014 12:13:40 -0700

On Thu, 28 Aug 2014, ruby wrote:

Since the files contain over 5GB data, the content string here will end up
too much data in memory. I want to avoid this and want to read chunk at a
time.

You'll probably need your own custom ContentHandler, which detects whenthere's too much data, and flushes it / starts a new file / etc


There's an example of how to do this in the tika-examples package, look at
parseToPlainTextChunks from ContentHandlerExample:
https://svn.apache.org/repos/asf/tika/trunk/tika-example/src/main/java/org/apache/tika/example/ContentHandlerExample.java

Basically though, you'll want to extend from DefaultContentHandler (whichtakes care of most of the basics for you), then write your own logic tohandle outputting / flushing / chunking as per your needs


Nick

Re: TIKA - how to read chunks at a time from a very large file?

Reply via email to