Processing docx documents.

Ian Boston Tue, 09 Feb 2010 01:46:37 -0800

Hi,
I know from other uses of Lucene [1] that parsing docx type documents (with 
poi) can cause excessive memory usage.


Does anyone know if the docx text extractors in use in Sling at the moment are 
problematic in this area. 
It looks like Tika 0.4 has made some inroads in this area in the version 0.4 
release, but may have made more since.

IIRC parsing Excel spreadsheets in docx format can cause OOM errors.

Thanks
Ian

1. http://jira.sakaiproject.org/browse/SAK-16808

Processing docx documents.

Reply via email to