Hi, I know from other uses of Lucene [1] that parsing docx type documents (with poi) can cause excessive memory usage.
Does anyone know if the docx text extractors in use in Sling at the moment are problematic in this area. It looks like Tika 0.4 has made some inroads in this area in the version 0.4 release, but may have made more since. IIRC parsing Excel spreadsheets in docx format can cause OOM errors. Thanks Ian 1. http://jira.sakaiproject.org/browse/SAK-16808
