[ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615520#comment-14615520 ]
Karl Wright commented on CONNECTORS-1219: ----------------------------------------- bq. This OOM could be resolved by tika write limit. I don't think so, because it occurs after the LuceneDocument structure has been built already. It occurs on the client.addOrReplace() line: {code} LuceneDocument inputDoc = buildDocument(documentURI, document); client.addOrReplace(documentURI, inputDoc); {code} This is likely because Lucene needs some multiple of the maximum size of a document in order to compress field values. But as long as memory consumption overall is limited by some user-controllable means, it's still OK, and the file size limit should do that. > Lucene Output Connector > ----------------------- > > Key: CONNECTORS-1219 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1219 > Project: ManifoldCF > Issue Type: New Feature > Reporter: Shinichiro Abe > Assignee: Shinichiro Abe > Attachments: CONNECTORS-1219-v0.1patch.patch, > CONNECTORS-1219-v0.2.patch > > > A output connector for Lucene local index directly, not via remote search > engine. It would be nice if we could use Lucene various API to the index > directly, even though we could do the same thing to the Solr or Elasticsearch > index. I assume we can do something to classification, categorization, and > tagging, using e.g lucene-classification package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)