[
https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128405#comment-13128405
]
Yonik Seeley commented on SOLR-2842:
------------------------------------
With all the libraries, configuration, and everything else one would need in
this client, it starts looking very much like a Solr server again! I can even
imagine once one has this fat client, that one would want to be able to accept
requests from others to get the same processing. It almost seems preferable to
"just use solr" instances as these special tika processors.
It might make sense when setting up a cluster to have a bank of solr servers
dedicated just to rich document processing, and then they will forward the
processed document to the correct shard (assuming new solr cloud stuff).
Custom code could somehow live in that bank of indexers to avoid an extra copy
of large binary documents, or outside clients could use stream.url to make solr
directly stream the large file from the source.
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
> Key: SOLR-2842
> URL: https://issues.apache.org/jira/browse/SOLR-2842
> Project: Solr
> Issue Type: Improvement
> Components: update
> Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of
> UpdateRequestProcessors in order to transform them in some way and then
> (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side
> (SolrJ), so that we could choose to do parts or all of the processing on the
> client. The most prominent use case is extracting text (Tika) from large
> binary documents, residing on local storage on the client(s). Streaming
> hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what
> would be more natural than reusing this - and any other processor - on the
> client side?
> However, for this to be possible, some interfaces need to change slightly..
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]