[ 
https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128474#comment-13128474
 ] 

Jan Høydahl commented on SOLR-2842:
-----------------------------------

bq. But the update processor should have access to SolrCore. I don't think this 
is something we want to drop. You do want access to the Request object and the 
SolrCore, as you have now.
The UpdateRequestProcessorChain now depends on SolrCore for getting config from 
solrconfig.xml, so we'd first need to separate updateChain config from 
solrconfig, e.g. through SOLR-2841 or similar. Although flexible to have "full 
access" for the Processors, it doesn't necessarily give the best APIs. Most 
processors will only need access to the input document and request params. In 
addition I think schema access for validating input and a resource loader to 
load own config from file are good candidates for what to provide to 
Processors. The resource loader on the client side could resolve resources 
locally, or even through the ZK loader.

The remaining 5% of processors which really need SolrCore (such as 
RunUpdateProcessor) should implement SolrCoreAware, and UpdateChain should 
statically check and throw an exception if any of these are attempted loaded in 
a context where SolrCore is null.
                
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
>                 Key: SOLR-2842
>                 URL: https://issues.apache.org/jira/browse/SOLR-2842
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of 
> UpdateRequestProcessors in order to transform them in some way and then 
> (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side 
> (SolrJ), so that we could choose to do parts or all of the processing on the 
> client. The most prominent use case is extracting text (Tika) from large 
> binary documents, residing on local storage on the client(s). Streaming 
> hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what 
> would be more natural than reusing this - and any other processor - on the 
> client side?
> However, for this to be possible, some interfaces need to change slightly..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to