[ 
https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129229#comment-13129229
 ] 

Ryan McKinley commented on SOLR-2842:
-------------------------------------

I don't have a real proposal... just thinking about generally reusable pipeline 
code.

bq. Do you suggest to let UpdateProcessor base class implement this interface?

No, since most domain specific UpdateProcessors can be boiled down to this 
(tika, langid, geonames, etc) i don't think they need to have access to the 
whole UpdateProcessor -- only sometimes do they need access to 
SolrCore/Schema/ResourceLoader etc.  With minimal dependencies, moving them 
around would be easy.

I was thinking we could have a general TransformingUpdateProcesor that could 
take a list of transformers (or something), rather then having all the 
dependencies 

bq. But you still need to construct and initialize the processors even if they 
are wrapped in the interface, thus my suggestion for a client side version of 
the factory.

I'm not convinced that a client side framework is necessary if the interfaces 
were easy enough to deal with directly.  I can see where a DSL would be cool, 
but having a client side NamedListInitalizedPlugin seems like a can of worms

.


                
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
>                 Key: SOLR-2842
>                 URL: https://issues.apache.org/jira/browse/SOLR-2842
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of 
> UpdateRequestProcessors in order to transform them in some way and then 
> (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side 
> (SolrJ), so that we could choose to do parts or all of the processing on the 
> client. The most prominent use case is extracting text (Tika) from large 
> binary documents, residing on local storage on the client(s). Streaming 
> hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what 
> would be more natural than reusing this - and any other processor - on the 
> client side?
> However, for this to be possible, some interfaces need to change slightly..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to