[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145584#comment-13145584
]
Martijn van Groningen commented on SOLR-1499:
---------------------------------------------
Yes we can just point to db or rss core that is also included in the example.
After looking into the code I have some concerns when the SolrEntityProcessor
is configured with threads > 1 it seems to me that the code will fail.
Basically the SolrQuery which is used to keep track of the offset is set as a
field of the rowIterator and the rowIterator is a field of SolrEntityProcessor
(actually its super class). It seems to me when more than one thread is
operating on the SolrEntityProcessor that each thread can overwrite the offset
of another thread. Seems like we need some locking.
> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
> SolrJ
> ---------------------------------------------------------------------------------
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.core.rev1182017.patch, SOLR-1499.patch,
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
> SOLR-1499.tests.rev1182017.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any
> standard Solr request parameter. This attribute is processed under the
> variable resolution rules and can be driven in an inner stage of the indexing
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as <field> elements.
> ** As with all fields, template processors can be used to alter the contents
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to
> prevent the indexing session from freezing up. By default the timeout is 5
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner
> entity.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]