[ 
https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488744#comment-13488744
 ] 

Markus Jelsma commented on NUTCH-1480:
--------------------------------------

I think you mean the justed linked issue NUTCH-1377? I just happen to work on 
that issue. Using the CloudSolrServer will send the docs to the correct shard 
already. The way it works now is sending millions of records to a single node 
which then distributes it again, a waste of IO. That issue will work with this 
issue so i may want to push them in together. It's just a matter of returning 
the correct SolrServer instance and working around the HTTPCLient issues.

agreed on deduplication. It would also be hard for this issue to work with 
dedup because not all indices may be identical.
                
> SolrIndexer to write to multiple servers.
> -----------------------------------------
>
>                 Key: NUTCH-1480
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1480
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.6
>
>         Attachments: NUTCH-1480-1.6.1.patch
>
>
> SolrUtils should return an array of SolrServers and read the SolrUrl as a 
> comma delimited list of URL's using Configuration.getString(). SolrWriter 
> should be able to handle this list of SolrServers.
> This is useful if you want to send documents to multiple servers if no 
> replication is available or if you want to send documents to multiple NOCs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to