[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

ASF GitHub Bot (JIRA) Fri, 24 Nov 2017 10:17:31 -0800

    [ 
https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265473#comment-16265473
 ]


ASF GitHub Bot commented on NUTCH-1480:
---------------------------------------

r0ann3l commented on issue #218: fix for NUTCH-1480 contributed by r0ann3l
URL: https://github.com/apache/nutch/pull/218#issuecomment-346834693
 
 
   Hi @sebastian-nagel, thank you very much for your comments!!! I agree with 
your suggestions and I included the changes you propose from your fork.
   
   About indexer-dummy, I also tried to make it work, but it was not possible. 
In theory, you can build as many instances of `IndexWriters` as you want, that 
you will always get the same instance, because it gotten from cache. So, the 
first issue I found was the `ObjectCache` uses the `Configuration` object 
itself as the key, and this object is not the same in each call. This causes 
that there are two instances of `IndexWriters` writing to same file, as you 
say. So, I replaced the key of `ObjectCache` with the UUID of the 
`Configuration` object.
   
   Now, we have only one instance of `IndexWriters`, but there is another 
problem: when you try to commit the writers in 
`IndexingJob.index(IndexingJob.java151)` it is already closed from 
`IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:44)`. Therefore, 
I moved the `commit()` call from `IndexingJob` to `IndexerOutputFormat`, just 
before the `close()` method is called.
   
   I also, moved the indexers description from `IndexingJob` to 
`IndexerOutputFormat`, to avoid to build `IndexWriters` instance twice.
   
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> SolrIndexer to write to multiple servers.
> -----------------------------------------
>
>                 Key: NUTCH-1480
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1480
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1480-1.6.1.patch, 
> adding-support-for-sharding-indexer-for-solr.patch
>
>
> SolrUtils should return an array of SolrServers and read the SolrUrl as a 
> comma delimited list of URL's using Configuration.getString(). SolrWriter 
> should be able to handle this list of SolrServers.
> This is useful if you want to send documents to multiple servers if no 
> replication is available or if you want to send documents to multiple NOCs.
> edit:
> This does not replace NUTCH-1377 but complements it. With NUTCH-1377 this 
> issue allows you to index to multiple SolrCloud clusters at the same time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

Reply via email to