[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714290#action_12714290 ]
Dmitry Lihachev commented on NUTCH-739: --------------------------------------- I think that optimizing solr - is not hadoop job. it does not need parallelization. > SolrDeleteDuplications too slow when using hadoop > ------------------------------------------------- > > Key: NUTCH-739 > URL: https://issues.apache.org/jira/browse/NUTCH-739 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.0.0 > Environment: hadoop cluster with 3 nodes > Map Task Capacity: 6 > Reduce Task Capacity: 6 > Indexer: one instance of solr server (on the one of slave nodes) > Reporter: Dmitry Lihachev > Fix For: 1.1 > > Attachments: NUTCH-739_remove_optimize_on_solr_dedup.patch > > > in my environment i always have many warnings like this on the dedup step > {noformat} > Task attempt_200905270022_0212_r_000003_0 failed to report status for 600 > seconds. Killing! > {noformat} > solr logs: > {noformat} > INFO: [] webapp=/solr path=/update > params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2} > status=0 QTime=173741 > May 27, 2009 10:29:27 AM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {optimize=} 0 173599 > May 27, 2009 10:29:27 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update > params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2} > status=0 QTime=173599 > May 27, 2009 10:29:27 AM org.apache.solr.search.SolrIndexSearcher close > INFO: Closing searc...@2ad9ac58 main > May 27, 2009 10:29:27 AM > org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean getMBeanInfo > WARNING: Could not getStatistics on info bean > org.apache.solr.search.SolrIndexSearcher > org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed > .... > {noformat} > So I think the problem in the piece of code on line 301 of > SolrDeleteDuplications ( solr.optimize() ). Because we have few job tasks > each of ones tries to optimize solr indexes before closing. > The simplest way to avoid this bug - removing this line and sending > "<optimize/>" message directly to solr server after dedup step -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.