[jira] Created: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

Dmitry Lihachev (JIRA) Wed, 27 May 2009 21:35:10 -0700

SolrDeleteDuplications too slow when using hadoop
-------------------------------------------------


                 Key: NUTCH-739
                 URL: https://issues.apache.org/jira/browse/NUTCH-739
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 1.0.0
         Environment: hadoop cluster with 3 nodes
Map Task Capacity: 6
Reduce Task Capacity: 6
Indexer: one instance of solr server (on the one of slave nodes)
            Reporter: Dmitry Lihachev
             Fix For: 1.1


in my environment i always have many warnings like this on the dedup step
{noformat}
Task attempt_200905270022_0212_r_000003_0 failed to report status for 600 
seconds. Killing!
{noformat}
solr logs:
{noformat}
INFO: [] webapp=/solr path=/update 
params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2}
 status=0 QTime=173741
May 27, 2009 10:29:27 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {optimize=} 0 173599
May 27, 2009 10:29:27 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update 
params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2}
 status=0 QTime=173599
May 27, 2009 10:29:27 AM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@2ad9ac58 main
May 27, 2009 10:29:27 AM org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean 
getMBeanInfo
WARNING: Could not getStatistics on info bean 
org.apache.solr.search.SolrIndexSearcher
org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
....
{noformat}

So I think the problem in the piece of code on line 301 of 
SolrDeleteDuplications ( solr.optimize() ). Because we have few job tasks each 
of ones tries to optimize solr indexes before closing.
The simplest way to avoid this bug - removing this line and sending 
"<optimize/>" message directly to solr server


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

Reply via email to