[ https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
VictorHu updated NUTCH-2205: ---------------------------- Affects Version/s: 2.3 Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 2.5.2,Zookeeper 3.4.6 ,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1 Fix Version/s: 2.4 Description: When the number of solr docs larger than 9000,the solrdedup of the nutch is broken.This is log: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: starting... 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at master.Itble/10.192.1.100:8032 16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1 16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453104806095_0162 16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application application_1453104806095_0162 16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: http://master.Itble:8088/proxy/application_1453104806095_0162/ 16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162 16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in uber mode : false 16/01/25 17:02:54 INFO mapreduce.Job: map 0% reduce 0% 16/01/25 17:03:02 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_0, Status : FAILED Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1] at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 16/01/25 17:03:12 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_1, Status : FAILED Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1] at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 16/01/25 17:03:22 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_2, Status : FAILED Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1] at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 16/01/25 17:03:31 INFO mapreduce.Job: map 100% reduce 100% 16/01/25 17:03:31 INFO mapreduce.Job: Job job_1453104806095_0162 failed with state FAILED due to: Task failed task_1453104806095_0162_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/01/25 17:03:31 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=30150 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=30150 Total vcore-seconds taken by all map tasks=30150 Total megabyte-seconds taken by all map tasks=46310400 Summary: Nutch solrdedup error in solrcloud for larger docs (was: Nutch solrdedup error in solrcloud for doc) > Nutch solrdedup error in solrcloud for larger docs > --------------------------------------------------- > > Key: NUTCH-2205 > URL: https://issues.apache.org/jira/browse/NUTCH-2205 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 2.3 > Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop > 2.5.2,Zookeeper 3.4.6 ,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1 > Reporter: VictorHu > Fix For: 2.4 > > > When the number of solr docs larger than 9000,the solrdedup of the nutch is > broken.This is log: > http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 > 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: > starting... > 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr > url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 > 16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at > master.Itble/10.192.1.100:8032 > 16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1 > 16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1453104806095_0162 > 16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application > application_1453104806095_0162 > 16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: > http://master.Itble:8088/proxy/application_1453104806095_0162/ > 16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162 > 16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in > uber mode : false > 16/01/25 17:02:54 INFO mapreduce.Job: map 0% reduce 0% > 16/01/25 17:03:02 INFO mapreduce.Job: Task Id : > attempt_1453104806095_0162_m_000000_0, Status : FAILED > Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this > request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, > http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, > http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > 16/01/25 17:03:12 INFO mapreduce.Job: Task Id : > attempt_1453104806095_0162_m_000000_1, Status : FAILED > Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this > request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, > http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, > http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, > http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > 16/01/25 17:03:22 INFO mapreduce.Job: Task Id : > attempt_1453104806095_0162_m_000000_2, Status : FAILED > Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this > request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, > http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, > http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > 16/01/25 17:03:31 INFO mapreduce.Job: map 100% reduce 100% > 16/01/25 17:03:31 INFO mapreduce.Job: Job job_1453104806095_0162 failed with > state FAILED due to: Task failed task_1453104806095_0162_m_000000 > Job failed as tasks failed. failedMaps:1 failedReduces:0 > 16/01/25 17:03:31 INFO mapreduce.Job: Counters: 8 > Job Counters > Failed map tasks=4 > Launched map tasks=4 > Other local map tasks=4 > Total time spent by all maps in occupied slots (ms)=30150 > Total time spent by all reduces in occupied slots (ms)=0 > Total time spent by all map tasks (ms)=30150 > Total vcore-seconds taken by all map tasks=30150 > Total megabyte-seconds taken by all map tasks=46310400 -- This message was sent by Atlassian JIRA (v6.3.4#6332)