from:"hareesh"

Problem while updating crawldb from segments directory

2010-04-27 Thread hareesh


Hi, 
I have posted the same query few weeks back, sorry for asking it again.
Iam having problem while updating the crawleddb from the crawled
segments.when i try to run the command its talking too long at after 1200
seconds updating to crawldb is getting failed, so iam unable to do
incremental crawling. is it a bug ...pls respond thanks in advance.. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-while-updating-crawldb-from-segments-directory-tp758754p758754.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Problem at the end of fetching

2010-03-31 Thread hareesh


Iam trying to crawl a seed list of 5000. it was working fine, but at the end
of fetching at depth 1 the process failed showing message like this. can any
one suggest what may be problem..

attempt_201003311259_0003_m_03_2: fetching
http://www.law.louisville.edu/news-events/admissions/feed
attempt_201003311259_0003_m_03_2: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4993
attempt_201003311259_0003_m_03_2: fetch of
http://dualdegree.seas.wustl.edu/ failed with:
java.net.UnknownHostException: dualdegree.seas.wustl.edu
attempt_201003311259_0003_m_03_2: fetching
http://www.couchsurfing.org/ambassador.html
attempt_201003311259_0003_m_03_2: fetching
http://www.niu.edu/northerntoday/contact.shtml
attempt_201003311259_0003_m_03_2: fetching
http://www.spertusshop.org/gifts-for-weddings-c-126.html
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)
runbot: fetch 20100331130047 at depth 1 failed.
runbot: Deleting segment 20100331130047.
--- Beginning crawl at depth 2 of 3 ---
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/20100331165040
Generator: filtering: true
Generator: topN: 80
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/20100331165040

-- 
View this message in context: 
http://n3.nabble.com/Problem-at-the-end-of-fetching-tp688124p688124.html
Sent from the Nutch - User mailing list archive at Nabble.com.

current leaseholder is trying to recreate file.

2010-03-30 Thread hareesh


Anyone have insight on the following message 

attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4998
attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4998
attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads.
Task attempt_201003301923_0007_m_04_0 failed to report status for 1865
seconds. Killing!
Task attempt_201003301923_0007_r_00_0 failed to report status for 1243
seconds. Killing!
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_1 on client xxx.xxx.xxx.xxx
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
at
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:134)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:92)
at
org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:66)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_2 on client xxx.xxx.xxx.xxx
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClie

Problem with writing index

2010-03-30 Thread hareesh


I was trying a crawl with 200 seeds. In previous cases it used to create the
index with out any problem , now when i started the crawl its show the
following exception at depth 2

attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads.
Task attempt_201003301923_0007_m_04_0 failed to report status for 1865
seconds. Killing!
Task attempt_201003301923_0007_r_00_0 failed to report status for 1243
seconds. Killing!
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_1 on client 192.168.101.155
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
at
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:134)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:92)
at
org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:66)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_2 on client 192.168.101.155
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DF

Problem when using updatedb

2010-03-30 Thread hareesh


Hi,

Iam having problem while updating the crawleddb from the crawled
segments.when i try to run the command its talking too long at after 1200
seconds updating to crawldb is getting failed, so iam unable to do
incremental crawling. is it a bug ...pls respond thanks in advance..
-- 
View this message in context: 
http://n3.nabble.com/Problem-when-using-updatedb-tp685806p685806.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Problem while updating crawldb from segments directory

Problem at the end of fetching

current leaseholder is trying to recreate file.

Problem with writing index

Problem when using updatedb

5 matches

Site Navigation

Mail list logo

Footer information