Problem while updating crawldb from segments directory

2010-04-27 Thread hareesh

Hi, 
I have posted the same query few weeks back, sorry for asking it again.
Iam having problem while updating the crawleddb from the crawled
segments.when i try to run the command its talking too long at after 1200
seconds updating to crawldb is getting failed, so iam unable to do
incremental crawling. is it a bug ...pls respond thanks in advance.. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-while-updating-crawldb-from-segments-directory-tp758754p758754.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Problem at the end of fetching

2010-03-31 Thread hareesh

Iam trying to crawl a seed list of 5000. it was working fine, but at the end
of fetching at depth 1 the process failed showing message like this. can any
one suggest what may be problem..

attempt_201003311259_0003_m_03_2: fetching
http://www.law.louisville.edu/news-events/admissions/feed
attempt_201003311259_0003_m_03_2: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4993
attempt_201003311259_0003_m_03_2: fetch of
http://dualdegree.seas.wustl.edu/ failed with:
java.net.UnknownHostException: dualdegree.seas.wustl.edu
attempt_201003311259_0003_m_03_2: fetching
http://www.couchsurfing.org/ambassador.html
attempt_201003311259_0003_m_03_2: fetching
http://www.niu.edu/northerntoday/contact.shtml
attempt_201003311259_0003_m_03_2: fetching
http://www.spertusshop.org/gifts-for-weddings-c-126.html
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)
runbot: fetch 20100331130047 at depth 1 failed.
runbot: Deleting segment 20100331130047.
--- Beginning crawl at depth 2 of 3 ---
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/20100331165040
Generator: filtering: true
Generator: topN: 80
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/20100331165040

-- 
View this message in context: 
http://n3.nabble.com/Problem-at-the-end-of-fetching-tp688124p688124.html
Sent from the Nutch - User mailing list archive at Nabble.com.


current leaseholder is trying to recreate file.

2010-03-30 Thread hareesh

Anyone have insight on the following message 

attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4998
attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4998
attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads.
Task attempt_201003301923_0007_m_04_0 failed to report status for 1865
seconds. Killing!
Task attempt_201003301923_0007_r_00_0 failed to report status for 1243
seconds. Killing!
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_1 on client xxx.xxx.xxx.xxx
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
at
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:134)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:92)
at
org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:66)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_2 on client xxx.xxx.xxx.xxx
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClie

Problem with writing index

2010-03-30 Thread hareesh

I was trying a crawl with 200 seeds. In previous cases it used to create the
index with out any problem , now when i started the crawl its show the
following exception at depth 2

attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads.
Task attempt_201003301923_0007_m_04_0 failed to report status for 1865
seconds. Killing!
Task attempt_201003301923_0007_r_00_0 failed to report status for 1243
seconds. Killing!
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_1 on client 192.168.101.155
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
at
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:134)
at org.apache.hadoop.io.MapFile$Writer.(MapFile.java:92)
at
org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:66)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/user/nutch/crawled/segments/20100330193414/crawl_fetch/part-2/index for
DFSClient_attempt_201003301923_0007_r_02_2 on client 192.168.101.155
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1055)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:998)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2585)
at org.apache.hadoop.hdfs.DFSClient.create(DF

Problem when using updatedb

2010-03-30 Thread hareesh

Hi,

Iam having problem while updating the crawleddb from the crawled
segments.when i try to run the command its talking too long at after 1200
seconds updating to crawldb is getting failed, so iam unable to do
incremental crawling. is it a bug ...pls respond thanks in advance..
-- 
View this message in context: 
http://n3.nabble.com/Problem-when-using-updatedb-tp685806p685806.html
Sent from the Nutch - User mailing list archive at Nabble.com.