Your patch seemed to do the trick with the segment reader. Here was the output,
I have removed most of the page content as it would otherwise turn this email
into a massive pile of junk in some email clients.
[command output]
Version: 2
url: http://acadisc.com/
base: http://acadisc.com/
contentType: text/html
metadata: Content-Length=9292 Connection=close ETag="3969a30-244c-462eae09"
nutch.segment.name=20070607150908 nutch.crawl.score=1.0 Date=Fri, 08 Jun 2007
22:05:45 GMT Accept-Ranges=bytes Server=Apache Content-Type=text/html
Last-Modified=Wed, 25 Apr 2007 01:25:29 GMT
Content:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
[html source content - removed]
</HTML>
Crawl Fetch::
Version: 5
Status: 33 (fetch_success)
Fetch time: Fri Jun 08 18:11:25 EDT 2007
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 0
Retry interval: 30.0 seconds (3.4722223E-4 days)
Score: 1.0
Signature: c079280b4afb4347372982d5a034d51b
Metadata: _ngt_:1181243348572 _pst_:success(1), lastModified=0
----- Original Message ----
From: Doğacan Güney <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, June 18, 2007 2:01:14 AM
Subject: Re: Indexing problems in nutch-nightly
On 6/18/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> There was no result due to the fact it does not complete and the process just
> hangs with zero processor utilization. There was nothing in the logs to show
> you, but I took a stack trace before killing the process completely and here
> it is;
>
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (diablo-1.5.0_07-b01 mixed
> mode):
> "Low Memory Detector" daemon prio=5 tid=0x00000000006cfc00 nid=0x6d5800
> runnable [0x0000000000000000..0x0000000000000000]
> "CompilerThread1" daemon prio=9 tid=0x00000000006c9c00 nid=0x6cf800 waiting
> on condition [0x0000000000000000..0x00007fffff1f4320]
> "CompilerThread0" daemon prio=9 tid=0x00000000006c3c00 nid=0x6c9800 waiting
> on condition [0x0000000000000000..0x00007fffff2f5400]
> "AdapterThread" daemon prio=9 tid=0x00000000006bac00 nid=0x6c3800 waiting on
> condition [0x0000000000000000..0x0000000000000000]
> "Signal Dispatcher" daemon prio=9 tid=0x00000000006a7c00 nid=0x6ba800 waiting
> on condition [0x0000000000000000..0x0000000000000000]
> "Finalizer" daemon prio=8 tid=0x00000000006a7000 nid=0x6a7800 in
> Object.wait() [0x00007fffff5f9000..0x00007fffff5f9910]
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000008b7860ad0> (a
> java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
> - locked <0x00000008b7860ad0> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> "Reference Handler" daemon prio=10 tid=0x000000000062b800 nid=0x62bc00 in
> Object.wait() [0x00007fffff6fa000..0x00007fffff6fac90]
> "main" prio=5 tid=0x0000000000516800 nid=0x516000 waiting on condition
> [0x00007fffffffc000..0x00007fffffffd2f0]
> at java.lang.Thread.sleep(Native Method)
> at org.apache.nutch.segment.SegmentReader.get(SegmentReader.java:348)
> at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:590)
> "VM Thread" prio=9 tid=0x000000000065f200 nid=0x62b400 runnable
> "GC task thread#0 (ParallelGC)" prio=5 tid=0x0000000000527c00 nid=0x5af400
> runnable
> "GC task thread#1 (ParallelGC)" prio=5 tid=0x00000000005b5200 nid=0x5bd000
> runnable
> "VM Periodic Task Thread" prio=9 tid=0x0000000000527800 nid=0x6dc800 waiting
> on condition
>
>
>
Ah, non-debuggable problems.... so much fun:)
Anyway, it seems you are running into the problem described here:
http://www.nabble.com/bug-in-SegmentReader-tf3788992.html
I have put up a "patchified" version here:
http://www.ceng.metu.edu.tr/~e1345172/segment_reader_hang.patch
Can you retry with this patch?
Thanks!
--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general