On 6/18/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> Your patch seemed to do the trick with the segment reader. Here was the 
> output, I have removed most of the page content as it would otherwise turn 
> this email into a massive pile of junk in some email clients.
>
> [command output]
>
> Version: 2
> url: http://acadisc.com/
> base: http://acadisc.com/
> contentType: text/html
> metadata: Content-Length=9292 Connection=close ETag="3969a30-244c-462eae09" 
> nutch.segment.name=20070607150908 nutch.crawl.score=1.0 Date=Fri, 08 Jun 2007 
> 22:05:45 GMT Accept-Ranges=bytes Server=Apache Content-Type=text/html 
> Last-Modified=Wed, 25 Apr 2007 01:25:29 GMT
> Content:
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML>
>
>
> [html source content - removed]
>
> </HTML>
>
> Crawl Fetch::
> Version: 5
> Status: 33 (fetch_success)
> Fetch time: Fri Jun 08 18:11:25 EDT 2007
> Modified time: Wed Dec 31 19:00:00 EST 1969
> Retries since fetch: 0
> Retry interval: 30.0 seconds (3.4722223E-4 days)
> Score: 1.0
> Signature: c079280b4afb4347372982d5a034d51b
> Metadata: _ngt_:1181243348572 _pst_:success(1), lastModified=0

I believe I finally tracked down the bug. This is related to a stupid
piece of my code from NUTCH-443. You are running your fetcher in
parsing mode, right? If you run parse as a seperate job, everything
should work fine. I know I have been asking a lot, but could you
verify that? Meanwhile, I will try to create a patch for Fetcher and
Fetcher2.

>
>
> ----- Original Message ----
> From: Doğacan Güney <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Monday, June 18, 2007 2:01:14 AM
> Subject: Re: Indexing problems in nutch-nightly
>
>
> On 6/18/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> > There was no result due to the fact it does not complete and the process 
> > just hangs with zero processor utilization. There was nothing in the logs 
> > to show you, but I took a stack trace before killing the process completely 
> > and here it is;
> >
> > Full thread dump Java HotSpot(TM) 64-Bit Server VM (diablo-1.5.0_07-b01 
> > mixed mode):
> > "Low Memory Detector" daemon prio=5 tid=0x00000000006cfc00 nid=0x6d5800 
> > runnable [0x0000000000000000..0x0000000000000000]
> > "CompilerThread1" daemon prio=9 tid=0x00000000006c9c00 nid=0x6cf800 waiting 
> > on condition [0x0000000000000000..0x00007fffff1f4320]
> > "CompilerThread0" daemon prio=9 tid=0x00000000006c3c00 nid=0x6c9800 waiting 
> > on condition [0x0000000000000000..0x00007fffff2f5400]
> > "AdapterThread" daemon prio=9 tid=0x00000000006bac00 nid=0x6c3800 waiting 
> > on condition [0x0000000000000000..0x0000000000000000]
> > "Signal Dispatcher" daemon prio=9 tid=0x00000000006a7c00 nid=0x6ba800 
> > waiting on condition [0x0000000000000000..0x0000000000000000]
> > "Finalizer" daemon prio=8 tid=0x00000000006a7000 nid=0x6a7800 in 
> > Object.wait() [0x00007fffff5f9000..0x00007fffff5f9910]
> >         at java.lang.Object.wait(Native Method)
> >         - waiting on <0x00000008b7860ad0> (a 
> > java.lang.ref.ReferenceQueue$Lock)
> >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
> >         - locked <0x00000008b7860ad0> (a java.lang.ref.ReferenceQueue$Lock)
> >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
> >         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> > "Reference Handler" daemon prio=10 tid=0x000000000062b800 nid=0x62bc00 in 
> > Object.wait() [0x00007fffff6fa000..0x00007fffff6fac90]
> > "main" prio=5 tid=0x0000000000516800 nid=0x516000 waiting on condition 
> > [0x00007fffffffc000..0x00007fffffffd2f0]
> >         at java.lang.Thread.sleep(Native Method)
> >         at 
> > org.apache.nutch.segment.SegmentReader.get(SegmentReader.java:348)
> >         at 
> > org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:590)
> > "VM Thread" prio=9 tid=0x000000000065f200 nid=0x62b400 runnable
> > "GC task thread#0 (ParallelGC)" prio=5 tid=0x0000000000527c00 nid=0x5af400 
> > runnable
> > "GC task thread#1 (ParallelGC)" prio=5 tid=0x00000000005b5200 nid=0x5bd000 
> > runnable
> > "VM Periodic Task Thread" prio=9 tid=0x0000000000527800 nid=0x6dc800 
> > waiting on condition
> >
> >
> >
>
> Ah, non-debuggable problems.... so much fun:)
>
> Anyway, it seems you are running into the problem described here:
> http://www.nabble.com/bug-in-SegmentReader-tf3788992.html
>
> I have put up a "patchified" version here:
> http://www.ceng.metu.edu.tr/~e1345172/segment_reader_hang.patch
>
> Can you retry with this patch?
>
> Thanks!
>
> --
> Doğacan Güney


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to