On 6/18/07, Sean Dean <[EMAIL PROTECTED]> wrote: > Your patch seemed to do the trick with the segment reader. Here was the > output, I have removed most of the page content as it would otherwise turn > this email into a massive pile of junk in some email clients. > > [command output] > > Version: 2 > url: http://acadisc.com/ > base: http://acadisc.com/ > contentType: text/html > metadata: Content-Length=9292 Connection=close ETag="3969a30-244c-462eae09" > nutch.segment.name=20070607150908 nutch.crawl.score=1.0 Date=Fri, 08 Jun 2007 > 22:05:45 GMT Accept-Ranges=bytes Server=Apache Content-Type=text/html > Last-Modified=Wed, 25 Apr 2007 01:25:29 GMT > Content: > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> > <HTML> > > > [html source content - removed] > > </HTML> > > Crawl Fetch:: > Version: 5 > Status: 33 (fetch_success) > Fetch time: Fri Jun 08 18:11:25 EDT 2007 > Modified time: Wed Dec 31 19:00:00 EST 1969 > Retries since fetch: 0 > Retry interval: 30.0 seconds (3.4722223E-4 days) > Score: 1.0 > Signature: c079280b4afb4347372982d5a034d51b > Metadata: _ngt_:1181243348572 _pst_:success(1), lastModified=0
I believe I finally tracked down the bug. This is related to a stupid piece of my code from NUTCH-443. You are running your fetcher in parsing mode, right? If you run parse as a seperate job, everything should work fine. I know I have been asking a lot, but could you verify that? Meanwhile, I will try to create a patch for Fetcher and Fetcher2. > > > ----- Original Message ---- > From: Doğacan Güney <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, June 18, 2007 2:01:14 AM > Subject: Re: Indexing problems in nutch-nightly > > > On 6/18/07, Sean Dean <[EMAIL PROTECTED]> wrote: > > There was no result due to the fact it does not complete and the process > > just hangs with zero processor utilization. There was nothing in the logs > > to show you, but I took a stack trace before killing the process completely > > and here it is; > > > > Full thread dump Java HotSpot(TM) 64-Bit Server VM (diablo-1.5.0_07-b01 > > mixed mode): > > "Low Memory Detector" daemon prio=5 tid=0x00000000006cfc00 nid=0x6d5800 > > runnable [0x0000000000000000..0x0000000000000000] > > "CompilerThread1" daemon prio=9 tid=0x00000000006c9c00 nid=0x6cf800 waiting > > on condition [0x0000000000000000..0x00007fffff1f4320] > > "CompilerThread0" daemon prio=9 tid=0x00000000006c3c00 nid=0x6c9800 waiting > > on condition [0x0000000000000000..0x00007fffff2f5400] > > "AdapterThread" daemon prio=9 tid=0x00000000006bac00 nid=0x6c3800 waiting > > on condition [0x0000000000000000..0x0000000000000000] > > "Signal Dispatcher" daemon prio=9 tid=0x00000000006a7c00 nid=0x6ba800 > > waiting on condition [0x0000000000000000..0x0000000000000000] > > "Finalizer" daemon prio=8 tid=0x00000000006a7000 nid=0x6a7800 in > > Object.wait() [0x00007fffff5f9000..0x00007fffff5f9910] > > at java.lang.Object.wait(Native Method) > > - waiting on <0x00000008b7860ad0> (a > > java.lang.ref.ReferenceQueue$Lock) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) > > - locked <0x00000008b7860ad0> (a java.lang.ref.ReferenceQueue$Lock) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) > > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > "Reference Handler" daemon prio=10 tid=0x000000000062b800 nid=0x62bc00 in > > Object.wait() [0x00007fffff6fa000..0x00007fffff6fac90] > > "main" prio=5 tid=0x0000000000516800 nid=0x516000 waiting on condition > > [0x00007fffffffc000..0x00007fffffffd2f0] > > at java.lang.Thread.sleep(Native Method) > > at > > org.apache.nutch.segment.SegmentReader.get(SegmentReader.java:348) > > at > > org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:590) > > "VM Thread" prio=9 tid=0x000000000065f200 nid=0x62b400 runnable > > "GC task thread#0 (ParallelGC)" prio=5 tid=0x0000000000527c00 nid=0x5af400 > > runnable > > "GC task thread#1 (ParallelGC)" prio=5 tid=0x00000000005b5200 nid=0x5bd000 > > runnable > > "VM Periodic Task Thread" prio=9 tid=0x0000000000527800 nid=0x6dc800 > > waiting on condition > > > > > > > > Ah, non-debuggable problems.... so much fun:) > > Anyway, it seems you are running into the problem described here: > http://www.nabble.com/bug-in-SegmentReader-tf3788992.html > > I have put up a "patchified" version here: > http://www.ceng.metu.edu.tr/~e1345172/segment_reader_hang.patch > > Can you retry with this patch? > > Thanks! > > -- > Doğacan Güney -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
