There was no result due to the fact it does not complete and the process just
hangs with zero processor utilization. There was nothing in the logs to show
you, but I took a stack trace before killing the process completely and here it
is;
Full thread dump Java HotSpot(TM) 64-Bit Server VM (diablo-1.5.0_07-b01 mixed
mode):
"Low Memory Detector" daemon prio=5 tid=0x00000000006cfc00 nid=0x6d5800
runnable [0x0000000000000000..0x0000000000000000]
"CompilerThread1" daemon prio=9 tid=0x00000000006c9c00 nid=0x6cf800 waiting on
condition [0x0000000000000000..0x00007fffff1f4320]
"CompilerThread0" daemon prio=9 tid=0x00000000006c3c00 nid=0x6c9800 waiting on
condition [0x0000000000000000..0x00007fffff2f5400]
"AdapterThread" daemon prio=9 tid=0x00000000006bac00 nid=0x6c3800 waiting on
condition [0x0000000000000000..0x0000000000000000]
"Signal Dispatcher" daemon prio=9 tid=0x00000000006a7c00 nid=0x6ba800 waiting
on condition [0x0000000000000000..0x0000000000000000]
"Finalizer" daemon prio=8 tid=0x00000000006a7000 nid=0x6a7800 in Object.wait()
[0x00007fffff5f9000..0x00007fffff5f9910]
at java.lang.Object.wait(Native Method)
- waiting on <0x00000008b7860ad0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x00000008b7860ad0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x000000000062b800 nid=0x62bc00 in
Object.wait() [0x00007fffff6fa000..0x00007fffff6fac90]
"main" prio=5 tid=0x0000000000516800 nid=0x516000 waiting on condition
[0x00007fffffffc000..0x00007fffffffd2f0]
at java.lang.Thread.sleep(Native Method)
at org.apache.nutch.segment.SegmentReader.get(SegmentReader.java:348)
at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:590)
"VM Thread" prio=9 tid=0x000000000065f200 nid=0x62b400 runnable
"GC task thread#0 (ParallelGC)" prio=5 tid=0x0000000000527c00 nid=0x5af400
runnable
"GC task thread#1 (ParallelGC)" prio=5 tid=0x00000000005b5200 nid=0x5bd000
runnable
"VM Periodic Task Thread" prio=9 tid=0x0000000000527800 nid=0x6dc800 waiting on
condition
----- Original Message ----
From: Doğacan Güney <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Sunday, June 17, 2007 8:28:39 AM
Subject: Re: Indexing problems in nutch-nightly
On 6/17/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> After the change to Indexer.java here is the more verbose log of the error
> that seems to be happening during the indexing phase;
>
> 2007-06-15 21:53:06,098 INFO indexer.Indexer - url=http://acadisc.com/,
> parseDa
> ta=Version: 5
> Status: failed(2,200): sun.io.MalformedInputException: Missing byte-order mark
> Title:
> Outlinks: 0
> Content Metadata:
> Parse Metadata:
> 2007-06-15 21:53:06,101 WARN mapred.LocalJobRunner - job_73pqhd
> java.lang.NullPointerException: value cannot be null
> at org.apache.lucene.document.Field.<init>(Field.java:188)
> at org.apache.lucene.document.Field.<init>(Field.java:164)
> at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:1
> 55)
> 2007-06-15 21:53:06,845 FATAL indexer.Indexer - Indexer: java.io.IOException:
> Jo
> b failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:280)
> at org.apache.nutch.indexer.Indexer.run(Indexer.java:302)
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:285)
>
>
Can you also do a
readseg -get <segment> "http://acadisc.com/"; -nogenerate -noparsetext
-noparsedata
and send the result?
--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general