you gave me an idea about content length.  so i went and checked, and of
course my http.content.limit was set to -1, and hence the hung threads.  i
turned it back to the default value and it works!  however, a minor nuisance
with this approach is that pdfs are chopped up and the PDF parser throws a
warning regarding incomplete content.  then the fetcher kept on trying the
same pdf document.  well this is a minor issue compared to the fatal error
before, so thanks!

On 6/20/07, Sunnyvale Fl <[EMAIL PROTECTED]> wrote:

thanks - however, i am using the regular http-protocol plugin.  everytime
it gave a few fatal npe following the hung threads and then died.  :(  not
sure what to do about it!

On 6/20/07, charlie w <[EMAIL PROTECTED]> wrote:
>
> I've seen this issue when the fetcher would attempt to retrieve huge
> files
> (>100MB).  The incredible amount of time required to fetch files this
> large caused the individual fetcher thread to appear hung to the thread
> monitoring the fetcher status.
>
> In my particular case, the only reason the fetcher was retrieving such a
> huge file is that the content length limit is broken for the http-client
> protocol plugin (it works in the regular http protocol plugin).  See
> Nutch-481.
>
>
> On 6/19/07, Sunnyvale Fl <[EMAIL PROTECTED]> wrote:
> >
> > Hi all,
> >
> > I upgraded from Nutch 0.8.1 to 0.9 and thought the hung thread issues
> are
> > fixed in this version (NUTCH-344), but I am still seeing the same
> problem.
> > I am only running 10 threads for a focused crawl, and have the default
> > fetcher.max.crawl.delay of 30.  Every time it hangs, it is followed by
> a
> > NPE
> > - any clue?  Thanks!
> >
> > 2007-06-19 14:40:37,670 WARN  fetcher.Fetcher - Aborting with 3 hung
> > threads.
> > 2007-06-19 15:01:46,518 FATAL fetcher.Fetcher -
> > java.lang.NullPointerException
> > 2007-06-19 15:01:46,545 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> > FSDataInputStream.java
> > :87)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java
> :125)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> > :1736)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> > SequenceFileRecordReader.java:108)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
> > 2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - fetcher caught:
> > java.lang.NullPointerException
> >
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to