[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2014-04-24 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1182:
---

Attachment: NUTCH-1182-2x.patch

Patch for 2.x.

> fetcher should track and shut down hung threads
> ---
>
> Key: NUTCH-1182
> URL: https://issues.apache.org/jira/browse/NUTCH-1182
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3, 1.4
> Environment: Linux, local job runner
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.9
>
> Attachments: NUTCH-1182-2x.patch, NUTCH-1182-trunk-v1.patch
>
>
> While crawling a slow server with a couple of very large PDF documents (30 
> MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are 
> fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
> ...
> {noformat}
> or
> {noformat}
>Exception in thread "QueueFeeder" java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>  at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are 
> still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via 
> {{output.collect()}} which must fail because the fetcher map job is already 
> finished and the associated temporary mapred directory is deleted. The error 
> message may get mixed with the progress output of the next fetch cycle 
> causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That 
> is, it's hard to track them down, and they will cause a hung thread again and 
> again.
> The problem is reproducible when fetching bigger documents and setting 
> {{mapred.task.timeout}} to a low value (this will definitely cause hung 
> threads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2014-04-05 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1182:
---

Attachment: NUTCH-1182-trunk-v1.patch

>From time to time this problem is reported by users 
>([2013|http://mail-archives.apache.org/mod_mbox/nutch-user/201304.mbox/%3ccajvbnigoqjl2hbuhv0gdbcjea2xzxhabqrsbpjaqtmfldkw...@mail.gmail.com%3E],
> 
>[2012a|http://stackoverflow.com/questions/10331440/nutch-fetcher-aborting-with-n-hung-threads],
> 
>[2012b|http://stackoverflow.com/questions/12181249/nutch-crawl-fails-when-run-as-a-background-process-on-linux],
> 
>[2011|http://lucene.472066.n3.nabble.com/Nutch-1-2-fetcher-aborting-with-N-hung-threads-td2411724.html]).
> Shutting down hung threads is hard to implement (cf. NUTCH-1387). But logging 
>the URLs which cause threads to hang would definitely help in many situations. 
>Patch attached.

> fetcher should track and shut down hung threads
> ---
>
> Key: NUTCH-1182
> URL: https://issues.apache.org/jira/browse/NUTCH-1182
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3, 1.4
> Environment: Linux, local job runner
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.9
>
> Attachments: NUTCH-1182-trunk-v1.patch
>
>
> While crawling a slow server with a couple of very large PDF documents (30 
> MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are 
> fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
> ...
> {noformat}
> or
> {noformat}
>Exception in thread "QueueFeeder" java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>  at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are 
> still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via 
> {{output.collect()}} which must fail because the fetcher map job is already 
> finished and the associated temporary mapred directory is deleted. The error 
> message may get mixed with the progress output of the next fetch cycle 
> causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That 
> is, it's hard to track them down, and they will cause a hung thread again and 
> again.
> The problem is reproducible when fetching bigger documents and setting 
> {{mapred.task.timeout}} to a low value (this will definitely cause hung 
> threads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2014-04-05 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1182:
---

Fix Version/s: 1.9

> fetcher should track and shut down hung threads
> ---
>
> Key: NUTCH-1182
> URL: https://issues.apache.org/jira/browse/NUTCH-1182
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3, 1.4
> Environment: Linux, local job runner
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.9
>
>
> While crawling a slow server with a couple of very large PDF documents (30 
> MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are 
> fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
> ...
> {noformat}
> or
> {noformat}
>Exception in thread "QueueFeeder" java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>  at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are 
> still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via 
> {{output.collect()}} which must fail because the fetcher map job is already 
> finished and the associated temporary mapred directory is deleted. The error 
> message may get mixed with the progress output of the next fetch cycle 
> causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That 
> is, it's hard to track them down, and they will cause a hung thread again and 
> again.
> The problem is reproducible when fetching bigger documents and setting 
> {{mapred.task.timeout}} to a low value (this will definitely cause hung 
> threads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1182:


Fix Version/s: 2.2
   1.7

> fetcher should track and shut down hung threads
> ---
>
> Key: NUTCH-1182
> URL: https://issues.apache.org/jira/browse/NUTCH-1182
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3, 1.4
> Environment: Linux, local job runner
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.7, 2.2
>
>
> While crawling a slow server with a couple of very large PDF documents (30 
> MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are 
> fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
> ...
> {noformat}
> or
> {noformat}
>Exception in thread "QueueFeeder" java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>  at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are 
> still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via 
> {{output.collect()}} which must fail because the fetcher map job is already 
> finished and the associated temporary mapred directory is deleted. The error 
> message may get mixed with the progress output of the next fetch cycle 
> causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That 
> is, it's hard to track them down, and they will cause a hung thread again and 
> again.
> The problem is reproducible when fetching bigger documents and setting 
> {{mapred.task.timeout}} to a low value (this will definitely cause hung 
> threads).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira