[jira] Commented: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks

2006-08-10 Thread Jason Calabrese (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-344?page=comments#action_12427238 ] 

Jason Calabrese commented on NUTCH-344:
---

This issue is still marked as resolved, it needs to be re-opened so the patch 
will be committed to SVN



> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -
>
> Key: NUTCH-344
> URL: http://issues.apache.org/jira/browse/NUTCH-344
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.8, 0.9.0, 0.8.1
> Environment: All
>Reporter: Greg Kim
> Fix For: 0.9.0, 0.8.1
>
> Attachments: cleanExpiredServerBlocks.patch, HttpBase.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies 
> to block fetcher threads while one thread busy waits... 
>   private static void cleanExpiredServerBlocks() {
> synchronized (BLOCKED_ADDR_TO_TIME) {
>   while (!BLOCKED_ADDR_QUEUE.isEmpty()) {   <= LINE 3:   
> String host = (String) BLOCKED_ADDR_QUEUE.getLast();
> long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
> if (time <= System.currentTimeMillis()) {   
>   BLOCKED_ADDR_TO_TIME.remove(host);
>   BLOCKED_ADDR_QUEUE.removeLast();
> }
>   }
> }
>   }
> LINE3:  As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the 
> thread that first enters this block busy-waits until it becomes empty while 
> all other threads block on the synchronized block.  This leads to extremely 
> poor fetcher performance.  
> Since the checkin to respect crawlDelay in robots.txt, we are no longer 
> guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is 
> to iterate the queue once rather than busy waiting...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks

2006-08-09 Thread Greg Kim (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-344?page=comments#action_12427100 ] 

Greg Kim commented on NUTCH-344:


Had the correct version in my workspace; blotched the copy over to the vendor 
trunk. doh!   Thanks Jason for catching it!

Jacob, your problem should be resolved w/ the one line patch that Jason 
provided. 

> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -
>
> Key: NUTCH-344
> URL: http://issues.apache.org/jira/browse/NUTCH-344
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.8.1, 0.9.0, 0.8
> Environment: All
>Reporter: Greg Kim
> Fix For: 0.8.1, 0.9.0
>
> Attachments: cleanExpiredServerBlocks.patch, HttpBase.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies 
> to block fetcher threads while one thread busy waits... 
>   private static void cleanExpiredServerBlocks() {
> synchronized (BLOCKED_ADDR_TO_TIME) {
>   while (!BLOCKED_ADDR_QUEUE.isEmpty()) {   <= LINE 3:   
> String host = (String) BLOCKED_ADDR_QUEUE.getLast();
> long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
> if (time <= System.currentTimeMillis()) {   
>   BLOCKED_ADDR_TO_TIME.remove(host);
>   BLOCKED_ADDR_QUEUE.removeLast();
> }
>   }
> }
>   }
> LINE3:  As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the 
> thread that first enters this block busy-waits until it becomes empty while 
> all other threads block on the synchronized block.  This leads to extremely 
> poor fetcher performance.  
> Since the checkin to respect crawlDelay in robots.txt, we are no longer 
> guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is 
> to iterate the queue once rather than busy waiting...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks

2006-08-09 Thread Jacob Brunson (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-344?page=comments#action_12427096 ] 

Jacob Brunson commented on NUTCH-344:
-

I'm having problems with the patch committed in revision #429779.  I used to be 
having the "fetch aborted with X hung threads" problem.  After updating to this 
revision, fetching goes fine for a while, but then I get this error on just 
about every page fetch attempt:
2006-08-09 23:27:28,548 INFO  fetcher.Fetcher - fetching 
http://www.xmission.com/~nelsonb/resources.htm
2006-08-09 23:27:28,549 ERROR http.Http - java.lang.NullPointerException
2006-08-09 23:27:28,549 ERROR http.Http - at 
org.apache.nutch.protocol.http.api.HttpBase.cleanExpiredServerBlocks(HttpBase.java:382)
2006-08-09 23:27:28,549 ERROR http.Http - at 
org.apache.nutch.protocol.http.api.HttpBase.blockAddr(HttpBase.java:323)
2006-08-09 23:27:28,549 ERROR http.Http - at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:188)
2006-08-09 23:27:28,549 ERROR http.Http - at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:144)
2006-08-09 23:27:28,549 INFO  fetcher.Fetcher - fetch of 
http://www.xmission.com/~nelsonb/resources.htm failed with: 
java.lang.NullPointerException


> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -
>
> Key: NUTCH-344
> URL: http://issues.apache.org/jira/browse/NUTCH-344
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.8.1, 0.9.0, 0.8
> Environment: All
>Reporter: Greg Kim
> Fix For: 0.8.1, 0.9.0
>
> Attachments: cleanExpiredServerBlocks.patch, HttpBase.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies 
> to block fetcher threads while one thread busy waits... 
>   private static void cleanExpiredServerBlocks() {
> synchronized (BLOCKED_ADDR_TO_TIME) {
>   while (!BLOCKED_ADDR_QUEUE.isEmpty()) {   <= LINE 3:   
> String host = (String) BLOCKED_ADDR_QUEUE.getLast();
> long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
> if (time <= System.currentTimeMillis()) {   
>   BLOCKED_ADDR_TO_TIME.remove(host);
>   BLOCKED_ADDR_QUEUE.removeLast();
> }
>   }
> }
>   }
> LINE3:  As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the 
> thread that first enters this block busy-waits until it becomes empty while 
> all other threads block on the synchronized block.  This leads to extremely 
> poor fetcher performance.  
> Since the checkin to respect crawlDelay in robots.txt, we are no longer 
> guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is 
> to iterate the queue once rather than busy waiting...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira