[jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2008-12-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653412#action_12653412
 ] 

Todd Lipcon commented on NUTCH-207:
---

Are both fetcher and fetcher2 supposed to be supported for the forseeable 
future? Or could I simply implement this for one of them and not have it 
integrated until the other is removed in the future?

> Bandwidth target for fetcher rather than a thread count
> ---
>
> Key: NUTCH-207
> URL: https://issues.apache.org/jira/browse/NUTCH-207
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 0.8
>Reporter: Rod Taylor
> Attachments: ratelimit.patch
>
>
> Increases or decreases the number of threads from the starting value 
> (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve 
> a target bandwidth (fetcher.threads.bandwidth).
> It seems to be able to keep within 10% of the target bandwidth even when 
> large numbers of errors are found or when a number of large pages is run 
> across.
> To achieve more accurate tracking Nutch should keep track of protocol 
> overhead as well as the volume of pages downloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2008-12-04 Thread Dennis Kubes (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653404#action_12653404
 ] 

Dennis Kubes commented on NUTCH-207:


I think this would be an interesting addition.  It would also need to be ported 
to fetcher2 as well as fetcher.  It you want to take on the task of porting it 
that would be great.  If you have any questions feel free to ask.

> Bandwidth target for fetcher rather than a thread count
> ---
>
> Key: NUTCH-207
> URL: https://issues.apache.org/jira/browse/NUTCH-207
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 0.8
>Reporter: Rod Taylor
> Attachments: ratelimit.patch
>
>
> Increases or decreases the number of threads from the starting value 
> (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve 
> a target bandwidth (fetcher.threads.bandwidth).
> It seems to be able to keep within 10% of the target bandwidth even when 
> large numbers of errors are found or when a number of large pages is run 
> across.
> To achieve more accurate tracking Nutch should keep track of protocol 
> overhead as well as the volume of pages downloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2008-12-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653368#action_12653368
 ] 

Todd Lipcon commented on NUTCH-207:
---

Any word on this JIRA? This would be a very useful feature for me - we are 
bandwidth constrained in the sense that we could easily pull a couple hundred 
mbits but don't want to go over our 95th percentile commit. I imagine others 
are in a similar situation.

Tweaking the number of fetchers gets us in the ballpark, but a feature like 
this would be far superior (since crawls often start off pulling higher than 
our commit and then slow to 60% of our commit later on)

If it's an issue of porting the patch against the current code I can take that 
on.



> Bandwidth target for fetcher rather than a thread count
> ---
>
> Key: NUTCH-207
> URL: https://issues.apache.org/jira/browse/NUTCH-207
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Affects Versions: 0.8
>Reporter: Rod Taylor
> Attachments: ratelimit.patch
>
>
> Increases or decreases the number of threads from the starting value 
> (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve 
> a target bandwidth (fetcher.threads.bandwidth).
> It seems to be able to keep within 10% of the target bandwidth even when 
> large numbers of errors are found or when a number of large pages is run 
> across.
> To achieve more accurate tracking Nutch should keep track of protocol 
> overhead as well as the volume of pages downloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2006-02-07 Thread Rod Taylor (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-207?page=comments#action_12365462 ] 

Rod Taylor commented on NUTCH-207:
--

Code was by Radu Mateescu with additional kibitzing by myself.

> Bandwidth target for fetcher rather than a thread count
> ---
>
>  Key: NUTCH-207
>  URL: http://issues.apache.org/jira/browse/NUTCH-207
>  Project: Nutch
> Type: New Feature
>   Components: fetcher
> Versions: 0.8-dev
> Reporter: Rod Taylor
>  Attachments: ratelimit.patch
>
> Increases or decreases the number of threads from the starting value 
> (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve 
> a target bandwidth (fetcher.threads.bandwidth).
> It seems to be able to keep within 10% of the target bandwidth even when 
> large numbers of errors are found or when a number of large pages is run 
> across.
> To achieve more accurate tracking Nutch should keep track of protocol 
> overhead as well as the volume of pages downloaded.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira