Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-06 Thread Matthias Jaekle
Hello Enzo,

we never developed a patch for this issue.

I believe back in 2004 and nutch 0.4 version, there was an other fetcher 
modul which was replaced in 0.5 version.

This fetcher was able to throttle bandwith, but it was also very buggy.

So the wiki description would be obsolete.

I am not familar with all the changes since version 0.7
So, it might be good, if somebody could change the wiki.

If you are interested to see, how this option was implemented, maybe you 
could find the old version in cvs.

Regards,

Matthias




Enzo Michelangeli schrieb:
  Hi Matthias,
 
  I'm writing you about the Nutch config file option
  fetcher.throttle.bandwidth , referenced by you at
  http://wiki.apache.org/nutch/FetchOptions . According to Andrzej
  Bialecki in
  the thread
  
http://www.nabble.com/Is--fetcher.throttle.bandwidth-known-to-work--t3861057.html
 

  ,
  that refers to a private patch not part of Nutch' mainline code base. Is
  that patch available from you for submission to the Nutch team?
 
  Thanks,
 
  Enzo
 
 


Enzo Michelangeli schrieb:
 - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED]
 Sent: Tuesday, June 05, 2007 4:56 PM
 
 [...]
 You can achieve a somewhat similar effect by controlling the number of 
 fetcher threads. I realize this is not as accurate as a specific 
 control mechanism, but so far it was sufficient for most users.

 If this feature is important to you, please provide a patch that 
 implements it, and we'll consider it for inclusion.
 
 I think that for the time being I'll just channel the traffic through a 
 Squid proxy, and use its delay pools feature to throttle the bandwidth 
 (and also its DNS caching, which, as I mentioned a few days ago, I also 
 need...). For Nutch, it might make sense to find the original patch. 
 I'll try to get n touch with Matthias Jaekle, who authored that wiki 
 page where fetcher.throttle.bandwidth was referenced.
 
 Thanks anyway,
 
 Enzo
 
 
 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-05 Thread Enzo Michelangeli
- Original Message - 
From: Andrzej Bialecki [EMAIL PROTECTED]
Sent: Monday, June 04, 2007 2:05 PM

 Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , 
 so I thought it was for real...

 Sorry, this page is wrong and should be corrected - some of the options 
 listed there were either a part of older version of Fetcher (and have been 
 replaced), or they were a part of a private patch (as was the case with 
 throttling).

Don't you think that throttling would be a valuable feature to retain? Is 
there anything to prevent saturation of the link to the Internet, either in 
the release 0.9 or in the current nightly builds code?

Enzo


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-05 Thread Andrzej Bialecki
Enzo Michelangeli wrote:
 - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED]
 Sent: Monday, June 04, 2007 2:05 PM
 
 Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions 
 , so I thought it was for real...

 Sorry, this page is wrong and should be corrected - some of the 
 options listed there were either a part of older version of Fetcher 
 (and have been replaced), or they were a part of a private patch (as 
 was the case with throttling).
 
 Don't you think that throttling would be a valuable feature to retain? 
 Is there anything to prevent saturation of the link to the Internet, 
 either in the release 0.9 or in the current nightly builds code?


You can achieve a somewhat similar effect by controlling the number of 
fetcher threads. I realize this is not as accurate as a specific control 
mechanism, but so far it was sufficient for most users.

If this feature is important to you, please provide a patch that 
implements it, and we'll consider it for inclusion.


-- 
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-05 Thread Enzo Michelangeli
- Original Message - 
From: Andrzej Bialecki [EMAIL PROTECTED]
Sent: Tuesday, June 05, 2007 4:56 PM

[...]
 You can achieve a somewhat similar effect by controlling the number of 
 fetcher threads. I realize this is not as accurate as a specific control 
 mechanism, but so far it was sufficient for most users.

 If this feature is important to you, please provide a patch that 
 implements it, and we'll consider it for inclusion.

I think that for the time being I'll just channel the traffic through a 
Squid proxy, and use its delay pools feature to throttle the bandwidth 
(and also its DNS caching, which, as I mentioned a few days ago, I also 
need...). For Nutch, it might make sense to find the original patch. I'll 
try to get n touch with Matthias Jaekle, who authored that wiki page where 
fetcher.throttle.bandwidth was referenced.

Thanks anyway,

Enzo
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-04 Thread Andrzej Bialecki
Enzo Michelangeli wrote:
 - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED]
 Sent: Monday, June 04, 2007 1:31 AM
 
 Enzo Michelangeli wrote:
 In my case (with Nutch 0.8), it seems not: I set it to 500, and the
 fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work 
 for
 the total bandwidth, or for each thread?

 There's nothing in the current code base to support this, neither 
 there is
 a config property with such name ... Is this perhaps a part of your local
 code base?
 
 Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , 
 so I
 thought it was for real...

Sorry, this page is wrong and should be corrected - some of the options 
listed there were either a part of older version of Fetcher (and have 
been replaced), or they were a part of a private patch (as was the case 
with throttling).


-- 
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


[Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-03 Thread Enzo Michelangeli
In my case (with Nutch 0.8), it seems not: I set it to 500, and the fetcher 
still saturates the 1.5 Mbit/s link... Is it supposed to work for the total 
bandwidth, or for each thread?

Enzo


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-03 Thread Andrzej Bialecki
Enzo Michelangeli wrote:
 In my case (with Nutch 0.8), it seems not: I set it to 500, and the 
 fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work 
 for the total bandwidth, or for each thread?

There's nothing in the current code base to support this, neither there 
is a config property with such name ... Is this perhaps a part of your 
local code base?



-- 
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-03 Thread Enzo Michelangeli
- Original Message - 
From: Andrzej Bialecki [EMAIL PROTECTED]
Sent: Monday, June 04, 2007 1:31 AM

 Enzo Michelangeli wrote:
 In my case (with Nutch 0.8), it seems not: I set it to 500, and the
 fetcher still saturates the 1.5 Mbit/s link... Is it supposed to work for
 the total bandwidth, or for each thread?

 There's nothing in the current code base to support this, neither there is
 a config property with such name ... Is this perhaps a part of your local
 code base?

Er... I saw it mentioned at http://wiki.apache.org/nutch/FetchOptions , so I
thought it was for real...

Enzo


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general