Re: [squid-users] reverse proxy filtering?
On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: I'm new to trying to use squid as a reverse proxy. I would like to filter out certain pages and if possible certain words. I installed perl so that I can use it to rebuild pages if that is possible? My squid.conf looks like so start acl all src all http_port 80 accel defaultsite=outside.com cache_peer inside parent 80 0 no-query originserver name=myAccel acl our_sites dstdomain outside.com aha, aha, .. http_access allow all eeek!! I want everyone on the outside to see the inside server minus one or two pages. Is that not what I set up? By lucky chance of some background defaults only, and assuming that the web server is highly secure on its own. If you have a small set of sites, such as those listed in our_sites then its best to be certain and use that ACL for the allow as well. http_access allow our_sites http_access deny all ... same on the cache_peer_access below. cache_peer_access myAccell all end how would I add it so that for example http://inside/protect.html is blocked? http://wiki.squid-cache.org/SquidFaq/SquidAcl so I want redirector_access? Is there an example line of this in a file I tried using url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl but I guess that requires more to use it? an acl? should acl all src all be acl all redirect all ? No to all three. The above is all line you mention trying is all thats needed. url_rewrite_access allow all but the above should be the default when a url_rewrite_program is set. so how do you tell it to use the url_rewrite_program with the inside site? Or does it use the script on all pages passing through the proxy? Is this only a rewrite on the requested url from the web browser? Ahh that might answer some of my questions before. I never tried clicking on it after implementing the rewrite script. I was only hovering over the url and seeing that it was still the same. What is making you think its not working? and what do the logs say about it? Also what is the c:/replace.pl code? === start #!c:\perl\bin\perl.exe $| = 1; $replace=a href=http://inside/login.html.*?/a; $with=no login; while ($INPUT=) { $INPUT=~s/$replace/$with/gi; print $INPUT; } === end I think I see the problem now I guess I am looking for something else besides url_rewrite maybe a full text replacement :-/ and is it possible to filter/replace certain words on the site like replace Albuquerque with Duke City for an example on all pages? No. no. no. Welcome to copyright violation hell. This was an example. I have full permission to do the real translations. I am told to remove certain links/buttons to login pages. thus I replace a herf=insidebutton/a with Currently I have a pathetic perl script that doesn't support cookies and is gong through each set of previous pages to bring up the content. I was hoping squid would greatly simplify this. I was using www::mechanize I know this isn't the best way but they just need a fast and dirty way. Ah, okay. Well the only ways squid has for doing content alteration is far too much as well for that use. (coding up an ICAP server and processing rules or a full eCAP adaptor plugin). IMO you need to kick the webapp developers to make their app do the removal under the right conditions. It would solve many more problems than having different copies of a page available with identical identifiers. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7
Re: [squid-users] reverse proxy filtering?
On Sun, Apr 19, 2009 at 12:29 AM, Jeff Sadowski jeff.sadow...@gmail.com wrote: On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: I'm new to trying to use squid as a reverse proxy. I would like to filter out certain pages and if possible certain words. I installed perl so that I can use it to rebuild pages if that is possible? My squid.conf looks like so start acl all src all http_port 80 accel defaultsite=outside.com cache_peer inside parent 80 0 no-query originserver name=myAccel acl our_sites dstdomain outside.com aha, aha, .. http_access allow all eeek!! I want everyone on the outside to see the inside server minus one or two pages. Is that not what I set up? By lucky chance of some background defaults only, and assuming that the web server is highly secure on its own. If you have a small set of sites, such as those listed in our_sites then its best to be certain and use that ACL for the allow as well. http_access allow our_sites http_access deny all ... same on the cache_peer_access below. cache_peer_access myAccell all end how would I add it so that for example http://inside/protect.html is blocked? http://wiki.squid-cache.org/SquidFaq/SquidAcl so I want redirector_access? Is there an example line of this in a file I tried using url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl but I guess that requires more to use it? an acl? should acl all src all be acl all redirect all ? No to all three. The above is all line you mention trying is all thats needed. url_rewrite_access allow all but the above should be the default when a url_rewrite_program is set. so how do you tell it to use the url_rewrite_program with the inside site? Or does it use the script on all pages passing through the proxy? Is this only a rewrite on the requested url from the web browser? Ahh that might answer some of my questions before. I never tried clicking on it after implementing the rewrite script. I was only hovering over the url and seeing that it was still the same. What is making you think its not working? and what do the logs say about it? Also what is the c:/replace.pl code? === start #!c:\perl\bin\perl.exe $| = 1; $replace=a href=http://inside/login.html.*?/a; $with=no login; while ($INPUT=) { $INPUT=~s/$replace/$with/gi; print $INPUT; } === end I think I see the problem now I guess I am looking for something else besides url_rewrite maybe a full text replacement :-/ and is it possible to filter/replace certain words on the site like replace Albuquerque with Duke City for an example on all pages? No. no. no. Welcome to copyright violation hell. This was an example. I have full permission to do the real translations. I am told to remove certain links/buttons to login pages. thus I replace a herf=insidebutton/a with Currently I have a pathetic perl script that doesn't support cookies and is gong through each set of previous pages to bring up the content. I was hoping squid would greatly simplify this. I was using www::mechanize I know this isn't the best way but they just need a fast and dirty way. Ah, okay. Well the only ways squid has for doing content alteration is far too much as well for that use. (coding up an ICAP server and processing rules or a full eCAP adaptor plugin). One more thing for the night if squid is written in C I think I can easily modify it to do what I want. The problem then becomes compiling it for windows. Can I just use cygwin? I'm thinking I can have an external program run on the page before handing it off to the web client. no? IMO you need to kick the webapp developers to make their app do the removal under the right conditions. It would solve many more problems than having different copies of a page available with identical identifiers. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7
Re: [squid-users] reverse proxy filtering?
Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: I'm new to trying to use squid as a reverse proxy. I would like to filter out certain pages and if possible certain words. I installed perl so that I can use it to rebuild pages if that is possible? My squid.conf looks like so start acl all src all http_port 80 accel defaultsite=outside.com cache_peer inside parent 80 0 no-query originserver name=myAccel acl our_sites dstdomain outside.com aha, aha, .. http_access allow all eeek!! I want everyone on the outside to see the inside server minus one or two pages. Is that not what I set up? By lucky chance of some background defaults only, and assuming that the web server is highly secure on its own. If you have a small set of sites, such as those listed in our_sites then its best to be certain and use that ACL for the allow as well. http_access allow our_sites http_access deny all ... same on the cache_peer_access below. cache_peer_access myAccell all end how would I add it so that for example http://inside/protect.html is blocked? http://wiki.squid-cache.org/SquidFaq/SquidAcl so I want redirector_access? Is there an example line of this in a file I tried using url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl but I guess that requires more to use it? an acl? should acl all src all be acl all redirect all ? No to all three. The above is all line you mention trying is all thats needed. url_rewrite_access allow all but the above should be the default when a url_rewrite_program is set. so how do you tell it to use the url_rewrite_program with the inside site? Or does it use the script on all pages passing through the proxy? It changes the request as passed on to the web server in-transit. So the client still sees what they clicked on, but gets content from the other site. Does not affect link or such on the page content. Is this only a rewrite on the requested url from the web browser? Ahh that might answer some of my questions before. I never tried clicking on it after implementing the rewrite script. I was only hovering over the url and seeing that it was still the same. What is making you think its not working? and what do the logs say about it? If you only checked the pages links, they may not change. Logs should show where the client went to and IP/name of server fetched from. Which would be the name of redirected server. Also what is the c:/replace.pl code? === start #!c:\perl\bin\perl.exe $| = 1; $replace=a href=http://inside/login.html.*?/a; $with=no login; while ($INPUT=) { $INPUT=~s/$replace/$with/gi; print $INPUT; } === end I think I see the problem now I guess I am looking for something else besides url_rewrite maybe a full text replacement :-/ thats what your code wants, not what I pointed you to using. You know I'm thinking you could get away without altering those pages, but just blocking external clients from visiting those URL. and is it possible to filter/replace certain words on the site like replace Albuquerque with Duke City for an example on all pages? No. no. no. Welcome to copyright violation hell. This was an example. I have full permission to do the real translations. I am told to remove certain links/buttons to login pages. thus I replace a herf=insidebutton/a with Currently I have a pathetic perl script that doesn't support cookies and is gong through each set of previous pages to bring up the content. I was hoping squid would greatly simplify this. I was using www::mechanize I know this isn't the best way but they just need a fast and dirty way. Ah, okay. Well the only ways squid has for doing content alteration is far too much as well for that use. (coding up an ICAP server and processing rules or a full eCAP adaptor plugin). IMO you need to kick the webapp developers to make their app do the removal under the right conditions. It would solve many more problems than having different copies of a page available with identical identifiers. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7 -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7
Re: [squid-users] reverse proxy filtering?
On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: I'm new to trying to use squid as a reverse proxy. I would like to filter out certain pages and if possible certain words. I installed perl so that I can use it to rebuild pages if that is possible? My squid.conf looks like so start acl all src all http_port 80 accel defaultsite=outside.com cache_peer inside parent 80 0 no-query originserver name=myAccel acl our_sites dstdomain outside.com aha, aha, .. http_access allow all eeek!! I want everyone on the outside to see the inside server minus one or two pages. Is that not what I set up? By lucky chance of some background defaults only, and assuming that the web server is highly secure on its own. If you have a small set of sites, such as those listed in our_sites then its best to be certain and use that ACL for the allow as well. http_access allow our_sites http_access deny all ... same on the cache_peer_access below. cache_peer_access myAccell all end how would I add it so that for example http://inside/protect.html is blocked? http://wiki.squid-cache.org/SquidFaq/SquidAcl so I want redirector_access? Is there an example line of this in a file I tried using url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl but I guess that requires more to use it? an acl? should acl all src all be acl all redirect all ? No to all three. The above is all line you mention trying is all thats needed. url_rewrite_access allow all but the above should be the default when a url_rewrite_program is set. What is making you think its not working? and what do the logs say about it? Also what is the c:/replace.pl code? and is it possible to filter/replace certain words on the site like replace Albuquerque with Duke City for an example on all pages? No. no. no. Welcome to copyright violation hell. This was an example. I have full permission to do the real translations. I am told to remove certain links/buttons to login pages. thus I replace a herf=insidebutton/a with Currently I have a pathetic perl script that doesn't support cookies and is gong through each set of previous pages to bring up the content. I was hoping squid would greatly simplify this. I was using www::mechanize I know this isn't the best way but they just need a fast and dirty way. Ah, okay. Well the only ways squid has for doing content alteration is far too much as well for that use. (coding up an ICAP server and processing rules or a full eCAP adaptor plugin). IMO you need to kick the webapp developers to make their app do the removal under the right conditions. It would solve many more problems than having different copies of a page available with identical identifiers. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7 To easy your mind about what it is for. I am helping a library to setup a way to display available books to the outside. The internal website allows you to login and check out books which they want blocked to the outside. They do not want to modify the web developers code to fit their special needs, since it is a commonly used program to the libraries. They just want me to stop people from logging in and checking out books and they don't need it to be an absolute just difficult. When they should only be allowed to check books out from inside. The people asking me to help don't get to choose the software either it is higher up where those decisions are made. Don't you love government :-D
[squid-users] Tproxy + wccp + tcp_outgoing_address
Hi All, I have configured two squid servers in tproxy+wccp mode and its working fine. I am using squid 2.7 (ctt proxy) and gre tunnel. Browsing is very slow compare than normal tproxy+bridge mode. I assume the problem is both incoming and outgoing traffic passed via eth0 (Gigabit Ethernet ). I have an idea to use eth1 interface and change the tcp_outgoing_address from eth0 ip to eth1 ip. Is it possible ?. or any other way to avoid this bottleneck Thanks in advance. Regards VIvek You are invited to Get a Free AOL Email ID. - http://webmail.aol.in
Re: [squid-users] reverse proxy filtering?
On Sun, Apr 19, 2009 at 1:18 AM, Jeff Sadowski jeff.sadow...@gmail.com wrote: On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries squ...@treenet.co.nz wrote: Jeff Sadowski wrote: I'm new to trying to use squid as a reverse proxy. I would like to filter out certain pages and if possible certain words. I installed perl so that I can use it to rebuild pages if that is possible? My squid.conf looks like so start acl all src all http_port 80 accel defaultsite=outside.com cache_peer inside parent 80 0 no-query originserver name=myAccel acl our_sites dstdomain outside.com aha, aha, .. http_access allow all eeek!! I want everyone on the outside to see the inside server minus one or two pages. Is that not what I set up? By lucky chance of some background defaults only, and assuming that the web server is highly secure on its own. If you have a small set of sites, such as those listed in our_sites then its best to be certain and use that ACL for the allow as well. http_access allow our_sites http_access deny all ... same on the cache_peer_access below. cache_peer_access myAccell all end how would I add it so that for example http://inside/protect.html is blocked? http://wiki.squid-cache.org/SquidFaq/SquidAcl so I want redirector_access? Is there an example line of this in a file I tried using url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl but I guess that requires more to use it? an acl? should acl all src all be acl all redirect all ? No to all three. The above is all line you mention trying is all thats needed. url_rewrite_access allow all but the above should be the default when a url_rewrite_program is set. What is making you think its not working? and what do the logs say about it? Also what is the c:/replace.pl code? and is it possible to filter/replace certain words on the site like replace Albuquerque with Duke City for an example on all pages? No. no. no. Welcome to copyright violation hell. This was an example. I have full permission to do the real translations. I am told to remove certain links/buttons to login pages. thus I replace a herf=insidebutton/a with Currently I have a pathetic perl script that doesn't support cookies and is gong through each set of previous pages to bring up the content. I was hoping squid would greatly simplify this. I was using www::mechanize I know this isn't the best way but they just need a fast and dirty way. Ah, okay. Well the only ways squid has for doing content alteration is far too much as well for that use. (coding up an ICAP server and processing rules or a full eCAP adaptor plugin). IMO you need to kick the webapp developers to make their app do the removal under the right conditions. It would solve many more problems than having different copies of a page available with identical identifiers. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7 To easy your mind about what it is for. I am helping a library to setup a way to display available books to the outside. The internal website allows you to login and check out books which they want blocked to the outside. They do not want to modify the web developers code to fit their special needs, since it is a commonly used program to the libraries. They just want me to stop people from logging in and checking out books and they don't need it to be an absolute just difficult. When they should only be allowed to check books out from inside. The people asking me to help don't get to choose the software either it is higher up where those decisions are made. Don't you love government :-D I think maybe this project better fits my needs :-D http://www.privoxy.org (I don't need to cache although it would have been nice) ugh I would have thought squid would have had text replacement. I'll see if privoxy works as good at replicating pages and using cookies. Hopefully it will work better than my own home brew written in perl using apache
[squid-users] Re: Tproxy + wccp + tcp_outgoing_address
sön 2009-04-19 klockan 03:52 -0400 skrev Vivek: I have configured two squid servers in tproxy+wccp mode and its working fine. I am using squid 2.7 (ctt proxy) and gre tunnel. Browsing is very slow compare than normal tproxy+bridge mode. I assume the problem is both incoming and outgoing traffic passed via eth0 (Gigabit Ethernet ). I kind of doubt you have more than 900Mbps of traffic. I have an idea to use eth1 interface and change the tcp_outgoing_address from eth0 ip to eth1 ip. Won't help. The problem is something else. Is it possible? Ofcourse, but it's not as simple as tcp_outgoing_address. . or any other way to avoid this bottleneck First step is to identify the cause to the bottleneck. 1. How is the performance if you configure the browser to use the proxy? 2. Have you verified cabling, switch negotiation etc? Regards Henrik
Re: [squid-users] reverse proxy filtering?
Hi, On Sun, 19 Apr 2009, Jeff Sadowski wrote: I am helping a library to setup a way to display available books to the outside. The internal website allows you to login and check out books which they want blocked to the outside. They do not want to modify the web developers code to fit their special needs, since it is a commonly used program to the libraries. They just want me to stop people from logging in and checking out books and they don't need it to be an absolute just difficult. When they should only be allowed to check books out from inside. I presume the login is required to do any task. It might be simplest to just block access to any URLs which process a check out and any other disallowed tasks? You could give a custom error page which says this task is not allowed to external users. I suppose it's better for users to not show buttons which they can't use, but this would be simple to implement, perform well and wouldn't require modifying html. Some people do modify content indirectly using squid's url_rewrite, including this amusing one: http://www.ex-parrot.com/~pete/upside-down-ternet.html which involves running a webserver on squid. The perl script downloads the page to squid's web directory, translates it and rewrites the url to the localhost location of the translated page. It's a bit of a hack, but it would probably work. Gavin
[squid-users] Re: Tproxy + wccp + tcp_outgoing_address
Henrik, Thanks for your reply. I will check all the things you had mention. Get you back to you if i need. Thanks again for your reply. Regards Vivek -Original Message- From: Henrik Nordstrom hen...@henriknordstrom.net To: Vivek vivek...@aol.in Cc: squid-users@squid-cache.org Sent: Sun, 19 Apr 2009 1:42 pm Subject: Re: Tproxy + wccp + tcp_outgoing_address sön 2009-04-19 klockan 03:52 -0400 skrev Vivek: I have configured two squid servers in tproxy+wccp mode and its working fine. I am using squid 2.7 (ctt proxy) and gre tunnel. Browsing is very slow compare than normal tproxy+bridge mode. I assume the problem is both incoming and outgoing=2 0traffic passed via eth0 (Gigabit Ethernet ). I kind of doubt you have more than 900Mbps of traffic. I have an idea to use eth1 interface and change the tcp_outgoing_address from eth0 ip to eth1 ip. Won't help. The problem is something else. Is it possible? Ofcourse, but it's not as simple as tcp_outgoing_address. . or any other way to avoid this bottleneck First step is to identify the cause to the bottleneck. 1. How is the performance if you configure the browser to use the proxy? 2. Have you verified cabling, switch negotiation etc? Regards Henrik You are invited to Get a Free AOL Email ID. - http://webmail.aol.in
Re: [squid-users] Allow Single IP to bypass Squid Proxy
Would you be so kind as give me an example of how to do this, I am very new to this. Thank you very much. Amos Jeffries-2 wrote: robp2175 wrote: robp2175 wrote: I want one ip on my network to bypass the squid proxy. How do I go about accomplishing this. Any help is greatly appreciated. Transparent Proxy with dansguardian and wpad Then configure your firewall to omit that IP from the interception. Configure your PAC script to send that machine DIRECT Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7 -- View this message in context: http://www.nabble.com/Allow-Single-IP-to-bypass-Squid-Proxy-tp23102257p23122028.html Sent from the Squid - Users mailing list archive at Nabble.com.
Re: [squid-users] reverse proxy filtering?
Gavin McCullagh wrote: Hi, On Sun, 19 Apr 2009, Jeff Sadowski wrote: I am helping a library to setup a way to display available books to the outside. The internal website allows you to login and check out books which they want blocked to the outside. They do not want to modify the web developers code to fit their special needs, since it is a commonly used program to the libraries. They just want me to stop people from logging in and checking out books and they don't need it to be an absolute just difficult. When they should only be allowed to check books out from inside. I presume the login is required to do any task. It might be simplest to just block access to any URLs which process a check out and any other disallowed tasks? You could give a custom error page which says this task is not allowed to external users. I suppose it's better for users to not show buttons which they can't use, but this would be simple to implement, perform well and wouldn't require modifying html. Some people do modify content indirectly using squid's url_rewrite, including this amusing one: http://www.ex-parrot.com/~pete/upside-down-ternet.html which involves running a webserver on squid. The perl script downloads the page to squid's web directory, translates it and rewrites the url to the localhost location of the translated page. It's a bit of a hack, but it would probably work. Gavin Yes thats another approach. Though the more I hear about the problem, the more I think your best solution would be to forget changing the HTML and simply block access to pages and form processors that are not publicly allowed. This is how the most of the world does it, no problems or complications either. Usually the software at back-end can be the one saying access denied. But a Squid access rule works just as well. Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7
Re: [squid-users] Allow Single IP to bypass Squid Proxy
robp2175 wrote: Would you be so kind as give me an example of how to do this, I am very new to this. Thank you very much. Can't say what to set your firewall. Take a gander at http://wiki.squid-cache.org/ConfigExamples/Intercept for several types of transparent setup and the firewall settings related. As for the PAC file, it looks something like this: function findProxyForUrl(host, url) { if(host == 192.68.0.1) return DIRECT; return PROXY fu; } Amos Jeffries-2 wrote: robp2175 wrote: robp2175 wrote: I want one ip on my network to bypass the squid proxy. How do I go about accomplishing this. Any help is greatly appreciated. Transparent Proxy with dansguardian and wpad Then configure your firewall to omit that IP from the interception. Configure your PAC script to send that machine DIRECT Amos -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7 -- Please be using Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14 Current Beta Squid 3.1.0.7
Re: [squid-users] reverse proxy filtering?
On Sun, Apr 19, 2009 at 3:09 AM, Gavin McCullagh gavin.mccull...@gcd.ie wrote: Hi, On Sun, 19 Apr 2009, Jeff Sadowski wrote: I am helping a library to setup a way to display available books to the outside. The internal website allows you to login and check out books which they want blocked to the outside. They do not want to modify the web developers code to fit their special needs, since it is a commonly used program to the libraries. They just want me to stop people from logging in and checking out books and they don't need it to be an absolute just difficult. When they should only be allowed to check books out from inside. I presume the login is required to do any task. Actually no you can browse books without login in. It might be simplest to just block access to any URLs which process a check out and any other disallowed tasks? You could give a custom error page which says this task is not allowed to external users. I suppose it's better for users to not show buttons which they can't use, but this would be simple to implement, perform well and wouldn't require modifying html. Some people do modify content indirectly using squid's url_rewrite, including this amusing one: http://www.ex-parrot.com/~pete/upside-down-ternet.html which involves running a webserver on squid. The perl script downloads the page to squid's web directory, translates it and rewrites the url to the localhost location of the translated page. It's a bit of a hack, but it would probably work. Cool thanks but I'm seriously looking at using privoxy and maybe even privoxy and squid together because it appears privoxy makes a terrible reverse proxy and would leave my proxy box open for others to download illegal content. So my current plan is to run privoxy on some random port and point the reverse proxy to that port and wala both inline editing via privoxy with a simple search replace string and no other sites except the one specified for the reverse proxy via squid. Gavin
Re: [squid-users] reverse proxy filtering?
Hi, On Sun, 19 Apr 2009, Jeff Sadowski wrote: Actually no you can browse books without login in. Why not just prevent logins then by having squid block the login processing page with a custom error page stating no logins from outside? Cool thanks but I'm seriously looking at using privoxy and maybe even privoxy and squid together because it appears privoxy makes a terrible reverse proxy and would leave my proxy box open for others to download illegal content. So my current plan is to run privoxy on some random port and point the reverse proxy to that port and wala both inline editing via privoxy with a simple search replace string and no other sites except the one specified for the reverse proxy via squid. Your call of course, but it seems like you're over-complicating life. The more links you have in the chain (squid, privoxy, ...) and the more complex your setup, the more things can go wrong over the lifetime of the system. For sure modifying the page content will be slower, but if you don't have lots of users that may not matter. Another thing to bear in mind is that upgrades to the web-based system may well break either setup -- the URLs might change so your url blocking might fail or the page content might change breaking your regular expressions. In principal, a system which only _allowed_ certain URLs and blocked all others would be more robust than blocking certain URLs, failing closed rather than open. Gavin
[squid-users] is there a squid cache rank value available for statistics?
Hi, I'm wondering about ways to measure the optimum size for a cache, in terms of the value you gain from each GB of cache space. If you've got a 400GB cache and only 99% of your hits come from the first 350GB, there's probably no point looking for a larger cache. If only 80% come from the first 350GB, then a bigger cache might well be useful. I realise there are rules of thumb for cache size, it would be interesting to be able to analyse a particular squid installation. Squid obviously removes objects from its cache based on the chosen cache_replacement_policy. It appears from the comments in squid.conf that in the case of the LRU policy, this is implemented as a list, presumably a queue of pointers to objects in the cache. Objects which come to the head of the queue are presumably next for removal. I guess if an object in the cache gets used it goes back to the tail of the queue. I suppose this process must involve linearly traversing the queue to find the object and remove it, which is presumably why heap-based policies are available. I wonder if it would be feasible to calculate a cache rank, which indicates the position an object was within the queue at the time of the hit. So, perhaps 0 means at the tail of the queue, 1 means at the head. If this could be reported alongside each hit in the access.log, one could draw stats on the amount of hits served by each portion of the queue and therefore determine the value of expanding or contracting your cache. In the case of simple LRU, if the queue must be traversed to find each element and requeue it (perhaps this isn't the case?), I suppose one could count the position in the queue and divide by the total length. With a heap, things are more complex. I guess you could give an indication of the depth in the heap but there would be so many objects on the lowest levels, I don't suppose this would be a great guide. Is there some better value available, such as the key used in the heap maybe? Or perhaps the whole idea is flawed somehow? Comments, criticisms, explanations, rebukes all welcome. Gavin
Re: [squid-users] is there a squid cache rank value available for statistics?
On Sun, 19 Apr 2009, Gavin McCullagh wrote: In the case of simple LRU, if the queue must be traversed to find each element and requeue it (perhaps this isn't the case?), On reflection, I presume this is not the case. I imagine the struct in ram for each cache object must include a pointer to prev and next. That makes me wonder how heap lru improves matters. I guess I need to go and read the paper referenced in the notes :-) Gavin
Re: [squid-users] is there a squid cache rank value available for statistics?
Hi, I'm wondering about ways to measure the optimum size for a cache, in terms of the value you gain from each GB of cache space. If you've got a 400GB cache and only 99% of your hits come from the first 350GB, there's probably no point looking for a larger cache. If only 80% come from the first 350GB, then a bigger cache might well be useful. Squid suffers from a little bit of an anochronism in the way it stores object. the classic ufs systems essentially use round-robin and hash to determine storage location for each object separately. This works wonders on ensuring no clashes, but not so good for retrieval optimization. Adrian Chadd has done a lot of study and some work in this area particularly for Squid-2.6/2.7. His paper for FreeBSD conference is a good read on how disk storage relates to Squid. http://www.squid-cache.org/~adrian/talks/20081007%20-%20NYCBSDCON%20-%20Disk%20IO.pdf I realise there are rules of thumb for cache size, it would be interesting to be able to analyse a particular squid installation. Feel free. We would be interested in any improvements you can come up with. Squid obviously removes objects from its cache based on the chosen cache_replacement_policy. It appears from the comments in squid.conf that in the case of the LRU policy, this is implemented as a list, presumably a queue of pointers to objects in the cache. Objects which come to the head of the queue are presumably next for removal. I guess if an object in the cache gets used it goes back to the tail of the queue. I suppose this process must involve linearly traversing the queue to find the object and remove it, which is presumably why heap-based policies are available. IIRC there is a doubly-linked list with tail pointer for LRU. I wonder if it would be feasible to calculate a cache rank, which indicates the position an object was within the queue at the time of the hit. So, perhaps 0 means at the tail of the queue, 1 means at the head. If this could be reported alongside each hit in the access.log, one could draw stats on the amount of hits served by each portion of the queue and therefore determine the value of expanding or contracting your cache. In the case of simple LRU, if the queue must be traversed to find each element and requeue it (perhaps this isn't the case?), I suppose one could count the position in the queue and divide by the total length. Yes, same big problems with that in LRU as displaying all objects in the cache ( 1 million is not uncommon cache sizes) and regex purges. With a heap, things are more complex. I guess you could give an indication of the depth in the heap but there would be so many objects on the lowest levels, I don't suppose this would be a great guide. Is there some better value available, such as the key used in the heap maybe? There is fileno or hashed value rather than URL. You still have the same issues of traversal though. Or perhaps the whole idea is flawed somehow? Comments, criticisms, explanations, rebukes all welcome. Gavin If you want to investigate. I'll gently nudge you towards Squid-3 where the rest of the development is going on and improvements have the best chance of survival. For further discussion you may want to bring this up in squid-dev where the developers hang out. Amos
[squid-users] squid 3.0STABLE14: Detected DEAD Parent [proxy]
Since v3.0 doesn't have IPv6 support, I set up the Polipo proxy (http://www.pps.jussieu.fr/~jch/software/polipo/) as a 6to4 relay in front of squid, by adding cache_peer 127.0.0.1 parent 8123 3130 to squid.conf. I was able to browse multiple IPv6 sites such as ipv6.googe.com with 3.0.STABLE14-20090416 as the proxy in Firefox, for quite a while, before squid's cache.log spit out: 2009/04/19 20:10:20| Ready to serve requests. 2009/04/19 20:12:09| WARNING: Probable misconfigured neighbor at 127.0.0.1 2009/04/19 20:12:09| WARNING: 143 of the last 150 ICP replies are DENIED 2009/04/19 20:12:09| WARNING: No replies will be sent for the next 3600 seconds 2009/04/19 20:12:31| Detected DEAD Parent: 127.0.0.1 2009/04/19 20:12:52| ipcacheParse: No Address records in response to 'www.ipv6.sixxs.net' After building 3.0.STABLE14-20090419, I find the above issue surfaced much faster than with STABLE14-20090416. Since 20090416 is no longer downlable, I built STABLE14 dated 20090411. It lasted about as long as 20090416, but spit out slightly different logs: 2009/04/19 20:57:36| Ready to serve requests. 2009/04/19 20:59:01| ipcacheParse: No Address records in response to 'ipv6.google.com' 2009/04/19 21:00:01| ipcacheParse: No Address records in response to 'ipv6.google.com' 2009/04/19 21:00:30| ipcacheParse: No Address records in response to 'www.ipv6.sixxs.net' 2009/04/19 21:00:39| 95%% of replies from '127.0.0.1' are UDP_DENIED Could this issue be mitigated from the squid end, so it keeps talking to Polipo (which appears to remain functional)? -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
[squid-users] Re: squid 3.0STABLE14: Detected DEAD Parent [proxy]
I've made one change in squid.conf that seems to cure most of the problem, with STABLE14-20090411: cache_peer 127.0.0.1 parent 8123 0 no-query no-digest no-netdb-exchange default Interestingly, I was searching about tunnelReadServer: FD 70: read failure: (0) Unknown error: 0 when I found the above options on a Russian forum. However, it appears URLs with a question mar (e.g. http://www.ipv6.sixxs.net/forum/?msg=general or http://ipv6.google.com/advanced_search?hl=en) still causes the same DNS lookup failure: squid either spits out ipcacheParse: No Address records in response to 'www.ipv6.sixxs.net' in cache.log, or the brower gets Unable to determine IP address from host name www.ipv6.sixxs.net, but rarely both. Is this because of hierarchy_stoplist cgi-bin ?? If so, how do I work around it, given my setup? Original-Nachricht Since v3.0 doesn't have IPv6 support, I set up the Polipo proxy (http://www.pps.jussieu.fr/~jch/software/polipo/) as a 6to4 relay in front of squid, by adding cache_peer 127.0.0.1 parent 8123 3130 to squid.conf. I was able to browse multiple IPv6 sites such as ipv6.googe.com with 3.0.STABLE14-20090416 as the proxy in Firefox, for quite a while, before squid's cache.log spit out: 2009/04/19 20:10:20| Ready to serve requests. 2009/04/19 20:12:09| WARNING: Probable misconfigured neighbor at 127.0.0.1 2009/04/19 20:12:09| WARNING: 143 of the last 150 ICP replies are DENIED 2009/04/19 20:12:09| WARNING: No replies will be sent for the next 3600 seconds 2009/04/19 20:12:31| Detected DEAD Parent: 127.0.0.1 2009/04/19 20:12:52| ipcacheParse: No Address records in response to 'www.ipv6.sixxs.net' After building 3.0.STABLE14-20090419, I find the above issue surfaced much faster than with STABLE14-20090416. Since 20090416 is no longer downlable, I built STABLE14 dated 20090411. It lasted about as long as 20090416, but spit out slightly different logs: 2009/04/19 20:57:36| Ready to serve requests. 2009/04/19 20:59:01| ipcacheParse: No Address records in response to 'ipv6.google.com' 2009/04/19 21:00:01| ipcacheParse: No Address records in response to 'ipv6.google.com' 2009/04/19 21:00:30| ipcacheParse: No Address records in response to 'www.ipv6.sixxs.net' 2009/04/19 21:00:39| 95%% of replies from '127.0.0.1' are UDP_DENIED Could this issue be mitigated from the squid end, so it keeps talking to Polipo (which appears to remain functional)? -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
[squid-users] Headers control in Squid 3.0
Hi. Before Squid 3.0 I can change a Proxy-Authenticate header through duplet: header_access Proxy-Authenticate deny browserFirefox osLinux header_replace Proxy-Authenticate Negotiate That because first authenticate method is NTLM for IE. After upgrade to Squid 3.0, header_access directive fork into request_header_access and reply_header_access. Implicate in my case I change directive header_access to reply_header_access. BUT! Now directive header_replace works only with request_header_access and don't change a Proxy-Authenticate headers. How to resolve this problem without downgrade to Squid 2.7.6? Or may be bypass this another way? Oleg.