RE: [squid-users] Cannot get conent from msnbc that have # in UR

2008-11-11 Thread Nicole

On 11-Nov-08 My Secret NSA Wiretap Overheard Nicole Saying  :
 
 
  Hello all
 
  I have started to receive complains from people trying to get video's from
 msnbc.com that use a # character in the URL.
 
 Such as:
 
 http://www.msnbc.msn.com/id/22425001/vp/27657223#27657223
 http://www.msnbc.msn.com/id/22425001/vp/27652443#27652443
 
 
 The access log shows that it is removing the pound sign and everything after.
 
 7 TCP_MISS:DIRECT
 9.2.2.7 - - [11/Nov/2008:09:59:30 -0800] GET
 http://www.msnbc.msn.com/id/22425001/vp/27657223 HTTP/1.1 200 477
 TCP_MISS:DIRECT
 9.2.2.7 - - [11/Nov/2008:10:00:18 -0800] GET
 http://www.msnbc.msn.com/id/22425001/vp/27652443 HTTP/1.1 200 477
 TCP_MISS:DIRECT
 
 
  I cannot see in my config why it would be truncating out the pound
 character.
 
 
  Any assistance greatly appreciated.
 
 

 On additional i forgot to include:
 This seems true for squid 2.6 and 2.7-stable5


 cache.log: 
 2008/11/11 16:33:28| Oversized chunk header on port 59375, url
http://www.msnbc.msn.com/id/3036677

 
 This seems to be true on every browser I test. Enable proxy.. will not load.
Disable proxy (on the browser) and the url loads.



   Thanks


  Nicole


 
 
 
 
 
 
 --
  |\ __ /|   (`\
  | o_o  |__  ) )   
 //  \\ 
   -  [EMAIL PROTECTED]  -  Powered by FreeBSD  -
 --
  The term daemons is a Judeo-Christian pejorative.
  Such processes will now be known as spiritual guides
   - Politicaly Correct UNIX Page


--
 |\ __ /|   (`\
 | o_o  |__  ) )   
//  \\ 
  -  [EMAIL PROTECTED]  -  Powered by FreeBSD  -
--
 The term daemons is a Judeo-Christian pejorative.
 Such processes will now be known as spiritual guides
  - Politicaly Correct UNIX Page





Re: [squid-users] Cannot get conent from msnbc that have # in UR

2008-11-11 Thread Amos Jeffries

Nicole wrote:

On 11-Nov-08 My Secret NSA Wiretap Overheard Nicole Saying  :


 Hello all

 I have started to receive complains from people trying to get video's from
msnbc.com that use a # character in the URL.

Such as:

http://www.msnbc.msn.com/id/22425001/vp/27657223#27657223
http://www.msnbc.msn.com/id/22425001/vp/27652443#27652443


The access log shows that it is removing the pound sign and everything after.

7 TCP_MISS:DIRECT
9.2.2.7 - - [11/Nov/2008:09:59:30 -0800] GET
http://www.msnbc.msn.com/id/22425001/vp/27657223 HTTP/1.1 200 477
TCP_MISS:DIRECT
9.2.2.7 - - [11/Nov/2008:10:00:18 -0800] GET
http://www.msnbc.msn.com/id/22425001/vp/27652443 HTTP/1.1 200 477
TCP_MISS:DIRECT


 I cannot see in my config why it would be truncating out the pound
character.


 Any assistance greatly appreciated.




 On additional i forgot to include:
 This seems true for squid 2.6 and 2.7-stable5


 cache.log: 
 2008/11/11 16:33:28| Oversized chunk header on port 59375, url

http://www.msnbc.msn.com/id/3036677

 
 This seems to be true on every browser I test. Enable proxy.. will not load.

Disable proxy (on the browser) and the url loads.



Ah. Bingo.
This is a combination of two problems:
 1) the msnbc stream software is sending chunked-encoded response to 
Squid when it should not be.
 2) and the hack in Squid-2 to cope with that bad behavior has a limit 
on the header size it can handle.


You might have to use the Accept-Encoding hack on them:

 # Fix broken sites by removing Accept-Encoding header
 acl broken dstdomain ...
 header_access Accept-Encoding deny broken

PS. an upgrade to 3.1 beta might be an option for you also.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE5 or 3.0.STABLE10
  Current Beta Squid 3.1.0.2