[squid-users] Squid with PHP & Apache

2013-11-25 Thread Ghassan Gharabli
 Hi,

I have built a PHP script to cache HTTP 1.X 206 Partial Content like
"WindowsUpdates" & Allow seeking through Youtube & many websites .

I am willing to move from PHP to C++ hopefully after a while.

The script is almost finished , but I have several question, I have no
idea if I should always grab the HTTP Response Headers and send them
back to the borwsers.

1) Does Squid still grab the "HTTP Response Headers", even if the
object is already in cache or Squid has already a cached copy of the
HTTP Response header . If Squid caches HTTP Response Headers then how
do you deal with HTTP CODE 302 if the object is already cached . I am
asking this question because I have already seen most websites use
same extensions such as .FLV including Location Header.

2) Do you also use mime.conf to send the Content-Type to the browser
in case of FTP/HTTP or only FTP ?

3) Does squid compare the length of the local cached copy with the
remote file if you already have the object file or you use
refresh_pattern?.

4) What happens if the user modies a refresh_pattern to cache an
object, for example .xml which does not have [Content-Length] header.
Do you still save it, or would you search for the ignore-headers used
to force caching the object and what happens if the cached copy
expires , do you still refresh the copy even if there is no
Content-Length header?.

I am really confused with this issue , because I am always getting a
headers list from the internet and I send them back to the browser
(using PHP and Apache) even if the object is in cache.

Your help and answers will be much appreciated

Thank you

Ghassan


Re: [squid-users] Squid with PHP & Apache

2013-11-25 Thread Amos Jeffries
On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote:
>  Hi,
> 
> I have built a PHP script to cache HTTP 1.X 206 Partial Content like
> "WindowsUpdates" & Allow seeking through Youtube & many websites .
> 

Ah. So you have written your own HTTP caching proxy in PHP. Well done.
Did you read RFC 2616 several times? your script is expected to to obey
all the MUST conditions and clauses in there discussing "proxy" or "cache".



NOTE: the easy way to do this is to upgrade your Squid to the current
series and use ACLs on the range_offset_limit directive. That way Squid
will convert Range requests to normal fetch requests and cache the
object before sending the requested pieces of it back to the client.
http://www.squid-cache.org/Doc/config/range_offset_limit/


> I am willing to move from PHP to C++ hopefully after a while.
> 
> The script is almost finished , but I have several question, I have no
> idea if I should always grab the HTTP Response Headers and send them
> back to the borwsers.

The response headers you get when receiving the object are meta data
describing that object AND the transaction used to fetch it AND the
network conditions/pathway used to fetch it. The cachs job is to store
those along with the object itself and deliver only the relevant headers
when delivering a HIT.

> 
> 1) Does Squid still grab the "HTTP Response Headers", even if the
> object is already in cache or Squid has already a cached copy of the
> HTTP Response header . If Squid caches HTTP Response Headers then how
> do you deal with HTTP CODE 302 if the object is already cached . I am
> asking this question because I have already seen most websites use
> same extensions such as .FLV including Location Header.

Yes. All proxies on the path are expected to relay the end-to-end
headers, drop the hop-by-hop headers, and MUST update/generate the
feature negotiation and state information headers to match its
capabilities in each direction.


> 
> 2) Do you also use mime.conf to send the Content-Type to the browser
> in case of FTP/HTTP or only FTP ?

Only FTP and Gopher *if* Squid is translating from the native FTP/Gopher
connection to HTTP. HTTP and protocols relayed using HTTP message format
are expected to supply the correct header.

> 
> 3) Does squid compare the length of the local cached copy with the
> remote file if you already have the object file or you use
> refresh_pattern?.

Content-Length is a declaration of how many payload bytes are following
the response headers. It has no relation to the servers object except in
the special case where the entire object is being delivered as payload
without any encoding.


> 
> 4) What happens if the user modies a refresh_pattern to cache an
> object, for example .xml which does not have [Content-Length] header.
> Do you still save it, or would you search for the ignore-headers used
> to force caching the object and what happens if the cached copy
> expires , do you still refresh the copy even if there is no
> Content-Length header?.

refresh_pattern does not cause caching of any objects. What it does is
tell Squid how long an object is valid for before it needs to be
revalidated or replaced. In some situations this can affect caching
decision, in most it only affects expiry.


Objects without content-length are handled differently by HTTP/1.0 and
HTTP/1.1 software.

When either end of the connection is advertising HTTP/1.0 the sending
software is expected to terminate the TCP connection on completion of
the payload block.

When both ends advertise HTTP/1.1 the sending software is expected to
use Transfer-Encoding:chunked in order to keep the connection alive
unless the client sent Connection:close.
 Doing the HTTP/1.0 behaviour is also acceptible if both ends are
HTTP/1.1, but causes a performance loss due to churn and setup costs of TCP.




> 
> I am really confused with this issue , because I am always getting a
> headers list from the internet and I send them back to the browser
> (using PHP and Apache) even if the object is in cache.

I am really confused about what you are describing here. You should only
get a headers list from the upstream server if you have contacted one.


You say the script is sending to the browser. This is not true at the
HTTP transaction level. The script sends to Apache, Apache sends to
whichever software requested from it.

What is the order you chained the Browser, Apache and Squid ?

  Browser -> Squid -> Apache -> Script -> Origin server
or,
  Browser -> Apache -> Script -> Squid -> Origin server


Amos


Re: [squid-users] Squid with PHP & Apache

2013-11-26 Thread Ghassan Gharabli
On Tue, Nov 26, 2013 at 5:30 AM, Amos Jeffries  wrote:
> On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote:
>>  Hi,
>>
>> I have built a PHP script to cache HTTP 1.X 206 Partial Content like
>> "WindowsUpdates" & Allow seeking through Youtube & many websites .
>>
>
> Ah. So you have written your own HTTP caching proxy in PHP. Well done.
> Did you read RFC 2616 several times? your script is expected to to obey
> all the MUST conditions and clauses in there discussing "proxy" or "cache".
>

Yes , I have read it and I will read it again , but the reason i am
building such a script is because internet here in Lebanon is really
expensive and scarce.

As you know Youtube is sending dynamic chunks for each video . For
example , if you watch a video on Youtube more than 10 times , then
Squid fill up the cache with more than 90 chunks per video , that is
why allowing to seek at any position of the video using my script
would save me the headache .

>
>
> NOTE: the easy way to do this is to upgrade your Squid to the current
> series and use ACLs on the range_offset_limit directive. That way Squid
> will convert Range requests to normal fetch requests and cache the
> object before sending the requested pieces of it back to the client.
> http://www.squid-cache.org/Doc/config/range_offset_limit/
>
>

I have successfully supported HTTP/206, if the object is cached and my
target is to enable Range headers, as I can see that iPhones or Google
Chrome check if the server has a header Accept-Ranges: Bytes then they
send a request bytes=x-y or multiple bytes like bytes=x-y,x-y .

>> I am willing to move from PHP to C++ hopefully after a while.
>>
>> The script is almost finished , but I have several question, I have no
>> idea if I should always grab the HTTP Response Headers and send them
>> back to the borwsers.
>
> The response headers you get when receiving the object are meta data
> describing that object AND the transaction used to fetch it AND the
> network conditions/pathway used to fetch it. The cachs job is to store
> those along with the object itself and deliver only the relevant headers
> when delivering a HIT.
>
>>
>> 1) Does Squid still grab the "HTTP Response Headers", even if the
>> object is already in cache or Squid has already a cached copy of the
>> HTTP Response header . If Squid caches HTTP Response Headers then how
>> do you deal with HTTP CODE 302 if the object is already cached . I am
>> asking this question because I have already seen most websites use
>> same extensions such as .FLV including Location Header.
>
> Yes. All proxies on the path are expected to relay the end-to-end
> headers, drop the hop-by-hop headers, and MUST update/generate the
> feature negotiation and state information headers to match its
> capabilities in each direction.
>
>

Do you mean by Yes , for grabbing the Http Response Headers even if
the object is already in cache, so therefore latency of network is
always added even if MISS or HIT situation?. I have tested Squid and I
have noticed that reading HIT objects from Squid takes about 0.x ms,
which I believe objects are always offline until expiry occurs.Right?

Till now I am using $http_response_headers as it is the fastest method
by far , but I still have an issue with latency as for each request
the function takes about 0.30s, which is really high, even if my
network latency is 100~150 ms. That is why I have thought that I could
possibly grab the HTTP Response Headers for the first time and store
them, so if the URI was called for a second time, then I would send
them the cached Headers instead of grabbing them again , to eliminate
the network latency. But I still have an issue ... How am i going to
know if the website sends HTTP/302 (because some websites send
HTTP/302 for the same requested file name ), if I am not grabbing the
header again in a HIT situation just to improve the latency. Second
issue is Saving headers of CDN.



>>
>> 2) Do you also use mime.conf to send the Content-Type to the browser
>> in case of FTP/HTTP or only FTP ?
>
> Only FTP and Gopher *if* Squid is translating from the native FTP/Gopher
> connection to HTTP. HTTP and protocols relayed using HTTP message format
> are expected to supply the correct header.
>
>>
>> 3) Does squid compare the length of the local cached copy with the
>> remote file if you already have the object file or you use
>> refresh_pattern?.
>
> Content-Length is a declaration of how many payload bytes are following
> the response headers. It has no relation to the servers object except in
> the special case where the entire object is being delivered as payload
> without any encoding.
>
>

I am only caching objects that have "Content-Length" header, if the
size was greater than 0 and I have noticed that there are some files
like XML , CSS , JS, which I believe I should save, but do you think I
must follow if-modified header to see if there is a fresh copy?.


>>
>> 4) What happens if the user modies a refresh_pattern to cach

Re: [squid-users] Squid with PHP & Apache

2013-11-26 Thread Eliezer Croitoru

Hey Ghassan,

Moving from PHP to C++ is a nice idea.
I do not know the size of the cache or it's limits but couple things to 
consider while implementing the cache:

* clients latency
* server overload
* total cost
* efficiency of the cache

Bandwidth can cost lots of money in some cases and which some are 
willing to pay for.
Youtube by itself is a beast since the number of visits per video might 
not be worth all the efforts that are being invested only in one video 
file\chunk.


Specifically on youtube you need to grab the response headers and in 
some cases even filter couple of them.
If you are caching and you are 99.5% sure that this "chunk" or "file" is 
ok as it is and as an object the headers can be considered as a side 
effect but in some cases are important.
A compromise between Response Headers from a file to "from source" is 
that in a case that the headers "file" or container is deleted to fetch 
new ones or in a case the expiration headers are "out-of-date" then 
fetch new Headers\object.


The main issue with 302 is the concept behind it.
I have seen that in the past the usage of 302 was in order to give 
enough time for the upstream proxy\cdn node to fetch more data but in 
some cases it was a honest redirection towards the best origin server.


In a case you know that uses 302 responses handle them by the site 
rather then in a Global way.


The Content-Type is used from the origin server headers since this is 
probably what the client application expects.
On a web-server you would see that by the file extension the 
Content-Type can be decided but this is not how squid handles http 
requests at all.


Squid algorithm are pretty simple while considering the basic "shape" of 
the object from the headers.


It is indeed an overhead to fetch from the web couple headers and there 
are some cases which it can be avoided but a re-validation of the 
integrity of the object\file is kind of important.


Back to the beginning of the Email:
If you do "know" that the object as it is now will not be changed for 
example as the owner of the web-service you can even serve the client 
"stale" content.


There is no force in the world that limits you to do that.

I can say that for example for youtube I was thinking about using 
another approach which would "rank" videos and will consider removing 
videos that was used once or twice per two weeks(which is depends on the 
size of the storage and load).


If you do have a strong server that can run PHP you can try to take for 
a spin squid with StoreID that can help you to use only squid for 
youtube video caching.


The only thing you will need to take care off is 302 response with an 
ICAP service for example.


I do know how tempting it is to use PHP and it can be in many cases 
better for a network to use another solution then only squid.


I do not know if you have seen this article:
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/Coordinator

The article shows couple aspect of youtube caching.

There was some PHP code at:
http://code.google.com/p/yt-cache/

Which I have seen long time ago.(2011-12)

StoreID is at the 3.4 branch of squid and is still on the Beta stage:
http://wiki.squid-cache.org/Features/StoreID

StoreID code by itself is very well tested and I am using it on a daily 
basis not even once restarting\reloading my local server for a very long 
time.
I have not heard about a very big production environment(clustered) 
reports in my email yet.


The basic idea of StoreID is to take the current existing internals of 
squid and to "unleash" them in a way that they can be exploited\used by 
external helper.


StoreID is not here to replace the PHP or any other methods that might 
fit any network, it comes to allow the admin and see the power of squid 
caching even in this "dead-end" case which requires acrobatics.


You can try to just test it in a small testing environment and to see if 
it fits to you.


One of the benefits that Apache+PHP has is the "Threading" which allows 
one service such as apache to utilize as much horse power as the machine 
has as a "metal".
Since squid is already there the whole internal traffic between the 
apache and squid can be "spared" while using StoreID.


Note that fetching the headers *only* from the origin server can still 
help you to decide if you want to fetch the whole object from it.
A fetch of a whole headers set which will not exceed 1KB is worth for 
even a 200KB file size in many cases.


I have tried to not miss somethings but I do not want to write a whole 
Scroll about yet so if there is more interest in it I will add more later.


Regards,
Eliezer

On 25/11/13 23:13, Ghassan Gharabli wrote:

  Hi,

I have built a PHP script to cache HTTP 1.X 206 Partial Content like
"WindowsUpdates" & Allow seeking through Youtube & many websites .

I am willing to move from PHP to C++ hopefully after a while.

The script is almost finished , but I have several question, I have no
idea if I should

Re: [squid-users] Squid with PHP & Apache

2013-11-27 Thread Amos Jeffries
On 27/11/2013 5:30 p.m., Ghassan Gharabli wrote:
> On Tue, Nov 26, 2013 at 5:30 AM, Amos Jeffries wrote:
>> On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote:
>>>  Hi,
>>>
>>> I have built a PHP script to cache HTTP 1.X 206 Partial Content like
>>> "WindowsUpdates" & Allow seeking through Youtube & many websites .
>>>
>>
>> Ah. So you have written your own HTTP caching proxy in PHP. Well done.
>> Did you read RFC 2616 several times? your script is expected to to obey
>> all the MUST conditions and clauses in there discussing "proxy" or "cache".
>>
> 
> Yes , I have read it and I will read it again , but the reason i am
> building such a script is because internet here in Lebanon is really
> expensive and scarce.
> 
> As you know Youtube is sending dynamic chunks for each video . For
> example , if you watch a video on Youtube more than 10 times , then
> Squid fill up the cache with more than 90 chunks per video , that is
> why allowing to seek at any position of the video using my script
> would save me the headache .
> 

Youtube is a special case. They do not strictly use Range requests for
the video seeking. If you are getting that lucky you.
They are also multiplexing videos via multiple URLs.


>>
>> NOTE: the easy way to do this is to upgrade your Squid to the current
>> series and use ACLs on the range_offset_limit directive. That way Squid
>> will convert Range requests to normal fetch requests and cache the
>> object before sending the requested pieces of it back to the client.
>> http://www.squid-cache.org/Doc/config/range_offset_limit/
>>
>>
> 
> I have successfully supported HTTP/206, if the object is cached and my
> target is to enable Range headers, as I can see that iPhones or Google
> Chrome check if the server has a header Accept-Ranges: Bytes then they
> send a request bytes=x-y or multiple bytes like bytes=x-y,x-y .
> 

Yes that is how Ranges requests and responses work.

What I meant was that Squid already contained a feature to selectively
cause the entire object to cache so it could generate the 206 response
for clients.


>>> I am willing to move from PHP to C++ hopefully after a while.
>>>
>>> The script is almost finished , but I have several question, I have no
>>> idea if I should always grab the HTTP Response Headers and send them
>>> back to the borwsers.
>>
>> The response headers you get when receiving the object are meta data
>> describing that object AND the transaction used to fetch it AND the
>> network conditions/pathway used to fetch it. The cachs job is to store
>> those along with the object itself and deliver only the relevant headers
>> when delivering a HIT.
>>
>>>
>>> 1) Does Squid still grab the "HTTP Response Headers", even if the
>>> object is already in cache or Squid has already a cached copy of the
>>> HTTP Response header . If Squid caches HTTP Response Headers then how
>>> do you deal with HTTP CODE 302 if the object is already cached . I am
>>> asking this question because I have already seen most websites use
>>> same extensions such as .FLV including Location Header.
>>
>> Yes. All proxies on the path are expected to relay the end-to-end
>> headers, drop the hop-by-hop headers, and MUST update/generate the
>> feature negotiation and state information headers to match its
>> capabilities in each direction.
>>
>>
> 
> Do you mean by Yes , for grabbing the Http Response Headers even if
> the object is already in cache, so therefore latency of network is
> always added even if MISS or HIT situation?

No. I mean the headers received along with the object need to be stored
with it and sent on HITs.
I see many people thinking they can just store the object by itself same
as a webs server stores it. But that way looses the vital header
information.

> I have tested Squid and I
> have noticed that reading HIT objects from Squid takes about 0.x ms,
> which I believe objects are always offline until expiry occurs.Right?
> 
> Till now I am using $http_response_headers as it is the fastest method
> by far , but I still have an issue with latency as for each request
> the function takes about 0.30s, which is really high, even if my
> network latency is 100~150 ms. That is why I have thought that I could
> possibly grab the HTTP Response Headers for the first time and store
> them, so if the URI was called for a second time, then I would send
> them the cached Headers instead of grabbing them again 

This is the way you MUST do it. To retain Last-Modified, Age, Date, ETag
and other critical headers.
 Network latency reduction is just a useful side effect.

> , to eliminate
> the network latency. But I still have an issue ... How am i going to
> know if the website sends HTTP/302 (because some websites send
> HTTP/302 for the same requested file name ), if I am not grabbing the
> header again in a HIT situation just to improve the latency. Second
> issue is Saving headers of CDN.


In HTTP the 302 response is an "object" to be cached same as 200 when it
contains Cache-Co

Re: [squid-users] Squid with PHP & Apache

2013-11-28 Thread Ghassan Gharabli
On Wed, Nov 27, 2013 at 7:44 AM, Eliezer Croitoru  wrote:
> Hey Ghassan,
>
> Moving from PHP to C++ is a nice idea.
> I do not know the size of the cache or it's limits but couple things to
> consider while implementing the cache:
> * clients latency
> * server overload
> * total cost
> * efficiency of the cache
>
> Bandwidth can cost lots of money in some cases and which some are willing to
> pay for.
> Youtube by itself is a beast since the number of visits per video might not
> be worth all the efforts that are being invested only in one video
> file\chunk.
>
> Specifically on youtube you need to grab the response headers and in some
> cases even filter couple of them.
> If you are caching and you are 99.5% sure that this "chunk" or "file" is ok
> as it is and as an object the headers can be considered as a side effect but
> in some cases are important.
> A compromise between Response Headers from a file to "from source" is that
> in a case that the headers "file" or container is deleted to fetch new ones
> or in a case the expiration headers are "out-of-date" then fetch new
> Headers\object.
>

Actually , this is how I do it ...

Thanks to Amos again , now I am able to save response_headers with the
object in-case of static URL.

 Youtube sends dynamic chunks if (HTML-5 Player) or (Flash Player) -->
First Function checks if mime argument is set, then I send different
headers to the browser to enable HTML-5 playback (that's what i was
investigating and it is working as i wanted), which should be
something like these headers :

header("Access-Control-Allow-Origin: http://www.youtube.com";);
header("Access-Control-Allow-Credentials: true");
header("Timing-Allow-Origin: http://www.youtube.com";);

And for normal playback such as; Flash Player, then I send normal
headers . I am able to disable HTML-5 as a test by removing the
user-agent, thus forcing Youtube to always use Flash player . HTML-5
Player doesn't like latency.

If Youtube send any chunk size , the script seeks and send a chunk
from our local saved video. Note that I don't save chunks. I only
serve/stream chunks, which I am very happy using , because If I am
going to save chunks, then I wont be always hitting cache and I am
very sure that some videos aren't cacheable.

Regarding Other sites with FLV , MP4 videos .. I am also allowing to
seek any video, even if the video isn't loaded. I already tried to
cache videos using Squid & Perl Re-writer Script , but if you try to
seek at any position of the video that has not been loaded yet, then
the video starts again from the beginning of any position.

I only follow arguments like Filename.FLV?Start= (Start Offset) and others .



> The main issue with 302 is the concept behind it.
> I have seen that in the past the usage of 302 was in order to give enough
> time for the upstream proxy\cdn node to fetch more data but in some cases it
> was a honest redirection towards the best origin server.
>
> In a case you know that uses 302 responses handle them by the site rather
> then in a Global way.
>
> The Content-Type is used from the origin server headers since this is
> probably what the client application expects.
> On a web-server you would see that by the file extension the Content-Type
> can be decided but this is not how squid handles http requests at all.
>
> Squid algorithm are pretty simple while considering the basic "shape" of the
> object from the headers.
>
> It is indeed an overhead to fetch from the web couple headers and there are
> some cases which it can be avoided but a re-validation of the integrity of
> the object\file is kind of important.
>
> Back to the beginning of the Email:
> If you do "know" that the object as it is now will not be changed for
> example as the owner of the web-service you can even serve the client
> "stale" content.
>
> There is no force in the world that limits you to do that.
>

Yes , you are right . I have optimized the script to have better
latency, which is now playing between 0.20 s ~ 0.30 s before saving
response_headers and after saving response headers .. execution time
came to 0 seconds . I am going to see the values in milliseconds and
optimize it again. But this progressed quite well.

I am only saving static response headers, as I was wondering how squid
deals with dynamic URLs .. If the URL changes every time you refresh
the page, then we are going to save Headers each time, which is not
such a good idea, so I made a function to detect CDN Content using
regex and based on the CDN content, I then save object file and header
using the same method.

> I can say that for example for youtube I was thinking about using another
> approach which would "rank" videos and will consider removing videos that
> was used once or twice per two weeks(which is depends on the size of the
> storage and load).
>
> If you do have a strong server that can run PHP you can try to take for a
> spin squid with StoreID that can help you to use only squi

Re: [squid-users] Squid with PHP & Apache

2013-11-28 Thread Ghassan Gharabli
On Wed, Nov 27, 2013 at 1:28 PM, Amos Jeffries  wrote:
> On 27/11/2013 5:30 p.m., Ghassan Gharabli wrote:
>> On Tue, Nov 26, 2013 at 5:30 AM, Amos Jeffries wrote:
>>> On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote:
  Hi,

 I have built a PHP script to cache HTTP 1.X 206 Partial Content like
 "WindowsUpdates" & Allow seeking through Youtube & many websites .

>>>
>>> Ah. So you have written your own HTTP caching proxy in PHP. Well done.
>>> Did you read RFC 2616 several times? your script is expected to to obey
>>> all the MUST conditions and clauses in there discussing "proxy" or "cache".
>>>
>>
>> Yes , I have read it and I will read it again , but the reason i am
>> building such a script is because internet here in Lebanon is really
>> expensive and scarce.
>>
>> As you know Youtube is sending dynamic chunks for each video . For
>> example , if you watch a video on Youtube more than 10 times , then
>> Squid fill up the cache with more than 90 chunks per video , that is
>> why allowing to seek at any position of the video using my script
>> would save me the headache .
>>
>
> Youtube is a special case. They do not strictly use Range requests for
> the video seeking. If you are getting that lucky you.
> They are also multiplexing videos via multiple URLs.
>

Hi Amos,

Youtube application is mostly using range requests on iPhone or
Android , but range argument on Browsers if and only if Browsers have
Flash player 11 installed. Youtube sends full length if Flash player
10 was installed. That was my investigation regarding Youtube.

>
>>>
>>> NOTE: the easy way to do this is to upgrade your Squid to the current
>>> series and use ACLs on the range_offset_limit directive. That way Squid
>>> will convert Range requests to normal fetch requests and cache the
>>> object before sending the requested pieces of it back to the client.
>>> http://www.squid-cache.org/Doc/config/range_offset_limit/
>>>
>>>
>>
>> I have successfully supported HTTP/206, if the object is cached and my
>> target is to enable Range headers, as I can see that iPhones or Google
>> Chrome check if the server has a header Accept-Ranges: Bytes then they
>> send a request bytes=x-y or multiple bytes like bytes=x-y,x-y .
>>
>
> Yes that is how Ranges requests and responses work.
>
> What I meant was that Squid already contained a feature to selectively
> cause the entire object to cache so it could generate the 206 response
> for clients.
>
>
 I am willing to move from PHP to C++ hopefully after a while.

 The script is almost finished , but I have several question, I have no
 idea if I should always grab the HTTP Response Headers and send them
 back to the borwsers.
>>>
>>> The response headers you get when receiving the object are meta data
>>> describing that object AND the transaction used to fetch it AND the
>>> network conditions/pathway used to fetch it. The cachs job is to store
>>> those along with the object itself and deliver only the relevant headers
>>> when delivering a HIT.
>>>

 1) Does Squid still grab the "HTTP Response Headers", even if the
 object is already in cache or Squid has already a cached copy of the
 HTTP Response header . If Squid caches HTTP Response Headers then how
 do you deal with HTTP CODE 302 if the object is already cached . I am
 asking this question because I have already seen most websites use
 same extensions such as .FLV including Location Header.
>>>
>>> Yes. All proxies on the path are expected to relay the end-to-end
>>> headers, drop the hop-by-hop headers, and MUST update/generate the
>>> feature negotiation and state information headers to match its
>>> capabilities in each direction.
>>>
>>>
>>
>> Do you mean by Yes , for grabbing the Http Response Headers even if
>> the object is already in cache, so therefore latency of network is
>> always added even if MISS or HIT situation?
>
> No. I mean the headers received along with the object need to be stored
> with it and sent on HITs.
> I see many people thinking they can just store the object by itself same
> as a webs server stores it. But that way looses the vital header
> information.
>

Do you mean within one call , you store the object and the header in
the same file and then you extract the header information when its
called ?. I am only calling the website once and retrieve
headers/Objects , Then storing the object as is and then I store the
header again in the same directory close to the object .

I am also categorizing the cache as if the URL was :
http://www.example.com/1/2/3/filename.ext

For example I generate directories based on URL (after CDN detection) such as :

Cache-Folder --> www.example.com --> 1 --> 2 --> 3 --> filename.ext .

Do you agree that it is a good idea ?.

Do you use swap.state file as an indexing file or the file that has
the location object/headers stored in the cache ?.

>> I have tested Squid and I
>> have noticed that reading HIT objects from Squ