Re: Store_url_rewrite for squid 3+
On 9/24/2012 11:46 AM, Amos Jeffries wrote: Haven't heard anything about this is a while. How is this project going? Well it's holidays here now so I took a small stop to breath. we had one just ended and 2\3 in the way so it leaves small amount of time. and some family stuff on the way so I hope I will sit some more this week. I had some progress while refactoring and re-reviewing the code. I found one problem of handling the helper which leaves the helper without any place(STOUT) to write and there for will have trouble and crash. I am now working on the points which each original_url is being requested and making sure this is the right point to replace it with store_url. I managed to predict what keys squid should lookup(storeget) and create couple times In the code but I'm still missing something since while I managed to change the digest I get in the store.log a lot of question mark which leaves me to think that probably there is a point which the referencing variable is not the right one. I also had a bit time to look over the Link headers and metalinks files since I remember something about it from the past. Eliezer
Re: Store_url_rewrite for squid 3+
On 10.09.2012 04:56, Eliezer Croitoru wrote: On 09/06/2012 03:58 AM, Amos Jeffries wrote: I don't think there is anything which needs new cache_cf.cc code. The parsing side if things is identical for url_rewrite_*. The different defaults and locations are all coded in cf.data.pre ... yes indeed but the actual effect comes from the code in cache_cf.cc an example is: if (Config.Program.redirect) { if (Config.redirectChildren.n_max < 1) { Config.redirectChildren.n_max = 0; wordlistDestroy(&Config.Program.redirect); } }it's specific for the redirect program. I have tried to use the helper but it seems like without the cache_cf.cc code it wil assume the number of default helpers is 0/20 and will not start any. which is werid... Ah. Okay I had forgotten that hack. Yes it is needed for store-url as well. It is for the case where a program name is configured but there are no children to be started. Or on reconfigure where a program used to be configured but is now removed from the config. 0/20 with none started straight away is the default. One will be started on first request through the proxy that needs the helper. Amos
Re: Store_url_rewrite for squid 3+
On 09/06/2012 03:58 AM, Amos Jeffries wrote: I don't think there is anything which needs new cache_cf.cc code. The parsing side if things is identical for url_rewrite_*. The different defaults and locations are all coded in cf.data.pre ... yes indeed but the actual effect comes from the code in cache_cf.cc an example is: if (Config.Program.redirect) { if (Config.redirectChildren.n_max < 1) { Config.redirectChildren.n_max = 0; wordlistDestroy(&Config.Program.redirect); } }it's specific for the redirect program. I have tried to use the helper but it seems like without the cache_cf.cc code it wil assume the number of default helpers is 0/20 and will not start any. which is werid... Eliezer
Re: Store_url_rewrite for squid 3+
On 9/09/2012 4:19 a.m., Alex Rousskov wrote: On 09/07/2012 09:13 PM, Amos Jeffries wrote: Also, any revalidation requests done later must be done on the original request URL. Not the stored URL nor the potentially different current client request URL. This sounds like a very important point that could justify storing the original request URL -- exactly the kind of information I was asking for, thank you! Why do we have to use the original request URL for revalidation instead of the current one? We use current, not original request headers (we do not store the original ones), right? Is it better to combine current headers with the original URL than it is to use the current URL with current headers? Revalidation requires very precise variant targeting to ensure updated headers received from the revalidation is not corrupting the cached object copy. Regardless what people may think the YouTube URLs and other sites being de-duplicated with store-url *are* actually pointing at different files on different servers with potentially different hashes or encoding details. Particularly in the cases where the HD and standard definition variants of a video are store-url mapped to the same cache object. The URL and ETag are both critical details to preserve here. Also, anything else which is used for specific Squid->upstream identification of the resource being revalidated. The store URL rewriting feature essentially assumes that any request URL that maps to URL X is equivalent and, hence, any response to any request URL that maps to URL X is equivalent. Why not use that assumption when revalidating? If we receive a 304, we can keep using the stored content. If we receive new response content, should not we assume that the stored content [under the original URL] is stale as well? Assumes is the right word. They are equivalent only in the proxy administrators thoughts. Which may be wrong or right. We have to let them be wrong sometimes and cause clients display problems, but we should not let them cause local cache corruption with revalidation updating cached objects meta data from incorrect variant sources. Again, I am not trying to say that using original URL for revalidation is wrong -- I am just trying to understand what the design constraints are. We could simply re-fetch and store a new copy from the new client request details. Revalidation is an optimization, but requires correct identification of the particular resource and variant we have in cache. That goes for anything in cache, store-url is just tricky in that the client-side request can't present us the accurate details for server-side. Thank you, Alex. P.S. The above still does not justify storing the rewritten URL(s), of course. No. I think those are only useful for key purposes and can be discarded once the object in cache is located for a HTI, or stored fro a MISS. Amos
Re: Store_url_rewrite for squid 3+
On 09/08/2012 01:36 AM, Alex Rousskov wrote: On 09/07/2012 01:32 PM, Eliezer Croitoru wrote: On 09/07/2012 05:10 PM, Alex Rousskov wrote: I am not sure what "option" you are referring to in the above. The Store::get(key) API I have described is not optional -- it is the primary way of detecting a hit. +1 to use the api. are we talking about the "Store::Root().get" ? That is where it starts, but other Store kids implement the get() method as well IIRC. From the caller perspective, you should also look at storeGetPublic*() and similar functions. basic review Done already. The URL rewriting must happen before or during Store key calculation. The problem is that if I am rewriting the url, unlike url_rewrite\redirect the connection should be initiated based on the original url and not the rewritten one. Sure, but that is not a problem you need to worry about much. Since you are not going to alter the original URL, the connection initiation code will not be affected by your changes (except for some minor adjustments if some data members get renamed or Store::get() API-related calls get adjusted as discussed below). Got my answers during the review. What I do not know yet is what and where and based on what the request is done. For this project, you do not need to know much about connection initiation code. You need to make a copy of the original URL, rewrite that copy using the new helper, and put it in a new HttpRequest data member (for example). This should probably happen (or be initiated from) client_side.cc or client_side_request.cc. Well after the review it seems like there is no need to copy the request but to use the char * storeurl. it makes more sense since you never touch any other part of the object while you are getting it. You then need to make sure that every time Store calculates a store key, the new member (if any) is used instead of the original URL. this is the original reason I wanted to know\find about the initiation code. but I managed to find what I need for now. Since the current code is using the original request URL where you want the rewritten/store URL to be used, I recommend _renaming_ all members that keep or extract the original/request URL (e.g., MemObject::url). This will force you to go through the relevant callers and make sure that the right URL is used, depending on the caller context. Exactly what I was getting into. Once you are done, we can review whether the vast majority of changes in the diff are a simple renaming of "url" to "requestUrl". If yes, you can revert that renaming change. If there are many cases where old "url" became "storeUrl" and many cases where it became "requestUrl" then the change should stay. Please note that the above is just a sketch. I am not suggesting any specific new names, and I am using existing class names only as an example. Will look into it more. as far as I can tell most of the current code MemObject::url calls are from debug. so we need to store the original url and the store_url at least for the period of time the request is being served. Yes, we need to keep both URLs around during the transaction lifetime. When I was talking about storing a URL, I was only talking about Store (i.e., storing something in the cache). A reason I can think of is to make it possible to use mgr:store_url_option that will maybe give some data about in cache objects such as urls etc. For most cases the access.log is what will be used but if there is a need it's for it it's for that. by the way the Purge tool uses the URL part of the metadata to find urls and purge them in a ufs cache_dir. since I stored objects in my cache with URL meta data as the rewritten one I could do lookups on the cache objects using purge. It gives you the ability to find out exactly at this moment what objects urls are in the cache and in my case also rewritten ones. (answer about storing or not the store_url in meta) That is a separate question. I cannot answer it, unfortunately. I do not know why Squid2 stored either of those URLs. I know Squid3 does not store the original/request URL. It does in ufs cahe_dir as a fact. (see the below quotation from a cache file) url_rewrite api gets the http requests as the background to any action so it cannot be used for store_url since any change to the request will lead to the change we are trying to not do exactly. mem_object contains: {... char *url; HttpRequest *request; ...} and the url is being used to make the hash but I still dont know what is being taken from the mem_object to do the request. HttpStateData::buildRequestPrefix() builds request headers. It may use mem_object::url. I do not think you should change that code (apart from renaming and Store::Get() API changes discussed elsewhere). You should only be concerned with calls to storeGetPublic*() and such. Make sure the callers do not use mem_object::url without checking whether there is a "rewritten" st
Re: Store_url_rewrite for squid 3+
On 09/07/2012 09:13 PM, Amos Jeffries wrote: > Also, any revalidation requests done later must be done on the > original request URL. Not the stored URL nor the potentially different > current client request URL. This sounds like a very important point that could justify storing the original request URL -- exactly the kind of information I was asking for, thank you! Why do we have to use the original request URL for revalidation instead of the current one? We use current, not original request headers (we do not store the original ones), right? Is it better to combine current headers with the original URL than it is to use the current URL with current headers? The store URL rewriting feature essentially assumes that any request URL that maps to URL X is equivalent and, hence, any response to any request URL that maps to URL X is equivalent. Why not use that assumption when revalidating? If we receive a 304, we can keep using the stored content. If we receive new response content, should not we assume that the stored content [under the original URL] is stale as well? Again, I am not trying to say that using original URL for revalidation is wrong -- I am just trying to understand what the design constraints are. Thank you, Alex. P.S. The above still does not justify storing the rewritten URL(s), of course.
Re: Store_url_rewrite for squid 3+, on what branch to start working?
On 6/09/2012 11:00 p.m., Eliezer Croitoru wrote: I had a *smalll* issue with my storage so I'm forced to start from almost scratch. I previously worked on the 3.2.1 latest stable sources and I am wondering on where to start now? start on head? stable? Eliezer 3.HEAD please. Amos
Re: Store_url_rewrite for squid 3+
On 8/09/2012 7:38 a.m., Eliezer Croitoru wrote: On 09/07/2012 05:10 PM, Alex Rousskov wrote: I am not sure what "option" you are referring to in the above. The Store::get(key) API I have described is not optional -- it is the primary way of detecting a hit. +1 to use the api. are we talking about the "Store::Root().get" ? I was just getting into it now and it seems to confused me a bit. and I found my answer for couple things about purge and other stuff on the way. as I see if we want to make it work we need to take another logic then then in 2.7 (obviously). I will check it and make sure I can make it stick and not break anything in *store* on the way. The URL rewriting must happen before or during Store key calculation. The problem is that if I am rewriting the url, unlike url_rewrite\redirect the connection should be initiated based on the original url and not the rewritten one. What I do not know yet is what and where and based on what the request is done. if it's based on the original request or the url. so we need to store the original url and the store_url at least for the period of time the request is being served.(answer about storing or not the store_url in meta) Correct. Also, any revalidation requests done later must be done on the original request URL. Not the stored URL nor the potentially different current client request URL. Thus we need to store the object with key being the store-url and preserve the original for server-side use. url_rewrite api gets the http requests as the background to any action so it cannot be used for store_url since any change to the request will lead to the change we are trying to not do exactly. mem_object contains: {... char *url; HttpRequest *request; ...} and the url is being used to make the hash but I still dont know what is being taken from the mem_object to do the request. when I will know what exact object and exact point in code is being used to fetch the request I think I can be more clever on a way to implement my idea. I had the end of the redirector but I lost it somewhere in couple too much K*lines. STORE_META_STOREURL is unused in Squid3 AFAICT. Ok, but it still there and can be used with just 2-3 lines of code. I have tested it and the data is written if needed but is not in use when reading from store. Why do we need to store it? I do not know why we need to store it and i think that the only things that should be saved is the full request full reply and some data\meta the being will be used by the replacement policy. since I dont know how everything works and about all the api in the code after reviewing the 2.7 code related to store_url it's seems like the approach wasn't that bad using the meta data. since couple hours ago I noticed that it was done in order to make things possible and not really to think in a more wide angle about the effect of it. the main problem I think the STORE_META_STOREURL was maybe used for is if and while rebuilding the cache_dir data to be able not pass the url to the store_url helper. from this point I think it's costs space but when squid rebuilds the cache_dir I still dont now if it's good to pass url into the helper. It will be costly to process maybe several millions of objects through the helper on startup/restart. And for the question that pops out: if a cache_dir swap.data get's corrupted, what the fate of the cache in the squid3.head ? (I have never had such a bad luck to be able feel what it's like). Amos
Re: Store_url_rewrite for squid 3+
On 09/07/2012 01:32 PM, Eliezer Croitoru wrote: > On 09/07/2012 05:10 PM, Alex Rousskov wrote: > > I am not sure what "option" you are referring to in the above. The > Store::get(key) API I have described is not optional -- it is the > primary way of detecting a hit. > > +1 to use the api. > are we talking about the "Store::Root().get" ? That is where it starts, but other Store kids implement the get() method as well IIRC. From the caller perspective, you should also look at storeGetPublic*() and similar functions. > The URL rewriting must happen before or during Store key calculation. > > The problem is that if I am rewriting the url, unlike > url_rewrite\redirect the connection should be initiated based on the > original url and not the rewritten one. Sure, but that is not a problem you need to worry about much. Since you are not going to alter the original URL, the connection initiation code will not be affected by your changes (except for some minor adjustments if some data members get renamed or Store::get() API-related calls get adjusted as discussed below). > What I do not know yet is what and where and based on what the request > is done. For this project, you do not need to know much about connection initiation code. You need to make a copy of the original URL, rewrite that copy using the new helper, and put it in a new HttpRequest data member (for example). This should probably happen (or be initiated from) client_side.cc or client_side_request.cc. You then need to make sure that every time Store calculates a store key, the new member (if any) is used instead of the original URL. Since the current code is using the original request URL where you want the rewritten/store URL to be used, I recommend _renaming_ all members that keep or extract the original/request URL (e.g., MemObject::url). This will force you to go through the relevant callers and make sure that the right URL is used, depending on the caller context. Once you are done, we can review whether the vast majority of changes in the diff are a simple renaming of "url" to "requestUrl". If yes, you can revert that renaming change. If there are many cases where old "url" became "storeUrl" and many cases where it became "requestUrl" then the change should stay. Please note that the above is just a sketch. I am not suggesting any specific new names, and I am using existing class names only as an example. > so we need to store the original url and the store_url at least for the > period of time the request is being served. Yes, we need to keep both URLs around during the transaction lifetime. When I was talking about storing a URL, I was only talking about Store (i.e., storing something in the cache). >(answer about storing or not the store_url in meta) That is a separate question. I cannot answer it, unfortunately. I do not know why Squid2 stored either of those URLs. I know Squid3 does not store the original/request URL. > url_rewrite api gets the http requests as the background to any action > so it cannot be used for store_url since any change to the request will > lead to the change we are trying to not do exactly. > > mem_object contains: > {... > char *url; > HttpRequest *request; > ...} > and the url is being used to make the hash but I still dont know what is > being taken from the mem_object to do the request. HttpStateData::buildRequestPrefix() builds request headers. It may use mem_object::url. I do not think you should change that code (apart from renaming and Store::Get() API changes discussed elsewhere). You should only be concerned with calls to storeGetPublic*() and such. Make sure the callers do not use mem_object::url without checking whether there is a "rewritten" store URL as well. You will probably want to change profile of most of those calls so that the caller does not need to worry about picking the right URL to use (the decision would happen inside storeGetPublic() and friends). >> Why do we need to store it? > I do not know why we need to store it and i think that the only things > that should be saved is the full request full reply and some data\meta > the being will be used by the replacement policy. > > since I dont know how everything works and about all the api in the code > after reviewing the 2.7 code related to store_url it's seems like the > approach wasn't that bad using the meta data. How is that metadata used in Squid2? What is it used for? Nobody is suggesting that Squid2 does something wrong here, of course. We just need to understand _what_ it does so that we can either mimic what it does in Squid3 or do something better. > since couple hours ago I noticed that it was done in order to make > things possible and not really to think in a more wide angle about the > effect of it. What things does it make possible? I hope we will not have to commit some code without understanding its effects. > the main problem I think the STORE_META_STOREURL was maybe us
Re: Store_url_rewrite for squid 3+
On 09/07/2012 05:10 PM, Alex Rousskov wrote: I am not sure what "option" you are referring to in the above. The Store::get(key) API I have described is not optional -- it is the primary way of detecting a hit. +1 to use the api. are we talking about the "Store::Root().get" ? I was just getting into it now and it seems to confused me a bit. and I found my answer for couple things about purge and other stuff on the way. as I see if we want to make it work we need to take another logic then then in 2.7 (obviously). I will check it and make sure I can make it stick and not break anything in *store* on the way. The URL rewriting must happen before or during Store key calculation. The problem is that if I am rewriting the url, unlike url_rewrite\redirect the connection should be initiated based on the original url and not the rewritten one. What I do not know yet is what and where and based on what the request is done. if it's based on the original request or the url. so we need to store the original url and the store_url at least for the period of time the request is being served.(answer about storing or not the store_url in meta) url_rewrite api gets the http requests as the background to any action so it cannot be used for store_url since any change to the request will lead to the change we are trying to not do exactly. mem_object contains: {... char *url; HttpRequest *request; ...} and the url is being used to make the hash but I still dont know what is being taken from the mem_object to do the request. when I will know what exact object and exact point in code is being used to fetch the request I think I can be more clever on a way to implement my idea. I had the end of the redirector but I lost it somewhere in couple too much K*lines. STORE_META_STOREURL is unused in Squid3 AFAICT. Ok, but it still there and can be used with just 2-3 lines of code. I have tested it and the data is written if needed but is not in use when reading from store. Why do we need to store it? I do not know why we need to store it and i think that the only things that should be saved is the full request full reply and some data\meta the being will be used by the replacement policy. since I dont know how everything works and about all the api in the code after reviewing the 2.7 code related to store_url it's seems like the approach wasn't that bad using the meta data. since couple hours ago I noticed that it was done in order to make things possible and not really to think in a more wide angle about the effect of it. the main problem I think the STORE_META_STOREURL was maybe used for is if and while rebuilding the cache_dir data to be able not pass the url to the store_url helper. from this point I think it's costs space but when squid rebuilds the cache_dir I still dont now if it's good to pass url into the helper. And for the question that pops out: if a cache_dir swap.data get's corrupted, what the fate of the cache in the squid3.head ? (I have never had such a bad luck to be able feel what it's like). Eliezer
Re: Store_url_rewrite for squid 3+
On 09/07/2012 06:20 AM, Eliezer Croitoru wrote: > On 09/06/2012 09:04 PM, Alex Rousskov wrote: >> The biggest question for me is why Squid2 code was storing multiple >> URLs with the cached object (if it was). >> Why cannot Store just work with the [rewritten] URL given to it and >> ignore the fact that some [store] URLs originated from some other >> [real] URLs? > I found the answer in the documentation at: 03_major_componenets: > "The Storage Manager is the glue between client and server > sides. Every object saved in the cache is allocated a > StoreEntry structure. While the object is being > accessed, it also has a MemObject structure." I do not understand how the above quote answers my question. Moreover, the above text is stale -- Squid3 (Rock Store and shared memory caches specifically) does not allocate StoreEntry for every object saved in the cache. Furthermore, as far as I can tell, current Squid3 code does not store the request URL at all (please correct me if I am wrong). This may imply that there is no [pressing] need to store the rewritten URL either (or at least we should have a clear understanding of why it needs to be stored, and storing it may be viewed as a separate project/improvement). > so I think the duplication was to preserve this structure and prevent > major api changes. > as far I get into the depth of the code I see how it's reasonable to > make this decision if you compare the loss and benefits . > loosing a few single(sometimes it's not) bytes of GB is cheap compared > to depth development time. I do not know what benefits you are talking about. Why do we need to store the rewritten URL? In other words, what will break if we do not store the rewritten URL? Again, I am not saying that storing URLs is wrong -- I just want to understand why we need to do that. One possible use is consistency checks. If we store the request URL, the hit serving code can double check that the current request URL is the same as the stored request URL. However, those checks do not explain why we need to store two URLs (rewritten and original) and they should be viewed as a separate improvement/project outside of your work scope. >> Store can get a list of cached objects by iterating through store_table >> and other store indexes. In general, you should not assume that it is >> possible to get a list of all cached URLs in any efficient/practical >> fashion because not all in-RAM indexes store URLs. It is only possible >> to get an answer to the following question: >> >>* Is a response with cache key K likely to be in Squid cache now? >> >> Where cache key is a hash computed over the request method, request URI, >> and other properties. > this is what I remember. > so for the new development there should be an option to do that but also > do the rewriting of the url before checking that or more practically to > change the cache key calculations if there is a store_url present for > the request. I am not sure what "option" you are referring to in the above. The Store::get(key) API I have described is not optional -- it is the primary way of detecting a hit. The URL rewriting must happen before or during Store key calculation. > I do have an approach that I want to check and Its' based on finding > all\specific mem_object operations and object creation in the code. > now I'm struggling to juggle find and mark the points. > > answering the double url thing > In 2.7 the feature used the storeswap_meta to add the storeurl string > and it's simplified things. Yes, if we need to store URL(s), we should use the StoreMeta API. Why do we need to store URL(s)? > I noticed "STORE_META_STOREURL" still in the TLV headers (probably to > support older cache object structure versions) so i will try to use it.. > for some testing purposes. STORE_META_STOREURL is unused in Squid3 AFAICT. > I have tested and added basic tests to make sure that the storeurl is > being written and used and it's not hearts the any current cache objects > or stuff. > any thoughts? Why do we need to store it? Thank you, Alex.
Re: Store_url_rewrite for squid 3+
On 09/06/2012 09:04 PM, Alex Rousskov wrote: On 09/05/2012 06:58 PM, Amos Jeffries wrote: FWIW, I have not reviewed the store_url_rewrite code in Squid2 so I cannot answer the questions related to how it was done. I can suggest ways of doing this in Squid3, but since somebody already investigated all the alternatives, it would be better to hear the summary of the Squid2 implementation (as it relates to Store) before diving into Squid3 development. squid2 form makes sense for me by the minute. The biggest question for me is why Squid2 code was storing multiple URLs with the cached object (if it was). Why cannot Store just work with the [rewritten] URL given to it and ignore the fact that some [store] URLs originated from some other [real] URLs? I found the answer in the documentation at: 03_major_componenets: "The Storage Manager is the glue between client and server sides. Every object saved in the cache is allocated a StoreEntry structure. While the object is being accessed, it also has a MemObject structure." so I think the duplication was to preserve this structure and prevent major api changes. as far I get into the depth of the code I see how it's reasonable to make this decision if you compare the loss and benefits . loosing a few single(sometimes it's not) bytes of GB is cheap compared to depth development time. some calculations: youtube url will be about then 600 ascii letters and before it was object = video but now it's about 1.7MB per chunk. so the loss of about 600 bytes of space(am i right?) compares to 20 Million bytes gain? well on 1.7MB it's something else but we are talking about 96Kb loss for a video file. Are we trying to support going from a store_url_rewrite config back to regular config without losing some of the cached objects? Since we are talking about a try for solution to solve a static objects de-duplication I think it's not our case. and even for a more dynamic one the case is always many->1 to achieve better cache and it's a case that makes no sense in rolling back to a dynamic url acquired per unique IP+COOKIE+TIME+other stuff. It will make sense if there was a plan for rebuilding the cached objects and the storeurl in store but it's seems like too intensive task to hand it like a reasonable regular usage case. If someone have this kind of cases he should migrate from generic cache-proxy to a more specific task customized proxy.(still not seems like a reasonable request for anything unless you want to be the CIA\KGB\FBI and collect data) ? question how cachemgr gets the list of urls in memory? You might be confusing "cache manager" (the thing that responds to "squidclient mgr:info" requests) with Store. Also, you should not think in terms of memory (RAM) because some objects are only cached on disk. It is best to think of Store as a collection of stored objects, ignoring their particular location (memory or disk) to the extent possible. no no i was talking about mgr:info .. this not related in straight connection but I had a question about it in the past and will leave that to somewhere in the future. but related to to store_url there was something that prevented mgr:what-ever-gives-data-on-cached-objects that it wont show the store_url objects. after understanding how it was coded it's pretty obviates how and why this happened and it can be prevented during new development while structuring the data correctly. Store can get a list of cached objects by iterating through store_table and other store indexes. In general, you should not assume that it is possible to get a list of all cached URLs in any efficient/practical fashion because not all in-RAM indexes store URLs. It is only possible to get an answer to the following question: * Is a response with cache key K likely to be in Squid cache now? Where cache key is a hash computed over the request method, request URI, and other properties. this is what I remember. so for the new development there should be an option to do that but also do the rewriting of the url before checking that or more practically to change the cache key calculations if there is a store_url present for the request. The Store is too big and complex of an API to accurately describe in an email IMO. I would be happy to answer specific questions about the stuff I know, but you may have to research how things work as there is no comprehensive documentation yet. just need major points in the process mentioned before. but will take sometime until I will review more code to not make any basic stand on the subject. I do have an approach that I want to check and Its' based on finding all\specific mem_object operations and object creation in the code. now I'm struggling to juggle find and mark the points. answering the double url thing In 2.7 the feature used the storeswap_meta to add the storeurl string and it's simplified things. it's a TLV struct so you are safe in to no
Re: Store_url_rewrite for squid 3+
On 09/05/2012 06:58 PM, Amos Jeffries wrote: > On 06.09.2012 11:58, Eliezer Croitoru wrote: > We can pause there for the infrastructure to look fine before moving on > to the store details. I've been waiting on assistance from Henrik or > Alex on that for a while. They are the ones who know the answers to your > questions below AFAIK. FWIW, I have not reviewed the store_url_rewrite code in Squid2 so I cannot answer the questions related to how it was done. I can suggest ways of doing this in Squid3, but since somebody already investigated all the alternatives, it would be better to hear the summary of the Squid2 implementation (as it relates to Store) before diving into Squid3 development. The biggest question for me is why Squid2 code was storing multiple URLs with the cached object (if it was). Why cannot Store just work with the [rewritten] URL given to it and ignore the fact that some [store] URLs originated from some other [real] URLs? Are we trying to support going from a store_url_rewrite config back to regular config without losing some of the cached objects? >> 2. Research the workflow of storing objects in memory and store and >> introduce psudo for a new workflow of storing objects to avoid bad >> effects on cache objects usage in any form that can be. >> - I do know that squid uses some hash look-up and I have seen in the >> things about it. >> - as far I understood from the code: >> client_request builds the request of the http object. >> creates a mem-object and on the way creates a checksum. >> a transfer from of the mem-object to a "store" happens. >> if a store rebuild happens it takes all of the data from the file in >> the store. >> >> ? question how cachemgr gets the list of urls in memory? You might be confusing "cache manager" (the thing that responds to "squidclient mgr:info" requests) with Store. Also, you should not think in terms of memory (RAM) because some objects are only cached on disk. It is best to think of Store as a collection of stored objects, ignoring their particular location (memory or disk) to the extent possible. Store can get a list of cached objects by iterating through store_table and other store indexes. In general, you should not assume that it is possible to get a list of all cached URLs in any efficient/practical fashion because not all in-RAM indexes store URLs. It is only possible to get an answer to the following question: * Is a response with cache key K likely to be in Squid cache now? Where cache key is a hash computed over the request method, request URI, and other properties. >> I will look at it later but if someone have solid knowledge on how >> the store routing was or implemented before i'm rushing into the code >> every piece of info will help me when looking into it. The Store is too big and complex of an API to accurately describe in an email IMO. I would be happy to answer specific questions about the stuff I know, but you may have to research how things work as there is no comprehensive documentation yet. Another complication is that such fundamental Squid2 Store feature as store_table needs to be removed but it has not been completely removed from Squid3 yet, so there is some [older] code that relies on it and some [newer] code that tries hard to stay away from it, all while doing the same kind of operations. Finally, the whole Store class hierarchy is ugly to a fault. It needs to be split into more independent classes instead of everybody and the kitchen sink inheriting from Store, hiding the intended boundaries among "store manager", "memory storage manager", "disk storage manager", "cache_dir manager", etc. Good luck, Alex.
Re: Store_url_rewrite for squid 3+, on what branch to start working?
I had a *smalll* issue with my storage so I'm forced to start from almost scratch. I previously worked on the 3.2.1 latest stable sources and I am wondering on where to start now? start on head? stable? Eliezer
Re: Store_url_rewrite for squid 3+
On 9/6/2012 4:37 AM, Amos Jeffries wrote: On 06.09.2012 13:23, Eliezer Croitoru wrote: OK, it seems we are getting to somewhere. i know how to patch using command but what are the proper one to get a patch file to be run later? will look into it. If you are using diff: diff -u orig_code/ new_code/ >output_file.patch If you are using bzr: bzr diff >output_file.patch Amos Thanks, and I hoped to sleep but i'm not tired :( anyway I will post couple patches later with the basic thing. i'm now mangling my working version into the ridrect.cc file and moving on. -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: Store_url_rewrite for squid 3+
On 06.09.2012 13:23, Eliezer Croitoru wrote: OK, it seems we are getting to somewhere. i know how to patch using command but what are the proper one to get a patch file to be run later? will look into it. If you are using diff: diff -u orig_code/ new_code/ >output_file.patch If you are using bzr: bzr diff >output_file.patch Amos
Re: Store_url_rewrite for squid 3+
OK, it seems we are getting to somewhere. i know how to patch using command but what are the proper one to get a patch file to be run later? will look into it. Eliezer On 9/6/2012 3:58 AM, Amos Jeffries wrote: On 06.09.2012 11:58, Eliezer Croitoru wrote: On 9/5/2012 9:56 AM, Eliezer Croitoru wrote: any leads,? Well there is a nice progress. I reviewed the 2.7 store_url_rewrite and I divided the task into more detailed smaller tasks. FTR: squid-2.7 ports are exempted as suitable in most cases for back-porting to stable despite our "no new features" policy. I am happy for this to be done as a series of patches instead of a singular change. It can be assembled in trunk and back-ported as a singular later. 1. Research the url_rewrite interface code and Introduce a modified version of url_rewrite as url_store_rewrite_program. - this task is kind of done(passed compiling and running on 3.2.1) by now and I want to get some ideas on naming conventions for the code to fit the project amazing code looks. list of changed files and code: create a mimic file of redirect.cc in ./store_rewrite.cc and change No need. What I meant earlier about src/redirect.cc being usable is that most of the code is an exact duplicate. You should only need to: * write a new start() function ie storeurlRewriteStart() * add new storeurl_rewriters global * adapt redirectRegisterWithCacheManager() to register a new "storeurl_rewrite" report (when that part is written) * adapt redirectInit() to setup both url_rewrite and storeurl-rewrite helpers. * adapt redirectShutdown() to cleanup both url_rewrite and storeurl-rewrite helpers. fair The new storeurlRewriteStart() to be used by store code for this re-writing and sets up the redirect interface using the new store helpers and whatever callback the changed URL is to be sent to. The entire rest of the code for helper management is identical. all the methods and variables to fit store_rewrite. strip all the url_rewrite data manipulating actions. change the debugging info. (after the store related planning tasks get back here to redo) ./structs.h adding the proper variables for: helper naming, bypass(on\off), acl_access namespace, child configs the ??_rewrites_host of url_rewrite dosnt belong for store_rewrite process at all. ./cache_cf.cc state the default configuration for the helper What do you mean by this? doConfigure() post-configuration checking? I don't think there is anything which needs new cache_cf.cc code. The parsing side if things is identical for url_rewrite_*. The different defaults and locations are all coded in cf.data.pre ... ok so later we will see how ot optimize the conf file. but there are coupe arguments there that are crucial for compilation and parsing the config file ./cf.data.pre stating all config directives for the the helper (copy and modify from url_rewrite_program) Okay. However, if merging the stages to trunk separately this will need to be the final step, since it makes the directives publicly visible. We want to make these changes and doc/release-notes/release-3.*.sgml at the same time when it is suitable for public use. Also, remove the storeurl_* entries at the top marking them as "obsolete"/unavilable type. ./ClientRequestContext.h adding int for state adding bool for done ./client_side_request.h stating the start method as squidexternal something Just "extern" will do. We are killing the SQUIDCEXTERN mess. ./client_side_request.cc adding calls and callouts ./protos.h stating the start init and shutdown methods. We are in the process of killing protos.h. Please create a store_urlrewrite.h header for these definitions instead. will be done However, see the comments above about redirect.cc. There should not need to be any new files created for this. Which removes the protos.h, main.cc and Makefile.am changes. ./main.cc: calling init and shutdown methods at start/reconfigure etc.. ./Makefile.am && ./Makefile.in adding the source ?.cc file into the commands Other than the notes above. Okay. If you have a patch for that please submit for audit :-) We can pause there for the infrastructure to look fine before moving on to the store details. I've been waiting on assistance from Henrik or Alex on that for a while. They are the ones who know the answers to your questions below AFAIK. 2. Research the workflow of storing objects in memory and store and introduce psudo for a new workflow of storing objects to avoid bad effects on cache objects usage in any form that can be. - I do know that squid uses some hash look-up and I have seen in the things about it. - as far I understood from the code: client_request builds the request of the http object. creates a mem-object and on the way creates a checksum. a transfer from of the mem-object to a "store" happens. if a store rebuild happens it takes all of the data from the file in the store. ? question how cachemgr gets the list of urls in memory
Re: Store_url_rewrite for squid 3+
On 06.09.2012 11:58, Eliezer Croitoru wrote: On 9/5/2012 9:56 AM, Eliezer Croitoru wrote: any leads,? Well there is a nice progress. I reviewed the 2.7 store_url_rewrite and I divided the task into more detailed smaller tasks. FTR: squid-2.7 ports are exempted as suitable in most cases for back-porting to stable despite our "no new features" policy. I am happy for this to be done as a series of patches instead of a singular change. It can be assembled in trunk and back-ported as a singular later. 1. Research the url_rewrite interface code and Introduce a modified version of url_rewrite as url_store_rewrite_program. - this task is kind of done(passed compiling and running on 3.2.1) by now and I want to get some ideas on naming conventions for the code to fit the project amazing code looks. list of changed files and code: create a mimic file of redirect.cc in ./store_rewrite.cc and change No need. What I meant earlier about src/redirect.cc being usable is that most of the code is an exact duplicate. You should only need to: * write a new start() function ie storeurlRewriteStart() * add new storeurl_rewriters global * adapt redirectRegisterWithCacheManager() to register a new "storeurl_rewrite" report (when that part is written) * adapt redirectInit() to setup both url_rewrite and storeurl-rewrite helpers. * adapt redirectShutdown() to cleanup both url_rewrite and storeurl-rewrite helpers. The new storeurlRewriteStart() to be used by store code for this re-writing and sets up the redirect interface using the new store helpers and whatever callback the changed URL is to be sent to. The entire rest of the code for helper management is identical. all the methods and variables to fit store_rewrite. strip all the url_rewrite data manipulating actions. change the debugging info. (after the store related planning tasks get back here to redo) ./structs.h adding the proper variables for: helper naming, bypass(on\off), acl_access namespace, child configs the ??_rewrites_host of url_rewrite dosnt belong for store_rewrite process at all. ./cache_cf.cc state the default configuration for the helper What do you mean by this? doConfigure() post-configuration checking? I don't think there is anything which needs new cache_cf.cc code. The parsing side if things is identical for url_rewrite_*. The different defaults and locations are all coded in cf.data.pre ... ./cf.data.pre stating all config directives for the the helper (copy and modify from url_rewrite_program) Okay. However, if merging the stages to trunk separately this will need to be the final step, since it makes the directives publicly visible. We want to make these changes and doc/release-notes/release-3.*.sgml at the same time when it is suitable for public use. Also, remove the storeurl_* entries at the top marking them as "obsolete"/unavilable type. ./ClientRequestContext.h adding int for state adding bool for done ./client_side_request.h stating the start method as squidexternal something Just "extern" will do. We are killing the SQUIDCEXTERN mess. ./client_side_request.cc adding calls and callouts ./protos.h stating the start init and shutdown methods. We are in the process of killing protos.h. Please create a store_urlrewrite.h header for these definitions instead. However, see the comments above about redirect.cc. There should not need to be any new files created for this. Which removes the protos.h, main.cc and Makefile.am changes. ./main.cc: calling init and shutdown methods at start/reconfigure etc.. ./Makefile.am && ./Makefile.in adding the source ?.cc file into the commands Other than the notes above. Okay. If you have a patch for that please submit for audit :-) We can pause there for the infrastructure to look fine before moving on to the store details. I've been waiting on assistance from Henrik or Alex on that for a while. They are the ones who know the answers to your questions below AFAIK. 2. Research the workflow of storing objects in memory and store and introduce psudo for a new workflow of storing objects to avoid bad effects on cache objects usage in any form that can be. - I do know that squid uses some hash look-up and I have seen in the things about it. - as far I understood from the code: client_request builds the request of the http object. creates a mem-object and on the way creates a checksum. a transfer from of the mem-object to a "store" happens. if a store rebuild happens it takes all of the data from the file in the store. ? question how cachemgr gets the list of urls in memory? so probable points of failure: using the wrong url to fetch the object. wrong arguments for checksum. storing with wrong arguments\url leading to faulty rebuild. I do remember that when I looked at a stored old store_url_rewrite cahce file long time ago there were two urls in the file what leads me (it's a bit fogy) to think that the stored file was the mem
Re: Store_url_rewrite for squid 3+
On 9/5/2012 9:56 AM, Eliezer Croitoru wrote: any leads,? Well there is a nice progress. I reviewed the 2.7 store_url_rewrite and I divided the task into more detailed smaller tasks. 1. Research the url_rewrite interface code and Introduce a modified version of url_rewrite as url_store_rewrite_program. - this task is kind of done(passed compiling and running on 3.2.1) by now and I want to get some ideas on naming conventions for the code to fit the project amazing code looks. list of changed files and code: create a mimic file of redirect.cc in ./store_rewrite.cc and change all the methods and variables to fit store_rewrite. strip all the url_rewrite data manipulating actions. change the debugging info. (after the store related planning tasks get back here to redo) ./structs.h adding the proper variables for: helper naming, bypass(on\off), acl_access namespace, child configs the ??_rewrites_host of url_rewrite dosnt belong for store_rewrite process at all. ./cache_cf.cc state the default configuration for the helper ./cf.data.pre stating all config directives for the the helper (copy and modify from url_rewrite_program) ./ClientRequestContext.h adding int for state adding bool for done ./client_side_request.h stating the start method as squidexternal something ./client_side_request.cc adding calls and callouts ./protos.h stating the start init and shutdown methods. ./main.cc: calling init and shutdown methods at start/reconfigure etc.. ./Makefile.am && ./Makefile.in adding the source ?.cc file into the commands 2. Research the workflow of storing objects in memory and store and introduce psudo for a new workflow of storing objects to avoid bad effects on cache objects usage in any form that can be. - I do know that squid uses some hash look-up and I have seen in the things about it. - as far I understood from the code: client_request builds the request of the http object. creates a mem-object and on the way creates a checksum. a transfer from of the mem-object to a "store" happens. if a store rebuild happens it takes all of the data from the file in the store. ? question how cachemgr gets the list of urls in memory? so probable points of failure: using the wrong url to fetch the object. wrong arguments for checksum. storing with wrong arguments\url leading to faulty rebuild. I do remember that when I looked at a stored old store_url_rewrite cahce file long time ago there were two urls in the file what leads me (it's a bit fogy) to think that the stored file was the memobject cache rather then a set of arguments such as refresh time related info,method,url,request,response. I will look at it later but if someone have solid knowledge on how the store routing was or implemented before i'm rushing into the code every piece of info will help me when looking into it. Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: Store_url_rewrite for squid 3+
On 9/5/2012 8:00 AM, Amos Jeffries wrote: On 5/09/2012 4:10 p.m., Eliezer Croitoru wrote: I'm reading some code (will take a while) to maybe get a functional store_url_rewrite for squid3+. Actually i was thinking about it a lot and the process should be very simple: use some stdin+stdout like for url_rewrite interface for starter. Yes. The redirect.cc interface can be used. The caller code runs *_access checks and provides a callback function which uses the resulting URL to do things to store keys. i think this is can be done pretty fast if someone knows the current code. it's kind of replicate the url_rewrite and change all the directives names from url_rewrite to store_url_rewrite and let someone write the functions. how is this as a starter to maybe make it work? Exactly what I was planning. Just a lack of store code knowledge getting in the way. It should be done before the request is being done and i think that before url_rewrite helper. At the point where cache key is being decided I think. Which is just before HIT lookup. After adaptations like url_rewrite. the next step after that is to take the "to be cached" entry and change the the key of the stored object. as it will be stored at in the cache and not like the old store_url_rewrite that was saved in a formart of old+new url. how about just start with the basics of putting the whole redirector as in with store_url_rewrite? how hard can it be? Want to give it a try? Well I would like to but it seems i'm lacking of knowledge in c\c++ and squid structure. I do recognize that there is a language there with some structure since I did wrote using java and ruby. (just wrote a small proxy for specific protocol that peers two tcp incoming connections after verification of identity and it seems to rock) I was thinking to start with creating a store_url_rewrite (fake) that will debug and all but will not do any changes except logging. If I\we can make this stage it will benefit anyone who will want to integrate some new interface for squid. So I started playing with the sources but since squid is a big puzzle for me I dont know where and what I should put or look for pieces of code related to the redirect.cc . what I did as a starter is to take redirect.cc (3.2.1)and change all the methods names I could find\think off to a "srewriter" syntax such as: redirectRegisterWithCacheManager(); to srewriteRegisterWithCacheManager(); file and diff attached. after that I tried to find anything related to redirect.cc in the makefiles and have seen: at: ./src/Makefile.am ./src/Makefile.in so i suppose i should add to every place of redirect.cc also srewrite.cc (but i'm dont know...) I didnt made any changed to the srewrite.cc that will make it "harmless" and "fake" yet. now the hard part for me is: how would I find any redirect.cc in the rest of squid and make sure to add the needed code for srewrite.cc? I can parse the srewrite.cc but since I dont know any of the code structure I wouldnt know what to extract and look for. I noticed kinkie did pretty things while parsing the code. I took a peer at the old store_url_rewrite and it seems to be divided into "client_side_storeurl_rewrite.c", "store_rewrite.c" and has some struct in structs.h for children concurrency and command. On 3.2.1 it's seems to be organized better and only in couple files such as the redirect.cc and others. any leads,? Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il #diff redirect.cc srewriter.cc 5c5 < * DEBUG: section 61Redirector --- > * DEBUG: section 61Srewriter 55c55 < /// url maximum lengh + extra informations passed to redirector --- > /// url maximum lengh + extra informations passed to srewriter 66c66 < } redirectStateData; --- > } srewriterStateData; 68,71c68,71 < static HLPCB redirectHandleReply; < static void redirectStateFree(redirectStateData * r); < static helper *redirectors = NULL; < static OBJH redirectStats; --- > static HLPCB srewriteHandleReply; > static void srewriterStateFree(srewriterStateData * r); > static helper *srewriters = NULL; > static OBJH srewriterStats; 73c73 < CBDATA_TYPE(redirectStateData); --- > CBDATA_TYPE(srewriterStateData); 76c76 < redirectHandleReply(void *data, char *reply) --- > srewriteHandleReply(void *data, char *reply) 78c78 < redirectStateData *r = static_cast(data); --- > srewriterStateData *r = static_cast(data); 81c81 < debugs(61, 5, "redirectHandleRead: {" << (reply && *reply != '\0' ? reply : "") << "}"); --- > debugs(61, 5, "srewriteHandleRead: {" << (reply && *reply != '\0' ? reply > : "") << "}"); 94c94 < redirectStateFree(r); --- > srewriterStateFree(r); 98c98 < redirectStateFree(redirectStateData * r) --- > srewriterStateFree(srewriterStateData * r) 105c105 < redirectStats(StoreEntry * sentry) --- > srewriterStats(StoreEntry * sentry) 107,108c107,108 < if (redirectors == NUL
Re: Store_url_rewrite for squid 3+
On 5/09/2012 4:10 p.m., Eliezer Croitoru wrote: I'm reading some code (will take a while) to maybe get a functional store_url_rewrite for squid3+. Actually i was thinking about it a lot and the process should be very simple: use some stdin+stdout like for url_rewrite interface for starter. Yes. The redirect.cc interface can be used. The caller code runs *_access checks and provides a callback function which uses the resulting URL to do things to store keys. i think this is can be done pretty fast if someone knows the current code. it's kind of replicate the url_rewrite and change all the directives names from url_rewrite to store_url_rewrite and let someone write the functions. how is this as a starter to maybe make it work? Exactly what I was planning. Just a lack of store code knowledge getting in the way. It should be done before the request is being done and i think that before url_rewrite helper. At the point where cache key is being decided I think. Which is just before HIT lookup. After adaptations like url_rewrite. the next step after that is to take the "to be cached" entry and change the the key of the stored object. as it will be stored at in the cache and not like the old store_url_rewrite that was saved in a formart of old+new url. how about just start with the basics of putting the whole redirector as in with store_url_rewrite? how hard can it be? Want to give it a try? Amos