Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
Hi Rusty, I didn't implement this because I wanted the server to be able to cache the reply easily (ie. new story goes up on /., everyone sends old hash, reply gets served from accelerator). I don't think that cacheing these will work as well as you might expect. In your example of slashdot, it gives a different reply every time, even for the same user. Try two wget's of slashdot.org and run a diff between the results. It would work for static pages, but with static pages you don't really need delta-encoding, as you'll get a good hit rate with the normal cache tag mechanisms that browsers and proxies already use. Cheers, Tridge ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Wednesday 01 April 2009 15:52:22 Martin Langhoff wrote: On Wed, Apr 1, 2009 at 12:48 AM, Rusty Russell ru...@rustcorp.com.au wrote: Well, 'strong' here is relative. In order to keep the checksum length finite and hence encode more blocks we only use a portion of the bits; it's a tradeoff. And so an overall checksum is important, just to verify that the final result is correct. Hmmm, if we need an overall checksum... - The server cannot stream data to the client because it has to wait until it has all of it. Even if our current implementation doesn't have this, having a protocol that allows streaming is high in my list. Yes, we need to chunk, because we can't hand the data on to the client until we've verified it, at least in a serious implementation. - Aren't we back to the 2-hashes-will-get-us-sued square? Nope, that's two hashes *per-block* IIRC. frankly, a hash collision that has the same content length and over the same syntax format (html/xml) is so rare as to be... well, not really something I would expect :-) Tridge said 16 bits, but actually it's 48 bits per block (32 bit adler + 16 bit strong). Since we're going to error out on the fail case, I'll switch the code to do 64-bit checksums (not right now, but soon: what we have is good enough for testing). Thanks, Rusty. ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Wednesday 01 April 2009 11:11:23 tri...@samba.org wrote: The per-block rolling hash should also be randomly seeded as Martin mentioned. That way if the user does ask for the page again then the hashing will be different. You need to send that seed along with the request. Hi Tridge, I didn't implement this because I wanted the server to be able to cache the reply easily (ie. new story goes up on /., everyone sends old hash, reply gets served from accelerator). But then I assumed a re-get on fail. Rusty. ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Wed, Apr 1, 2009 at 8:29 AM, Rusty Russell ru...@rustcorp.com.au wrote: Yes, we need to chunk, because we can't hand the data on to the client until we've verified it, at least in a serious implementation. Hmmm. If I understand you right, the concern is that the rolling hash matches in locations that aren't a true match. Can we do anything that is still efficient and retains the ability to stream? Maybe the client can send 2 hashes in the header, same block size but seeded differently? Or is the problem with the delta blocks we send?... (doesn't seem likely to prevent a streaming implementation, but maybe I'm missing something) Since we're going to error out on the fail case, I'll switch the code to do 64-bit checksums (not right now, but soon: what we have is good enough for testing). Does 2 hashes make the error condition so unlikely that we can assume it won't happen normally? Also - delivery of HTTP payloads is not guaranteed. As Tridge said, non-cacheable GETs may be non-idempotent, but they sometimes fail to complete for any of many reasons, and the user has a big fat Refresh button right there in the webbrowser. IOWs, a blind, unchecked delete-last-user action in a webapp is a bug in the webapp. It is ok to fail, as long as the retry will use a different seed... cheers, m ps: cc'd the http-crcsync list, which is more appropriate... -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
If we go for the 'fail in case of mismatch' approach, we can keep streaming. We simply have to make sure that we close the connection before we have streamed the last block if we discover the global checksum mismatch. And that is just a matter of putting the global checksum validation before the 'stream last block back to client' in the decoder logic. Alex WULMS Lead Developer/Systems Engineer Tel: +32 2 655 3931 Information Systems - SWIFT.COM Development S.W.I.F.T. SCRL -Original Message- From: Martin Langhoff [mailto:martin.langh...@gmail.com] Sent: Wednesday, April 01, 2009 11:11 AM To: Rusty Russell Cc: Toby Collett; Gervase Markham; Alex Wulms; XS Devel; WULMS Alexander; tri...@samba.org; angxia Huang; j...@freedesktop.org; http-crcs...@lists.laptop.org Subject: Re: Apache proxy CRCsync mozilla gsoc project? On Wed, Apr 1, 2009 at 8:29 AM, Rusty Russell ru...@rustcorp.com.au wrote: Yes, we need to chunk, because we can't hand the data on to the client until we've verified it, at least in a serious implementation. Hmmm. If I understand you right, the concern is that the rolling hash matches in locations that aren't a true match. Can we do anything that is still efficient and retains the ability to stream? Maybe the client can send 2 hashes in the header, same block size but seeded differently? Or is the problem with the delta blocks we send?... (doesn't seem likely to prevent a streaming implementation, but maybe I'm missing something) Since we're going to error out on the fail case, I'll switch the code to do 64-bit checksums (not right now, but soon: what we have is good enough for testing). Does 2 hashes make the error condition so unlikely that we can assume it won't happen normally? Also - delivery of HTTP payloads is not guaranteed. As Tridge said, non-cacheable GETs may be non-idempotent, but they sometimes fail to complete for any of many reasons, and the user has a big fat Refresh button right there in the webbrowser. IOWs, a blind, unchecked delete-last-user action in a webapp is a bug in the webapp. It is ok to fail, as long as the retry will use a different seed... cheers, m ps: cc'd the http-crcsync list, which is more appropriate... -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff smime.p7s Description: S/MIME cryptographic signature ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
So a quick question, what sort of http transfers are chunking most often used for? I believe we will get poor results with the method for most types of binary data, which tend to be the larger files. In the web context these will generally have not changed at all (in which case traditional caching will help) or will have changed completely in which case the hashing is just overhead. Happy to be corrected on this point. Actually while we are on this thought do we want to add the strong hash to the request headers so the upstream server can reply with use the cached version. This would allow the server side to correct for sites that don't use correct cache headers (i.e. static images with no cache information). One alternative to the fail on error is to hold a copy on the server end for a short period so we can retransmit unencoded, but this is probably unacceptable overhead on the server side, especially if we can't manage to maintain a TCP session for the retry. Are there any headers sent with each http chunk, we could always put our strong hash across these, assuming that chunking is defined at source and not repartitioned by caches and proxies in between. Toby ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Wed, Apr 1, 2009 at 8:17 PM, Toby Collett t...@plan9.net.nz wrote: So a quick question, what sort of http transfers are chunking most often used for? I believe we will get poor results with the method for most types of binary data, which tend to be the larger files. In the web context these will generally have not changed at all (in which case traditional caching will help) or will have changed completely in which case the hashing is just overhead. Happy to be corrected on this point. I agree. We can apply this method exclusively to */text and */xml mimetypes. And something I forgot on the earlier pro-streaming notes: the memory model of apache doesn't really release memory back to the kernel -- it keeps it in the per-process memory pool. This means that if we have unbound memory allocations (such as buffering whole requests), then our memory usage will be terrible. It's a bit less horrid with worker-threads, but in general, apache modules usually strive to maintained fixed-sized buffers. So it's something to keep in mind :-) cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Thu, 2009-04-02 at 07:17 +1300, Toby Collett wrote: So a quick question, what sort of http transfers are chunking most often used for? Dynamically generated content is the scenario for chunked transfers; since you don't know the length a-priori, some other method of indicating the message length is necessary. - Jim -- Jim Gettys j...@freedesktop.org ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
One thing we need to do is think about the headers carefully, as this is the aspect of the project we could promote as a web standard. There is a large amount of flexibility we could put in to this, but as Rusty has said, if there is a way someone can implement a protocol wrong they will. So we need to keep it as simple as possible. At the moment we append the block size and hashes for the blocks to the request. The response has a content encoding set, and will need a strong hash added. The number of blocks is fixed at 20 for the moment, with a hash size of 30 bits, which felt like a nice balance between overhead and performance. This keeps our header at around the 128 byte mark when you have base64 encoded the hashes (we dont pad the base64 encoding, so 30bits-5bytes). The other aspect we need to standardise is the encoding of the response. Again at the moment this is a very simplistic binary encoding. The response is encoded in sections, each beginning with either a 'L' to indicate a literal section or a 'B' to indicate a matched block (actually we could make one a default and save a few bytes here). A literal section then has a 4 byte int in network byte order for the size of the literal section, followed by the data. a block section has a single byte indicating the block number. There is no error checking in the encoding itself, this is assumed to be taken care in other layers, and we through in a strong hash on the whole file to make sure this is correct. There is a risk if we get a corruption of the literal length byte that we could try read a very large amount of data, not sure if this is acceptable. Toby 2009/3/31 Gervase Markham g...@mozilla.org On 25/03/09 18:20, Toby Collett wrote: Not a GSoC project, just a project(crcsync is the name at the moment). Initial target is a double proxy server, one each end of the slow link, with dreams of web standards and browser integration following. Seems to me that both projects need the same upstream server extension to be able to send the deltas down. Current state of the apache modules is that all the major pieces are in place but not a lot testing and no optimisation has been carried out yet. OK. So maybe the browser integration for this, or at least the groundwork for it, is what our SoC project should be. Particularly if you have Apache modules that work already. See https://wiki.mozilla.org/Community:SummerOfCode09:WebPagesOverRsync for where we are at the moment. We are getting incredible amounts of interest in this project - more than all the others combined. It seems like an idea whose time has come. Gerv -- This email is intended for the addressee only and may contain privileged and/or confidential information ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Tue, Mar 31, 2009 at 8:32 PM, Toby Collett t...@plan9.net.nz wrote: We are only using 30 bit hashes, so even if it was a perfect hash it is possible you could get a collision. Having said that our collision space is only the single web request, so should reduce chances of error. IIRC, if rsync thinks there was a collision on the weak hash, it rolls again through the file with the weak hash and a different seed. Maybe we could include a differently seeded fingerprint? Is that what you were thinking? cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
We are only using 30 bit hashes, so even if it was a perfect hash it is possible you could get a collision. Having said that our collision space is only the single web request, so should reduce chances of error. Toby 2009/4/1 Martin Langhoff martin.langh...@gmail.com On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett t...@plan9.net.nz wrote: There is no error checking in the encoding itself, this is assumed to be taken care in other layers, and we through in a strong hash on the whole file to make sure this is correct. Is that right? I thought what Rusty was saying re crcsync is that crc is strong, even when rolling? cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff -- This email is intended for the addressee only and may contain privileged and/or confidential information ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
The plan was to include something like an sha1 hash of the original file in the response headers. Then once the file has been decoded you can check to make sure it matches. If not you can resend the request without the black hash header and get the file the oldfashioned way. Toby 2009/4/1 Martin Langhoff martin.langh...@gmail.com On Tue, Mar 31, 2009 at 8:32 PM, Toby Collett t...@plan9.net.nz wrote: We are only using 30 bit hashes, so even if it was a perfect hash it is possible you could get a collision. Having said that our collision space is only the single web request, so should reduce chances of error. IIRC, if rsync thinks there was a collision on the weak hash, it rolls again through the file with the weak hash and a different seed. Maybe we could include a differently seeded fingerprint? Is that what you were thinking? cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff -- This email is intended for the addressee only and may contain privileged and/or confidential information ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
Hi Toby, The plan was to include something like an sha1 hash of the original file in the response headers. Then once the file has been decoded you can check to make sure it matches. If not you can resend the request without the black hash header and get the file the oldfashioned way. re-sending http requests can be dangerous. The request might have triggered an action like delete the last person from the list. When you resend it could delete two users rather than one. Remember that one of the aims of this work is to allow cacheing of dynamic requests, so you can't just assume the pages are marked as cacheable (which usually implies that a 2nd request won't do any harm). Certainly including a strong whole-page hash is a good idea, but if the strong hash doesn't match, then I think you need to return an error, just like if you got a network outage. The per-block rolling hash should also be randomly seeded as Martin mentioned. That way if the user does ask for the page again then the hashing will be different. You need to send that seed along with the request. In practice hashing errors will be extremely rare. It is extremely rare for rsync to need a 2nd pass, and it uses a much weaker rolling hash (I think I used 16 bits by default for the per block hashes). The ability to do multiple passes is what allows rsync to get away with such a small hash, but I remember that when I was testing the multiple-pass code I needed to weaken it even more to get any reasonable chance of a 2nd pass so I could be sure the code worked. Cheers, Tridge ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Tuesday 31 March 2009 23:29:23 Martin Langhoff wrote: On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett t...@plan9.net.nz wrote: There is no error checking in the encoding itself, this is assumed to be taken care in other layers, and we through in a strong hash on the whole file to make sure this is correct. Is that right? I thought what Rusty was saying re crcsync is that crc is strong, even when rolling? Well, 'strong' here is relative. In order to keep the checksum length finite and hence encode more blocks we only use a portion of the bits; it's a tradeoff. And so an overall checksum is important, just to verify that the final result is correct. Cheers, Rusty. ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Wed, Apr 1, 2009 at 12:48 AM, Rusty Russell ru...@rustcorp.com.au wrote: Well, 'strong' here is relative. In order to keep the checksum length finite and hence encode more blocks we only use a portion of the bits; it's a tradeoff. And so an overall checksum is important, just to verify that the final result is correct. Hmmm, if we need an overall checksum... - The server cannot stream data to the client because it has to wait until it has all of it. Even if our current implementation doesn't have this, having a protocol that allows streaming is high in my list. - Aren't we back to the 2-hashes-will-get-us-sued square? frankly, a hash collision that has the same content length and over the same syntax format (html/xml) is so rare as to be... well, not really something I would expect :-) cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
Not a GSoC project, just a project(crcsync is the name at the moment). Initial target is a double proxy server, one each end of the slow link, with dreams of web standards and browser integration following. Seems to me that both projects need the same upstream server extension to be able to send the deltas down. Current state of the apache modules is that all the major pieces are in place but not a lot testing and no optimisation has been carried out yet. Toby 2009/3/26 Gervase Markham g...@mozilla.org On 23/03/09 11:19, Martin Langhoff wrote: Fantastic! I assume the rsync-http now know of the vastly superior karma of crcsync over the 2-hash method of rsync. Er, not really. After a lunchtime conversation with tridge at LCA where he told me about his original project, I just thought it would be cool and put it up on our SoC list. So I know very little about what's possible. If the Apache mods and Mozilla speak the same protocol, then machines behind bandwidth-constrained links will be in much better shape. I can see 3G-internet providers pushing this too. Clearly, it's worth making sure everyone's on the same page. I see this as a killer app for Firefox on low-bandwidth links; we'll have every smalltown and developing world ISP which still has dial-up customers telling their customers use Firefox to make your Internet faster. They'd install the compression server on their web proxy, and voila. Have I understood correctly? Is Martin coordinating a GSoC project to do an apache extension for delta-compression-over-HTTP? Gerv -- This email is intended for the addressee only and may contain privileged and/or confidential information ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
On Mon, Mar 23, 2009 at 7:29 AM, Rusty Russell ru...@rustcorp.com.au wrote: Tridge just cc'd me on on a GSOC rsync-http mozilla project; given that Martin is coordinating an apache proxy plugin, I thought I'd send a big inclusive mail to make sure we all know about each other! My involvement: a crcsync module in CCAN which can be used as a (simplified) librsync. Fantastic! I assume the rsync-http now know of the vastly superior karma of crcsync over the 2-hash method of rsync. If the Apache mods and Mozilla speak the same protocol, then machines behind bandwidth-constrained links will be in much better shape. I can see 3G-internet providers pushing this too. Also cc'ing Jim Gettys -- our long-held hope is that the resulting extension to the http protocol is something that can be folded into a future http spec. Pushing buttons to create http-crcs...@lists.laptop.org to serve as a coordination point. cheers, martin -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
[Server-devel] Apache proxy CRCsync mozilla gsoc project?
Hi, Tridge just cc'd me on on a GSOC rsync-http mozilla project; given that Martin is coordinating an apache proxy plugin, I thought I'd send a big inclusive mail to make sure we all know about each other! My involvement: a crcsync module in CCAN which can be used as a (simplified) librsync. Cheers! Rusty. ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel