Re: replication handler - compression
It could also be that the C version is a lot more efficient than the Java version and it could take longer regardless. I could not find a benchmark on that, but C is usually better for bit twiddling. wunder On 10/30/08 10:36 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > man gzip: > >-# --fast --best > Regulate the speed of compression using the specified digit #, > where -1 or --fast indicates the fastest compres- > sion method (less compression) and -9 or --best indicates the > slowest compression method (best compression). The > default compression level is -6 (that is, biased towards high > compression at expense of speed). > > > So it could be better than the factor of 2, but also take longer. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Walter Underwood <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Thursday, October 30, 2008 11:52:47 AM >> Subject: Re: replication handler - compression >> >> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds, >> so it isn't free. >> >> $ cd index-copy >> $ du -sk >> 134336 . >> $ gzip * >> $ du -sk >> 62084 . >> >> wunder >> >> On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote: >> >>> Yeah. I'm just not sure how much benefit in terms of data transfer this >>> will >>> save. Has anyone tested this to see if this is even worth it? >>> >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message >>>> From: Erik Hatcher >>>> To: solr-user@lucene.apache.org >>>> Sent: Thursday, October 30, 2008 9:54:28 AM >>>> Subject: Re: replication handler - compression >>>> >>>> +1 - the GzipServletFilter is the way to go. >>>> >>>> Regarding request handlers reading HTTP headers, yeah,... this will >>>> improve, >>>> for >>>> sure. >>>> >>>> Erik >>>> >>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: >>>> >>>>> >>>>> : You are partially right. Instead of the HTTP header , we use a request >>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is >>>>> >>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for >>>>> clients to request compression over HTTP from servers. >>>>> >>>>> replicatoin is both special enough and important enough that if we had to >>>>> add special support to make that information available to the handler on >>>>> the master we could. >>>>> >>>>> but frankly i don't think that's neccessary: the logic to turn on >>>>> compression if the client requests it using "Accept-Encoding: gzip" is >>>>> generic enough that there is no reason for it to be in a handler. we >>>>> could easily put it in the SolrDispatchFilter, or even in a new >>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of >>>>> a GzipServletFilter in the wild that could be used as is. >>>>> >>>>> then we'd have double wins: compression for replication, and compression >>>>> of all responses generated by Solr if hte client requests it. >>>>> >>>>> -Hoss >>> >
Re: replication handler - compression
man gzip: -# --fast --best Regulate the speed of compression using the specified digit #, where -1 or --fast indicates the fastest compres- sion method (less compression) and -9 or --best indicates the slowest compression method (best compression). The default compression level is -6 (that is, biased towards high compression at expense of speed). So it could be better than the factor of 2, but also take longer. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Walter Underwood <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 30, 2008 11:52:47 AM > Subject: Re: replication handler - compression > > About a factor of 2 on a small, optimized index. Gzipping took 20 seconds, > so it isn't free. > > $ cd index-copy > $ du -sk > 134336 . > $ gzip * > $ du -sk > 62084 . > > wunder > > On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote: > > > Yeah. I'm just not sure how much benefit in terms of data transfer this > > will > > save. Has anyone tested this to see if this is even worth it? > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: Erik Hatcher > >> To: solr-user@lucene.apache.org > >> Sent: Thursday, October 30, 2008 9:54:28 AM > >> Subject: Re: replication handler - compression > >> > >> +1 - the GzipServletFilter is the way to go. > >> > >> Regarding request handlers reading HTTP headers, yeah,... this will > >> improve, > >> for > >> sure. > >> > >> Erik > >> > >> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: > >> > >>> > >>> : You are partially right. Instead of the HTTP header , we use a request > >>> : parameter. (RequestHandlers cannot read HTP headers). If the param is > >>> > >>> hmmm, i'm with walter: we shouldn't invent new mechanisms for > >>> clients to request compression over HTTP from servers. > >>> > >>> replicatoin is both special enough and important enough that if we had to > >>> add special support to make that information available to the handler on > >>> the master we could. > >>> > >>> but frankly i don't think that's neccessary: the logic to turn on > >>> compression if the client requests it using "Accept-Encoding: gzip" is > >>> generic enough that there is no reason for it to be in a handler. we > >>> could easily put it in the SolrDispatchFilter, or even in a new > >>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of > >>> a GzipServletFilter in the wild that could be used as is. > >>> > >>> then we'd have double wins: compression for replication, and compression > >>> of all responses generated by Solr if hte client requests it. > >>> > >>> -Hoss > >
Re: replication handler - compression
: Yeah. I'm just not sure how much benefit in terms of data transfer this : will save. Has anyone tested this to see if this is even worth it? one mans trash is another mans treasure ... if you're replicating snapshoots very frequently within a single datacenter speed is critical and bandwidth is free -- if you're replicating once a day from one data center to another over a very expensive, very small, pipe spending some time+cpu to compress may be worth it. either way: it should be almost trivial to implement if people wnat to supply a patch, and with a simple new requestDispatcher config option, easy to disable completeley on the server for people who might have clients sending "Accept-Encodig: gzip" willy nilly -Hoss
Re: replication handler - compression
CPU was at 100%, it was not IO bound. --wunder On 10/30/08 8:58 AM, "christophe" <[EMAIL PROTECTED]> wrote: > Gziping on disk requires quite some I/O. I guess that on the fly zipping > should be faster. > > C. > > Walter Underwood wrote: >> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds, >> so it isn't free. >> >> $ cd index-copy >> $ du -sk >> 134336 . >> $ gzip * >> $ du -sk >> 62084 . >> >> wunder >> >> On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: >> >> >>> Yeah. I'm just not sure how much benefit in terms of data transfer this >>> will >>> save. Has anyone tested this to see if this is even worth it? >>> >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message >>> >>>> From: Erik Hatcher <[EMAIL PROTECTED]> >>>> To: solr-user@lucene.apache.org >>>> Sent: Thursday, October 30, 2008 9:54:28 AM >>>> Subject: Re: replication handler - compression >>>> >>>> +1 - the GzipServletFilter is the way to go. >>>> >>>> Regarding request handlers reading HTTP headers, yeah,... this will >>>> improve, >>>> for >>>> sure. >>>> >>>> Erik >>>> >>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: >>>> >>>> >>>>> : You are partially right. Instead of the HTTP header , we use a request >>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is >>>>> >>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for >>>>> clients to request compression over HTTP from servers. >>>>> >>>>> replicatoin is both special enough and important enough that if we had to >>>>> add special support to make that information available to the handler on >>>>> the master we could. >>>>> >>>>> but frankly i don't think that's neccessary: the logic to turn on >>>>> compression if the client requests it using "Accept-Encoding: gzip" is >>>>> generic enough that there is no reason for it to be in a handler. we >>>>> could easily put it in the SolrDispatchFilter, or even in a new >>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of >>>>> a GzipServletFilter in the wild that could be used as is. >>>>> >>>>> then we'd have double wins: compression for replication, and compression >>>>> of all responses generated by Solr if hte client requests it. >>>>> >>>>> -Hoss >>>>> >> >>
Re: replication handler - compression
Gziping on disk requires quite some I/O. I guess that on the fly zipping should be faster. C. Walter Underwood wrote: About a factor of 2 on a small, optimized index. Gzipping took 20 seconds, so it isn't free. $ cd index-copy $ du -sk 134336 . $ gzip * $ du -sk 62084 . wunder On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: Yeah. I'm just not sure how much benefit in terms of data transfer this will save. Has anyone tested this to see if this is even worth it? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, October 30, 2008 9:54:28 AM Subject: Re: replication handler - compression +1 - the GzipServletFilter is the way to go. Regarding request handlers reading HTTP headers, yeah,... this will improve, for sure. Erik On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: : You are partially right. Instead of the HTTP header , we use a request : parameter. (RequestHandlers cannot read HTP headers). If the param is hmmm, i'm with walter: we shouldn't invent new mechanisms for clients to request compression over HTTP from servers. replicatoin is both special enough and important enough that if we had to add special support to make that information available to the handler on the master we could. but frankly i don't think that's neccessary: the logic to turn on compression if the client requests it using "Accept-Encoding: gzip" is generic enough that there is no reason for it to be in a handler. we could easily put it in the SolrDispatchFilter, or even in a new ServletFilte (i'm guessing iv'e seen about 74 different implementations of a GzipServletFilter in the wild that could be used as is. then we'd have double wins: compression for replication, and compression of all responses generated by Solr if hte client requests it. -Hoss
Re: replication handler - compression
About a factor of 2 on a small, optimized index. Gzipping took 20 seconds, so it isn't free. $ cd index-copy $ du -sk 134336 . $ gzip * $ du -sk 62084 . wunder On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Yeah. I'm just not sure how much benefit in terms of data transfer this will > save. Has anyone tested this to see if this is even worth it? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Erik Hatcher <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Thursday, October 30, 2008 9:54:28 AM >> Subject: Re: replication handler - compression >> >> +1 - the GzipServletFilter is the way to go. >> >> Regarding request handlers reading HTTP headers, yeah,... this will improve, >> for >> sure. >> >> Erik >> >> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: >> >>> >>> : You are partially right. Instead of the HTTP header , we use a request >>> : parameter. (RequestHandlers cannot read HTP headers). If the param is >>> >>> hmmm, i'm with walter: we shouldn't invent new mechanisms for >>> clients to request compression over HTTP from servers. >>> >>> replicatoin is both special enough and important enough that if we had to >>> add special support to make that information available to the handler on >>> the master we could. >>> >>> but frankly i don't think that's neccessary: the logic to turn on >>> compression if the client requests it using "Accept-Encoding: gzip" is >>> generic enough that there is no reason for it to be in a handler. we >>> could easily put it in the SolrDispatchFilter, or even in a new >>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of >>> a GzipServletFilter in the wild that could be used as is. >>> >>> then we'd have double wins: compression for replication, and compression >>> of all responses generated by Solr if hte client requests it. >>> >>> -Hoss >
Re: replication handler - compression
Yeah. I'm just not sure how much benefit in terms of data transfer this will save. Has anyone tested this to see if this is even worth it? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Erik Hatcher <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 30, 2008 9:54:28 AM > Subject: Re: replication handler - compression > > +1 - the GzipServletFilter is the way to go. > > Regarding request handlers reading HTTP headers, yeah,... this will improve, > for > sure. > > Erik > > On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: > > > > > : You are partially right. Instead of the HTTP header , we use a request > > : parameter. (RequestHandlers cannot read HTP headers). If the param is > > > > hmmm, i'm with walter: we shouldn't invent new mechanisms for > > clients to request compression over HTTP from servers. > > > > replicatoin is both special enough and important enough that if we had to > > add special support to make that information available to the handler on > > the master we could. > > > > but frankly i don't think that's neccessary: the logic to turn on > > compression if the client requests it using "Accept-Encoding: gzip" is > > generic enough that there is no reason for it to be in a handler. we > > could easily put it in the SolrDispatchFilter, or even in a new > > ServletFilte (i'm guessing iv'e seen about 74 different implementations of > > a GzipServletFilter in the wild that could be used as is. > > > > then we'd have double wins: compression for replication, and compression > > of all responses generated by Solr if hte client requests it. > > > > -Hoss
Re: replication handler - compression
+1 - the GzipServletFilter is the way to go. Regarding request handlers reading HTTP headers, yeah,... this will improve, for sure. Erik On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote: : You are partially right. Instead of the HTTP header , we use a request : parameter. (RequestHandlers cannot read HTP headers). If the param is hmmm, i'm with walter: we shouldn't invent new mechanisms for clients to request compression over HTTP from servers. replicatoin is both special enough and important enough that if we had to add special support to make that information available to the handler on the master we could. but frankly i don't think that's neccessary: the logic to turn on compression if the client requests it using "Accept-Encoding: gzip" is generic enough that there is no reason for it to be in a handler. we could easily put it in the SolrDispatchFilter, or even in a new ServletFilte (i'm guessing iv'e seen about 74 different implementations of a GzipServletFilter in the wild that could be used as is. then we'd have double wins: compression for replication, and compression of all responses generated by Solr if hte client requests it. -Hoss
Re: replication handler - compression
: You are partially right. Instead of the HTTP header , we use a request : parameter. (RequestHandlers cannot read HTP headers). If the param is hmmm, i'm with walter: we shouldn't invent new mechanisms for clients to request compression over HTTP from servers. replicatoin is both special enough and important enough that if we had to add special support to make that information available to the handler on the master we could. but frankly i don't think that's neccessary: the logic to turn on compression if the client requests it using "Accept-Encoding: gzip" is generic enough that there is no reason for it to be in a handler. we could easily put it in the SolrDispatchFilter, or even in a new ServletFilte (i'm guessing iv'e seen about 74 different implementations of a GzipServletFilter in the wild that could be used as is. then we'd have double wins: compression for replication, and compression of all responses generated by Solr if hte client requests it. -Hoss
Re: replication handler - compression
Hoss, You are partially right. Instead of the HTTP header , we use a request parameter. (RequestHandlers cannot read HTP headers). If the param is present it wraps the response in an zip outputstream. It is configured in the slave because Every slave may not want compression. . Slaves which are near can skip it. On Thu, Oct 30, 2008 at 3:54 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > My understanding of Noble's comment (and i could be wrong, i'm reading > between the lines) is that if you specify the new setting he's suggesting > when initializing the replication handler on the slave, then the slave > should start using an "Accept-Encoding: gzip" header when querying the > master, and that when recieving this header, the master will start > wrapping the response in a "Content-Encoding: gzip" > > (I'm making this assumption based on his note about this being a new slave > config option, with no mention of any new otions on the master) > > : You propose to do compressed transfers over HTTP ignoring the standard > : support for compressed transfers in HTTP. Programming that with a > : library doesn't make it "standard". > > : >> open a JIRA issue. we > : > will use a gzip on both ends of the pipe . On > : > the slave > : >> side you can > : > say > : > true > : > as an extra option to compress and > : >> send > : > data from server > : > --Noble > > > -Hoss > > -- --Noble Paul
Re: replication handler - compression
My understanding of Noble's comment (and i could be wrong, i'm reading between the lines) is that if you specify the new setting he's suggesting when initializing the replication handler on the slave, then the slave should start using an "Accept-Encoding: gzip" header when querying the master, and that when recieving this header, the master will start wrapping the response in a "Content-Encoding: gzip" (I'm making this assumption based on his note about this being a new slave config option, with no mention of any new otions on the master) : You propose to do compressed transfers over HTTP ignoring the standard : support for compressed transfers in HTTP. Programming that with a : library doesn't make it "standard". : >> open a JIRA issue. we : > will use a gzip on both ends of the pipe . On : > the slave : >> side you can : > say : > true : > as an extra option to compress and : >> send : > data from server : > --Noble -Hoss
Re: replication handler - compression
You propose to do compressed transfers over HTTP ignoring the standard support for compressed transfers in HTTP. Programming that with a library doesn't make it "standard". In Ultraseek, we implemented index synchronization over HTTP with compression. It wasn't that hard. I doubt that compression will make a huge difference, Lucene uses reasonable compression in the indexes already. wunder On 10/29/08 10:35 AM, "Noble Paul നോബിള് नोब्ळ्" <[EMAIL PROTECTED]> wrote: > we are not doing anything non-standard GZipInputStream/GZipOutputStream are > standards. But asking users to setup an extra apache is not fair if we can > manage it with say 5 lines of code On Wed, Oct 29, 2008 at 7:44 PM, Walter > Underwood <[EMAIL PROTECTED]> wrote: > Why invent something when > compression is standard in HTTP? --wunder > > On 10/29/08 4:35 AM, "Noble Paul > നോബിള് नोब्ळ्" <[EMAIL PROTECTED]> > wrote: > >> open a JIRA issue. we > will use a gzip on both ends of the pipe . On > the slave >> side you can > say > true > as an extra option to compress and >> send > data from server > --Noble > > -- --Noble Paul
Re: replication handler - compression
we are not doing anything non-standard GZipInputStream/GZipOutputStream are standards. But asking users to setup an extra apache is not fair if we can manage it with say 5 lines of code On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Why invent something when compression is standard in HTTP? --wunder > > On 10/29/08 4:35 AM, "Noble Paul നോബിള് नोब्ळ्" <[EMAIL PROTECTED]> > wrote: > >> open a JIRA issue. we will use a gzip on both ends of the pipe . On > the slave >> side you can say > true > as an extra option to compress and >> send data from server > --Noble > > -- --Noble Paul
Re: replication handler - compression
Why invent something when compression is standard in HTTP? --wunder On 10/29/08 4:35 AM, "Noble Paul നോബിള് नोब्ळ्" <[EMAIL PROTECTED]> wrote: > open a JIRA issue. we will use a gzip on both ends of the pipe . On the slave > side you can say true as an extra option to compress and > send data from server --Noble
Re: replication handler - compression
Do keep in mind that compression is a CPU intensive process so it is a trade off between CPU utilization and network bandwidth. I have see cases where compressing the data before a network transfer ended up being slower than without compression because the cost of compression and un-compression was more than the gain in network transfer. Bill On Wed, Oct 29, 2008 at 7:35 AM, Noble Paul നോബിള് नोब्ळ् < [EMAIL PROTECTED]> wrote: > open a JIRA issue. we will use a gzip on both ends of the pipe . On > the slave side you can say > true > as an extra option to compress and send data from server > --Noble > > > > > On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins > <[EMAIL PROTECTED]> wrote: > > I have now optimized the index - down to 325mb, it compresses down to > 20mb. > > > > I think the new replication thing in solr is great, but if it could > compress the files it's sending, it would be an awful lot more useful when > replicating, as we are, between sites. > > > > > > > > > > > > Simon Collins > > Systems Analyst > > > > Telephone: 01904 606 867 > > Fax Number: 01904 528 791 > > > > shoe-shop.com ltd > > Catherine House > > Northminster Business Park > > Upper Poppleton, YORK > > YO26 6QU > > www.shoe-shop.com > > > > > > This message (and any associated files) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is confidential, subject to copyright or constitutes a > trade secret. If you are not the intended recipient you are hereby notified > that any dissemination, copying or distribution of this message, or files > associated with this message, is strictly prohibited. If you have received > this message in error, please notify us immediately by replying to the > message and deleting it from your computer. Messages sent to and from us may > be monitored. > > > > Internet communications cannot be guaranteed to be secure or error-free > as information could be intercepted, corrupted, lost, destroyed, arrive late > or incomplete, or contain viruses. Therefore, we do not accept > responsibility for any errors or omissions that are present in this message, > or any attachment, that have arisen as a result of e-mail transmission. If > verification is required, please request a hard-copy version. Any views or > opinions presented are solely those of the author and do not necessarily > represent those of the company. (PAVD001) > > Shoe-shop.com Limited is a company registered in England and Wales with > company number 03817232. Vat Registration GB 734 256 241. Registered Office > Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 > 6QU. > > > > > > -Original Message- > > > > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > > Sent: 29 October 2008 03:29 > > To: solr-user@lucene.apache.org > > Subject: Re: replication handler - compression > > > > The new replication feature does not use any unix commands , it is > > pure java. On the fly compression is hard but possible. > > I wish to repeat the question. Did you optimize the index? Because a > > 10:1 compression is not usually observed in an optimized index. Our > > own experiments showed compression of around 10:6 for optimized > > indexes. > > > > --Noble > > > > On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> > wrote: > >> Aha! The hint to the actual problem: "When compressed with winzip". You > are running Solr on Windows. > >> > >> Snapshots don't work on Windows: they depend on a Unix file system > feature. You may be copying the entire index. Not just that, it could be > inconsistent. > >> This is a fine topic for a "best practices for Windows" wiki page. > >> > >> The 'scp' program what you want. It has an option to compress on the fly > without saving anything to disk. 'Rcopy' in particular has features to only > copy what is not already at the target. The Putty suite 'pscp' program also > has the compression feature. > >> > >> Lance > >> > >> -Original Message- > >> From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > >> Sent: Monday, October 27, 2008 9:36 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: replication handler - compression > >> > >>> It is useful only if your
Re: replication handler - compression
open a JIRA issue. we will use a gzip on both ends of the pipe . On the slave side you can say true as an extra option to compress and send data from server --Noble On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins <[EMAIL PROTECTED]> wrote: > I have now optimized the index - down to 325mb, it compresses down to 20mb. > > I think the new replication thing in solr is great, but if it could compress > the files it's sending, it would be an awful lot more useful when > replicating, as we are, between sites. > > > > > > Simon Collins > Systems Analyst > > Telephone: 01904 606 867 > Fax Number: 01904 528 791 > > shoe-shop.com ltd > Catherine House > Northminster Business Park > Upper Poppleton, YORK > YO26 6QU > www.shoe-shop.com > > > This message (and any associated files) is intended only for the use of the > individual or entity to which it is addressed and may contain information > that is confidential, subject to copyright or constitutes a trade secret. If > you are not the intended recipient you are hereby notified that any > dissemination, copying or distribution of this message, or files associated > with this message, is strictly prohibited. If you have received this message > in error, please notify us immediately by replying to the message and > deleting it from your computer. Messages sent to and from us may be monitored. > > Internet communications cannot be guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive late or > incomplete, or contain viruses. Therefore, we do not accept responsibility > for any errors or omissions that are present in this message, or any > attachment, that have arisen as a result of e-mail transmission. If > verification is required, please request a hard-copy version. Any views or > opinions presented are solely those of the author and do not necessarily > represent those of the company. (PAVD001) > Shoe-shop.com Limited is a company registered in England and Wales with > company number 03817232. Vat Registration GB 734 256 241. Registered Office > Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU. > > > -Original Message- > > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > Sent: 29 October 2008 03:29 > To: solr-user@lucene.apache.org > Subject: Re: replication handler - compression > > The new replication feature does not use any unix commands , it is > pure java. On the fly compression is hard but possible. > I wish to repeat the question. Did you optimize the index? Because a > 10:1 compression is not usually observed in an optimized index. Our > own experiments showed compression of around 10:6 for optimized > indexes. > > --Noble > > On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> Aha! The hint to the actual problem: "When compressed with winzip". You are >> running Solr on Windows. >> >> Snapshots don't work on Windows: they depend on a Unix file system feature. >> You may be copying the entire index. Not just that, it could be inconsistent. >> This is a fine topic for a "best practices for Windows" wiki page. >> >> The 'scp' program what you want. It has an option to compress on the fly >> without saving anything to disk. 'Rcopy' in particular has features to only >> copy what is not already at the target. The Putty suite 'pscp' program also >> has the compression feature. >> >> Lance >> >> -Original Message- >> From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] >> Sent: Monday, October 27, 2008 9:36 PM >> To: solr-user@lucene.apache.org >> Subject: Re: replication handler - compression >> >>> It is useful only if your bandwidth is very low. >>> Otherwise the cost of copying/comprressing/decompressing can take up >>> more time than we save. >> >> I mean compressing and transferring. If the optimized index itself has a >> very high compression ratio then it is worth exploring the option of >> compresssing and transferring. And do not assume that all the files in the >> index directory is transferred during replication. It only transfers the >> files which are used by the current commit point and the ones which are >> absent in the slave >> >> >>> >>> >>> >>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins >>> <[EMAIL PROTECTED]> wrote: >>>> Is there an optio
Re: replication handler - compression
Hi, Is the new replication feature based on HTTP requests between sites ? If yes, then I guess it might be possible to configure an HTTP server with mod_deflate so the data is compressed on the fly. C. Simon Collins wrote: I have now optimized the index - down to 325mb, it compresses down to 20mb. I think the new replication thing in solr is great, but if it could compress the files it's sending, it would be an awful lot more useful when replicating, as we are, between sites. Simon Collins Systems Analyst Telephone: 01904 606 867 Fax Number: 01904 528 791 shoe-shop.com ltd Catherine House Northminster Business Park Upper Poppleton, YORK YO26 6QU www.shoe-shop.com This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored. Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001) Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU. -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: 29 October 2008 03:29 To: solr-user@lucene.apache.org Subject: Re: replication handler - compression The new replication feature does not use any unix commands , it is pure java. On the fly compression is hard but possible. I wish to repeat the question. Did you optimize the index? Because a 10:1 compression is not usually observed in an optimized index. Our own experiments showed compression of around 10:6 for optimized indexes. --Noble On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows. Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent. This is a fine topic for a "best practices for Windows" wiki page. The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target. The Putty suite 'pscp' program also has the compression feature. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 9:36 PM To: solr-user@lucene.apache.org Subject: Re: replication handler - compression It is useful only if your bandwidth is very low. Otherwise the cost of copying/comprressing/decompressing can take up more time than we save. I mean compressing and transferring. If the optimized index itself has a very high compression ratio then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins <[EMAIL PROTECTED]> wrote: Is there an option on the replication handler to compress the files? I'm trying to replicate off site, and seem to have accumulated about 1.4gb. When compressed with winzip of all things i can get this down to about 10% of the size. Is compression in the pipeline / can it be if not! simon This message has been scanned for malware by SurfControl plc. www.surfcontrol.com -- --Noble Paul -- --Noble Paul
RE: replication handler - compression
I have now optimized the index - down to 325mb, it compresses down to 20mb. I think the new replication thing in solr is great, but if it could compress the files it's sending, it would be an awful lot more useful when replicating, as we are, between sites. Simon Collins Systems Analyst Telephone: 01904 606 867 Fax Number: 01904 528 791 shoe-shop.com ltd Catherine House Northminster Business Park Upper Poppleton, YORK YO26 6QU www.shoe-shop.com This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored. Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001) Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU. -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: 29 October 2008 03:29 To: solr-user@lucene.apache.org Subject: Re: replication handler - compression The new replication feature does not use any unix commands , it is pure java. On the fly compression is hard but possible. I wish to repeat the question. Did you optimize the index? Because a 10:1 compression is not usually observed in an optimized index. Our own experiments showed compression of around 10:6 for optimized indexes. --Noble On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Aha! The hint to the actual problem: "When compressed with winzip". You are > running Solr on Windows. > > Snapshots don't work on Windows: they depend on a Unix file system feature. > You may be copying the entire index. Not just that, it could be inconsistent. > This is a fine topic for a "best practices for Windows" wiki page. > > The 'scp' program what you want. It has an option to compress on the fly > without saving anything to disk. 'Rcopy' in particular has features to only > copy what is not already at the target. The Putty suite 'pscp' program also > has the compression feature. > > Lance > > -Original Message- > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > Sent: Monday, October 27, 2008 9:36 PM > To: solr-user@lucene.apache.org > Subject: Re: replication handler - compression > >> It is useful only if your bandwidth is very low. >> Otherwise the cost of copying/comprressing/decompressing can take up >> more time than we save. > > I mean compressing and transferring. If the optimized index itself has a very > high compression ratio then it is worth exploring the option of compresssing > and transferring. And do not assume that all the files in the index directory > is transferred during replication. It only transfers the files which are used > by the current commit point and the ones which are absent in the slave > > >> >> >> >> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins >> <[EMAIL PROTECTED]> wrote: >>> Is there an option on the replication handler to compress the files? >>> >>> >>> >>> I'm trying to replicate off site, and seem to have accumulated about >>> 1.4gb. When compressed with winzip of all things i can get this down >>> to about 10% of the size. >>> >>> >>> >>> Is compression in the pipeline / can it be if not! >>> >>> >>> >>> simon >>> >>> >>> >>> This message has been scanned for malware by SurfControl plc. >>> www.surfcontrol.com >>> >> >> >> >> -- >> --Noble Paul >> > > > > -- > --Noble Paul > > -- --Noble Paul
Re: replication handler - compression
The new replication feature does not use any unix commands , it is pure java. On the fly compression is hard but possible. I wish to repeat the question. Did you optimize the index? Because a 10:1 compression is not usually observed in an optimized index. Our own experiments showed compression of around 10:6 for optimized indexes. --Noble On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Aha! The hint to the actual problem: "When compressed with winzip". You are > running Solr on Windows. > > Snapshots don't work on Windows: they depend on a Unix file system feature. > You may be copying the entire index. Not just that, it could be inconsistent. > This is a fine topic for a "best practices for Windows" wiki page. > > The 'scp' program what you want. It has an option to compress on the fly > without saving anything to disk. 'Rcopy' in particular has features to only > copy what is not already at the target. The Putty suite 'pscp' program also > has the compression feature. > > Lance > > -Original Message- > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > Sent: Monday, October 27, 2008 9:36 PM > To: solr-user@lucene.apache.org > Subject: Re: replication handler - compression > >> It is useful only if your bandwidth is very low. >> Otherwise the cost of copying/comprressing/decompressing can take up >> more time than we save. > > I mean compressing and transferring. If the optimized index itself has a very > high compression ratio then it is worth exploring the option of compresssing > and transferring. And do not assume that all the files in the index directory > is transferred during replication. It only transfers the files which are used > by the current commit point and the ones which are absent in the slave > > >> >> >> >> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins >> <[EMAIL PROTECTED]> wrote: >>> Is there an option on the replication handler to compress the files? >>> >>> >>> >>> I'm trying to replicate off site, and seem to have accumulated about >>> 1.4gb. When compressed with winzip of all things i can get this down >>> to about 10% of the size. >>> >>> >>> >>> Is compression in the pipeline / can it be if not! >>> >>> >>> >>> simon >>> >>> >>> >>> This message has been scanned for malware by SurfControl plc. >>> www.surfcontrol.com >>> >> >> >> >> -- >> --Noble Paul >> > > > > -- > --Noble Paul > > -- --Noble Paul
RE: replication handler - compression
Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows. Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent. This is a fine topic for a "best practices for Windows" wiki page. The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target. The Putty suite 'pscp' program also has the compression feature. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 9:36 PM To: solr-user@lucene.apache.org Subject: Re: replication handler - compression > It is useful only if your bandwidth is very low. > Otherwise the cost of copying/comprressing/decompressing can take up > more time than we save. I mean compressing and transferring. If the optimized index itself has a very high compression ratio then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave > > > > On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins > <[EMAIL PROTECTED]> wrote: >> Is there an option on the replication handler to compress the files? >> >> >> >> I'm trying to replicate off site, and seem to have accumulated about >> 1.4gb. When compressed with winzip of all things i can get this down >> to about 10% of the size. >> >> >> >> Is compression in the pipeline / can it be if not! >> >> >> >> simon >> >> >> >> This message has been scanned for malware by SurfControl plc. >> www.surfcontrol.com >> > > > > -- > --Noble Paul > -- --Noble Paul
Re: replication handler - compression
> It is useful only if your bandwidth is very low. > Otherwise the cost of copying/comprressing/decompressing can take up > more time than we save. I mean compressing and transferring. If the optimized index itself has a very high compression ratio then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave > > > > On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins > <[EMAIL PROTECTED]> wrote: >> Is there an option on the replication handler to compress the files? >> >> >> >> I'm trying to replicate off site, and seem to have accumulated about >> 1.4gb. When compressed with winzip of all things i can get this down to >> about 10% of the size. >> >> >> >> Is compression in the pipeline / can it be if not! >> >> >> >> simon >> >> >> >> This message has been scanned for malware by SurfControl plc. >> www.surfcontrol.com >> > > > > -- > --Noble Paul > -- --Noble Paul
Re: replication handler - compression
Are you sure you optimized the index? It is useful only if your bandwidth is very low. Otherwise the cost of copying/comprressing/decompressing can take up more time than we save. On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins <[EMAIL PROTECTED]> wrote: > Is there an option on the replication handler to compress the files? > > > > I'm trying to replicate off site, and seem to have accumulated about > 1.4gb. When compressed with winzip of all things i can get this down to > about 10% of the size. > > > > Is compression in the pipeline / can it be if not! > > > > simon > > > > This message has been scanned for malware by SurfControl plc. > www.surfcontrol.com > -- --Noble Paul