Re: replication handler - compression

2008-10-30 Thread Walter Underwood
It could also be that the C version is a lot more efficient than
the Java version and it could take longer regardless. I could not
find a benchmark on that, but C is usually better for bit twiddling.

wunder

On 10/30/08 10:36 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> man gzip:
> 
>-# --fast --best
>   Regulate the speed of compression using the specified digit #,
> where -1 or --fast indicates the  fastest  compres-
>   sion  method (less compression) and -9 or --best indicates the
> slowest compression method (best compression).  The
>   default compression level is -6 (that is, biased towards high
> compression at expense of speed).
> 
>  
> So it could be better than the factor of 2, but also take longer. :)
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Walter Underwood <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 30, 2008 11:52:47 AM
>> Subject: Re: replication handler - compression
>> 
>> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
>> so it isn't free.
>> 
>> $ cd index-copy
>> $ du -sk
>> 134336  .
>> $ gzip *
>> $ du -sk
>> 62084   .
>> 
>> wunder
>> 
>> On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote:
>> 
>>> Yeah.  I'm just not sure how much benefit in terms of data transfer this
>>> will
>>> save.  Has anyone tested this to see if this is even worth it?
>>> 
>>> 
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> 
>>> 
>>> - Original Message 
>>>> From: Erik Hatcher
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Thursday, October 30, 2008 9:54:28 AM
>>>> Subject: Re: replication handler - compression
>>>> 
>>>> +1 - the GzipServletFilter is the way to go.
>>>> 
>>>> Regarding request handlers reading HTTP headers, yeah,... this will
>>>> improve,
>>>> for 
>>>> sure.
>>>> 
>>>> Erik
>>>> 
>>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>>>> 
>>>>> 
>>>>> : You are partially right. Instead of the HTTP header , we use a request
>>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>>>> 
>>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>>>> clients to request compression over HTTP from servers.
>>>>> 
>>>>> replicatoin is both special enough and important enough that if we had to
>>>>> add special support to make that information available to the handler on
>>>>> the master we could.
>>>>> 
>>>>> but frankly i don't think that's neccessary: the logic to turn on
>>>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>>>> generic enough that there is no reason for it to be in a handler.  we
>>>>> could easily put it in the SolrDispatchFilter, or even in a new
>>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>>>> a GzipServletFilter in the wild that could be used as is.
>>>>> 
>>>>> then we'd have double wins: compression for replication, and compression
>>>>> of all responses generated by Solr if hte client requests it.
>>>>> 
>>>>> -Hoss
>>> 
> 



Re: replication handler - compression

2008-10-30 Thread Otis Gospodnetic
man gzip:

   -# --fast --best
  Regulate the speed of compression using the specified digit #, 
where -1 or --fast indicates the  fastest  compres-
  sion  method (less compression) and -9 or --best indicates the 
slowest compression method (best compression).  The
  default compression level is -6 (that is, biased towards high 
compression at expense of speed).

 
So it could be better than the factor of 2, but also take longer. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Walter Underwood <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 30, 2008 11:52:47 AM
> Subject: Re: replication handler - compression
> 
> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
> so it isn't free.
> 
> $ cd index-copy
> $ du -sk
> 134336  .
> $ gzip *
> $ du -sk
> 62084   .
> 
> wunder
> 
> On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote:
> 
> > Yeah.  I'm just not sure how much benefit in terms of data transfer this 
> > will
> > save.  Has anyone tested this to see if this is even worth it?
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Erik Hatcher 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, October 30, 2008 9:54:28 AM
> >> Subject: Re: replication handler - compression
> >> 
> >> +1 - the GzipServletFilter is the way to go.
> >> 
> >> Regarding request handlers reading HTTP headers, yeah,... this will 
> >> improve,
> >> for 
> >> sure.
> >> 
> >> Erik
> >> 
> >> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
> >> 
> >>> 
> >>> : You are partially right. Instead of the HTTP header , we use a request
> >>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
> >>> 
> >>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
> >>> clients to request compression over HTTP from servers.
> >>> 
> >>> replicatoin is both special enough and important enough that if we had to
> >>> add special support to make that information available to the handler on
> >>> the master we could.
> >>> 
> >>> but frankly i don't think that's neccessary: the logic to turn on
> >>> compression if the client requests it using "Accept-Encoding: gzip" is
> >>> generic enough that there is no reason for it to be in a handler.  we
> >>> could easily put it in the SolrDispatchFilter, or even in a new
> >>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
> >>> a GzipServletFilter in the wild that could be used as is.
> >>> 
> >>> then we'd have double wins: compression for replication, and compression
> >>> of all responses generated by Solr if hte client requests it.
> >>> 
> >>> -Hoss
> > 



Re: replication handler - compression

2008-10-30 Thread Chris Hostetter

: Yeah.  I'm just not sure how much benefit in terms of data transfer this 
: will save.  Has anyone tested this to see if this is even worth it?

one mans trash is another mans treasure ... if you're replicating 
snapshoots very frequently within a single datacenter speed is critical
and bandwidth is free -- if you're replicating once a day from one data 
center to another over a very expensive, very small, pipe spending some 
time+cpu to compress may be worth it.

either way: it should be almost trivial to implement if people wnat to 
supply a patch, and with a simple new requestDispatcher config option, 
easy to disable completeley on the server for people who might have 
clients sending "Accept-Encodig: gzip" willy nilly


-Hoss



Re: replication handler - compression

2008-10-30 Thread Walter Underwood
CPU was at 100%, it was not IO bound. --wunder

On 10/30/08 8:58 AM, "christophe" <[EMAIL PROTECTED]> wrote:

> Gziping on disk requires quite some I/O. I guess that on the fly zipping
> should be faster.
> 
> C.
> 
> Walter Underwood wrote:
>> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
>> so it isn't free.
>> 
>> $ cd index-copy
>> $ du -sk
>> 134336  .
>> $ gzip *
>> $ du -sk
>> 62084   .
>> 
>> wunder
>> 
>> On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:
>> 
>>   
>>> Yeah.  I'm just not sure how much benefit in terms of data transfer this
>>> will
>>> save.  Has anyone tested this to see if this is even worth it?
>>> 
>>> 
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> 
>>> 
>>> - Original Message 
>>> 
>>>> From: Erik Hatcher <[EMAIL PROTECTED]>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Thursday, October 30, 2008 9:54:28 AM
>>>> Subject: Re: replication handler - compression
>>>> 
>>>> +1 - the GzipServletFilter is the way to go.
>>>> 
>>>> Regarding request handlers reading HTTP headers, yeah,... this will
>>>> improve,
>>>> for 
>>>> sure.
>>>> 
>>>> Erik
>>>> 
>>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>>>> 
>>>>   
>>>>> : You are partially right. Instead of the HTTP header , we use a request
>>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>>>> 
>>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>>>> clients to request compression over HTTP from servers.
>>>>> 
>>>>> replicatoin is both special enough and important enough that if we had to
>>>>> add special support to make that information available to the handler on
>>>>> the master we could.
>>>>> 
>>>>> but frankly i don't think that's neccessary: the logic to turn on
>>>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>>>> generic enough that there is no reason for it to be in a handler.  we
>>>>> could easily put it in the SolrDispatchFilter, or even in a new
>>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>>>> a GzipServletFilter in the wild that could be used as is.
>>>>> 
>>>>> then we'd have double wins: compression for replication, and compression
>>>>> of all responses generated by Solr if hte client requests it.
>>>>> 
>>>>> -Hoss
>>>>> 
>> 
>>   



Re: replication handler - compression

2008-10-30 Thread christophe
Gziping on disk requires quite some I/O. I guess that on the fly zipping 
should be faster.


C.

Walter Underwood wrote:

About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
so it isn't free.

$ cd index-copy
$ du -sk
134336  .
$ gzip *
$ du -sk
62084   .

wunder

On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

  

Yeah.  I'm just not sure how much benefit in terms of data transfer this will
save.  Has anyone tested this to see if this is even worth it?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, October 30, 2008 9:54:28 AM
Subject: Re: replication handler - compression

+1 - the GzipServletFilter is the way to go.

Regarding request handlers reading HTTP headers, yeah,... this will improve,
for 
sure.


Erik

On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:

  

: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is

hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we had to
add special support to make that information available to the handler on
the master we could.

but frankly i don't think that's neccessary: the logic to turn on
compression if the client requests it using "Accept-Encoding: gzip" is
generic enough that there is no reason for it to be in a handler.  we
could easily put it in the SolrDispatchFilter, or even in a new
ServletFilte (i'm guessing iv'e seen about 74 different implementations of
a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and compression
of all responses generated by Solr if hte client requests it.

-Hoss



  


Re: replication handler - compression

2008-10-30 Thread Walter Underwood
About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
so it isn't free.

$ cd index-copy
$ du -sk
134336  .
$ gzip *
$ du -sk
62084   .

wunder

On 10/30/08 8:20 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Yeah.  I'm just not sure how much benefit in terms of data transfer this will
> save.  Has anyone tested this to see if this is even worth it?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Erik Hatcher <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 30, 2008 9:54:28 AM
>> Subject: Re: replication handler - compression
>> 
>> +1 - the GzipServletFilter is the way to go.
>> 
>> Regarding request handlers reading HTTP headers, yeah,... this will improve,
>> for 
>> sure.
>> 
>> Erik
>> 
>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>> 
>>> 
>>> : You are partially right. Instead of the HTTP header , we use a request
>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>> 
>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>> clients to request compression over HTTP from servers.
>>> 
>>> replicatoin is both special enough and important enough that if we had to
>>> add special support to make that information available to the handler on
>>> the master we could.
>>> 
>>> but frankly i don't think that's neccessary: the logic to turn on
>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>> generic enough that there is no reason for it to be in a handler.  we
>>> could easily put it in the SolrDispatchFilter, or even in a new
>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>> a GzipServletFilter in the wild that could be used as is.
>>> 
>>> then we'd have double wins: compression for replication, and compression
>>> of all responses generated by Solr if hte client requests it.
>>> 
>>> -Hoss
> 



Re: replication handler - compression

2008-10-30 Thread Otis Gospodnetic
Yeah.  I'm just not sure how much benefit in terms of data transfer this will 
save.  Has anyone tested this to see if this is even worth it?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Erik Hatcher <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 30, 2008 9:54:28 AM
> Subject: Re: replication handler - compression
> 
> +1 - the GzipServletFilter is the way to go.
> 
> Regarding request handlers reading HTTP headers, yeah,... this will improve, 
> for 
> sure.
> 
> Erik
> 
> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
> 
> > 
> > : You are partially right. Instead of the HTTP header , we use a request
> > : parameter. (RequestHandlers cannot read HTP headers). If the param is
> > 
> > hmmm, i'm with walter: we shouldn't invent new mechanisms for
> > clients to request compression over HTTP from servers.
> > 
> > replicatoin is both special enough and important enough that if we had to
> > add special support to make that information available to the handler on
> > the master we could.
> > 
> > but frankly i don't think that's neccessary: the logic to turn on
> > compression if the client requests it using "Accept-Encoding: gzip" is
> > generic enough that there is no reason for it to be in a handler.  we
> > could easily put it in the SolrDispatchFilter, or even in a new
> > ServletFilte (i'm guessing iv'e seen about 74 different implementations of
> > a GzipServletFilter in the wild that could be used as is.
> > 
> > then we'd have double wins: compression for replication, and compression
> > of all responses generated by Solr if hte client requests it.
> > 
> > -Hoss



Re: replication handler - compression

2008-10-30 Thread Erik Hatcher

+1 - the GzipServletFilter is the way to go.

Regarding request handlers reading HTTP headers, yeah,... this will  
improve, for sure.


Erik

On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:



: You are partially right. Instead of the HTTP header , we use a  
request
: parameter. (RequestHandlers cannot read HTP headers). If the param  
is


hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we  
had to
add special support to make that information available to the  
handler on

the master we could.

but frankly i don't think that's neccessary: the logic to turn on
compression if the client requests it using "Accept-Encoding: gzip" is
generic enough that there is no reason for it to be in a handler.  we
could easily put it in the SolrDispatchFilter, or even in a new
ServletFilte (i'm guessing iv'e seen about 74 different  
implementations of

a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and  
compression

of all responses generated by Solr if hte client requests it.

-Hoss




Re: replication handler - compression

2008-10-29 Thread Chris Hostetter

: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is

hmmm, i'm with walter: we shouldn't invent new mechanisms for 
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we had to 
add special support to make that information available to the handler on 
the master we could.

but frankly i don't think that's neccessary: the logic to turn on 
compression if the client requests it using "Accept-Encoding: gzip" is 
generic enough that there is no reason for it to be in a handler.  we 
could easily put it in the SolrDispatchFilter, or even in a new 
ServletFilte (i'm guessing iv'e seen about 74 different implementations of 
a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and compression 
of all responses generated by Solr if hte client requests it.

-Hoss



Re: replication handler - compression

2008-10-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hoss,
You are partially right. Instead of the HTTP header , we use a request
parameter. (RequestHandlers cannot read HTP headers). If the param is
present it wraps the response in an zip outputstream. It is configured
in the slave because Every slave may not want compression. . Slaves
which are near can skip it.



On Thu, Oct 30, 2008 at 3:54 AM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
>
> My understanding of Noble's comment (and i could be wrong, i'm reading
> between the lines) is that if you specify the new setting he's suggesting
> when initializing the replication handler on the slave, then the slave
> should start using an "Accept-Encoding: gzip" header when querying the
> master, and that when recieving this header, the master will start
> wrapping the response in a "Content-Encoding: gzip"
>
> (I'm making this assumption based on his note about this being a new slave
> config option, with no mention of any new otions on the master)
>
> : You propose to do compressed transfers over HTTP ignoring the standard
> : support for compressed transfers in HTTP. Programming that with a
> : library doesn't make it "standard".
>
> : >> open a JIRA issue. we
> : > will use a gzip on both ends of the pipe . On
> : > the slave
> : >> side you can
> : > say
> : > true
> : > as an extra option to compress and
> : >> send
> : > data from server
> : > --Noble
>
>
> -Hoss
>
>



-- 
--Noble Paul


Re: replication handler - compression

2008-10-29 Thread Chris Hostetter

My understanding of Noble's comment (and i could be wrong, i'm reading 
between the lines) is that if you specify the new setting he's suggesting 
when initializing the replication handler on the slave, then the slave 
should start using an "Accept-Encoding: gzip" header when querying the 
master, and that when recieving this header, the master will start 
wrapping the response in a "Content-Encoding: gzip"  

(I'm making this assumption based on his note about this being a new slave 
config option, with no mention of any new otions on the master)

: You propose to do compressed transfers over HTTP ignoring the standard
: support for compressed transfers in HTTP. Programming that with a
: library doesn't make it "standard".

: >> open a JIRA issue. we
: > will use a gzip on both ends of the pipe . On
: > the slave
: >> side you can
: > say
: > true
: > as an extra option to compress and
: >> send
: > data from server
: > --Noble


-Hoss



Re: replication handler - compression

2008-10-29 Thread Walter Underwood
You propose to do compressed transfers over HTTP ignoring the standard
support for compressed transfers in HTTP. Programming that with a
library doesn't make it "standard".

In Ultraseek, we implemented index synchronization over HTTP with
compression. It wasn't that hard.

I doubt that compression will make a huge difference, Lucene uses
reasonable compression in the indexes already.

wunder

On 10/29/08 10:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <[EMAIL PROTECTED]>
wrote:

> we are not doing anything non-standard
GZipInputStream/GZipOutputStream are
> standards. But asking users to
setup an extra apache is not fair if we can
> manage it with say 5 lines
of code

On Wed, Oct 29, 2008 at 7:44 PM, Walter
> Underwood
<[EMAIL PROTECTED]> wrote:
> Why invent something when
> compression is standard in HTTP? --wunder
>
> On 10/29/08 4:35 AM, "Noble Paul
> നോബിള്‍ नोब्ळ्" <[EMAIL PROTECTED]>
> wrote:
>
>> open a JIRA issue. we
> will use a gzip on both ends of the pipe . On
> the slave
>> side you can
> say
> true
> as an extra option to compress and
>> send
> data from server
> --Noble
>
>



--
--Noble Paul




Re: replication handler - compression

2008-10-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
we are not doing anything non-standard
GZipInputStream/GZipOutputStream are standards. But asking users to
setup an extra apache is not fair if we can manage it with say 5 lines
of code

On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> Why invent something when compression is standard in HTTP? --wunder
>
> On 10/29/08 4:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <[EMAIL PROTECTED]>
> wrote:
>
>> open a JIRA issue. we will use a gzip on both ends of the pipe . On
> the slave
>> side you can say
> true
> as an extra option to compress and
>> send data from server
> --Noble
>
>



-- 
--Noble Paul


Re: replication handler - compression

2008-10-29 Thread Walter Underwood
Why invent something when compression is standard in HTTP? --wunder

On 10/29/08 4:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <[EMAIL PROTECTED]>
wrote:

> open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave
> side you can say
true
as an extra option to compress and
> send data from server
--Noble



Re: replication handler - compression

2008-10-29 Thread Bill Au
Do keep in mind that compression is a CPU intensive process so it is a trade
off between CPU utilization and network bandwidth.  I have see cases where
compressing the data before a network transfer ended up being slower than
without compression because the cost of compression and un-compression was
more than the gain in network transfer.

Bill

On Wed, Oct 29, 2008 at 7:35 AM, Noble Paul നോബിള്‍ नोब्ळ् <
[EMAIL PROTECTED]> wrote:

> open a JIRA issue. we will use a gzip on both ends of the pipe . On
> the slave side you can say
> true
> as an extra option to compress and send data from server
> --Noble
>
>
>
>
> On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins
> <[EMAIL PROTECTED]> wrote:
> > I have now optimized the index - down to 325mb, it compresses down to
> 20mb.
> >
> > I think the new replication thing in solr is great, but if it could
> compress the files it's sending, it would be an awful lot more useful when
> replicating, as we are, between sites.
> >
> >
> >
> > 
> >
> > Simon Collins
> > Systems Analyst
> >
> > Telephone: 01904 606 867
> > Fax Number: 01904 528 791
> >
> > shoe-shop.com ltd
> > Catherine House
> > Northminster Business Park
> > Upper Poppleton, YORK
> > YO26 6QU
> > www.shoe-shop.com
> > 
> >
> > This message (and any associated files) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is confidential, subject to copyright or constitutes a
> trade secret. If you are not the intended recipient you are hereby notified
> that any dissemination, copying or distribution of this message, or files
> associated with this message, is strictly prohibited. If you have received
> this message in error, please notify us immediately by replying to the
> message and deleting it from your computer. Messages sent to and from us may
> be monitored.
> >
> > Internet communications cannot be guaranteed to be secure or error-free
> as information could be intercepted, corrupted, lost, destroyed, arrive late
> or incomplete, or contain viruses. Therefore, we do not accept
> responsibility for any errors or omissions that are present in this message,
> or any attachment, that have arisen as a result of e-mail transmission. If
> verification is required, please request a hard-copy version. Any views or
> opinions presented are solely those of the author and do not necessarily
> represent those of the company. (PAVD001)
> > Shoe-shop.com Limited is a company registered in England and Wales with
> company number 03817232. Vat Registration GB 734 256 241. Registered Office
> Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26
> 6QU.
> >
> >
> > -Original Message-
> >
> > From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> > Sent: 29 October 2008 03:29
> > To: solr-user@lucene.apache.org
> > Subject: Re: replication handler - compression
> >
> > The new replication feature does not use any unix commands , it is
> > pure java.  On the fly compression is hard but possible.
> > I wish to repeat the question. Did you optimize the index? Because a
> > 10:1 compression is not usually observed in an optimized index. Our
> > own experiments showed compression of around 10:6 for optimized
> > indexes.
> >
> > --Noble
> >
> > On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]>
> wrote:
> >> Aha! The hint to the actual problem: "When compressed with winzip". You
> are running Solr on Windows.
> >>
> >> Snapshots don't work on Windows: they depend on a Unix file system
> feature. You may be copying the entire index. Not just that, it could be
> inconsistent.
> >> This is a fine topic for a "best practices for Windows" wiki page.
> >>
> >> The 'scp' program what you want. It has an option to compress on the fly
> without saving anything to disk. 'Rcopy' in particular has features to only
> copy what is not already at the target.  The Putty suite 'pscp' program also
> has the compression feature.
> >>
> >> Lance
> >>
> >> -Original Message-
> >> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, October 27, 2008 9:36 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: replication handler - compression
> >>
> >>> It is useful only if your

Re: replication handler - compression

2008-10-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave side you can say
true
as an extra option to compress and send data from server
--Noble




On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins
<[EMAIL PROTECTED]> wrote:
> I have now optimized the index - down to 325mb, it compresses down to 20mb.
>
> I think the new replication thing in solr is great, but if it could compress 
> the files it's sending, it would be an awful lot more useful when 
> replicating, as we are, between sites.
>
>
>
> 
>
> Simon Collins
> Systems Analyst
>
> Telephone: 01904 606 867
> Fax Number: 01904 528 791
>
> shoe-shop.com ltd
> Catherine House
> Northminster Business Park
> Upper Poppleton, YORK
> YO26 6QU
> www.shoe-shop.com
> 
>
> This message (and any associated files) is intended only for the use of the 
> individual or entity to which it is addressed and may contain information 
> that is confidential, subject to copyright or constitutes a trade secret. If 
> you are not the intended recipient you are hereby notified that any 
> dissemination, copying or distribution of this message, or files associated 
> with this message, is strictly prohibited. If you have received this message 
> in error, please notify us immediately by replying to the message and 
> deleting it from your computer. Messages sent to and from us may be monitored.
>
> Internet communications cannot be guaranteed to be secure or error-free as 
> information could be intercepted, corrupted, lost, destroyed, arrive late or 
> incomplete, or contain viruses. Therefore, we do not accept responsibility 
> for any errors or omissions that are present in this message, or any 
> attachment, that have arisen as a result of e-mail transmission. If 
> verification is required, please request a hard-copy version. Any views or 
> opinions presented are solely those of the author and do not necessarily 
> represent those of the company. (PAVD001)
> Shoe-shop.com Limited is a company registered in England and Wales with 
> company number 03817232. Vat Registration GB 734 256 241. Registered Office 
> Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.
>
>
> -Original Message-
>
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: 29 October 2008 03:29
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
> The new replication feature does not use any unix commands , it is
> pure java.  On the fly compression is hard but possible.
> I wish to repeat the question. Did you optimize the index? Because a
> 10:1 compression is not usually observed in an optimized index. Our
> own experiments showed compression of around 10:6 for optimized
> indexes.
>
> --Noble
>
> On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
>> Aha! The hint to the actual problem: "When compressed with winzip". You are 
>> running Solr on Windows.
>>
>> Snapshots don't work on Windows: they depend on a Unix file system feature. 
>> You may be copying the entire index. Not just that, it could be inconsistent.
>> This is a fine topic for a "best practices for Windows" wiki page.
>>
>> The 'scp' program what you want. It has an option to compress on the fly 
>> without saving anything to disk. 'Rcopy' in particular has features to only 
>> copy what is not already at the target.  The Putty suite 'pscp' program also 
>> has the compression feature.
>>
>> Lance
>>
>> -Original Message-
>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
>> Sent: Monday, October 27, 2008 9:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: replication handler - compression
>>
>>> It is useful only if your bandwidth is very low.
>>> Otherwise the cost of copying/comprressing/decompressing can take up
>>> more time than we save.
>>
>> I mean compressing and transferring. If the optimized index itself has a 
>> very high compression ratio  then it is worth exploring the option of 
>> compresssing and transferring. And do not assume that all the files in the 
>> index directory is transferred during replication. It only transfers the 
>> files which are used by the current commit point and the ones which are 
>> absent in the slave
>>
>>
>>>
>>>
>>>
>>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>>> <[EMAIL PROTECTED]> wrote:
>>>> Is there an optio

Re: replication handler - compression

2008-10-29 Thread christophe

Hi,

Is the new replication feature based on HTTP requests between sites ?
If yes, then I guess it might be possible to configure an HTTP server 
with mod_deflate so the data is compressed on the fly.


C.

Simon Collins wrote:

I have now optimized the index - down to 325mb, it compresses down to 20mb.

I think the new replication thing in solr is great, but if it could compress 
the files it's sending, it would be an awful lot more useful when replicating, 
as we are, between sites.





Simon Collins
Systems Analyst

Telephone: 01904 606 867
Fax Number: 01904 528 791

shoe-shop.com ltd
Catherine House
Northminster Business Park
Upper Poppleton, YORK
YO26 6QU
www.shoe-shop.com


This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored. 

Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001) 
Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.



-Original Message-

From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: 29 October 2008 03:29

To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression

The new replication feature does not use any unix commands , it is
pure java.  On the fly compression is hard but possible.
I wish to repeat the question. Did you optimize the index? Because a
10:1 compression is not usually observed in an optimized index. Our
own experiments showed compression of around 10:6 for optimized
indexes.

--Noble

On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
  

Aha! The hint to the actual problem: "When compressed with winzip". You are 
running Solr on Windows.

Snapshots don't work on Windows: they depend on a Unix file system feature. You 
may be copying the entire index. Not just that, it could be inconsistent.
This is a fine topic for a "best practices for Windows" wiki page.

The 'scp' program what you want. It has an option to compress on the fly 
without saving anything to disk. 'Rcopy' in particular has features to only 
copy what is not already at the target.  The Putty suite 'pscp' program also 
has the compression feature.

Lance

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
Sent: Monday, October 27, 2008 9:36 PM
To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression



It is useful only if your bandwidth is very low.
Otherwise the cost of copying/comprressing/decompressing can take up
more time than we save.
  

I mean compressing and transferring. If the optimized index itself has a very 
high compression ratio  then it is worth exploring the option of compresssing 
and transferring. And do not assume that all the files in the index directory 
is transferred during replication. It only transfers the files which are used 
by the current commit point and the ones which are absent in the slave





On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
<[EMAIL PROTECTED]> wrote:
  

Is there an option on the replication handler to compress the files?



I'm trying to replicate off site, and seem to have accumulated about
1.4gb. When compressed with winzip of all things i can get this down
to about 10% of the size.



Is compression in the pipeline / can it be if not!



simon



This message has been scanned for malware by SurfControl plc.
www.surfcontrol.com




--
--Noble Paul

  


--
--Noble Paul







  


RE: replication handler - compression

2008-10-29 Thread Simon Collins
I have now optimized the index - down to 325mb, it compresses down to 20mb.

I think the new replication thing in solr is great, but if it could compress 
the files it's sending, it would be an awful lot more useful when replicating, 
as we are, between sites.





Simon Collins
Systems Analyst

Telephone: 01904 606 867
Fax Number: 01904 528 791

shoe-shop.com ltd
Catherine House
Northminster Business Park
Upper Poppleton, YORK
YO26 6QU
www.shoe-shop.com


This message (and any associated files) is intended only for the use of the 
individual or entity to which it is addressed and may contain information that 
is confidential, subject to copyright or constitutes a trade secret. If you are 
not the intended recipient you are hereby notified that any dissemination, 
copying or distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, please 
notify us immediately by replying to the message and deleting it from your 
computer. Messages sent to and from us may be monitored. 

Internet communications cannot be guaranteed to be secure or error-free as 
information could be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete, or contain viruses. Therefore, we do not accept responsibility for 
any errors or omissions that are present in this message, or any attachment, 
that have arisen as a result of e-mail transmission. If verification is 
required, please request a hard-copy version. Any views or opinions presented 
are solely those of the author and do not necessarily represent those of the 
company. (PAVD001) 
Shoe-shop.com Limited is a company registered in England and Wales with company 
number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine 
House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.


-Original Message-

From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: 29 October 2008 03:29
To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression

The new replication feature does not use any unix commands , it is
pure java.  On the fly compression is hard but possible.
I wish to repeat the question. Did you optimize the index? Because a
10:1 compression is not usually observed in an optimized index. Our
own experiments showed compression of around 10:6 for optimized
indexes.

--Noble

On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Aha! The hint to the actual problem: "When compressed with winzip". You are 
> running Solr on Windows.
>
> Snapshots don't work on Windows: they depend on a Unix file system feature. 
> You may be copying the entire index. Not just that, it could be inconsistent.
> This is a fine topic for a "best practices for Windows" wiki page.
>
> The 'scp' program what you want. It has an option to compress on the fly 
> without saving anything to disk. 'Rcopy' in particular has features to only 
> copy what is not already at the target.  The Putty suite 'pscp' program also 
> has the compression feature.
>
> Lance
>
> -Original Message-
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 27, 2008 9:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
>> It is useful only if your bandwidth is very low.
>> Otherwise the cost of copying/comprressing/decompressing can take up
>> more time than we save.
>
> I mean compressing and transferring. If the optimized index itself has a very 
> high compression ratio  then it is worth exploring the option of compresssing 
> and transferring. And do not assume that all the files in the index directory 
> is transferred during replication. It only transfers the files which are used 
> by the current commit point and the ones which are absent in the slave
>
>
>>
>>
>>
>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>> <[EMAIL PROTECTED]> wrote:
>>> Is there an option on the replication handler to compress the files?
>>>
>>>
>>>
>>> I'm trying to replicate off site, and seem to have accumulated about
>>> 1.4gb. When compressed with winzip of all things i can get this down
>>> to about 10% of the size.
>>>
>>>
>>>
>>> Is compression in the pipeline / can it be if not!
>>>
>>>
>>>
>>> simon
>>>
>>>
>>>
>>> This message has been scanned for malware by SurfControl plc.
>>> www.surfcontrol.com
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul


Re: replication handler - compression

2008-10-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
The new replication feature does not use any unix commands , it is
pure java.  On the fly compression is hard but possible.
I wish to repeat the question. Did you optimize the index? Because a
10:1 compression is not usually observed in an optimized index. Our
own experiments showed compression of around 10:6 for optimized
indexes.

--Noble

On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Aha! The hint to the actual problem: "When compressed with winzip". You are 
> running Solr on Windows.
>
> Snapshots don't work on Windows: they depend on a Unix file system feature. 
> You may be copying the entire index. Not just that, it could be inconsistent.
> This is a fine topic for a "best practices for Windows" wiki page.
>
> The 'scp' program what you want. It has an option to compress on the fly 
> without saving anything to disk. 'Rcopy' in particular has features to only 
> copy what is not already at the target.  The Putty suite 'pscp' program also 
> has the compression feature.
>
> Lance
>
> -Original Message-
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 27, 2008 9:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
>> It is useful only if your bandwidth is very low.
>> Otherwise the cost of copying/comprressing/decompressing can take up
>> more time than we save.
>
> I mean compressing and transferring. If the optimized index itself has a very 
> high compression ratio  then it is worth exploring the option of compresssing 
> and transferring. And do not assume that all the files in the index directory 
> is transferred during replication. It only transfers the files which are used 
> by the current commit point and the ones which are absent in the slave
>
>
>>
>>
>>
>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>> <[EMAIL PROTECTED]> wrote:
>>> Is there an option on the replication handler to compress the files?
>>>
>>>
>>>
>>> I'm trying to replicate off site, and seem to have accumulated about
>>> 1.4gb. When compressed with winzip of all things i can get this down
>>> to about 10% of the size.
>>>
>>>
>>>
>>> Is compression in the pipeline / can it be if not!
>>>
>>>
>>>
>>> simon
>>>
>>>
>>>
>>> This message has been scanned for malware by SurfControl plc.
>>> www.surfcontrol.com
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul


RE: replication handler - compression

2008-10-28 Thread Lance Norskog
Aha! The hint to the actual problem: "When compressed with winzip". You are 
running Solr on Windows.

Snapshots don't work on Windows: they depend on a Unix file system feature. You 
may be copying the entire index. Not just that, it could be inconsistent.
This is a fine topic for a "best practices for Windows" wiki page.

The 'scp' program what you want. It has an option to compress on the fly 
without saving anything to disk. 'Rcopy' in particular has features to only 
copy what is not already at the target.  The Putty suite 'pscp' program also 
has the compression feature.

Lance

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 9:36 PM
To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression

> It is useful only if your bandwidth is very low.
> Otherwise the cost of copying/comprressing/decompressing can take up 
> more time than we save.

I mean compressing and transferring. If the optimized index itself has a very 
high compression ratio  then it is worth exploring the option of compresssing 
and transferring. And do not assume that all the files in the index directory 
is transferred during replication. It only transfers the files which are used 
by the current commit point and the ones which are absent in the slave


>
>
>
> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins 
> <[EMAIL PROTECTED]> wrote:
>> Is there an option on the replication handler to compress the files?
>>
>>
>>
>> I'm trying to replicate off site, and seem to have accumulated about 
>> 1.4gb. When compressed with winzip of all things i can get this down 
>> to about 10% of the size.
>>
>>
>>
>> Is compression in the pipeline / can it be if not!
>>
>>
>>
>> simon
>>
>>
>>
>> This message has been scanned for malware by SurfControl plc. 
>> www.surfcontrol.com
>>
>
>
>
> --
> --Noble Paul
>



--
--Noble Paul



Re: replication handler - compression

2008-10-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
> It is useful only if your bandwidth is very low.
> Otherwise the cost of copying/comprressing/decompressing can take up
> more time than we save.

I mean compressing and transferring. If the optimized index itself has
a very high compression ratio  then it is worth exploring the option
of compresssing and transferring. And do not assume that all the files
in the index directory is transferred during replication. It only
transfers the files which are used by the current commit point and the
ones which are absent in the slave


>
>
>
> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
> <[EMAIL PROTECTED]> wrote:
>> Is there an option on the replication handler to compress the files?
>>
>>
>>
>> I'm trying to replicate off site, and seem to have accumulated about
>> 1.4gb. When compressed with winzip of all things i can get this down to
>> about 10% of the size.
>>
>>
>>
>> Is compression in the pipeline / can it be if not!
>>
>>
>>
>> simon
>>
>>
>>
>> This message has been scanned for malware by SurfControl plc. 
>> www.surfcontrol.com
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul


Re: replication handler - compression

2008-10-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
Are you sure you optimized the index?
It is useful only if your bandwidth is very low.
Otherwise the cost of copying/comprressing/decompressing can take up
more time than we save.



On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
<[EMAIL PROTECTED]> wrote:
> Is there an option on the replication handler to compress the files?
>
>
>
> I'm trying to replicate off site, and seem to have accumulated about
> 1.4gb. When compressed with winzip of all things i can get this down to
> about 10% of the size.
>
>
>
> Is compression in the pipeline / can it be if not!
>
>
>
> simon
>
>
>
> This message has been scanned for malware by SurfControl plc. 
> www.surfcontrol.com
>



-- 
--Noble Paul