Re: field compression in solr 3.6

2012-05-24 Thread Shawn Heisey

On 5/23/2012 2:48 PM, pramila_tha...@ontla.ola.org wrote:

Hi Everyone,

solr 3.6 does not seem to be honoring the field compress.

While merging the indexes the size of Index is very big.

Is there any other way to  handle this to keep compression functionality?


Compression support was removed from Solr.  I am not clear on the 
reasons, but there was probably a good one.  The wiki says it happened 
in 1.4.1.


http://wiki.apache.org/solr/SchemaXml#Data_Types

There seems to be a patch to put compression back in, implemented in a 
different way that is not compatible with fields compressed in the old 
way.  The patch has not been committed to any Solr version.


https://issues.apache.org/jira/browse/SOLR-752

Thanks,
Shawn



Re: Field compression

2011-04-18 Thread Smiley, David W.
It's probably not accurate to say that a lot of sites were *relying* on that 
feature. It's an optimization.

Getting a working patch applying to trunk is on my TODO-list within the next 
couple months.
https://issues.apache.org/jira/browse/SOLR-752
"Watch" the issue to see when I get to it.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Apr 15, 2011, at 12:56 PM, Charlie Jackson wrote:

> I know I'm late to the party, but I recently learned that field compression 
> was removed as of Solr 1.4.1. I think a lot of sites were relying on that 
> feature, so I'm curious what people are doing now that it's gone. 
> Specifically, what are people doing to efficiently store *and highlight* 
> large fulltext fields? I can think of ways to store the text efficiently 
> (compress it myself), or highlight it (leave it uncompressed), but not both 
> at the same time.
> 
> Also, is anyone working on anything to restore compression to Solr? I 
> understand it was removed because Lucene removed support for it, but I was 
> hoping to upgrade my site to 3.1 soon and we rely on that feature.
> 
> - Charlie









Re: Field Compression

2009-06-09 Thread Michael Ludwig

Fer-Bj schrieb:

for all the documents we have a field called "small_body" , which is a
60 chars max text field that were we store the "abstract" for each
article.



we need to display this small_body we want to compress every time.


If this works like compressing individual files, the overhead for just
60 characters (which may be no more than 60 bytes) may mean that any
attempt at compression results in inflation.

On the other hand, if lower-level units (pages) are compressed (as
opposed to individual fields), then I don't know what sense a
configurable compression threshold might make.

Maybe one of the pros can clarify.


Last question: what's the best way to determine the compress
threshold ?


One fairly obvious way would be to index the same set of documents
twice, with compression and then without, and then to compare the index
size on disk. If you don't save, say, five or ten percent (YMMV), it
might not be worth the effort.

Michael Ludwig


Re: Field Compression

2009-06-04 Thread Fer-Bj

Here is what we have:

for all the documents we have a field called "small_body" , which is a 60
chars max text field that were we store the "abstract" for each article.

We have about 8,000,000 documents indexed, and usually we display this
small_body on our "listing pages". 

For each listing page we load 50 documents at the time, that is to say, we
need to display this small_body we want to compress every time.

I'll probably do the compress for this field and run a 1 week test to see
the outcome, roll it back eventually.

Last question: what's the best way to determine the compress threshold ?

Grant Ingersoll-6 wrote:
> 
> 
> On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:
> 
>>
>> It *will* cause performance issues if you load that field for a large
>> number of documents on a particular search. I know Lucene itself
>> has lazy field loading that helps in this case, but I don't know how
>> to persuade SOLR to use it (it may even lazy-load automatically).
>> But this is separate from searching...
> 
> Lazy loading is an option configured in the solrconfig.xml
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23879859.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Compression

2009-06-04 Thread Grant Ingersoll


On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:



It *will* cause performance issues if you load that field for a large
number of documents on a particular search. I know Lucene itself
has lazy field loading that helps in this case, but I don't know how
to persuade SOLR to use it (it may even lazy-load automatically).
But this is separate from searching...


Lazy loading is an option configured in the solrconfig.xml




Re: Field Compression

2009-06-04 Thread Erick Erickson
Warning: This is from a Lucene perspective
I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to
*storing* the data, not putting the tokens in the index
(this latter is what's serached)...

It *will* cause performance issues if you load that field for a large
number of documents on a particular search. I know Lucene itself
has lazy field loading that helps in this case, but I don't know how
to persuade SOLR to use it (it may even lazy-load automatically).
But this is separate from searching...

Best
er...@nottoomuchhelpbutimtrying.

On Thu, Jun 4, 2009 at 4:07 AM, Fer-Bj  wrote:

>
> Is it correct to assume that using field compression will cause performance
> issues if we decide to allow search over this field?
>
> ie:
>
>   required="true" />
>   omitNorms="true"/>
>stored="true"/>
>   omitNorms="true"/>
>
> if I decide to add "compressed=true"  to the BODY field... and a I allow
> search on body... would that be a problem?
> At the same time: if I add compressed=true , but I never do search on this
> field ?
>
>
> Stu Hood-3 wrote:
> >
> > I just finished watching this talk about a column-store RDBMS, which has
> a
> > long section on column compression. Specifically, it talks about the
> gains
> > from compressing similar data together, and how lazily decompressing data
> > only when it must be processed is great for memory/CPU cache usage.
> >
> > http://youtube.com/watch?v=yrLd-3lnZ58
> >
> > While interesting, its not relevant to Lucene's stored field storage. On
> > the other hand, it did get me thinking about stored field compression and
> > lazy field loading.
> >
> > Can anyone give me some pointers about compressThreshold values that
> would
> > be worth experimenting with? Our stored fields are often between 20 and
> > 300 characters, and we're willing to spend more time indexing if it will
> > make searching less IO bound.
> >
> > Thanks,
> >
> > Stu Hood
> > Architecture Software Developer
> > Mailtrust, a Rackspace Company
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Field-Compression-tp15258669p23865558.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Field Compression

2009-06-04 Thread Fer-Bj

Is it correct to assume that using field compression will cause performance
issues if we decide to allow search over this field?

ie:

  
 
   
 

if I decide to add "compressed=true"  to the BODY field... and a I allow
search on body... would that be a problem?
At the same time: if I add compressed=true , but I never do search on this
field ?
  

Stu Hood-3 wrote:
> 
> I just finished watching this talk about a column-store RDBMS, which has a
> long section on column compression. Specifically, it talks about the gains
> from compressing similar data together, and how lazily decompressing data
> only when it must be processed is great for memory/CPU cache usage.
> 
> http://youtube.com/watch?v=yrLd-3lnZ58
> 
> While interesting, its not relevant to Lucene's stored field storage. On
> the other hand, it did get me thinking about stored field compression and
> lazy field loading.
> 
> Can anyone give me some pointers about compressThreshold values that would
> be worth experimenting with? Our stored fields are often between 20 and
> 300 characters, and we're willing to spend more time indexing if it will
> make searching less IO bound.
> 
> Thanks,
> 
> Stu Hood
> Architecture Software Developer
> Mailtrust, a Rackspace Company
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23865558.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Compression

2008-02-03 Thread Mike Klaas


On 3-Feb-08, at 1:34 PM, Stu Hood wrote:

I just finished watching this talk about a column-store RDBMS,  
which has a long section on column compression. Specifically, it  
talks about the gains from compressing similar data together, and  
how lazily decompressing data only when it must be processed is  
great for memory/CPU cache usage.


http://youtube.com/watch?v=yrLd-3lnZ58

While interesting, its not relevant to Lucene's stored field  
storage. On the other hand, it did get me thinking about stored  
field compression and lazy field loading.


Can anyone give me some pointers about compressThreshold values  
that would be worth experimenting with? Our stored fields are often  
between 20 and 300 characters, and we're willing to spend more time  
indexing if it will make searching less IO bound.


Field compression can save you space and converts the field into a  
binary field, which is lazy-loaded more efficiently than a string  
field.  As for the threshold, I use 200 on a multi-kilobyte field,  
but this doesn't mean that it isn't effective on smaller fields.   
Experimentation on small indices followed by claculating the avg.  
stored bytes/docs is usually fruitful.


Of course, the best way to improve performance in this regard is to  
store the less-frequently-used fields in a parallel solr index.  This  
only works if the largest fields are the rarely-used ones, though  
(like retrieving the doc contents to create a summary).


-Mike