Re: field compression in solr 3.6
On 5/23/2012 2:48 PM, pramila_tha...@ontla.ola.org wrote: Hi Everyone, solr 3.6 does not seem to be honoring the field compress. While merging the indexes the size of Index is very big. Is there any other way to handle this to keep compression functionality? Compression support was removed from Solr. I am not clear on the reasons, but there was probably a good one. The wiki says it happened in 1.4.1. http://wiki.apache.org/solr/SchemaXml#Data_Types There seems to be a patch to put compression back in, implemented in a different way that is not compatible with fields compressed in the old way. The patch has not been committed to any Solr version. https://issues.apache.org/jira/browse/SOLR-752 Thanks, Shawn
Re: Field compression
It's probably not accurate to say that a lot of sites were *relying* on that feature. It's an optimization. Getting a working patch applying to trunk is on my TODO-list within the next couple months. https://issues.apache.org/jira/browse/SOLR-752 "Watch" the issue to see when I get to it. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Apr 15, 2011, at 12:56 PM, Charlie Jackson wrote: > I know I'm late to the party, but I recently learned that field compression > was removed as of Solr 1.4.1. I think a lot of sites were relying on that > feature, so I'm curious what people are doing now that it's gone. > Specifically, what are people doing to efficiently store *and highlight* > large fulltext fields? I can think of ways to store the text efficiently > (compress it myself), or highlight it (leave it uncompressed), but not both > at the same time. > > Also, is anyone working on anything to restore compression to Solr? I > understand it was removed because Lucene removed support for it, but I was > hoping to upgrade my site to 3.1 soon and we rely on that feature. > > - Charlie
Re: Field Compression
Fer-Bj schrieb: for all the documents we have a field called "small_body" , which is a 60 chars max text field that were we store the "abstract" for each article. we need to display this small_body we want to compress every time. If this works like compressing individual files, the overhead for just 60 characters (which may be no more than 60 bytes) may mean that any attempt at compression results in inflation. On the other hand, if lower-level units (pages) are compressed (as opposed to individual fields), then I don't know what sense a configurable compression threshold might make. Maybe one of the pros can clarify. Last question: what's the best way to determine the compress threshold ? One fairly obvious way would be to index the same set of documents twice, with compression and then without, and then to compare the index size on disk. If you don't save, say, five or ten percent (YMMV), it might not be worth the effort. Michael Ludwig
Re: Field Compression
Here is what we have: for all the documents we have a field called "small_body" , which is a 60 chars max text field that were we store the "abstract" for each article. We have about 8,000,000 documents indexed, and usually we display this small_body on our "listing pages". For each listing page we load 50 documents at the time, that is to say, we need to display this small_body we want to compress every time. I'll probably do the compress for this field and run a 1 week test to see the outcome, roll it back eventually. Last question: what's the best way to determine the compress threshold ? Grant Ingersoll-6 wrote: > > > On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: > >> >> It *will* cause performance issues if you load that field for a large >> number of documents on a particular search. I know Lucene itself >> has lazy field loading that helps in this case, but I don't know how >> to persuade SOLR to use it (it may even lazy-load automatically). >> But this is separate from searching... > > Lazy loading is an option configured in the solrconfig.xml > > > > -- View this message in context: http://www.nabble.com/Field-Compression-tp15258669p23879859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Compression
On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even lazy-load automatically). But this is separate from searching... Lazy loading is an option configured in the solrconfig.xml
Re: Field Compression
Warning: This is from a Lucene perspective I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to *storing* the data, not putting the tokens in the index (this latter is what's serached)... It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even lazy-load automatically). But this is separate from searching... Best er...@nottoomuchhelpbutimtrying. On Thu, Jun 4, 2009 at 4:07 AM, Fer-Bj wrote: > > Is it correct to assume that using field compression will cause performance > issues if we decide to allow search over this field? > > ie: > > required="true" /> > omitNorms="true"/> >stored="true"/> > omitNorms="true"/> > > if I decide to add "compressed=true" to the BODY field... and a I allow > search on body... would that be a problem? > At the same time: if I add compressed=true , but I never do search on this > field ? > > > Stu Hood-3 wrote: > > > > I just finished watching this talk about a column-store RDBMS, which has > a > > long section on column compression. Specifically, it talks about the > gains > > from compressing similar data together, and how lazily decompressing data > > only when it must be processed is great for memory/CPU cache usage. > > > > http://youtube.com/watch?v=yrLd-3lnZ58 > > > > While interesting, its not relevant to Lucene's stored field storage. On > > the other hand, it did get me thinking about stored field compression and > > lazy field loading. > > > > Can anyone give me some pointers about compressThreshold values that > would > > be worth experimenting with? Our stored fields are often between 20 and > > 300 characters, and we're willing to spend more time indexing if it will > > make searching less IO bound. > > > > Thanks, > > > > Stu Hood > > Architecture Software Developer > > Mailtrust, a Rackspace Company > > > > > > > > -- > View this message in context: > http://www.nabble.com/Field-Compression-tp15258669p23865558.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Field Compression
Is it correct to assume that using field compression will cause performance issues if we decide to allow search over this field? ie: if I decide to add "compressed=true" to the BODY field... and a I allow search on body... would that be a problem? At the same time: if I add compressed=true , but I never do search on this field ? Stu Hood-3 wrote: > > I just finished watching this talk about a column-store RDBMS, which has a > long section on column compression. Specifically, it talks about the gains > from compressing similar data together, and how lazily decompressing data > only when it must be processed is great for memory/CPU cache usage. > > http://youtube.com/watch?v=yrLd-3lnZ58 > > While interesting, its not relevant to Lucene's stored field storage. On > the other hand, it did get me thinking about stored field compression and > lazy field loading. > > Can anyone give me some pointers about compressThreshold values that would > be worth experimenting with? Our stored fields are often between 20 and > 300 characters, and we're willing to spend more time indexing if it will > make searching less IO bound. > > Thanks, > > Stu Hood > Architecture Software Developer > Mailtrust, a Rackspace Company > > > -- View this message in context: http://www.nabble.com/Field-Compression-tp15258669p23865558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Compression
On 3-Feb-08, at 1:34 PM, Stu Hood wrote: I just finished watching this talk about a column-store RDBMS, which has a long section on column compression. Specifically, it talks about the gains from compressing similar data together, and how lazily decompressing data only when it must be processed is great for memory/CPU cache usage. http://youtube.com/watch?v=yrLd-3lnZ58 While interesting, its not relevant to Lucene's stored field storage. On the other hand, it did get me thinking about stored field compression and lazy field loading. Can anyone give me some pointers about compressThreshold values that would be worth experimenting with? Our stored fields are often between 20 and 300 characters, and we're willing to spend more time indexing if it will make searching less IO bound. Field compression can save you space and converts the field into a binary field, which is lazy-loaded more efficiently than a string field. As for the threshold, I use 200 on a multi-kilobyte field, but this doesn't mean that it isn't effective on smaller fields. Experimentation on small indices followed by claculating the avg. stored bytes/docs is usually fruitful. Of course, the best way to improve performance in this regard is to store the less-frequently-used fields in a parallel solr index. This only works if the largest fields are the rarely-used ones, though (like retrieving the doc contents to create a summary). -Mike