compression did a lot of good, but there is one set of use cases where we have 
significant speed loss when using defaults. 

There is lovely concept of Document Values that is perfect for smallish single 
fields, there are also stored fields that are prefect for clunky texts, but 
there is a gap in applications needing 10-20 smallish fields, and retrieving 
many documents for post-processing (e.g. think clustering-like applications). 

Stored fields went slow for this case due to compression penalty (default 
codec) and DV require seek per field… 

User has a few options to tweak, 
1. write own codec with smaller chunks  to reduce compression penalty (maybe 
Adrien comes up with more crazy speed-ups in compression, like static 
dictionaries he mentioned :)
2. pack fields into structure and and store as DV byte array and be happy with 
parsing it back and forth
3. Use old non-compressing codec

Second would probably work nice (Ido not know if DVs are intended for that?), 
but requires user to do serialisation of many fields into single byte[]… that 
kind of defeats concept of lucene fields as a user has to pack fields into 
document. That would be kind of "stored fields for collection of small-sized 
fields  over DVs" 

Thinking aloud as I am not really happy to hear lucene cannot retrieve more 
than 20-ish documents in meaningful time, It did in the past and is more than 
able to do it today, maybe for the moment not in the most comfortable way :)


  


On Jul 7, 2013, at 6:53 PM, Chris Zhang <zhangjcm...@gmail.com> wrote:

> thanks Jack,
> yes, i should evaluate lucene by query performance.
> 
> 
> On Mon, Jul 8, 2013 at 12:45 AM, Jack Krupansky 
> <j...@basetechnology.com>wrote:
> 
>> To be clear, Lucene and Solr are "search" engines, NOT "storage" engines.
>> Has someone claimed otherwise to you?
>> 
>> What is your query performance in in 4.x vs. 3.x? That's the true, proper
>> measure of Lucene and Solr performance.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Chris Zhang
>> Sent: Sunday, July 07, 2013 12:26 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Please Help solve problem of bad read performance in lucene
>> 4.2.1
>> 
>> 
>> thianks Adrien,
>> In my project, almost all hit docs are supposed to be fetched for every
>> query, what's why I am upset by the poor reading performance. Maybe I
>> should store field values which are expected to be stored in high
>> performance storage engine.
>> In the above test case, time consuming of reading all docs in lucene 3.0 is
>> about 78 sec, that reading speed is approximately 10MB/s , but 700+ sec in
>> lucene 4.2.1, which indicates reading speed is less than 1MB/s.  So I think
>> committer of lucene should pay attention to this.
>> 
>> 
>> On Sun, Jul 7, 2013 at 10:23 PM, Adrien Grand <jpou...@gmail.com> wrote:
>> 
>> Indeed, Lucene 4.1+ may be a bit slower for indices that comptelely
>>> fit in your file-system cache. On the other hand, you should see
>>> better performance with indices which are larger than the amount of
>>> physical memory of your machine. Your reading benchmark only measures
>>> IndexReader.get(int) which should only be used to display summary
>>> results (that is, only called 10 or 20 times per displayed page). Most
>>> of time, the bottleneck is rather searching which can be made more
>>> efficient on small indices by switching to an in-memory postings
>>> format.
>>> 
>>> --
>>> Adrien
>>> 
>>> ------------------------------**------------------------------**---------
>>> To unsubscribe, e-mail: 
>>> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org>
>>> For additional commands, e-mail: 
>>> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>
>>> 
>>> 
>>> 
>> 
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: 
>> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org>
>> For additional commands, e-mail: 
>> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to