[jira] Commented: (LUCENE-2252) stored field retrieve slow

John Wang (JIRA) Sat, 06 Feb 2010 19:58:54 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830641#action_12830641
 ]


John Wang commented on LUCENE-2252:
-----------------------------------

bq. I still think 4 bytes/doc is too much (its too much wasted ram for 
virtually no gain)

That depends on the application. In modern machines (at least with the machines 
we are using, e.g. a macbook pro) we can afford it :) I am not sure I agree 
with "virtually no gain" if you look at the numbers I posted. IMHO, the gain is 
significant.

I hate to get into a subjective argument on this though.

bq. I dont understand why you need something like a custom segment file to do 
this, why cant you just simply use Directory to load this particular file into 
memory for your use case?

Having a custom segment allows me to not having to get into this subjective 
argument in what is too much memory or what is the gain, since it just depends 
on my application, right?

Furthermore, with the question at hand, even if we do use Directory 
implementation Uwe suggested, it is not optimal. For my use case, the cost of 
the seek/read for the count on the data file is very wasteful. Also even for 
getting position, I can just a random access into an array compare to a 
in-memory seek,read/parse.

The very simple store mechanism we have written outside of lucene has a gain of 
>85x, yes, 8500%, over lucene stored fields. We would like to however, take 
advantage of the some of the good stuff already in lucene, e.g.  merge 
mechanism (which is very nicely done), delete handling etc.


> stored field retrieve slow
> --------------------------
>
>                 Key: LUCENE-2252
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2252
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0
>            Reporter: John Wang
>
> IndexReader.document() on a stored field is rather slow. Did a simple 
> multi-threaded test and profiled it:
> 40+% time is spent in getting the offset from the index file
> 30+% time is spent in reading the count (e.g. number of fields to load)
> Although I ran it on my lap top where the disk isn't that great, but still 
> seems to be much room in improvement, e.g. load field index file into memory 
> (for a 5M doc index, the extra memory footprint is 20MB, peanuts comparing to 
> other stuff being loaded)
> A related note, are there plans to have custom segments as part of flexible 
> indexing feature?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2252) stored field retrieve slow

Reply via email to