[jira] [Commented] (LUCENE-6322) IndexSearcher.doc(int docID, SetfieldsToLoad) is slower in Lucene 4.9 when compared to Lucene 2.9

Adrien Grand (JIRA) Mon, 02 Mar 2015 08:24:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343325#comment-14343325
 ]


Adrien Grand commented on LUCENE-6322:
--------------------------------------

Agreed it would be nice to skip over compressed blocks when they are not needed 
instead of decompressing and then discarding the decompressed bytes. I was just 
looking at the impl and it seems that to make it work we would need to store 
the compressed length of each block and implement skipBytes on the anonymous 
DataInput created in CompressingStoredFieldsReader.document.

> IndexSearcher.doc(int docID, SetfieldsToLoad)  is slower in Lucene 4.9 when 
> compared to Lucene 2.9
> --------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6322
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6322
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 4.9
>         Environment: Windows, JDK 7/8
>            Reporter: Sekhar
>             Fix For: 4.10.x
>
>
> We use IndexSearcher.doc(int docID, SetfieldsToLoad) method to get the 
> document with selected stored fields. If we did not mention few stored fields 
> which have data more than 500KB, this call is slower in Lucene 4.9 when 
> compared to Lucene 2.9.
> I debugged the above method with Lucene 4.9 and found that 
> CompressingStoredFieldsReader#visitDocument(int docID, StoredFieldVisitor 
> visitor) is spending more time while loading file content and decompressing 
> in chunks of 16kb, even to skip the fields. It is noticeable degrade if the 
> document's field size is more than 1MB, and we call this method in loop for 
> more than 1000 such documents.
> In case of Lucene 2.9, there was no compression, and if we want to skip the 
> field, it just does file seek to set the next pointer to read the stored 
> field. For example see Lucene3xStoredFieldsReader#skipField() method how it 
> works for skipping a field in Lucene 2.9 which is VERY faster compared to 
> Lucene 4.9.
> We should have something in CompressingStoredFieldsReader to know the field’s 
> compressed length in file and just do the file seek to set the next pointer 
> instead of loading content from file and decompress that in 16KB chunks to 
> just skip the field from the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6322) IndexSearcher.doc(int docID, SetfieldsToLoad) is slower in Lucene 4.9 when compared to Lucene 2.9

Reply via email to