[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

Yonik Seeley (JIRA) Sat, 19 Jun 2010 06:52:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880483#action_12880483
 ]


Yonik Seeley commented on LUCENE-2380:
--------------------------------------

It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance 
impact of going through the new FieldCache interfaces (PackedInts for ord 
lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the 
table above)... and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were 
repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the 
best... as if the JVM correctly specialized it, but then discarded it later, 
never to return.

So: you can't always depend on the JVM being able to inline stuff for you, and 
it seems very hard to determine when it can.
This obviously has implications for the lucene benchmarker too.


> Add FieldCache.getTermBytes, to load term data as byte[]
> --------------------------------------------------------
>
>                 Key: LUCENE-2380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
> LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
> LUCENE-2380_enum.patch, LUCENE-2380_enum.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

Reply via email to