[jira] Commented: (LUCENE-2554) preflex codec doesn't order terms correctly

2010-07-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891364#action_12891364
 ] 

Robert Muir commented on LUCENE-2554:
-

the perf issues here are really from our contrived tests... its good to use 
_TestUtil.randomUnicodeString, but it gives you the impression there is 
something wrong with this dance and there really isnt.

I added _TestUtil.randomRealisticUnicodeString in r966878, you can swap this 
into some of these slow tests and see its definitely the problem.


> preflex codec doesn't order terms correctly
> ---
>
> Key: LUCENE-2554
> URL: https://issues.apache.org/jira/browse/LUCENE-2554
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2554.patch
>
>
> The surrogate dance in the preflex codec (which must dynamically remap terms 
> from UTF16 order to unicode code point order) is buggy.
> To better test it, I want to add a test-only codec, preflexrw, that is able 
> to write indices in the pre-flex format.  Then we should also fix tests to 
> randomly pick codecs (including preflexrw) so we better test all of our 
> codecs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2554) preflex codec doesn't order terms correctly

2010-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891585#action_12891585
 ] 

Robert Muir commented on LUCENE-2554:
-

bq. We still need a clean way to randomly swap in the preflexrw codec

I don't think we should do this with RandomIndexWriter like we do now, but pull 
this stuff out of there and move it to ant/LuceneTestCase.

I would prefer if we could supply a variable to ant (e.g. -Dtest.codec=) and 
LuceneTestCase[J4] would set the codec to this.
We could allow for a value of "random" here also to do what RIW does today.
I think this would make it easier to run the entire test suite with different 
codecs.

Also, some tests testCases might not be suitable for all codecs, and so we need 
to add annotations or some way to special-case these tests.


> preflex codec doesn't order terms correctly
> ---
>
> Key: LUCENE-2554
> URL: https://issues.apache.org/jira/browse/LUCENE-2554
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2554.patch, LUCENE-2554.patch, LUCENE-2554.patch
>
>
> The surrogate dance in the preflex codec (which must dynamically remap terms 
> from UTF16 order to unicode code point order) is buggy.
> To better test it, I want to add a test-only codec, preflexrw, that is able 
> to write indices in the pre-flex format.  Then we should also fix tests to 
> randomly pick codecs (including preflexrw) so we better test all of our 
> codecs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org