[jira] [Commented] (LUCENE-3264) crank up faceting module tests

2011-06-30 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057668#comment-13057668
 ] 

Shai Erera commented on LUCENE-3264:


Patch looks very good. All tests pass for me (I've applied on trunk only).

Few things I've noticed:
* Previously the tests took 1m20s to run, now they take 2m55s. I guess it's 
because previously we only created RAMDirs, while now newDirectory picks FSDir 
from time to time (10%?).

* FacetTestUtils.close*() can be removed and calls replaced by 
IOUtils.closeSafely. This is not critical, just remove redundant code.

* You added a TODO to CategoryListIteratorTest about the test failing if 
TieredMP is used. In general TieredMP is not good for the taxonomy index, which 
relies on Lucene doc IDs, and therefore segments must be merged in-order. LTW 
uses LMP specifically because of that. I will look into the test to understand 
why would it care about doc IDs, since it doesn't using the taxonomy index at 
all.

* There are few places with code like: assertTrue(Would like to test this with 
deletions!,indexReader.hasDeletions()), and assertTrue(Would like to test 
this with deletions!,indexReader.numDeletedDocs()  0) which you removed. Any 
reason?

* You added a TODO to TestScoredDocIDsUtils (about reader is read-only) -- 
you're right, the comment can be deleted.

While I reviewed, I was thinking that RandomIndexWriter is used to replace the 
IndexWriter for content indexing. While this is good, this does not cover the 
'taxonomy' indexing. So I wonder if we should have under facet/test/o.a.l.utils 
a RandomTaxonomyWriter which opens RIW internally?

This is very impressive progress Robert, thanks for doing it !

I am +1 to commit, after we resolve the tiny issues I raised above. We can add 
RandomTaxonomyWriter as a follow-on commit.

 crank up faceting module tests
 --

 Key: LUCENE-3264
 URL: https://issues.apache.org/jira/browse/LUCENE-3264
 Project: Lucene - Java
  Issue Type: Test
  Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3264.patch


 The faceting module has a large set of good tests.
 lets switch them over to use all of our test infra (randomindexwriter, random 
 iwconfig, mockanalyzer, newDirectory, ...)
 I don't want to address multipliers and atLeast() etc on this issue, I think 
 we should follow up with that on a separate issue, that also looks at speed 
 and making sure the nightly build is exhaustive.
 for now, lets just get the coverage in, it will be good to do before any 
 refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3264) crank up faceting module tests

2011-06-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057792#comment-13057792
 ] 

Robert Muir commented on LUCENE-3264:
-

{quote}
Previously the tests took 1m20s to run, now they take 2m55s. I guess it's 
because previously we only created RAMDirs, while now newDirectory picks FSDir 
from time to time (10%?).
{quote}

I don't think its from FSDir, this is now very very rarely picked. Anyway, as 
said in the issue summary, for a number of reasons, I don't want to address 
this on this issue, I want to address the coverage first.

{quote}
FacetTestUtils.close*() can be removed and calls replaced by 
IOUtils.closeSafely. This is not critical, just remove redundant code.
{quote}

ah, you are right. let's change this.

{quote}
You added a TODO to CategoryListIteratorTest about the test failing if TieredMP 
is used. In general TieredMP is not good for the taxonomy index, which relies 
on Lucene doc IDs, and therefore segments must be merged in-order. LTW uses LMP 
specifically because of that. I will look into the test to understand why would 
it care about doc IDs, since it doesn't using the taxonomy index at all.
{quote}

Right, as you said this is for the main index, not the taxonomy index. So I 
think the test just relies upon lucene doc ids, but I didnt want to just change 
the test without saying why.

{quote}
There are few places with code like: assertTrue(Would like to test this with 
deletions!,indexReader.hasDeletions()), and assertTrue(Would like to test 
this with deletions!,indexReader.numDeletedDocs()  0) which you removed. Any 
reason?
{quote}

Mostly to prevent the tests from failing. RandomIndexWriter randomly optimizes 
some times, so occasionally there are no deletions. I think this is fine 
(actually better) as far as coverage... then the deleted docs is occasionally 
null, etc.

{quote}
You added a TODO to TestScoredDocIDsUtils (about reader is read-only) – you're 
right, the comment can be deleted.
{quote}

OK, I'll nuke this.

{quote}
We can add RandomTaxonomyWriter as a follow-on commit.
{quote}

Yes, lets do this separate.


 crank up faceting module tests
 --

 Key: LUCENE-3264
 URL: https://issues.apache.org/jira/browse/LUCENE-3264
 Project: Lucene - Java
  Issue Type: Test
  Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3264.patch


 The faceting module has a large set of good tests.
 lets switch them over to use all of our test infra (randomindexwriter, random 
 iwconfig, mockanalyzer, newDirectory, ...)
 I don't want to address multipliers and atLeast() etc on this issue, I think 
 we should follow up with that on a separate issue, that also looks at speed 
 and making sure the nightly build is exhaustive.
 for now, lets just get the coverage in, it will be good to do before any 
 refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3264) crank up faceting module tests

2011-06-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057901#comment-13057901
 ] 

Robert Muir commented on LUCENE-3264:
-

{quote}
I don't understand. I thought that you said so regarding introducing atLeast 
and iterations, and I'm ok with that. I was just asking, since all you've done 
is move to use newDir, newIWC and RandomIW, how come the tests running time got 
that much longer? If it's not FSDir, do you have any idea what can cause that? 
Will RandomIW stall indexing randomly, or maybe it's newIWC which chooses to 
flush more often?
{quote}

I think the slowdown is basically linear (the tests run 2x or 3x as slow). Let 
me explain some of the reasons why you have this slowdown over just normal 
indexing without using randomiw/mockdirectorywrapper/etc:
# we call checkIndex on every directory we create after its closed. I think 
this is the right thing to do always... it does slow down the tests a bit.
# we do sometimes get crappy indexing params, crazy merge params, ridiculous 
IndexReader/Writer params (e.g. termIndexInterval=1). I think sometimes these 
non-optimal params slow things down.
# occasionally we do things like randomly fully or partially optimize, yield(), 
etc.

So while Lucene's defaults are pretty good, we are testing a bunch of 
non-default parameters and doing a bunch of other crazy things... so these slow 
down the tests!

That being said, I'm working on the speed issue at least a little here, because 
I really want to get this test improvements in,  although I really didn't want 
to work on this here (I think 1 minute extra *temporarily* to the build is no 
big deal for the additional coverage).


 crank up faceting module tests
 --

 Key: LUCENE-3264
 URL: https://issues.apache.org/jira/browse/LUCENE-3264
 Project: Lucene - Java
  Issue Type: Test
  Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3264.patch


 The faceting module has a large set of good tests.
 lets switch them over to use all of our test infra (randomindexwriter, random 
 iwconfig, mockanalyzer, newDirectory, ...)
 I don't want to address multipliers and atLeast() etc on this issue, I think 
 we should follow up with that on a separate issue, that also looks at speed 
 and making sure the nightly build is exhaustive.
 for now, lets just get the coverage in, it will be good to do before any 
 refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3264) crank up faceting module tests

2011-06-30 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057906#comment-13057906
 ] 

Shai Erera commented on LUCENE-3264:


Thanks Robert. This makes sense to me.

bq. although I really didn't want to work on this here (I think 1 minute extra 
temporarily to the build is no big deal for the additional coverage)

I apologize if that caused you to do that work here. I really only wanted to 
understand. By all means, commit the changes. The explanation makes sense and 
I'm ok with it. We can speed up things later.

 crank up faceting module tests
 --

 Key: LUCENE-3264
 URL: https://issues.apache.org/jira/browse/LUCENE-3264
 Project: Lucene - Java
  Issue Type: Test
  Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3264.patch, LUCENE-3264.patch


 The faceting module has a large set of good tests.
 lets switch them over to use all of our test infra (randomindexwriter, random 
 iwconfig, mockanalyzer, newDirectory, ...)
 I don't want to address multipliers and atLeast() etc on this issue, I think 
 we should follow up with that on a separate issue, that also looks at speed 
 and making sure the nightly build is exhaustive.
 for now, lets just get the coverage in, it will be good to do before any 
 refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org