[jira] [Commented] (LUCENE-3262) Facet benchmarking

Doron Cohen (Commented) (JIRA) Thu, 06 Oct 2011 04:12:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121844#comment-13121844
 ]


Doron Cohen commented on LUCENE-3262:
-------------------------------------

Thanks for reviewing Shai!

* facets.alg: I agree, and added that specific facet source setting. With also 
modified the algorithm to have several rounds, some with facets, some without, 
so the effect of adding facets on performance is measured.

* PFD.readers.incRef() - once the task (e.g. OpenReaderTask) opened the reader 
that task is gone and the reader really is maintained by the PFD, therefore IMO 
it is important that the PFD maintains its ref count as it does today. I agree 
that more documentation for this can help and added something in this spirit to 
both setReader()'s:

{code}
  /**
   * Set the taxonomy reader. Takes ownership of that taxonomy reader, that is,
   * internally performs taxoReader.incRef(). 
   * @param indexReader The indexReader to set.
   */
  public synchronized void setTaxonomyReader(TaxonomyReader taxoReader) throws 
IOException {

  }

  /**
   * Set the index reader. Takes ownership of that index reader, that is,
   * internally performs indexReader.incRef(). 
   * @param indexReader The indexReader to set.
   */
  public synchronized void setIndexReader(IndexReader indexReader) throws 
IOException {
  }
{code}

* ItemSource and config props "source.xyz" - I am afraid there might be 
algorithms out there that will silently stop working, as the props names are 
not safe types or something like that. So I would rather to not alter these 
property names. Instead, following your comment, I renamed *ItemSource* to 
*ContentItemsSource* which is still extended by both ContentSource and 
FacetSource. So it is not that bad now that the property names start with 
"content." (these properties apply to all content-item-sources - that is both 
doc-content and facets). 

* ItemSource.resetInputs @SuppressWarnings("unused") - this is for the 
IOException which is not really thrown by that code (though it might be thrown 
by subclasses). Perhaps just a misconfiguration of my IDE? (I assume you have 
that warning disabled or stg?)

* PerfRunData Class.forName on String rather than class.getName() - this is in 
fact the code pattern for all reflection actions in PerfRunData, so I just 
repeated for the new code whatever was there for the existing code. However, in 
fact, I thought about the same thing when adding this code. I think the 
difference is that if config contains a specification of another class, the 
default class would not even be loaded when using strings, but it will be 
loaded (though not used) when using Class.getName(). It is not a big 
performance issue, but it does seem "cleaner" to not load classes that are not 
going to be used. In any case, if we would like to change that, should revisit 
all the .forName(String) of default impls in the benchmark (and I think there 
are a few) and it seems to not belong to this issue.

Thanks for the review, working on a new patch - there are several copy/paste 
errors in the code where a CloseTaxonomyReader by mistake sets the PFD 
IndexReader to null...
                
> Facet benchmarking
> ------------------
>
>                 Key: LUCENE-3262
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3262
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/benchmark, modules/facet
>            Reporter: Shai Erera
>            Assignee: Doron Cohen
>         Attachments: CorpusGenerator.java, LUCENE-3262.patch, 
> TestPerformanceHack.java
>
>
> A spin off from LUCENE-3079. We should define few benchmarks for faceting 
> scenarios, so we can evaluate the new faceting module as well as any 
> improvement we'd like to consider in the future (such as cutting over to 
> docvalues, implement FST-based caches etc.).
> Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here 
> as a starting point.
> We've also done some preliminary job for extending Benchmark for faceting, so 
> I'll attach it here as well.
> We should perhaps create a Wiki page where we clearly describe the benchmark 
> scenarios, then include results of 'default settings' and 'optimized 
> settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3262) Facet benchmarking

Reply via email to