[ 
https://issues.apache.org/jira/browse/LUCENE-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5735:
---------------------------------
    Attachment: LUCENE-5735.patch

Here's the patch, with randomized testing verification of results.  The 
faceting code doesn't make any date assumptions, and so it should work once a 
non-date NumberRangePrefixTree subclass comes into existence.

I added a method to NumberRangePrefixTreeStrategy with this signature:
{code:java}
public Facets calcFacets(IndexReaderContext context, final Bits acceptDocs, 
Shape facetRange, int level)
{code}

acceptDocs is a filter of documents to count (null to count all docs).  
facetRange is a range that will limit the values counted to that provided range 
(e.g. a start date and end date span).  Get one of those easily via 
toRangeShape(start,end) method.  'level' is the bottom-most prefix tree level 
to be counted.  For example, the level corresponding to a 'day'.  There are 
multiple ways of determining what the level is, namely by lookup given a 
Calendar field or taking it from an existing Calendar instance.

The response structure, 'Facets', looks like this:
{code:java}
public static class Facets {
    //TODO consider a variable-level structure -- more general purpose.

    public Facets(int detailLevel) {
      this.detailLevel = detailLevel;
    }

    /** The bottom-most detail-level counted, as requested. */
    public final int detailLevel;

    /**
     * The count of documents with ranges that completely spanned the parents 
of the detail level. In more technical
     * terms, this is the count of leaf cells 2 up and higher from the bottom. 
Usually you only care about counts at
     * detailLevel, and so you will add this number to all other counts below, 
including to omitted/implied children
     * counts of 0. If there are no indexed ranges (just instances, i.e. fully 
specified dates) then this value will
     * always be 0.
     */
    public int topLeaves;

    /** Holds all the {@link FacetParentVal} instances in order of the key. 
This is sparse; there won't be an
     * instance if it's count and children are all 0. The keys are {@link 
LevelledValue} shapes, which can be
     * converted back to the original Object (i.e. a Calendar) via {@link 
#toObject(LevelledValue)}. */
    public final SortedMap<LevelledValue,FacetParentVal> parents = new 
TreeMap<>();

    /** Holds a block of detailLevel counts aggregated to their parent level. */
    public static class FacetParentVal {

      /** The count of ranges that span all of the childCount.  In more 
technical terms, this is the number of leaf
       * cells found at this parent.  Treat this like {@link Facets#topLeaves}. 
*/
      public int parentLeaves;// (parent leaf count) to be added to all 
descendants (children)

      /** The length of {@link #childCounts}. If childCounts is not null then 
this is childCounts.length, otherwise it
       * says how long it would have been if it weren't null. */
      public int childCountsLen;

      /** The detail level counts. It will be null if there are none, and thus 
they are assumed 0. Most apps, when
       * presenting the information, will add {@link #topLeaves} and {@link 
#parentLeaves} to each count. */
      public int[] childCounts;//null if there are none.
      //assert childCountsLen == childCounts.length
    }
  }
{code}

I've got a toString() on it with concise output that I found nice to look at 
during debugging.

The patch has some small changes to related classes involved that are mostly 
little refactorings and/or making things visible that the facets code needs 
that were previously private.  I'd like to make more refactoring/renaming 
happen around there to make this number/date spatial API a little more 
friendly. I'm sure it'll be a bit awkward to newcomers. 

> Faceting for DateRangePrefixTree
> --------------------------------
>
>                 Key: LUCENE-5735
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5735
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-5735.patch
>
>
> The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion 
> amenable to faceting by meaningful time buckets. The motivation for this 
> feature is to efficiently populate a calendar bar chart or 
> [heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have 
> date instances like many do but it's challenging for date ranges.
> Internally this is going to iterate over the terms using seek/next with 
> TermsEnum as appropriate.  It should be quite efficient; it won't need any 
> special caches. I should be able to re-use SPT traversal code in 
> AbstractVisitingPrefixTreeFilter.  If this goes especially well; the 
> underlying implementation will be re-usable for geospatial heat-map faceting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to