[
https://issues.apache.org/jira/browse/LUCENE-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-5735:
---------------------------------
Attachment: LUCENE-5735.patch
Here's the patch, with randomized testing verification of results. The
faceting code doesn't make any date assumptions, and so it should work once a
non-date NumberRangePrefixTree subclass comes into existence.
I added a method to NumberRangePrefixTreeStrategy with this signature:
{code:java}
public Facets calcFacets(IndexReaderContext context, final Bits acceptDocs,
Shape facetRange, int level)
{code}
acceptDocs is a filter of documents to count (null to count all docs).
facetRange is a range that will limit the values counted to that provided range
(e.g. a start date and end date span). Get one of those easily via
toRangeShape(start,end) method. 'level' is the bottom-most prefix tree level
to be counted. For example, the level corresponding to a 'day'. There are
multiple ways of determining what the level is, namely by lookup given a
Calendar field or taking it from an existing Calendar instance.
The response structure, 'Facets', looks like this:
{code:java}
public static class Facets {
//TODO consider a variable-level structure -- more general purpose.
public Facets(int detailLevel) {
this.detailLevel = detailLevel;
}
/** The bottom-most detail-level counted, as requested. */
public final int detailLevel;
/**
* The count of documents with ranges that completely spanned the parents
of the detail level. In more technical
* terms, this is the count of leaf cells 2 up and higher from the bottom.
Usually you only care about counts at
* detailLevel, and so you will add this number to all other counts below,
including to omitted/implied children
* counts of 0. If there are no indexed ranges (just instances, i.e. fully
specified dates) then this value will
* always be 0.
*/
public int topLeaves;
/** Holds all the {@link FacetParentVal} instances in order of the key.
This is sparse; there won't be an
* instance if it's count and children are all 0. The keys are {@link
LevelledValue} shapes, which can be
* converted back to the original Object (i.e. a Calendar) via {@link
#toObject(LevelledValue)}. */
public final SortedMap<LevelledValue,FacetParentVal> parents = new
TreeMap<>();
/** Holds a block of detailLevel counts aggregated to their parent level. */
public static class FacetParentVal {
/** The count of ranges that span all of the childCount. In more
technical terms, this is the number of leaf
* cells found at this parent. Treat this like {@link Facets#topLeaves}.
*/
public int parentLeaves;// (parent leaf count) to be added to all
descendants (children)
/** The length of {@link #childCounts}. If childCounts is not null then
this is childCounts.length, otherwise it
* says how long it would have been if it weren't null. */
public int childCountsLen;
/** The detail level counts. It will be null if there are none, and thus
they are assumed 0. Most apps, when
* presenting the information, will add {@link #topLeaves} and {@link
#parentLeaves} to each count. */
public int[] childCounts;//null if there are none.
//assert childCountsLen == childCounts.length
}
}
{code}
I've got a toString() on it with concise output that I found nice to look at
during debugging.
The patch has some small changes to related classes involved that are mostly
little refactorings and/or making things visible that the facets code needs
that were previously private. I'd like to make more refactoring/renaming
happen around there to make this number/date spatial API a little more
friendly. I'm sure it'll be a bit awkward to newcomers.
> Faceting for DateRangePrefixTree
> --------------------------------
>
> Key: LUCENE-5735
> URL: https://issues.apache.org/jira/browse/LUCENE-5735
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial
> Reporter: David Smiley
> Assignee: David Smiley
> Attachments: LUCENE-5735.patch
>
>
> The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion
> amenable to faceting by meaningful time buckets. The motivation for this
> feature is to efficiently populate a calendar bar chart or
> [heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have
> date instances like many do but it's challenging for date ranges.
> Internally this is going to iterate over the terms using seek/next with
> TermsEnum as appropriate. It should be quite efficient; it won't need any
> special caches. I should be able to re-use SPT traversal code in
> AbstractVisitingPrefixTreeFilter. If this goes especially well; the
> underlying implementation will be re-usable for geospatial heat-map faceting.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]