[ 
https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-7005:
-------------------------------
    Attachment: SOLR-7005_heatmap.patch
                SOLR-7005_heatmap.patch

This latest patch integrates with the FacetComponent to become a new type of 
faceting; it's no longer a separate SearchComponent. And this implements 
distributed/sharding support.  The params were added to FacetParams.

Like interval faceting, the code is kept out of FacetComponent & SimpleFacets 
but even more so (even the distributed logic), with only one-liner hooks where 
needed, and only a touch more than that in SimpleFacets.  The whole situation 
could be a lot more elegant with a major refactor of how the faceting code 
overall is organized, but I have no time these days.

There are some nocommits:
* I need to randomized-test round-trip the PNG encoding.
* the {{ints}} format is particularly easy to consume and even eye-ball it to 
get a sense of the data.  As-such I want to orient the numbers to go 
right-to-left then top-down.  And maybe rename to "ints2d".  But it isn't 
particularly compact/efficient, and I already know I want a separate format I'm 
tentatively calling "skipList" that would list the counts with negative numbers 
signifying how many zeroes to insert; and then you have to know the column/row 
order to read it, which would of course be documented.  This format would be 
compact and great for sparse data or small heatmaps.  But then would we really 
need "ints2d"?  
* I'm confused about FacetComponent.distributedProcess() line ~215 (removal of 
faceting types when distribFieldFacetRefinements != null).  
[~hossman_luc...@fucit.org] Which faceting types should be removed here; why is 
it just facet.field and facet.query; maybe the others should too?
* the facet.heatmap.bbox param actually supports not just the rect-query syntax 
but WKT as well; which can be convenient but I expect to be atypical. Should 
'wkt' be a separate mutually exclusive param?  Ugh; param fatigue.  Or could it 
be renamed to 'region' or 'shape' or 'geom'?  This reminds me; it could be 
useful for the underlying Lucene heatmap facet counter to not insist on a 
rectangular region; why not specify any poly and get counts just for those grid 
squares.  That would make 'bbox' even less appropriate.

> facet.heatmap for spatial heatmap faceting on RPT
> -------------------------------------------------
>
>                 Key: SOLR-7005
>                 URL: https://issues.apache.org/jira/browse/SOLR-7005
>             Project: Solr
>          Issue Type: New Feature
>          Components: spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.1
>
>         Attachments: SOLR-7005_heatmap.patch, SOLR-7005_heatmap.patch, 
> SOLR-7005_heatmap.patch, heatmap_512x256.png, heatmap_64x32.png
>
>
> This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell 
> counter in Lucene spatial LUCENE-6191.  This is a form of faceting, and 
> as-such I think it should live in the "facet" parameter namespace.  Here's 
> what the parameters are:
> * facet=true
> * facet.heatmap=fieldname
> * facet.heatmap.bbox=\["-180 -90" TO "180 90"]
> * facet.heatmap.gridLevel=6
> * facet.heatmap.distErrPct=0.10
> Like other faceting features, the fieldName can have local-params to exclude 
> filter queries or specify an output key.
> The bbox is optional; you get the whole world or you can specify a box or 
> actually any shape that WKT supports (you get the bounding box of whatever 
> you put).
> Ultimately, this feature needs to know the grid level, which together with 
> the input shape will yield a certain number of cells.  You can specify 
> gridLevel exactly, or don't and instead provide distErrPct which is computed 
> like it is for the RPT field type as seen in the schema.  0.10 yielded ~4k 
> cells but it'll vary.  There's also a facet.heatmap.maxCells safety net 
> defaulting to 100k.  Exceed this and you get an error.
> The output is (JSON):
> {noformat}
> {gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
>  0, 2, 1, ....],[1, 1, 3, 2, ...],...]}
> {noformat}
> counts is null if all would be 0.  Perhaps individual row arrays should 
> likewise be null... I welcome feedback.
> I'm toying with an output format option in which you can specify a base-64'ed 
> grayscale PNG.
> Obviously this should support sharded / distributed environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to