[ 
https://issues.apache.org/jira/browse/LUCENE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183827#comment-15183827
 ] 

Michael McCandless commented on LUCENE-7056:
--------------------------------------------

bq. But it probably makes sense to evaluate using a better global test set, and 
not an artificial one.

+1 for real data, not synthetic data.  I think when one tests on synthetic 
data, one draws synthetic conclusions.

The London UK test I've been running (all sources in luceneutil) is 2.5% of ALL 
the world's OpenStreetMap (PlanetOSM) points, plus 100% of the points contained 
within the bbox around London, UK, so it really is a global test, just with a 
sampling of the world's points.  It's ~61M total points.

But I agree that 2.5% sampling is not realistic ... I'll download the latest 
OSM export (48 GB .bz2!) and try to index all points, and test on larger shapes 
across the globe.

Does anyone have any pointers for other large realistic geospatial corpora?  
Geonames is smallish (~8.6 M docs in my snapshot, though that's ~2 years ago 
now).

bq. I've spent a bit of time working on adding Geo benchmarks to luceneutil. I 
think we can make more informed decisions about what to add/remove and which 
approach to use once we have decent nightly benchmarks to characterize 
performance.

+1 to get these benchmarks folded into our nightly benchmark (so we can 
catch/measure regressions/improvements over time), and to make it easier for 
anyone to run them.

> Spatial3d/Geo3d should have zero runtime dependencies
> -----------------------------------------------------
>
>                 Key: LUCENE-7056
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7056
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial3d
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 6.0
>
>         Attachments: LUCENE_7056__split_spatial3d_package.patch, 
> LUCENE_7056__split_spatial3d_package.patch
>
>
> This is a proposal for the "spatial3d" module to be purely about the 
> shape/geometry implementations it has.  In Lucene 5 that's actually all it 
> has.  In Lucene 6 at the moment its ~76 files have 2 classes that I think 
> should go elsewhere: Geo3DPoint and PointInGeo3DShapeQuery.  Specifically 
> lucene-spatial-extras (which doesn't quite exist yet so lucene-spatial) would 
> be a suitable place due to the dependency.   _Eventually_ I see this module 
> migrating elsewhere be it on its own or a part of something else more 
> spatial-ish.  Even if that never comes to pass, non-Lucene users who want to 
> use this module for it's geometry annoyingly have to exclude the Lucene 
> dependencies that are there because this module also contains these two 
> classes.
> In a comment I'll suggest some specifics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to