[
https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394781#comment-15394781
]
Michael McCandless commented on LUCENE-7396:
--------------------------------------------
These results are awesome! I tested {{IndexAndSearchOpenStreetMaps1D}} and saw
good indexing gains.
However, I also tested with the NYC taxi data
(http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). Each of these
docs has ~20 points, and unfortunately somehow writing points (from the IW log)
is a bit (10-20%) slower. I wonder if something about the data distribution
somehow affects the performance change here?
Also, I don't think we should pre-budget into IW's buffer for the int[] ords we
allocate? That is really a transient thing, only allocated (and then freed, or
at least reclaimable by GC) for the one field currently writing its points, so
I think it's fair to not count that against IW's buffer? It's allowed that IW
allocates heap beyond its RAM buffer for transient things like this...
> Speed up flush of 1-dimension points
> ------------------------------------
>
> Key: LUCENE-7396
> URL: https://issues.apache.org/jira/browse/LUCENE-7396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7396.patch, LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when
> points come in order. So maybe we could make IndexWriter's PointValuesWriter
> sort before feeding the PointsFormat and somehow propagate the information to
> the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little
> memory usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]