[jira] [Commented] (LUCENE-7396) Speed up flush of 1-dimension points

Michael McCandless (JIRA) Tue, 26 Jul 2016 16:45:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394781#comment-15394781
 ]


Michael McCandless commented on LUCENE-7396:
--------------------------------------------

These results are awesome!  I tested {{IndexAndSearchOpenStreetMaps1D}} and saw 
good indexing gains.

However, I also tested with the NYC taxi data 
(http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml).  Each of these 
docs has ~20 points, and unfortunately somehow writing points (from the IW log) 
is a bit (10-20%) slower.  I wonder if something about the data distribution 
somehow affects the performance change here?

Also, I don't think we should pre-budget into IW's buffer for the int[] ords we 
allocate?  That is really a transient thing, only allocated (and then freed, or 
at least reclaimable by GC) for the one field currently writing its points, so 
I think it's fair to not count that against IW's buffer?  It's allowed that IW 
allocates heap beyond its RAM buffer for transient things like this...

> Speed up flush of 1-dimension points
> ------------------------------------
>
>                 Key: LUCENE-7396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7396.patch, LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when 
> points come in order. So maybe we could make IndexWriter's PointValuesWriter 
> sort before feeding the PointsFormat and somehow propagate the information to 
> the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little 
> memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7396) Speed up flush of 1-dimension points

Reply via email to