[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382455#comment-15382455
 ] 

Michael McCandless commented on LUCENE-7371:


Oh sorry, I upgraded the Linux kernel from 4.4 -> 4.6.4 on 7/17!  I'll add an 
annotation.

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382452#comment-15382452
 ] 

Robert Muir commented on LUCENE-7371:
-

I think [~mikemccand] may have upgraded his operating system.

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-18 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382404#comment-15382404
 ] 

Adrien Grand commented on LUCENE-7371:
--

The benchmarks are reporting interesting changes, some seem to perform slightly 
faster now, like IntNRQ 
(http://people.apache.org/~mikemccand/lucenebench/IntNRQ.html) or the geo3d 
distance filter 
(http://people.apache.org/~mikemccand/geobench.html#search-distance) but some 
others seem to perform a bit slower like the 10-gon filter 
(http://people.apache.org/~mikemccand/geobench.html#search-poly_10) or the 10 
nearest points 
(http://people.apache.org/~mikemccand/geobench.html#search-nearest_10). The 
fact that it is not consistently slower or faster is due to the distribution of 
points in the blocks that need to be read I think (the more unique leading 
bytes, the more expensive the read). Given that the slow down is not general to 
all benchmarks and that the size reduction is significant I don't think this 
should be reverted, but let me know if you think otherwise. (For the record 
many benchmarks look slower on July 17th but I don't think this is related to 
this change, for instance even phrases got slower 
http://people.apache.org/~mikemccand/lucenebench/Phrase.html)

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373139#comment-15373139
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 1a6df249f91ca9f4dab792c48f5965f3388f1776 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a6df24 ]

LUCENE-7371: Fix CHANGES entry.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373140#comment-15373140
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit b54d46722b36f107edd59a8d843b93f5727a9058 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b54d467 ]

LUCENE-7371: Fix CHANGES entry.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373137#comment-15373137
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 1f446872aa9346c22643d0fb753ec42942b5a4d2 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1f44687 ]

LUCENE-7371: Better compression of values in Lucene60PointsFormat.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373129#comment-15373129
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 866398bea67607bcd54331a48736e6bdb94a703d in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=866398b ]

LUCENE-7371: Better compression of values in Lucene60PointsFormat.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372981#comment-15372981
 ] 

Michael McCandless commented on LUCENE-7371:


+1, great!

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372803#comment-15372803
 ] 

Michael McCandless commented on LUCENE-7371:


This is a nice optimization!  Patch looks good!

The {{BKDWriter}} change to pick which dimension to apply the run-length coding 
to is best effort right?  Because, you could have a dim with fewer unique 
leading suffix bytes, but a larger delta between first and last values?  But it 
would take quite a bit more work at indexing time to figure it out ... maybe 
add a comment explaining this tradeoff?  It seems likely the "min delta" 
approach should work well in practice, but have you tried with the 
slow-but-correct approach to verify?

Also, I noticed {{TestBackwardsCompatibility}} seems not to test points!  I'll 
go fix that ...

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org