[
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549975#comment-13549975
]
Michael McCandless commented on LUCENE-4620:
--------------------------------------------
Trunk:
{noformat}
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted,
length of: 2430) 41152 times.
[java]
[java] Encoder Bits/Int Encode Time
Encode Time Decode Time Decode Time
[java] [milliseconds]
[microsecond / int] [milliseconds] [microsecond / int]
[java]
-------------------------------------------------------------------------------------------------------------------------------
[java] VInt8 18.4955 4430
44.3003 1162 11.6201
[java] Sorting (Unique (VInt8)) 18.4955 4344
43.4403 1105 11.0501
[java] Sorting (Unique (DGap (VInt8))) 8.5597 4481
44.8103 842 8.4201
[java] Sorting (Unique (DGap (EightFlags (VInt8)))) 4.9679
4636 46.3603 1021
10.2101
[java] Sorting (Unique (DGap (FourFlags (VInt8)))) 4.8198
4515 45.1503 1001
10.0101
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 4.5794
4904 49.0403 1056
10.5601
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 4.5794
4751 47.5103 1035
10.3501
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted,
length of: 1489) 67159 times.
[java]
[java] Encoder Bits/Int Encode Time
Encode Time Decode Time Decode Time
[java] [milliseconds]
[microsecond / int] [milliseconds] [microsecond / int]
[java]
-------------------------------------------------------------------------------------------------------------------------------
[java] VInt8 18.2673 1241
12.4100 1128 11.2800
[java] Sorting (Unique (VInt8)) 18.2673 3488
34.8801 924 9.2400
[java] Sorting (Unique (DGap (VInt8))) 8.9456 3061
30.6101 660 6.6000
[java] Sorting (Unique (DGap (EightFlags (VInt8)))) 5.7542
3693 36.9301 1026
10.2600
[java] Sorting (Unique (DGap (FourFlags (VInt8)))) 5.5447
3462 34.6201 811
8.1100
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 5.3566
3846 38.4601 1018
10.1800
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 5.3996
3879 38.7901 1025
10.2500
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted,
length of: 18) 5555555 times.
[java]
[java] Encoder Bits/Int Encode Time
Encode Time Decode Time Decode Time
[java] [milliseconds]
[microsecond / int] [milliseconds] [microsecond / int]
[java]
-------------------------------------------------------------------------------------------------------------------------------
[java] VInt8 20.8889 1179
11.7900 1114 11.1400
[java] Sorting (Unique (VInt8)) 20.8889 2251
22.5100 1171 11.7100
[java] Sorting (Unique (DGap (VInt8))) 12.0000 2174
21.7400 848 8.4800
[java] Sorting (Unique (DGap (EightFlags (VInt8)))) 10.2222
2372 23.7200 1092
10.9200
[java] Sorting (Unique (DGap (FourFlags (VInt8)))) 10.2222
2355 23.5500 1062
10.6200
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 9.7778
2414 24.1400 1085
10.8500
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 10.2222
2492 24.9200 1130
11.3000
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted,
length of: 957) 104493 times.
[java]
[java] Encoder Bits/Int Encode Time
Encode Time Decode Time Decode Time
[java] [milliseconds]
[microsecond / int] [milliseconds] [microsecond / int]
[java]
-------------------------------------------------------------------------------------------------------------------------------
[java] VInt8 16.5768 998
9.9800 896 8.9600
[java] Sorting (Unique (VInt8)) 16.5768 2542
25.4201 864 8.6400
[java] Sorting (Unique (DGap (VInt8))) 8.4848 2468
24.6800 646 6.4600
[java] Sorting (Unique (DGap (EightFlags (VInt8)))) 4.4138
2526 25.2601 768
7.6800
[java] Sorting (Unique (DGap (FourFlags (VInt8)))) 4.1797
2406 24.0600 696
6.9600
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 3.8955
2541 25.4101 802
8.0200
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 3.8871
2537 25.3701 770
7.7000
[java]
{noformat}
Patch:
{noformat}
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted,
length of: 2430) 41152 times.
[java]
[java] Encoder
Bits/Int Encode Time Encode Time Decode Time
Decode Time
[java]
[milliseconds] [microsecond / int] [milliseconds]
[microsecond / int]
[java]
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[java] VInt8
18.4955 594 5.9400 419
4.1900
[java] Sorting (Unique (VInt8))
18.4955 3147 31.4702 579
5.7900
[java] Sorting (Unique (DGap (VInt8)))
8.5597 3167 31.6702 278
2.7800
[java] Sorting (Unique (DGap (EightFlags (VInt))))
4.9679 3624 36.2402 401
4.0100
[java] Sorting (Unique (DGap (FourFlags (VInt))))
4.8198 3534 35.3402 379
3.7900
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))
4.5794 3954 39.5403 580
5.8000
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))
4.5794 3947 39.4703 595
5.9500
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted,
length of: 1489) 67159 times.
[java]
[java] Encoder
Bits/Int Encode Time Encode Time Decode Time
Decode Time
[java]
[milliseconds] [microsecond / int] [milliseconds]
[microsecond / int]
[java]
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[java] VInt8
18.2673 592 5.9200 441
4.4100
[java] Sorting (Unique (VInt8))
18.2673 2002 20.0200 443
4.4300
[java] Sorting (Unique (DGap (VInt8)))
8.9456 2077 20.7701 301
3.0100
[java] Sorting (Unique (DGap (EightFlags (VInt))))
5.7542 2646 26.4601 419
4.1900
[java] Sorting (Unique (DGap (FourFlags (VInt))))
5.5447 2505 25.0501 375
3.7500
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))
5.3566 2984 29.8401 625
6.2500
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))
5.3996 2997 29.9701 616
6.1600
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted,
length of: 18) 5555555 times.
[java]
[java] Encoder
Bits/Int Encode Time Encode Time Decode Time
Decode Time
[java]
[milliseconds] [microsecond / int] [milliseconds]
[microsecond / int]
[java]
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[java] VInt8
20.8889 585 5.8500 585
5.8500
[java] Sorting (Unique (VInt8))
20.8889 1127 11.2700 588
5.8800
[java] Sorting (Unique (DGap (VInt8)))
12.0000 1156 11.5600 477
4.7700
[java] Sorting (Unique (DGap (EightFlags (VInt))))
10.2222 1346 13.4600 657
6.5700
[java] Sorting (Unique (DGap (FourFlags (VInt))))
10.2222 1385 13.8500 573
5.7300
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))
9.7778 1565 15.6500 845
8.4500
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))
10.2222 1662 16.6200 891
8.9100
[java]
[java]
[java] Estimating ~100000000 Integers compression time by
[java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted,
length of: 957) 104493 times.
[java]
[java] Encoder
Bits/Int Encode Time Encode Time Decode Time
Decode Time
[java]
[milliseconds] [microsecond / int] [milliseconds]
[microsecond / int]
[java]
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[java] VInt8
16.5768 446 4.4600 439
4.3900
[java] Sorting (Unique (VInt8))
16.5768 1429 14.2900 420
4.2000
[java] Sorting (Unique (DGap (VInt8)))
8.4848 1390 13.9000 298
2.9800
[java] Sorting (Unique (DGap (EightFlags (VInt))))
4.4138 1457 14.5700 387
3.8700
[java] Sorting (Unique (DGap (FourFlags (VInt))))
4.1797 1529 15.2900 368
3.6800
[java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))
3.8955 1829 18.2900 530
5.3000
[java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))
3.8871 1842 18.4200 528
5.2800
[java]
{noformat}
Looks like ~2-3X faster... good!
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int)
> and decode(int). Originally, we believed that this layer can be useful for
> other scenarios, but in practice it's used only for writing/reading the
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder
> can still be streaming (as we don't know in advance how many ints will be
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet
> associations, which can write arbitrary byte[], and so may decoding to an
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts
> etc.) and later read, with as little overhead as possible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]