[jira] [Updated] (HBASE-26566) Optimize determine E step in OrderedBytes

Yutong Xiao (Jira) Tue, 14 Dec 2021 22:43:06 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yutong Xiao updated HBASE-26566:
--------------------------------
    Description: 
Found current estimate E step in OrderedBytes is to search step by step in 
loops. We can directly calculate the length to move the point and let the time 
complexity become to O(1).
>From the comments of the method encodeNumericLarge:
{panel:title=encodeNumericLarge}
Encode the large magnitude floating point number *val* using the key encoding. 
The caller guarantees that *val* will be
finite and abs({*}val{*}) >= 1.0. 
A floating point value is encoded as an integer exponent *E* and a mantissa 
{*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the 
smallest value possible without making *M* greater than or equal to 1.0.
Each centimal digit of the mantissa is stored in a byte. If the value of the 
centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 
*2*X+1* for every byte of the mantissa, except for the last byte which will be 
{*}2*X+0{*}. The mantissa must be the minimum number of bytes necessary to 
represent the value; trailing *X==0* digits are omitted. This means that the 
mantissa will never contain a byte with the
 value 0x00.
The function could be divided into 3 steps:
 # Confirm the sign (Negative or not)
 # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
 # Encode *M* by peeling off centimal digit.
 # Encoding *X*{panel}
At the step 2, we normalise abs(*val*) and determine the exponent *E*. 
The current implementation is :
{code:java}
    while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); 
e +=16; }
    while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
e+= 4; }
    while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
abs.movePointLeft(2); e++; }
{code}
We start with e == 0 and move the point to left with 2e each time. The time 
complexity is O(\n).
>From the loop above we can see that for large decimal, we can just move the 
>point to left until abs < BigDecimal.ONE.
This can be directly calculated and we can also directly calculated the e (just 
the half of the steps).
In this case, we only need move the point once.

  was:
Found current estimate E step in OrderedBytes is to search step by step in 
loops. We can directly calculate the length to move the point and let the time 
complexity become to O(1).
Current implementation in encodeNumericLarge:
{code:java}
    while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); 
e +=16; }
    while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
e+= 4; }
    while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
abs.movePointLeft(2); e++; }
{code}

We start with e == 0 and move the point to left with 2e each time. The time 
complexity is O(\n).
>From the loop above we can see that for large decimal, we can just move the 
>point to left until abs < BigDecimal.ONE.
This can be directly calculated and we can also directly calculated the e (just 
the half of the steps).
In this case, we only need move the point once.


> Optimize determine E step in OrderedBytes
> -----------------------------------------
>
>                 Key: HBASE-26566
>                 URL: https://issues.apache.org/jira/browse/HBASE-26566
>             Project: HBase
>          Issue Type: Task
>          Components: Performance
>            Reporter: Yutong Xiao
>            Assignee: Yutong Xiao
>            Priority: Major
>
> Found current estimate E step in OrderedBytes is to search step by step in 
> loops. We can directly calculate the length to move the point and let the 
> time complexity become to O(1).
> From the comments of the method encodeNumericLarge:
> {panel:title=encodeNumericLarge}
> Encode the large magnitude floating point number *val* using the key 
> encoding. The caller guarantees that *val* will be
> finite and abs({*}val{*}) >= 1.0. 
> A floating point value is encoded as an integer exponent *E* and a mantissa 
> {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the 
> smallest value possible without making *M* greater than or equal to 1.0.
> Each centimal digit of the mantissa is stored in a byte. If the value of the 
> centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 
> *2*X+1* for every byte of the mantissa, except for the last byte which will 
> be {*}2*X+0{*}. The mantissa must be the minimum number of bytes necessary to 
> represent the value; trailing *X==0* digits are omitted. This means that the 
> mantissa will never contain a byte with the
>  value 0x00.
> The function could be divided into 3 steps:
>  # Confirm the sign (Negative or not)
>  # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
>  # Encode *M* by peeling off centimal digit.
>  # Encoding *X*{panel}
> At the step 2, we normalise abs(*val*) and determine the exponent *E*. 
> The current implementation is :
> {code:java}
>     while (abs.compareTo(E32) >= 0 && e <= 350) { abs = 
> abs.movePointLeft(32); e +=16; }
>     while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
> e+= 4; }
>     while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
> abs.movePointLeft(2); e++; }
> {code}
> We start with e == 0 and move the point to left with 2e each time. The time 
> complexity is O(\n).
> From the loop above we can see that for large decimal, we can just move the 
> point to left until abs < BigDecimal.ONE.
> This can be directly calculated and we can also directly calculated the e 
> (just the half of the steps).
> In this case, we only need move the point once.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HBASE-26566) Optimize determine E step in OrderedBytes

Reply via email to