[ https://issues.apache.org/jira/browse/HBASE-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yutong Xiao updated HBASE-26566: -------------------------------- Description: Found current estimate E step in OrderedBytes is to search step by step in loops. We can directly calculate the length to move the point and let the time complexity become to O(1). >From the comments of the method encodeNumericLarge: {panel:title=encodeNumericLarge} Encode the large magnitude floating point number *val* using the key encoding. The caller guarantees that *val* will be finite and abs({*}val{*}) >= 1.0. A floating point value is encoded as an integer exponent *E* and a mantissa {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the smallest value possible without making *M* greater than or equal to 1.0. Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailing *X==0* digits are omitted. This means that the mantissa will never contain a byte with the value 0x00. The function could be divided into 3 steps: # Confirm the sign (Negative or not) # Normalise the abs({*}val{*}), transformed to *(M * 100^E)* # Encode *M* by peeling off centimal digit and *X*{panel} At the step 2, we normalise abs(*val*) and determine the exponent *E*. The current implementation is : {code:java} while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); e +=16; } while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); e+= 4; } while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = abs.movePointLeft(2); e++; } {code} Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y & z are digits, e.g. 1000.0001) from right to left step by step, until abs(val) < 1. We can find that e is the half of point moved length. So that we could save the above three loops and calculate the moved length and e directly. My implementation: {code:java} // This is the number of digits of the Integer part of abs(val) when val > 10 int integerDigits = abs.precision() - abs.scale(); // If length is odd, from the last loop above, we actually moved one more step forward. int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : integerDigits + 1; e = lengthToMoveLeft / 2; // The edge condition. if (e > 350) { e = 351; lengthToMoveLeft = 702; } abs = abs.movePointLeft(lengthToMoveLeft); {code} In this case, we only move the point once. The time complexity is O(1). Same idea to the method of encodeNumericSmall. was: Found current estimate E step in OrderedBytes is to search step by step in loops. We can directly calculate the length to move the point and let the time complexity become to O(1). >From the comments of the method encodeNumericLarge: {panel:title=encodeNumericLarge} Encode the large magnitude floating point number *val* using the key encoding. The caller guarantees that *val* will be finite and abs({*}val{*}) >= 1.0. A floating point value is encoded as an integer exponent *E* and a mantissa {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the smallest value possible without making *M* greater than or equal to 1.0. Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailing *X==0* digits are omitted. This means that the mantissa will never contain a byte with the value 0x00. The function could be divided into 3 steps: # Confirm the sign (Negative or not) # Normalise the abs({*}val{*}), transformed to *(M * 100^E)* # Encode *M* by peeling off centimal digit. # Encoding *X*{panel} At the step 2, we normalise abs(*val*) and determine the exponent *E*. The current implementation is : {code:java} while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); e +=16; } while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); e+= 4; } while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = abs.movePointLeft(2); e++; } {code} Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y & z are digits, e.g. 1000.0001) from right to left step by step, until abs(val) < 1. We can find that e is the half of point moved length. So that we could save the above three loops and calculate the moved length and e directly. My implementation: {code:java} // This is the number of digits of the Integer part of abs(val) when val > 10 int integerDigits = abs.precision() - abs.scale(); // If length is odd, from the last loop above, we actually moved one more step forward. int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : integerDigits + 1; e = lengthToMoveLeft / 2; // The edge condition. if (e > 350) { e = 351; lengthToMoveLeft = 702; } abs = abs.movePointLeft(lengthToMoveLeft); {code} In this case, we only move the point once. The time complexity is O(1). Same idea to the method of encodeNumericSmall. > Optimize determine E step in OrderedBytes > ----------------------------------------- > > Key: HBASE-26566 > URL: https://issues.apache.org/jira/browse/HBASE-26566 > Project: HBase > Issue Type: Task > Components: Performance > Reporter: Yutong Xiao > Assignee: Yutong Xiao > Priority: Major > > Found current estimate E step in OrderedBytes is to search step by step in > loops. We can directly calculate the length to move the point and let the > time complexity become to O(1). > From the comments of the method encodeNumericLarge: > {panel:title=encodeNumericLarge} > Encode the large magnitude floating point number *val* using the key > encoding. The caller guarantees that *val* will be > finite and abs({*}val{*}) >= 1.0. > A floating point value is encoded as an integer exponent *E* and a mantissa > {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the > smallest value possible without making *M* greater than or equal to 1.0. > Each centimal digit of the mantissa is stored in a byte. If the value of the > centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be > 2*X+1 for every byte of the mantissa, except for the last byte which will be > 2*X+0. The mantissa must be the minimum number of bytes necessary to > represent the value; trailing *X==0* digits are omitted. This means that the > mantissa will never contain a byte with the > value 0x00. > The function could be divided into 3 steps: > # Confirm the sign (Negative or not) > # Normalise the abs({*}val{*}), transformed to *(M * 100^E)* > # Encode *M* by peeling off centimal digit and *X*{panel} > At the step 2, we normalise abs(*val*) and determine the exponent *E*. > The current implementation is : > {code:java} > while (abs.compareTo(E32) >= 0 && e <= 350) { abs = > abs.movePointLeft(32); e +=16; } > while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); > e+= 4; } > while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = > abs.movePointLeft(2); e++; } > {code} > Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y > & z are digits, e.g. 1000.0001) from right to left step by step, until > abs(val) < 1. We can find that e is the half of point moved length. So that > we could save the above three loops and calculate the moved length and e > directly. > My implementation: > {code:java} > // This is the number of digits of the Integer part of abs(val) when val > > 10 > int integerDigits = abs.precision() - abs.scale(); > // If length is odd, from the last loop above, we actually moved one more > step forward. > int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : > integerDigits + 1; > e = lengthToMoveLeft / 2; > // The edge condition. > if (e > 350) { > e = 351; > lengthToMoveLeft = 702; > } > abs = abs.movePointLeft(lengthToMoveLeft); > {code} > In this case, we only move the point once. The time complexity is O(1). > Same idea to the method of encodeNumericSmall. -- This message was sent by Atlassian Jira (v8.20.1#820001)