[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760361#comment-15760361 ] XIE FAN commented on KYLIN-1832: I will attach a benchmark test result later > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: XIE FAN > Fix For: v1.6.1 > > Attachments: > 0001-KYLIN-1832-HyperLogLog-performance-optimization.patch, > HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737544#comment-15737544 ] hongbin ma commented on KYLIN-1832: --- cool, thanks [~xiefan46] looking forward to seeing the benchmark:) > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: XIE FAN > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729170#comment-15729170 ] XIE FAN commented on KYLIN-1832: paper link:http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf A java implementation of this paper: https://github.com/addthis/stream-lib > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: XIE FAN > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729040#comment-15729040 ] XIE FAN commented on KYLIN-1832: I will try to optimize this problem by using the method mentioned in goolge's paper: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” and add a sparse HyperLogLog counter for low cardinality columns to reduce the memory usage and time complexity. > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: XIE FAN > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646 ] fengYu commented on KYLIN-1832: --- the max size of biggerIndexSet is half of m, and I will test about it later > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352639#comment-15352639 ] liyang commented on KYLIN-1832: --- I see the improvement mainly comes from the added {{biggerIndexSet}} and {{isOverThreshold}}. However I'm bit concerned about the memory footprint introduced by {{biggerIndexSet}}. > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352603#comment-15352603 ] liyang commented on KYLIN-1832: --- Also it's not obvious why the 'singleBucket' can slow down merge/write/read, could you elaborate? Or any test result? > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352592#comment-15352592 ] fengYu commented on KYLIN-1832: --- I will upload the patch for 1.x and 2.x, but replace the whole file is ok if you can accept this kind of implemention. > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352585#comment-15352585 ] liyang commented on KYLIN-1832: --- [~feng_xiao_yu], could you provide a patch instead of a whole file? > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)