[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-12-18 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760361#comment-15760361
 ] 

XIE FAN commented on KYLIN-1832:


I will attach a benchmark test result later

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: XIE FAN
> Fix For: v1.6.1
>
> Attachments: 
> 0001-KYLIN-1832-HyperLogLog-performance-optimization.patch, 
> HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737544#comment-15737544
 ] 

hongbin ma commented on KYLIN-1832:
---

cool, thanks [~xiefan46] looking forward to seeing the benchmark:)

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: XIE FAN
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-12-07 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729170#comment-15729170
 ] 

XIE FAN commented on KYLIN-1832:


paper 
link:http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf

A java implementation of this paper: https://github.com/addthis/stream-lib

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: XIE FAN
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-12-07 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729040#comment-15729040
 ] 

XIE FAN commented on KYLIN-1832:


I will try to optimize this problem by using the method mentioned in goolge's 
paper: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art 
Cardinality Estimation Algorithm” and add a sparse HyperLogLog counter for low 
cardinality columns to reduce the memory usage and time complexity.

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: XIE FAN
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646
 ] 

fengYu commented on KYLIN-1832:
---

the max size of biggerIndexSet is half of m, and I will test about it later


> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352639#comment-15352639
 ] 

liyang commented on KYLIN-1832:
---

I see the improvement mainly comes from the added {{biggerIndexSet}} and 
{{isOverThreshold}}. However I'm bit concerned about the memory footprint 
introduced by {{biggerIndexSet}}.

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352603#comment-15352603
 ] 

liyang commented on KYLIN-1832:
---

Also it's not obvious why the 'singleBucket' can slow down merge/write/read, 
could you elaborate? Or any test result?

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352592#comment-15352592
 ] 

fengYu commented on KYLIN-1832:
---

I will upload the patch for 1.x and 2.x, but replace the whole file is ok if 
you can accept this kind of implemention.

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352585#comment-15352585
 ] 

liyang commented on KYLIN-1832:
---

[~feng_xiao_yu], could you provide a patch instead of a whole file?

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)