[jira] [Comment Edited] (SPARK-8997) Improve LocalPrefixSpan performance

2015-07-12 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623292#comment-14623292
 ] 

Feynman Liang edited comment on SPARK-8997 at 7/12/15 11:43 PM:


Why PrimitiveKeyOpenHashMap if keys will be Array[Int] (and later 
Array[Array[Item]]), which are not primitive and will not benefit from 
@specialized annotations?

I'm also not clear on what is meant by 3; aren't list and array both eager (did 
you mean to use a Stream (lazy) or ArrayBuffer (in-place update))? Which part 
of the code exactly are you referring to?


was (Author: fliang):
Why PrimitiveKeyOpenHashMap if keys will be Array[Int] (and later 
Array[Array[Item]]), which are not primitive and will not benefit from 
@specialized annotations?

I'm also not clear on what is meant by 3; aren't list and array both eager (did 
you mean to use a Stream)? Which part of the code exactly are you referring to?

 Improve LocalPrefixSpan performance
 ---

 Key: SPARK-8997
 URL: https://issues.apache.org/jira/browse/SPARK-8997
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Feynman Liang
   Original Estimate: 24h
  Remaining Estimate: 24h

 We can improve the performance by:
 1. run should output Iterator instead of Array
 2. Local count shouldn't use groupBy, which creates too many arrays. We can 
 use PrimitiveKeyOpenHashMap
 3. We can use list to avoid materialize frequent sequences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8997) Improve LocalPrefixSpan performance

2015-07-11 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623292#comment-14623292
 ] 

Feynman Liang edited comment on SPARK-8997 at 7/11/15 7:40 AM:
---

Why PrimitiveKeyOpenHashMap if keys will be Array[Int] (and later 
Array[Array[Item]]), which are not primitive and will not benefit from 
@specialized annotations?

I'm also not clear on what is meant by 3; aren't list and array both eager (did 
you mean to use a Stream)? Which part of the code exactly are you referring to?


was (Author: fliang):
Why PrimitiveKeyOpenHashMap if keys will be Array[Int] (and later 
Array[Array[Item]]), which are not primitive and will not benefit from 
@specialized annotations?

 Improve LocalPrefixSpan performance
 ---

 Key: SPARK-8997
 URL: https://issues.apache.org/jira/browse/SPARK-8997
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Feynman Liang
   Original Estimate: 24h
  Remaining Estimate: 24h

 We can improve the performance by:
 1. run should output Iterator instead of Array
 2. Local count shouldn't use groupBy, which creates too many arrays. We can 
 use PrimitiveKeyOpenHashMap
 3. We can use list to avoid materialize frequent sequences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org