[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652407#comment-14652407 ] Joseph K. Bradley commented on SPARK-7316: -- Retargeting for 1.6 Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Assignee: Alexander Ulanov Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531296#comment-14531296 ] Joseph K. Bradley commented on SPARK-7316: -- Definitely makes sense for time series data Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531151#comment-14531151 ] Joseph K. Bradley commented on SPARK-7316: -- I've spoken with [~mengxr], and this feature may need to slip to 1.5. Sorry! Also, what are major use cases for this? Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531229#comment-14531229 ] Alexander Ulanov commented on SPARK-7316: - I would say that the major use case is practical considerations :) In my case it is time series analysis of sensor data. It does not make sense to analyze time windows with step 1 because it is high-frequency sensor (1024 Hz). Also, even if we want to do it, the size of the resulting data gets enormous. For example, I have 2B data points (542 hours) of size 23GB binary data. If I apply sliding window with size 1024 and step 1, it will result in 1024*23=23.5TB of data which I am not able to process with Spark currently (honestly speaking my disk space is only 10TB). If you store data in HDFS than it will be tripled, i.e. 70TB. Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. Although one can generate sliding windows with step 1 and then filter every Nth window, it might take much more time and disk space depending on the step size. For example, if your window is 1000 then you will generate the amount of data thousand times bigger than your initial dataset. It does not make sense if you need just every Nth window, so the data generated will be 1000/N smaller. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524309#comment-14524309 ] Apache Spark commented on SPARK-7316: - User 'avulanov' has created a pull request for this issue: https://github.com/apache/spark/pull/5855 Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org