[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531296#comment-14531296 ]
Joseph K. Bradley commented on SPARK-7316: ------------------------------------------ Definitely makes sense for time series data > Add step capability to RDD sliding window > ----------------------------------------- > > Key: SPARK-7316 > URL: https://issues.apache.org/jira/browse/SPARK-7316 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Alexander Ulanov > Fix For: 1.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > RDDFunctions in MLlib contains sliding window implementation with step 1. > User should be able to define step. This capability should be implemented. > Although one can generate sliding windows with step 1 and then filter every > Nth window, it might take much more time and disk space depending on the step > size. For example, if your window is 1000 then you will generate the amount > of data thousand times bigger than your initial dataset. It does not make > sense if you need just every Nth window, so the data generated will be 1000/N > smaller. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org