[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Ulanov updated SPARK-7316:
------------------------------------
    Description: 
RDDFunctions in MLlib contains sliding window implementation with step 1. User 
should be able to define step. This capability should be implemented.

Although one can generate sliding windows with step 1 and then filter every Nth 
window, it might take much more time and disk space depending on the step size. 
For example, if your window is 1000 then you will generate the amount of data 
thousand times bigger than your initial dataset. It does not make sense if you 
need just every Nth window, so the data generated will be 1000/N smaller. 



  was:RDDFunctions in MLlib contains sliding window implementation with step 1. 
User should be able to define step. This capability should be implemented.


> Add step capability to RDD sliding window
> -----------------------------------------
>
>                 Key: SPARK-7316
>                 URL: https://issues.apache.org/jira/browse/SPARK-7316
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Alexander Ulanov
>             Fix For: 1.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> RDDFunctions in MLlib contains sliding window implementation with step 1. 
> User should be able to define step. This capability should be implemented.
> Although one can generate sliding windows with step 1 and then filter every 
> Nth window, it might take much more time and disk space depending on the step 
> size. For example, if your window is 1000 then you will generate the amount 
> of data thousand times bigger than your initial dataset. It does not make 
> sense if you need just every Nth window, so the data generated will be 1000/N 
> smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to