[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-08-03 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652407#comment-14652407
 ] 

Joseph K. Bradley commented on SPARK-7316:
--

Retargeting for 1.6

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
Assignee: Alexander Ulanov
   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.
 Although one can generate sliding windows with step 1 and then filter every 
 Nth window, it might take much more time and disk space depending on the step 
 size. For example, if your window is 1000 then you will generate the amount 
 of data thousand times bigger than your initial dataset. It does not make 
 sense if you need just every Nth window, so the data generated will be 1000/N 
 smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531296#comment-14531296
 ] 

Joseph K. Bradley commented on SPARK-7316:
--

Definitely makes sense for time series data

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.
 Although one can generate sliding windows with step 1 and then filter every 
 Nth window, it might take much more time and disk space depending on the step 
 size. For example, if your window is 1000 then you will generate the amount 
 of data thousand times bigger than your initial dataset. It does not make 
 sense if you need just every Nth window, so the data generated will be 1000/N 
 smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531151#comment-14531151
 ] 

Joseph K. Bradley commented on SPARK-7316:
--

I've spoken with [~mengxr], and this feature may need to slip to 1.5.  Sorry!
Also, what are major use cases for this?

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.
 Although one can generate sliding windows with step 1 and then filter every 
 Nth window, it might take much more time and disk space depending on the step 
 size. For example, if your window is 1000 then you will generate the amount 
 of data thousand times bigger than your initial dataset. It does not make 
 sense if you need just every Nth window, so the data generated will be 1000/N 
 smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-05-06 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531229#comment-14531229
 ] 

Alexander Ulanov commented on SPARK-7316:
-

I would say that the major use case is practical considerations :)

In my case it is time series analysis of sensor data. It does not make sense to 
analyze time windows with step 1 because it is high-frequency sensor (1024 Hz). 
Also, even if we want to do it, the size of the resulting data gets enormous. 
For example, I have 2B data points (542 hours) of size 23GB binary data. If I 
apply sliding window with size 1024 and step 1, it will result in 
1024*23=23.5TB of data which I am not able to process with Spark currently 
(honestly speaking my disk space is only 10TB). If you store data in HDFS than 
it will be tripled, i.e. 70TB. 


 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.
 Although one can generate sliding windows with step 1 and then filter every 
 Nth window, it might take much more time and disk space depending on the step 
 size. For example, if your window is 1000 then you will generate the amount 
 of data thousand times bigger than your initial dataset. It does not make 
 sense if you need just every Nth window, so the data generated will be 1000/N 
 smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524309#comment-14524309
 ] 

Apache Spark commented on SPARK-7316:
-

User 'avulanov' has created a pull request for this issue:
https://github.com/apache/spark/pull/5855

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org