RJ Nowling created SPARK-4727:
---------------------------------

             Summary: Add "dimensional" RDDs (time series, spatial)
                 Key: SPARK-4727
                 URL: https://issues.apache.org/jira/browse/SPARK-4727
             Project: Spark
          Issue Type: Brainstorming
          Components: Spark Core
    Affects Versions: 1.1.0
            Reporter: RJ Nowling


Certain types of data (times series, spatial) can benefit from specialized 
RDDs.  I'd like to open a discussion about this.

For example, time series data should be ordered by time and would benefit from 
operations like:
* Subsampling (taking every n data points)
* Signal processing (correlations, FFTs, filtering)
* Windowing functions

Spatial data benefits from ordering and partitioning along a 2D or 3D grid.  
For example, path finding algorithms can optimized by only comparing points 
within a set distance, which can be computed more efficiently by partitioning 
data into a grid.

Although the operations on time series and spatial data may be different, there 
is some commonality in the sense of the data having ordered dimensions and the 
implementations may overlap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to