Hi Jonathan, you might be interested in https://issues.apache.org/jira/browse/SPARK-3655 (not yet available) and https://github.com/tresata/spark-sorted (not part of spark, but it is available right now). Hopefully thats what you are looking for. To the best of my knowledge that covers what is available now / what is being worked on.
Imran On Wed, Mar 11, 2015 at 4:38 PM, Jonathan Coveney <jcove...@gmail.com> wrote: > Hello all, > > I am wondering if spark already has support for optimizations on sorted > data and/or if such support could be added (I am comfortable dropping to a > lower level if necessary to implement this, but I'm not sure if it is > possible at all). > > Context: we have a number of data sets which are essentially already > sorted on a key. With our current systems, we can take advantage of this to > do a lot of analysis in a very efficient fashion...merges and joins, for > example, can be done very efficiently, as can folds on a secondary key and > so on. > > I was wondering if spark would be a fit for implementing these sorts of > optimizations? Obviously it is sort of a niche case, but would this be > achievable? Any pointers on where I should look? >