Hi, I saw a replay of a talk about what’s coming in Spark 2.0 and the speed performances…
I am curious about indexing of data sets. In HBase/MapRDB you can create ordered sets of indexes through an inverted table. Here, you can take the intersection of the indexes to find the result set of rows. (Or intersect/null if you have left outer joins…) AFAIK, there was a project on an indexedRDD, but not sure how far that had gone? I realize that some of the improvements are based on using hashed joins, which would make indexing a bit harder… or am I missing something? Thx --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org