Re: creating a distributed index

2015-11-06 Thread swetha kasireddy
Hi Ankur, I have the following questions on IndexedRDD. 1. Does the IndexedRDD support the key types of String? As per the current documentation, it looks like it supports only Long? 2. Is IndexedRDD efficient when joined with another RDD. So, basically my usecase is that I need to create an

Re: creating a distributed index

2015-07-15 Thread Ankur Dave
The latest version of IndexedRDD supports any key type with a defined serializer https://github.com/amplab/spark-indexedrdd/blob/master/src/main/scala/edu/berkeley/cs/amplab/spark/indexedrdd/KeySerializer.scala, including Strings. It's not released yet, but you can use it from the master branch if

Re: creating a distributed index

2015-07-15 Thread Jem Tucker
as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-distributed-index-tp11204p23842.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: creating a distributed index

2015-07-15 Thread Jem Tucker
This is very interesting, do you know if this version will be backwards compatible with older versions of Spark (1.2.0)? Thanks, Jem On Wed, Jul 15, 2015 at 10:04 AM Ankur Dave ankurd...@gmail.com wrote: The latest version of IndexedRDD supports any key type with a defined serializer

Re: creating a distributed index

2015-07-15 Thread Burak Yavuz
this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-distributed-index-tp11204p23842.html Sent from the Apache Spark User List

Re: creating a distributed index

2015-07-14 Thread swetha
/creating-a-distributed-index-tp11204p23842.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: creating a distributed index

2014-08-04 Thread Philip Ogren
After playing around with mapPartition I think this does exactly what I want. I can pass in a function to mapPartition that looks like this: def f1(iter: Iterator[String]): Iterator[MyIndex] = { val idx: MyIndex = new MyIndex() while (iter.hasNext) {

Re: creating a distributed index

2014-08-01 Thread andy petrella
Hey, There is some work that started on IndexedRDD (on master I think). Meanwhile, checking what has been done in GraphX regarding vertex index in partitions could be worthwhile I guess Hth Andy Le 1 août 2014 22:50, Philip Ogren philip.og...@oracle.com a écrit : Suppose I want to take my large

Re: creating a distributed index

2014-08-01 Thread Ankur Dave
At 2014-08-01 14:50:22 -0600, Philip Ogren philip.og...@oracle.com wrote: It seems that I could do this with mapPartition so that each element in a partition gets added to an index for that partition. [...] Would it then be possible to take a string and query each partition's index with it?