Re: How do you get the partitioner for an RDD in Java?
Thanks Imran. That's exactly what I needed to know. Darin. From: Imran Rashid iras...@cloudera.com To: Darin McBeath ddmcbe...@yahoo.com Cc: User user@spark.apache.org Sent: Tuesday, February 17, 2015 8:35 PM Subject: Re: How do you get the partitioner for an RDD in Java? a JavaRDD is just a wrapper around a normal RDD defined in scala, which is stored in the rdd field. You can access everything that way. The JavaRDD wrappers just provide some interfaces that are a bit easier to work with in Java. If this is at all convincing, here's me demonstrating it inside the spark-shell (yes its scala, but I'm using the java api) scala val jsc = new JavaSparkContext(sc) jsc: org.apache.spark.api.java.JavaSparkContext = org.apache.spark.api.java.JavaSparkContext@7d365529 scala val data = jsc.parallelize(java.util.Arrays.asList(Array(a, b, c))) data: org.apache.spark.api.java.JavaRDD[Array[String]] = ParallelCollectionRDD[0] at parallelize at console:15 scala data.rdd.partitioner res0: Option[org.apache.spark.Partitioner] = None On Tue, Feb 17, 2015 at 3:44 PM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: In an 'early release' of the Learning Spark book, there is the following reference: In Scala and Java, you can determine how an RDD is partitioned using its partitioner property (or partitioner() method in Java) However, I don't see the mentioned 'partitioner()' method in Spark 1.2 or a way of getting this information. I'm curious if anyone has any suggestions for how I might go about finding how an RDD is partitioned in a Java program. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How do you get the partitioner for an RDD in Java?
a JavaRDD is just a wrapper around a normal RDD defined in scala, which is stored in the rdd field. You can access everything that way. The JavaRDD wrappers just provide some interfaces that are a bit easier to work with in Java. If this is at all convincing, here's me demonstrating it inside the spark-shell (yes its scala, but I'm using the java api) scala val jsc = new JavaSparkContext(sc) jsc: org.apache.spark.api.java.JavaSparkContext = org.apache.spark.api.java.JavaSparkContext@7d365529 scala val data = jsc.parallelize(java.util.Arrays.asList(Array(a, b, c))) data: org.apache.spark.api.java.JavaRDD[Array[String]] = ParallelCollectionRDD[0] at parallelize at console:15 scala data.rdd.partitioner res0: Option[org.apache.spark.Partitioner] = None On Tue, Feb 17, 2015 at 3:44 PM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: In an 'early release' of the Learning Spark book, there is the following reference: In Scala and Java, you can determine how an RDD is partitioned using its partitioner property (or partitioner() method in Java) However, I don't see the mentioned 'partitioner()' method in Spark 1.2 or a way of getting this information. I'm curious if anyone has any suggestions for how I might go about finding how an RDD is partitioned in a Java program. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: How do you get the partitioner for an RDD in Java?
Where did you look? BTW, it is defined in the RDD class as a val: val partitioner: Option[Partitioner] Mohammed -Original Message- From: Darin McBeath [mailto:ddmcbe...@yahoo.com.INVALID] Sent: Tuesday, February 17, 2015 1:45 PM To: User Subject: How do you get the partitioner for an RDD in Java? In an 'early release' of the Learning Spark book, there is the following reference: In Scala and Java, you can determine how an RDD is partitioned using its partitioner property (or partitioner() method in Java) However, I don't see the mentioned 'partitioner()' method in Spark 1.2 or a way of getting this information. I'm curious if anyone has any suggestions for how I might go about finding how an RDD is partitioned in a Java program. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org