Re: Concurrent execution of actions within a driver

2015-10-26 Thread Rishitesh Mishra
; rdd1. collect //Action 1 > rdd2. collect //Action 2 > > } > > Does Spark run Action 1 & 2 run in parallel? ( some kind of a pass through > the driver code and than start the execution)? > > if not than is using threads safe for independent actions/red's? > > > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra

Re: Node afinity for Kafka-Direct Stream

2015-10-14 Thread Rishitesh Mishra
gt; the receiver dies and needs to be restarted somewhere else. > > As I understand, the direct-kafka streaming model just computes offsets > and relays the work to a KafkaRDD. How is the execution locality compared > to the receiver-based approach? > > thanks, Gerard. >

Re: Spark DataFrame GroupBy into List

2015-10-13 Thread Rishitesh Mishra
duceByKey(reduceF) rdd3.foreach(r => println(r)) You can always reconvert the obtained RDD after tranformation and reduce to a DataFrame. Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://www.linkedin.com/profile/view?id=AAIAAAIFdkMB_v-nolCrFH6_pKf9oH6tZD8Qlgo=nav_r

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
Which version of Spark you are using ?? I can get correct results using JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) . I changed according to your input and got correct results from this test suite. On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
got it by using JdbcRDDSuite > > Regards, > Satish Chandra > > On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Which version of Spark you are using ?? I can get correct results using >> JdbcRDD. Infact there is a test sui

Re: in joins, does one side stream?

2015-09-20 Thread Rishitesh Mishra
all the rows having the same join > key in order to perform the join. > > > > On Sat, Sep 19, 2015 at 12:55 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Hi Reynold, >> Can you please elaborate on this. I thought RDD also opens only an >> iter

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra
Hi Reynold, Can you please elaborate on this. I thought RDD also opens only an iterator. Does it get materialized for joins? Rishi On Saturday, September 19, 2015, Reynold Xin wrote: > Yes for RDD -- both are materialized. No for DataFrame/SQL - one side > streams. > > >

Re: RDD from partitions

2015-08-28 Thread Rishitesh Mishra
Hi Jem, A simple way to get this is to use MapPartitionedRDD. Please see the below code. For this you need to know your parent RDD's partition numbers that you want to exclude. One drawback here is the new RDD will also invoke similar number of tasks as parent RDDs as both the RDDs have same

Re: Spark driver locality

2015-08-28 Thread Rishitesh Mishra
to worker node to read data from remote hadoop cluster? I am more interested to know how mapr NFS layer is accessed in parallel. - Swapnil On Thu, Aug 27, 2015 at 2:53 PM, Rishitesh Mishra rishi80.mis...@gmail.com wrote: Hi Swapnil, Let me try to answer some of the questions. Answers inline

Re: Spark driver locality

2015-08-27 Thread Rishitesh Mishra
Hi Swapnil, Let me try to answer some of the questions. Answers inline. Hope it helps. On Thursday, August 27, 2015, Swapnil Shinde swapnilushi...@gmail.com wrote: Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on

Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Rishitesh Mishra
Hi Sateesh, It is interesting to know , how did you determine that the Dstream runs on a single core. Did you mean receivers? Coming back to your question, could you not start disk io in a separate thread, so that the sceduler can go ahead and assign other tasks ? On 21 Aug 2015 16:06, Sateesh

Re: How to list all dataframes and RDDs available in current session?

2015-08-20 Thread Rishitesh Mishra
I am not sure if you can view all RDDs in a session. Tables are maintained in a catalogue . Hence its easier. However you can see the DAG representation , which lists all the RDDs in a job , with Spark UI. On 20 Aug 2015 22:34, Dhaval Patel dhaval1...@gmail.com wrote: Apologies I

Subscribe

2015-08-17 Thread Rishitesh Mishra