; rdd1. collect //Action 1
> rdd2. collect //Action 2
>
> }
>
> Does Spark run Action 1 & 2 run in parallel? ( some kind of a pass through
> the driver code and than start the execution)?
>
> if not than is using threads safe for independent actions/red's?
>
>
>
--
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedin.com/in/rishiteshmishra
gt; the receiver dies and needs to be restarted somewhere else.
>
> As I understand, the direct-kafka streaming model just computes offsets
> and relays the work to a KafkaRDD. How is the execution locality compared
> to the receiver-based approach?
>
> thanks, Gerard.
>
duceByKey(reduceF)
rdd3.foreach(r => println(r))
You can always reconvert the obtained RDD after tranformation and
reduce to a DataFrame.
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://www.linkedin.com/profile/view?id=AAIAAAIFdkMB_v-nolCrFH6_pKf9oH6tZD8Qlgo=nav_r
Which version of Spark you are using ?? I can get correct results using
JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) .
I changed according to your input and got correct results from this test
suite.
On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j
got it by using JdbcRDDSuite
>
> Regards,
> Satish Chandra
>
> On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra <
> rishi80.mis...@gmail.com> wrote:
>
>> Which version of Spark you are using ?? I can get correct results using
>> JdbcRDD. Infact there is a test sui
all the rows having the same join
> key in order to perform the join.
>
>
>
> On Sat, Sep 19, 2015 at 12:55 PM, Rishitesh Mishra <
> rishi80.mis...@gmail.com> wrote:
>
>> Hi Reynold,
>> Can you please elaborate on this. I thought RDD also opens only an
>> iter
Hi Reynold,
Can you please elaborate on this. I thought RDD also opens only an
iterator. Does it get materialized for joins?
Rishi
On Saturday, September 19, 2015, Reynold Xin wrote:
> Yes for RDD -- both are materialized. No for DataFrame/SQL - one side
> streams.
>
>
>
Hi Jem,
A simple way to get this is to use MapPartitionedRDD. Please see the below
code. For this you need to know your parent RDD's partition numbers that
you want to exclude. One drawback here is the new RDD will also invoke
similar number of tasks as parent RDDs as both the RDDs have same
to worker node to read data from
remote hadoop cluster? I am more interested to know how mapr NFS layer is
accessed in parallel.
-
Swapnil
On Thu, Aug 27, 2015 at 2:53 PM, Rishitesh Mishra
rishi80.mis...@gmail.com wrote:
Hi Swapnil,
Let me try to answer some of the questions. Answers inline
Hi Swapnil,
Let me try to answer some of the questions. Answers inline. Hope it helps.
On Thursday, August 27, 2015, Swapnil Shinde swapnilushi...@gmail.com
wrote:
Hello
I am new to spark world and started to explore recently in standalone
mode. It would be great if I get clarifications on
Hi Sateesh,
It is interesting to know , how did you determine that the Dstream runs on
a single core. Did you mean receivers?
Coming back to your question, could you not start disk io in a separate
thread, so that the sceduler can go ahead and assign other tasks ?
On 21 Aug 2015 16:06, Sateesh
I am not sure if you can view all RDDs in a session. Tables are maintained
in a catalogue . Hence its easier. However you can see the DAG
representation , which lists all the RDDs in a job , with Spark UI.
On 20 Aug 2015 22:34, Dhaval Patel dhaval1...@gmail.com wrote:
Apologies
I
13 matches
Mail list logo