Re: Do transformation functions on RDD invoke a Job [sc.runJob]?

2016-04-25 Thread Praveen Devarao
;d...@spark.apache.org>, user <user@spark.apache.org> Date: 25/04/2016 10:59 pm Subject: Re: Do transformation functions on RDD invoke a Job [sc.runJob]? Spark SQL's query planner has always delayed building the RDD, so has never needed to eagerly calculate the range boundaries (

Re: Do transformation functions on RDD invoke a Job [sc.runJob]?

2016-04-25 Thread Michael Armbrust
will try again" > > > > From:Reynold Xin <r...@databricks.com> > To:Praveen Devarao/India/IBM@IBMIN > Cc:"d...@spark.apache.org" <d...@spark.apache.org>, user < > user@spark.apache.org> > Date:25/04/2016 11:26 am > Su

Re: Do transformation functions on RDD invoke a Job [sc.runJob]?

2016-04-25 Thread Praveen Devarao
From: Reynold Xin <r...@databricks.com> To: Praveen Devarao/India/IBM@IBMIN Cc: "d...@spark.apache.org" <d...@spark.apache.org>, user <user@spark.apache.org> Date: 25/04/2016 11:26 am Subject: Re: Do transformation functions on RDD invoke a Job

Do transformation functions on RDD invoke a Job [sc.runJob]?

2016-04-25 Thread Praveen Devarao
Hi, I have a streaming program with the block as below [ref: https://github.com/agsachin/streamingBenchmark/blob/master/spark-benchmarks/src/main/scala/TwitterStreaming.scala ] 1 val lines = messages.map(_._2) 2 val hashTags = lines.flatMap(status => status.split(" "

Re: Do transformation functions on RDD invoke a Job [sc.runJob]?

2016-04-24 Thread Reynold Xin
Usually no - but sortByKey does because it needs the range boundary to be built in order to have the RDD. It is a long standing problem that's unfortunately very difficult to solve without breaking the RDD API. In DataFrame/Dataset we don't have this issue though. On Sun, Apr 24, 2016 at 10:54