;d...@spark.apache.org>, user <user@spark.apache.org>
Date: 25/04/2016 10:59 pm
Subject: Re: Do transformation functions on RDD invoke a Job
[sc.runJob]?
Spark SQL's query planner has always delayed building the RDD, so has
never needed to eagerly calculate the range boundaries (
will try again"
>
>
>
> From:Reynold Xin <r...@databricks.com>
> To:Praveen Devarao/India/IBM@IBMIN
> Cc:"d...@spark.apache.org" <d...@spark.apache.org>, user <
> user@spark.apache.org>
> Date:25/04/2016 11:26 am
> Su
From: Reynold Xin <r...@databricks.com>
To: Praveen Devarao/India/IBM@IBMIN
Cc: "d...@spark.apache.org" <d...@spark.apache.org>, user
<user@spark.apache.org>
Date: 25/04/2016 11:26 am
Subject: Re: Do transformation functions on RDD invoke a Job
Hi,
I have a streaming program with the block as below [ref:
https://github.com/agsachin/streamingBenchmark/blob/master/spark-benchmarks/src/main/scala/TwitterStreaming.scala
]
1 val lines = messages.map(_._2)
2 val hashTags = lines.flatMap(status => status.split(" "
Usually no - but sortByKey does because it needs the range boundary to be
built in order to have the RDD. It is a long standing problem that's
unfortunately very difficult to solve without breaking the RDD API.
In DataFrame/Dataset we don't have this issue though.
On Sun, Apr 24, 2016 at 10:54