+1 to Ayan's answer, I think this is a common distributed anti pattern that
trips us all up at some point or another.
You definitely want to (in most cases) yield and create a new
RDD/Dataframe/Dataset and then perform your save operation on that.
On 28 May 2017 at 21:09, ayan guha
Hi
You can modify your parse function to yield/emit the output record, instead
of inserting. that way, you can essentially call .toDF to convert the
outcome to a dataframe and then use driver's cassandra connection to save
to cassandra (data will still in Executors, but now connector itself will
You would need to use *native* Cassandra API's in each Executor -
not org.apache.spark.sql.cassandra.CassandraSQLContext
- including to create a separate Cassandra connection on each Executor.
2017-05-28 15:47 GMT-07:00 Abdulfattah Safa :
> So I can't run SQL queries in
So I can't run SQL queries in Executors ?
On Sun, May 28, 2017 at 11:00 PM Mark Hamstra
wrote:
> You can't do that. SparkContext and SparkSession can exist only on the
> Driver.
>
> On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa
> wrote:
>
>>
You can't do that. SparkContext and SparkSession can exist only on the
Driver.
On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa
wrote:
> How can I use SparkContext (to create Spark Session or Cassandra Sessions)
> in executors?
> If I pass it as parameter to the foreach
How can I use SparkContext (to create Spark Session or Cassandra Sessions)
in executors?
If I pass it as parameter to the foreach or foreachpartition, then it will
have a null value.
Shall I create a new SparkContext in each executor?
Here is what I'm trying to do:
Read a dump directory with