Hello folks,
Recently I have noticed unexpectedly big network traffic between Driver Program
and Worker node.
During debugging I have figured out that it is caused by following block of
code
—— Java ——— —
DataFrame etpvRecords = context.sql(" SOME SQL query here");
Mapper m = new
I noticed that toJavaRDD causes a computation on the DataFrame, so is it
considered an action, even though logically it's a transformation?
On Nov 4, 2015 6:51 PM, "Aliaksei Tsyvunchyk"
wrote:
> Hello folks,
>
> Recently I have noticed unexpectedly big network traffic
Hello Romi,
Do you mean that in my particular case I’m causing computation on dataFrame or
it is regular behavior of DataFrame.toJavaRDD ?
If it’s regular behavior, do you know which approach could be used to perform
make/reduce on dataFrame without causing it to load all data to driver
In my program I move between RDD and DataFrame several times.
I know that the entire data of the DF doesn't go into the driver because it
wouldn't fit there.
But calling toJavaRDD does cause computation.
Check the number of partitions you have on the DF and RDD...
On Nov 4, 2015 7:54 PM,
Hi Romi,
Thank for pointing me. I quite new in Spark and not sure how it can help when
I’ll check number of partitions in DF and RDD, so if you can give me some
explanation it would be really helpful. Link to documentation will also help.
> On Nov 4, 2015, at 1:05 PM, Romi Kuntsman