Re: spark dataset.cache is not thread safe

2019-07-22 Thread Amit Sharma
please update me if any one knows how to handle it. On Sun, Jul 21, 2019 at 7:18 PM Amit Sharma wrote: > Hi , I wrote a code in future block which read data from dataset and cache > it which is used later in the code. I faced a issue that data.cached() data > will be replaced by concurrent

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-22 Thread Bobby Evans
You are missing a lot of the stack trace that could explain the exception. All it shows is that an exception happened while writing out the orc file, not what that underlying exception is, there should be at least one more caused by under the one you included. Thanks, Bobby On Mon, Jul 22, 2019

Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-22 Thread Balakumar iyer S
Hi , I am trying to perform a group by followed by aggregate collect set operation on a two column data-setschema (LeftData int , RightData int). code snippet val wind_2 = dframe.groupBy("LeftData").agg(collect_set(array("RightData")))