Cached data not showing up in Storage tab

2018-10-16 Thread Venkat Dabri
When I cache a variable the data never shows up in the storage tab. The storage tab is always blank. I have tried it in Zeppelin as well as spark-shell. scala> val classCount = spark.read.parquet("s3:// /classCount") scala> classCount.persist scala> classCount.count Nothing shows up in the

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
The same problem is mentioned here : https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html https://stackoverflow.com/questions/44792213/blank-storage-tab-in-spark-history-server On Tue, Oct 16, 2018 at 8:06 AM Venkat Dabri wrote: > > I d

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
t; > On Mon, Oct 15, 2018 at 11:53 AM Venkat Dabri wrote: >> >> I am trying to do a broadcast join on two tables. The size of the >> smaller table will vary based upon the parameters but the size of the >> larger table is close to 2TB. What I have noticed is that if I d

Spark seems to think that a particular broadcast variable is large in size

2018-10-15 Thread Venkat Dabri
I am trying to do a broadcast join on two tables. The size of the smaller table will vary based upon the parameters but the size of the larger table is close to 2TB. What I have noticed is that if I don't set the spark.sql.autoBroadcastJoinThreshold to 10G some of these operations do a

Re: java.lang.UnsupportedOperationException: No Encoder found for Set[String]

2018-08-16 Thread Venkat Dabri
We are using spark 2.2.0. Is it possible to bring the ExpressionEncoder from 2.3.0 and related classes into my code base and use them? I see the changes in ExpressionEncoder between 2.3.0 and 2.2.0 is not much but there might be many other classes underneath that might have changed. On Thu, Aug