Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-19 Thread Eyal Zituny
Hi Eyal, You can also try to call repartition(1) before calling the "monotonically_increasing_id" function , it will probably have some performance degradation (depends on the size of the files), but might be not as bad as the other workaround after adding the ID you can repartition again in

Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-18 Thread Kazuaki Ishizaki
Hi There is the latest status for code generation. When we use the master that will be Spark 2.2, the following exception occurs. The latest version fixed 64KB errors in this case. However, we meet another JVM limit, the number of the constant pool entry. Caused by:

[Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-18 Thread elevy
Hello all, I am using the Spark 2.1.0 release, I am trying to load BigTable CSV file with more than 1500 columns into our system Our flow of doing that is: • First, read the data as an RDD • generate continuing record id using the zipWithIndex() (this operation exist only in RDD API,