Hi Eyal,
You can also try to call repartition(1) before calling the
"monotonically_increasing_id" function , it will probably have some
performance degradation (depends on the size of the files), but might be
not as bad as the other workaround
after adding the ID you can repartition again in
Hi
There is the latest status for code generation.
When we use the master that will be Spark 2.2, the following exception
occurs. The latest version fixed 64KB errors in this case. However, we
meet another JVM limit, the number of the constant pool entry.
Caused by:
Hello all,
I am using the Spark 2.1.0 release,
I am trying to load BigTable CSV file with more than 1500 columns into our
system
Our flow of doing that is:
• First, read the data as an RDD
• generate continuing record id using the zipWithIndex()
(this operation exist only in RDD API,