Hi all,
I have 10M (ID, BINARY) record, and the size of each BINARY is 5MB on
average.
In my daily application, I need to filter out 10K BINARY according to an ID
list.
How should I store the whole data to make the filtering faster?
I'm using DataFrame in Spark 2.0.0 and I've tried row-based
Hi all,
I'm trying to append a column to a df.
I understand that the new column must be created by
1) using literals,
2) transforming an existing column in df,
or 3) generated from udf over this df
In my case, the column to be appended is created by processing each row,
like
val df =