Hi
In python, or in general in spark, you can just "read" the files and select
the column. I am assuming you are reading each file individually in
separate dataframes and joining them. Instead, you can read all the files
in single dataframe and select 1 column.
On Wed, Feb 9, 2022 at 2:55 AM
I need to create a single table by selecting one column from thousands of
files. The columns are all of the same type, have the same number of rows and
rows names. I am currently using join. I get OOM on mega-mem cluster with 2.8
TB.
Does spark have something like cbind() “Take a sequence of