Hi , I want to store binary data (such as images) into hive table but the binary data column might be much larger than other columns per row. I'm worried about the query performance. One way I can think of is to separate binary data storage from other columns by creating 2 hive tables and run 2 separate spark query and join them later.
Later, I found parquet has supported column split into different files as shown here: https://parquet.apache.org/documentation/latest/ I'm wondering if spark sql already supports that ? If so, how to use ? Weide