Hello, I'm trying to write to parquet some RDD[T] where T is a protobuf message, in scala. I am wondering what is the best option to do this, and I would be interested by your lights. So far, I see two possibilities: - use PairRDD method *saveAsNewAPIHadoopFile*, and I guess I need to call *ParquetOutputFormat.setWriteSupportClass *and *ProtoParquetOutputFormat.setProtobufClass *before. But in that case, I'm not sure I have much control on how to partition files in different folders on file system. - or convert the RDD to dataframe then use *write.parquet ; *in that case, I have more control, in case rely on *partitionBy *to arrange the files in different folders. But I'm not sure there is some built-in way to convert rdd of protobuf to dataframe in spark ? I would need to rely on this : https://github.com/saurfang/sparksql-protobuf.
What do you think ? Kind regards, David