Which format is the best format for SparkSQL adhoc queries and general data 
storage?

There are lots of specialized cases, but generally accessing some but not all 
the available columns with a reasonable subset of the data.

I am learning towards Parquet as it has great support in Spark.

I also have to consider any file on HDFS may be accessed from other tools like 
Hive, Impala, HAWQ.

Suggestions?
—
airis.DATA
Timothy Spann, Senior Solutions Architect
C: 609-250-5894
http://airisdata.com/
http://meetup.com/nj-datascience


Reply via email to