Which format is the best format for SparkSQL adhoc queries and general data storage?
There are lots of specialized cases, but generally accessing some but not all the available columns with a reasonable subset of the data. I am learning towards Parquet as it has great support in Spark. I also have to consider any file on HDFS may be accessed from other tools like Hive, Impala, HAWQ. Suggestions? — airis.DATA Timothy Spann, Senior Solutions Architect C: 609-250-5894 http://airisdata.com/ http://meetup.com/nj-datascience