Re: Best Practice for Enum in Spark SQL

2017-05-12 Thread Anastasios Zouzias
Hi Mike, FYI: Is you are using Spark 2.x, you might have issues with encoders if you use a case class with Enumeration type field, see https://issues.apache.org/jira/browse/SPARK-17248 For (1), (2), I would guess Int would be better (space-wise), but I am not familiar with parquet's internals.

Best Practice for Enum in Spark SQL

2017-05-11 Thread Mike Wheeler
Hi Spark Users, I want to store Enum type (such as Vehicle Type: Car, SUV, Wagon) in my data. My storage format will be parquet and I need to access the data from Spark-shell, Spark SQL CLI, and hive. My questions: 1) Should I store my Enum type as String or store it as numeric encoding (aka