Hi Mike, FYI: Is you are using Spark 2.x, you might have issues with encoders if you use a case class with Enumeration type field, see https://issues.apache.org/jira/browse/SPARK-17248
For (1), (2), I would guess Int would be better (space-wise), but I am not familiar with parquet's internals. Best, Anastasios On Fri, May 12, 2017 at 5:07 AM, Mike Wheeler <rotationsymmetr...@gmail.com> wrote: > Hi Spark Users, > > I want to store Enum type (such as Vehicle Type: Car, SUV, Wagon) in my > data. My storage format will be parquet and I need to access the data from > Spark-shell, Spark SQL CLI, and hive. My questions: > > 1) Should I store my Enum type as String or store it as numeric encoding > (aka 1=Car, 2=SUV, 3=Wagon)? > > 2) If I choose String, any penalty in hard drive space or memory? > > Thank you! > > Mike > -- -- Anastasios Zouzias <a...@zurich.ibm.com>