Hi Mike,

FYI: Is you are using Spark 2.x, you might have issues with encoders if you
use a case class with Enumeration type field, see
https://issues.apache.org/jira/browse/SPARK-17248

For (1), (2), I would guess Int would be better (space-wise), but I am not
familiar with parquet's internals.

Best,
Anastasios

On Fri, May 12, 2017 at 5:07 AM, Mike Wheeler <rotationsymmetr...@gmail.com>
wrote:

> Hi Spark Users,
>
> I want to store Enum type (such as Vehicle Type: Car, SUV, Wagon)  in my
> data. My storage format will be parquet and I need to access the data from
> Spark-shell, Spark SQL CLI, and hive. My questions:
>
> 1) Should I store my Enum type as String or store it as numeric encoding
> (aka 1=Car, 2=SUV, 3=Wagon)?
>
> 2) If I choose String, any penalty in hard drive space or memory?
>
> Thank you!
>
> Mike
>



-- 
-- Anastasios Zouzias
<a...@zurich.ibm.com>

Reply via email to