Performance of using map column in schema

Ryan LeCompte Sun, 11 Oct 2009 04:20:42 -0700

Hello all,

I was wondering if there are any performance hits in using a
map<string,string> column in a Hive schema to represent a line of an apache
log. My issue is that frequently new parameters are added to apache log
lines, and it would be nice to not have to always explicitly define these
new typed columns in the Hive schema table. If we could specify a single
column of map<string,string> that represented all of the param key=value
pairs of the apache log line, then we could write ad-hoc queries that
referenced whichever log params we wanted. However, it seems that Hive wants
typed columns for each parameter to perform well. Any thoughts?


Thanks,
Ryan

Performance of using map column in schema

Reply via email to