Hi,
Is Hive capable of indexing the data and storing them in a way optimized for
querying (like a columnar database - bitmap indexes, compression, etc.)?
I need to be able to get decent response times for queries (up to a few
seconds) over huge amounts of analytical data. Is that achievable (with
appropriate number of machines in a cluster)? I saw the
serialization/deserialization of tables is pluggable. Is that the way to
make the storage more efficient? Any existing implementation (either ready
or in progress) that would be targeted at this? Or any hints on what I may
want to take a look at among the things that are currently available in
Hive/Hadoop?
Thanks,
Martin