I'd honestly like to see hive remain a partitioned flat file store. I
don't think indexing what's inside the files is too incredibly useful
in most situations where you'd use hive. I also think this kind of
store is just the right fit for the hadoop and large scale analytics
situation. I don't want to see hive go toward hbase or katta. What is
the long term vision for hive?
Josh
On Dec 14, 2008, at 1:06 PM, Joydeep Sen Sarma wrote:
We have done some preliminary work with indexing – but that’s not
the focus right now and no code is available in the open source
trunk for this purpose. I think it’s fair to say that hive is not
optimized for online processing right now. (and we are quite some
ways off from columnar storage).
From: Martin Matula [mailto:[email protected]]
Sent: Sunday, December 14, 2008 6:54 AM
To: [email protected]
Subject: OLAP with Hive
Hi,
Is Hive capable of indexing the data and storing them in a way
optimized for querying (like a columnar database - bitmap indexes,
compression, etc.)?
I need to be able to get decent response times for queries (up to a
few seconds) over huge amounts of analytical data. Is that
achievable (with appropriate number of machines in a cluster)? I saw
the serialization/deserialization of tables is pluggable. Is that
the way to make the storage more efficient? Any existing
implementation (either ready or in progress) that would be targeted
at this? Or any hints on what I may want to take a look at among the
things that are currently available in Hive/Hadoop?
Thanks,
Martin