On Mon, Jul 3, 2017 at 10:17 AM, Dain Sundstrom <d...@iq80.com> wrote:
> +1 > > I work on Presto and I think this the right direction for our users. We > have several users running Presto without Hive and anything we can do to > help simplify the Metastore experience would be a good help. > > When I read proposals like this, one thing I like to see is a vision > (scope) for the project. In this case, I’d like to understand if the plan > is to limit the scope of the system to what Hive can support. For example, > the system will clearly support schemas (databases) with tables and views > as defined by Hive, but will there be support for additional types like a > Presto view which is incompatible with a Hive views due to the language > differences? Currently, in Presto we create a Hive view to reserve a spot > in the "tables namespace”, and then we put our view data in a table > properties. I would like to formalize this kind of system, so if a Hive > user queries a Presto view, they get a proper error message. I have similar > concerns about data types, compression, and data organization (e.g., > different bucketing strategies). > We tried to lay out the scope in the wiki page [1] Details will need to be worked out by the new project. But I’ll give you my view on it. I don’t see the value of breaking this out of Hive if it isn’t willing to take non-Hive features. If it’s still Hive only in it’s focus why pay the cost of having separate projects? So, as long as Presto style views don’t break Hive style views or make the system horribly complicated and someone is willing to add them, +1. A related area that we will need to work out is the metastore connection to the Hive physical layout. Today, when a user says “create table”, the metastore creates a directory in HDFS. This ties the metastore to a Hive style data layout. How should that be handled going forward? We could assert that having a standard data layout is good, and all users of this metadata system should use this layout. We could make the physical operations pluggable, providing the Hive style operations as an option, but allowing users to bring others. We could completely remove the physical operations, leave them all in Hive, and say that any system using this should do their own physical operations. I don't like the last option because it makes it hard to share data across tools, but I can think of pro and con arguments for the first two. > Another aspect of this is what is the vision for the specification of the > Metastore. Is the vision to have a very open end-user extensible design > (e.g., just a name and a bag of properties), or is the vision to have a > project specified common set properties with “rules” for proper extension? > Again, just my opinion, but I would say the latter. The utility of a name and a bag of properties turns out to be pretty limited and pretty easy to implement if that’s all you want. The current metastore can do a lot more than that. > > I would also be very interested in documentation for the Metastore APIs > (and can help). We currently reverse engineer proper metastore interaction > by reading the Hive code, and writing a lot of experimental programs, and I > would really just like to know the "right way”. Also, we end up missing > out on new features in the Metastore due to the work required to understand > how they work. > +1 to better documentation regardless of where the metastore code lives. Alan. 1. https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal