[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994987#comment-12994987 ] Anja Gruenheid commented on HIVE-1940: -- I tried figuring out datanucleus and the creation of the initial metastore model, but I don't quite understand it: When I create the metastore in MySQL, I generate the jars by running ant model-jar in the hive/metastore folder. When I then run hive, metastore tables are generated according the command that I use (eg show tables) in MySQL. I referenced org.datanucleus.store.rdbms.SchemaTool instead of jpox before generating the jar file, but it didn't change anything. Basically, there has to be an overview of all metastore tables that can possibly be invoked. My question is: where? Thanks a lot for your help! > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > Attachments: HiveMetaStore.pdf > > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anja Gruenheid updated HIVE-1940: - Attachment: HiveMetaStore.pdf Hive MetaStore Model - 02/05/2011 > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > Attachments: HiveMetaStore.pdf > > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991022#comment-12991022 ] Anja Gruenheid commented on HIVE-1940: -- Here is the metastore model that I generated with MySQL Workbench: http://home.in.tum.de/~gruenhei/HiveMetaStore.pdf Comparing this model to the one displayed in the index wiki, I noticed the two tables PARTITIONS and PARTITION_KEY_VALS are missing in my model. Do you have any idea how I can create them? I tried adding partitions on tables, but that just created entries in table PARTITION_KEYS. > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990910#comment-12990910 ] Anja Gruenheid commented on HIVE-1940: -- I found out that the IDXS metastore tables are generated when I create an index for the first time. > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990868#comment-12990868 ] Anja Gruenheid commented on HIVE-1940: -- I created the metastore as you suggested, but I'm missing a couple of tables like IDXS. I used MySQL as local database and adjusted the parameters accordingly. When I create tables, I can see them in the metastore via MySQL, so it definitely is working. > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990331#comment-12990331 ] Anja Gruenheid commented on HIVE-1940: -- I have set up the last stable version, but as far as I understood, some features have been added during the current iteration, which also have had impact on the design of the MetaStore. Is there an up-to-date overview of the MetaStore somewhere or should I retrace the updates that have been made since the last release? If I can collect all the data that I need, I'll create the model. > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989494#comment-12989494 ] Anja Gruenheid commented on HIVE-1940: -- As first step, I would like to take a closer look at collecting meta data on the column level. In issue HIVE-33, five different statistics are described (# distinct values, # null values, 3 min values, 3 max values, avg size of column) that have been proposed as column meta data. As reference, I would take the implementation of the table/partition meta data collection. As far as I can tell, deriving histograms is a little bit more complex than obtaining column information, which is why I want to start out with that. Is there an up-to-date MetaStore DDL script or an E/R model? > Query Optimization Using Column Metadata and Histograms > --- > > Key: HIVE-1940 > URL: https://issues.apache.org/jira/browse/HIVE-1940 > Project: Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Anja Gruenheid > > The current basis for cost-based query optimization in Hive is information > gathered on tables and partitions. To make further improvements in query > optimization possible, the next step is to develop and implement > possibilities to gather information on columns as discussed in issue HIVE-33. > After that, an implementation of histograms is a possible option to use and > collect run-time statistics. Next to the actual implementation of these > features, it is also necessary to develop a consistent storage model for the > MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
Query Optimization Using Column Metadata and Histograms --- Key: HIVE-1940 URL: https://issues.apache.org/jira/browse/HIVE-1940 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor Reporter: Anja Gruenheid The current basis for cost-based query optimization in Hive is information gathered on tables and partitions. To make further improvements in query optimization possible, the next step is to develop and implement possibilities to gather information on columns as discussed in issue HIVE-33. After that, an implementation of histograms is a possible option to use and collect run-time statistics. Next to the actual implementation of these features, it is also necessary to develop a consistent storage model for the MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira