[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-05 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991022#comment-12991022
 ] 

Anja Gruenheid commented on HIVE-1940:
--

Here is the metastore model that I generated with MySQL Workbench: 
http://home.in.tum.de/~gruenhei/HiveMetaStore.pdf
Comparing this model to the one displayed in the index wiki, I noticed the two 
tables PARTITIONS and PARTITION_KEY_VALS are missing in my model. Do you have 
any idea how I can create them? I tried adding partitions on tables, but that 
just created entries in table PARTITION_KEYS.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-04 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990868#comment-12990868
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I created the metastore as you suggested, but I'm missing a couple of tables 
like IDXS. I used MySQL as local database and adjusted the parameters 
accordingly.
When I create tables, I can see them in the metastore via MySQL, so it 
definitely is working.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-04 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990910#comment-12990910
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I found out that the IDXS metastore tables are generated when I create an index 
for the first time.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-03 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990331#comment-12990331
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I have set up the last stable version, but as far as I understood, some 
features have been added during the current iteration, which also have had 
impact on the design of the MetaStore. Is there an up-to-date overview of the 
MetaStore somewhere or should I retrace the updates that have been made since 
the last release?

If I can collect all the data that I need, I'll create the model.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-01 Thread Anja Gruenheid (JIRA)
Query Optimization Using Column Metadata and Histograms
---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid


The current basis for cost-based query optimization in Hive is information 
gathered on tables and partitions. To make further improvements in query 
optimization possible, the next step is to develop and implement possibilities 
to gather information on columns as discussed in issue HIVE-33. After that, an 
implementation of histograms is a possible option to use and collect run-time 
statistics. Next to the actual implementation of these features, it is also 
necessary to develop a consistent storage model for the MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-01 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989494#comment-12989494
 ] 

Anja Gruenheid commented on HIVE-1940:
--

As first step, I would like to take a closer look at collecting meta data on 
the column level. In issue HIVE-33, five different statistics are described (# 
distinct values, # null values, 3 min values, 3 max values, avg size of column) 
that have been proposed as column meta data. As reference, I would take the 
implementation of the table/partition meta data collection.
As far as I can tell, deriving histograms is a little bit more complex than 
obtaining column information, which is why I want to start out with that.

Is there an up-to-date MetaStore DDL script or an E/R model?

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira