[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-15 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994987#comment-12994987
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I tried figuring out datanucleus and the creation of the initial metastore 
model, but I don't quite understand it:

When I create the metastore in MySQL, I generate the jars by running ant 
model-jar in the hive/metastore folder. When I then run hive, metastore tables 
are generated according the command that I use (eg show tables) in MySQL. I 
referenced org.datanucleus.store.rdbms.SchemaTool instead of jpox before 
generating the jar file, but it didn't change anything. Basically, there has to 
be an overview of all metastore tables that can possibly be invoked. My 
question is: where?

Thanks a lot for your help!

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
> Attachments: HiveMetaStore.pdf
>
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-15 Thread Anja Gruenheid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anja Gruenheid updated HIVE-1940:
-

Attachment: HiveMetaStore.pdf

Hive MetaStore Model - 02/05/2011

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
> Attachments: HiveMetaStore.pdf
>
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-05 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991022#comment-12991022
 ] 

Anja Gruenheid commented on HIVE-1940:
--

Here is the metastore model that I generated with MySQL Workbench: 
http://home.in.tum.de/~gruenhei/HiveMetaStore.pdf
Comparing this model to the one displayed in the index wiki, I noticed the two 
tables PARTITIONS and PARTITION_KEY_VALS are missing in my model. Do you have 
any idea how I can create them? I tried adding partitions on tables, but that 
just created entries in table PARTITION_KEYS.

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-04 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990910#comment-12990910
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I found out that the IDXS metastore tables are generated when I create an index 
for the first time.

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-04 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990868#comment-12990868
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I created the metastore as you suggested, but I'm missing a couple of tables 
like IDXS. I used MySQL as local database and adjusted the parameters 
accordingly.
When I create tables, I can see them in the metastore via MySQL, so it 
definitely is working.

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-03 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990331#comment-12990331
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I have set up the last stable version, but as far as I understood, some 
features have been added during the current iteration, which also have had 
impact on the design of the MetaStore. Is there an up-to-date overview of the 
MetaStore somewhere or should I retrace the updates that have been made since 
the last release?

If I can collect all the data that I need, I'll create the model.

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-01 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989494#comment-12989494
 ] 

Anja Gruenheid commented on HIVE-1940:
--

As first step, I would like to take a closer look at collecting meta data on 
the column level. In issue HIVE-33, five different statistics are described (# 
distinct values, # null values, 3 min values, 3 max values, avg size of column) 
that have been proposed as column meta data. As reference, I would take the 
implementation of the table/partition meta data collection.
As far as I can tell, deriving histograms is a little bit more complex than 
obtaining column information, which is why I want to start out with that.

Is there an up-to-date MetaStore DDL script or an E/R model?

> Query Optimization Using Column Metadata and Histograms
> ---
>
> Key: HIVE-1940
> URL: https://issues.apache.org/jira/browse/HIVE-1940
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-01 Thread Anja Gruenheid (JIRA)
Query Optimization Using Column Metadata and Histograms
---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid


The current basis for cost-based query optimization in Hive is information 
gathered on tables and partitions. To make further improvements in query 
optimization possible, the next step is to develop and implement possibilities 
to gather information on columns as discussed in issue HIVE-33. After that, an 
implementation of histograms is a possible option to use and collect run-time 
statistics. Next to the actual implementation of these features, it is also 
necessary to develop a consistent storage model for the MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira