[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914728#action_12914728
 ] 

He Yongqiang commented on HIVE-1361:


+1 running tests.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
> HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, 
> HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913783#action_12913783
 ] 

Namit Jain commented on HIVE-1361:
--

Ning, the latest patch contains the output of svn stat

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910851#action_12910851
 ] 

HBase Review Board commented on HIVE-1361:
--

Message from: "namit jain" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/862/#review1264
---



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java


This code seems useless



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java


How are you accounting for speculative 
execution ?

Can 2 tasks insert the entry ?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


It might be a good idea to make it easy 
to add new stats. Right now, you will need
to fix code in multiple places.

Instead of hard-coding nRowsInTable, it
would be good to keep an array of stats
we are publishing in a central place



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


This (addOutputs()) should be done at 
compile time



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java


Most of these parameters need not be 
instance variables - have a new function
where these are defined



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java


Can you add publishStats in Utilities and
let TableScan and FileSink share it



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


I am assuming these red blocks mean TABs



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java


??



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java


Do we need to lock the row ?
use a SELECT FOR UPDATE instead of
SELECT


- namit





> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910813#action_12910813
 ] 

HBase Review Board commented on HIVE-1361:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/862/#review1262
---



trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


Hive conf additions should be accompanied by new entries in 
conf/hive-default.xml for documentation purposes.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


Using e.toString() alone here may lose some of the diagnostics.

LOG.error has an overload which takes a Throwable parameter; use that to 
make sure that all the diagnostics (e.g. nested throwables) are logged.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


As a performance followup, we probably want to use delete(List) for 
batching.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


See perf comment above.  Also, this scan+delete code could be shared to 
avoid duplication.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java


See comments in HBaseStatsAggregator regarding diagnostics.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java


Another perf note:  for batch update, we can use setAutoFlush(false) and 
then flushCommits in closeConnection.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java


Probably need a followup to make this configurable.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java


What is this code for?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


Isn't this going to throw an NPE if aggregateStats returns null after 
handling an error?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


s/retur/return/



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java


typo:  MapRedTaks



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Some more overview (or just link to updated wiki doc) would be good here 
since the methods below reference things like temporary stats and aggregation 
without really explaining them.

Also:  I think having the publisher/aggregator implementations catch errors 
themselves is confusing.  It would be cleaner to let them propagate the 
exceptions, and instead catch+suppress+warn in the calling code (under control 
of a strictness config param).




trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


@param, @return?



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Use correct Javadoc @param syntax, and add @return.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Use @param, @return



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java


For other plugin-loading code, we use JavaUtils.getClassLoader().  Should 
probably do the same here?




trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java


Don't use printStackTrace; log the exception instead.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java


See comments on StatsAggregator regarding Javadoc.  Also, 
s/statics/statistics/



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java


I don't think this warrants four exclamation marks.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java


Is it worth using a prepared statement here?  

Also, depending on the transaction isolation level, concurrent update 
attempts could result

[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910470#action_12910470
 ] 

Namit Jain commented on HIVE-1361:
--

Will take a look 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-16 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910380#action_12910380
 ] 

HBase Review Board commented on HIVE-1361:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/862/
---

Review request for Hive Developers.


Summary
---

HIVE-1361


This addresses bug HIVE-1361.
http://issues.apache.org/jira/browse/HIVE-1361


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 997199 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
 PRE-CREATION 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
 PRE-CREATION 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java
 PRE-CREATION 
  trunk/ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JobCloseFeedBack.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 997199 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java
 PRE-CREATION 

Diff: http://review.cloudera.org/r/862/diff


Testing
---


Thanks,

Carl




> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Attachments: HIVE-1361.java_only.patch, HIVE-13

[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910350#action_12910350
 ] 

John Sichi commented on HIVE-1361:
--

Yay for Java-only patch :)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-08-09 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896634#action_12896634
 ] 

Ning Zhang commented on HIVE-1361:
--

Ahmed has put up the design doc on the wiki: 
http://wiki.apache.org/hadoop/Hive/StatsDev.

Ahmed is also finalizing the patch for review. 

There are some minor changes from the original requirement: currently the stats 
gather are # of rows, total size in bytes, # files and # of partitions (for 
table). It does not have the min/max/avg of row/file sizes since they are 
different in the raw size (serialized and compressed) with the sizes we saw 
during stats gathering (deserialized and decompressed). And there are no strong 
use cases for them currently, so we'll exclude them for this patch. 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-06-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881448#action_12881448
 ] 

Ning Zhang commented on HIVE-1361:
--

Some comments from internal design review:
 - The ANALYZE TABLE command should be integrated with the data replication 
hook. When an existing table/partition is analyzed, a new WriteEntity should be 
generated to make metadata replication work. 
 - Investigate JDO on top of HBase integration. If JDO works on HBase, we could 
just use JDO to update column stats as well. 
 - ANALYZE TABLE partition () should support 
"dynamic-partition-style" partition specification. This means the if there are 
2 partition columns ds, hr, we can do analyze table partition(ds = 
'2010-06-01', hr) to analyze all hr sub-partitions under ds='2010-06-01'. 



> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-05-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870264#action_12870264
 ] 

Ning Zhang commented on HIVE-1361:
--

all these stats should be able to collected automatically at insert time. since 
loading doesn't scan the data, we cannot gather stats from this command. 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.6.0
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-05-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870240#action_12870240
 ] 

Namit Jain commented on HIVE-1361:
--

Can we further break them down to the stats which will be collected 
automatically at insert/load time vs. the stats which will be collected when 
the user explicitly analyzes the table ?

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.6.0
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.