[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-12-11 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243037#comment-14243037
 ] 

Pankit Thapar commented on HIVE-8955:
-

[~ashutoshc] , do you have any insight on this?


 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223568#comment-14223568
 ] 

Pankit Thapar commented on HIVE-8955:
-

[~szehon] , Can you please confirm if this is a bug or intentional?


 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223674#comment-14223674
 ] 

Szehon Ho commented on HIVE-8955:
-

Hi Pankit, I took a look.  It seems like for alter table/partition case, 
there's different flag being checked than hive.stats.autogather, which says A 
flag to gather statistics automatically during the INSERT OVERWRITE command.

The stats do seem to be correctly updated though at 
[here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L215]
 and 
[here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L461]
 as per my limited understanding.

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223712#comment-14223712
 ] 

Pankit Thapar commented on HIVE-8955:
-

Thanks for a quick glance [~szehon]. As far as I can tell, its the same flag 
hive.stats.autogather

// Statistics
HIVESTATSAUTOGATHER(hive.stats.autogather, true,
A flag to gather statistics automatically during the INSERT OVERWRITE 
command.),


But this flag is not used in the code flow for alter_partition

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223745#comment-14223745
 ] 

Szehon Ho commented on HIVE-8955:
-

I meant the stats do seem to be updated in 
[here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L461]
 during the 'alter partition' code path.

Its not checking the flag as you mentioned, but it doesnt look like a bug to 
me, as the flag is talking about 'insert overwrite' and not about alter table.  
Though my knowledge is more limited as Im not the original author of this 
flag/code.

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223778#comment-14223778
 ] 

Pankit Thapar commented on HIVE-8955:
-

Yes you are correct that the stats are updated in insert overwrite but insert 
overwrite might itself call append_partition or alter_partition. In case of 
append, it respects the flag but not in case of alter partition.

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223782#comment-14223782
 ] 

Pankit Thapar commented on HIVE-8955:
-

[~ashutoshc] , can you please confirm if this is a bug or not?

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223788#comment-14223788
 ] 

Szehon Ho commented on HIVE-8955:
-

I see, feel free to submit a patch if you have a test case to repro the issue.  
Maybe others with more knowledge of stats can chime in as well.. 

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223791#comment-14223791
 ] 

Ashutosh Chauhan commented on HIVE-8955:


Is there a specific query where you are not seeing the behavior you are 
expecting ?

 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf

2014-11-24 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223797#comment-14223797
 ] 

Pankit Thapar commented on HIVE-8955:
-

if I insert overwrite into an already existing partition, I see that it does 
the stats update even when hive.stats.autogather is set to false.
for example: 
 [hadoop@ip-10-169-146-156 ~]$ hive --hiveconf hive.log.dir=. --hiveconf 
hive.stats.autogather=false 
hive create table test(x string, y string,z string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ,;
hive LOAD DATA LOCAL INPATH './file.txt' OVERWRITE INTO TABLE test;
hive create table test_part(a string) PARTITIONED BY (x string, y string) 
LOCATION 'my table location';  
hive set hive.exec.dynamic.partition=true;  
hive  set hive.exec.dynamic.partition.mode=nonstrict;   
hive  INSERT OVERWRITE TABLE test_part PARTITION (x,y) select x,y,z from test;

I see update stats for the last query. 



 alter partition should check for hive.stats.autogather in hiveConf
 

 Key: HIVE-8955
 URL: https://issues.apache.org/jira/browse/HIVE-8955
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.15.0


 When alter partition code path is triggered, it should check for the flag 
 hive.stats.autogather, if it is true, then only updateStats else skip them.
 This is done in append_partition code flow. 
 Is there any specific reason the alter_partition does not respect this conf 
 variable?
 //code snippet : HiveMetastore.java 
  private Partition append_partition_common(RawStore ms, String dbName, String 
 tableName,
 ListString part_vals, EnvironmentContext envContext) throws 
 InvalidObjectException,
 AlreadyExistsException, MetaException {
 ...
 
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
 }
 ...
 ...
 }
 The above code snippet checks for the variable but this same check is absent 
 in 
 //code snippet : HiveAlterHandler.java 
 public Partition alterPartition(final RawStore msdb, Warehouse wh, final 
 String dbname,
   final String name, final ListString part_vals, final Partition 
 new_part)
   throws InvalidOperationException, InvalidObjectException, 
 AlreadyExistsException,
   MetaException {
 
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)