[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243037#comment-14243037 ] Pankit Thapar commented on HIVE-8955: - [~ashutoshc] , do you have any insight on this? alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223568#comment-14223568 ] Pankit Thapar commented on HIVE-8955: - [~szehon] , Can you please confirm if this is a bug or intentional? alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223674#comment-14223674 ] Szehon Ho commented on HIVE-8955: - Hi Pankit, I took a look. It seems like for alter table/partition case, there's different flag being checked than hive.stats.autogather, which says A flag to gather statistics automatically during the INSERT OVERWRITE command. The stats do seem to be correctly updated though at [here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L215] and [here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L461] as per my limited understanding. alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223712#comment-14223712 ] Pankit Thapar commented on HIVE-8955: - Thanks for a quick glance [~szehon]. As far as I can tell, its the same flag hive.stats.autogather // Statistics HIVESTATSAUTOGATHER(hive.stats.autogather, true, A flag to gather statistics automatically during the INSERT OVERWRITE command.), But this flag is not used in the code flow for alter_partition alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223745#comment-14223745 ] Szehon Ho commented on HIVE-8955: - I meant the stats do seem to be updated in [here|https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L461] during the 'alter partition' code path. Its not checking the flag as you mentioned, but it doesnt look like a bug to me, as the flag is talking about 'insert overwrite' and not about alter table. Though my knowledge is more limited as Im not the original author of this flag/code. alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223778#comment-14223778 ] Pankit Thapar commented on HIVE-8955: - Yes you are correct that the stats are updated in insert overwrite but insert overwrite might itself call append_partition or alter_partition. In case of append, it respects the flag but not in case of alter partition. alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223782#comment-14223782 ] Pankit Thapar commented on HIVE-8955: - [~ashutoshc] , can you please confirm if this is a bug or not? alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223788#comment-14223788 ] Szehon Ho commented on HIVE-8955: - I see, feel free to submit a patch if you have a test case to repro the issue. Maybe others with more knowledge of stats can chime in as well.. alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223791#comment-14223791 ] Ashutosh Chauhan commented on HIVE-8955: Is there a specific query where you are not seeing the behavior you are expecting ? alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8955) alter partition should check for hive.stats.autogather in hiveConf
[ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223797#comment-14223797 ] Pankit Thapar commented on HIVE-8955: - if I insert overwrite into an already existing partition, I see that it does the stats update even when hive.stats.autogather is set to false. for example: [hadoop@ip-10-169-146-156 ~]$ hive --hiveconf hive.log.dir=. --hiveconf hive.stats.autogather=false hive create table test(x string, y string,z string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,; hive LOAD DATA LOCAL INPATH './file.txt' OVERWRITE INTO TABLE test; hive create table test_part(a string) PARTITIONED BY (x string, y string) LOCATION 'my table location'; hive set hive.exec.dynamic.partition=true; hive set hive.exec.dynamic.partition.mode=nonstrict; hive INSERT OVERWRITE TABLE test_part PARTITION (x,y) select x,y,z from test; I see update stats for the last query. alter partition should check for hive.stats.autogather in hiveConf Key: HIVE-8955 URL: https://issues.apache.org/jira/browse/HIVE-8955 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.15.0 When alter partition code path is triggered, it should check for the flag hive.stats.autogather, if it is true, then only updateStats else skip them. This is done in append_partition code flow. Is there any specific reason the alter_partition does not respect this conf variable? //code snippet : HiveMetastore.java private Partition append_partition_common(RawStore ms, String dbName, String tableName, ListString part_vals, EnvironmentContext envContext) throws InvalidObjectException, AlreadyExistsException, MetaException { ... if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir); } ... ... } The above code snippet checks for the variable but this same check is absent in //code snippet : HiveAlterHandler.java public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, final String name, final ListString part_vals, final Partition new_part) throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, MetaException { ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)