[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440502#comment-17440502 ] Jean-Yves STEPHAN commented on HIVE-18743: -- Hello. We use Hive for a Spark project, and our Spark job hangs in a branch of the code controlled by the DO_NOT_POPULATE_QUICK_STATS property. I'd like to try switching off this flag, it's currently passed as an "EnvironmentContext". Is this something I can control via an environment variable? or via a HiveConf (to set in hive-site.xml)? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.1.0, 1.2.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alex Kolbasov >Priority: Major > Fix For: 3.1.0, 2.4.0, 3.0.0 > > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448509#comment-16448509 ] Vihang Karajgaonkar commented on HIVE-18743: merged to branch-3 as well. Since fixed in 2.4.0 and 3.1.0 without 3.0.0 doesn't make sense. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Fix For: 3.0.0, 2.4.0, 3.1.0 > > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448495#comment-16448495 ] Vihang Karajgaonkar commented on HIVE-18743: patch merged to branch-2 as well. Resolving this. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447291#comment-16447291 ] Alexander Kolbasov commented on HIVE-18743: --- [~vihangk1] just got results from branch-2 patch testing - the test failures seem to be unrelated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447146#comment-16447146 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12920122/HIVE-18743.01-branch-2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10673 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=227) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_tableproperty_optimize] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types] (batchId=155) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[merge_negative_5] (batchId=88) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/10402/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10402/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12920122 - PreCommit-HIVE-Build > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446957#comment-16446957 ] Vihang Karajgaonkar commented on HIVE-18743: Patch merged to master branch. Thanks for your contribution [~akolb] Is the branch-2 patch ready to be merged as well? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446552#comment-16446552 ] Alexander Kolbasov commented on HIVE-18743: --- Attached branch-2 patch as well. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01-branch-2.patch, HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446536#comment-16446536 ] Alexander Kolbasov commented on HIVE-18743: --- While porting the fix to branch-2 I noticed that {{alterTempTable()}} there updates stats while in branch-3 it doesn't. Does anyone know why this is the case? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446488#comment-16446488 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] Would you be able to commit the fix for me? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446487#comment-16446487 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] I heep biostory of reviews in reviewboard, but will keep patches in Jira as well in the future if this is useful for others. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445471#comment-16445471 ] Zoltan Haindrich commented on HIVE-18743: - [~akolb] there was an acid related ticket which have landed just before I've seen the end of that ticket - since it have added a lot of if-s everywhere I've to re-interpret a lot of things... so we are better of to have at least this fix.. +1 ; I'm checking if there are any related test failures note: why are you removing previous version of your patch? please don't do that...I know it might look tidier...but: the comments will miss there context - and by re-using patch#01 you may confuse a reviewer who have already seen your ticket...and remembers that it had 1 patch > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444691#comment-16444691 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12919833/HIVE-18743.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 14280 tests executed *Failed tests:* {noformat} TestMinimrCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93) [infer_bucket_sort_num_buckets.q,infer_bucket_sort_reducers_power_two.q,parallel_orderby.q,bucket_num_reducers_acid.q,infer_bucket_sort_map_operators.q,infer_bucket_sort_merge.q,root_dir_external_table.q,infer_bucket_sort_dyn_part.q,udf_using.q,bucket_num_reducers_acid2.q] TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely timed out) (batchId=217) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_smb] (batchId=92) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[results_cache_invalidation2] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] (batchId=54) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[default_constraint] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_1] (batchId=171) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=105) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[cluster_tasklog_retrieval] (batchId=98) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace] (batchId=98) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace_turnoff] (batchId=98) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[minimr_broken_pipe] (batchId=98) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query64] (batchId=253) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=225) org.apache.hadoop.hive.ql.TestAcidOnTez.testAcidInsertWithRemoveUnion (batchId=228) org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=228) org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 (batchId=228) org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=232) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=235) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=235) org.apache.hive.minikdc.TestJdbcWithMiniKdcCookie.testCookieNegative (batchId=254) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/10349/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10349/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10349/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12919833 - PreCommit-HIVE-Build > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444587#comment-16444587 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s{color} | {color:red} standalone-metastore: The patch generated 5 new + 522 unchanged - 12 fixed = 527 total (was 534) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 58s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-10349/dev-support/hive-personality.sh | | git revision | master / 046bc64 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-10349/yetus/diff-checkstyle-standalone-metastore.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-10349/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436583#comment-16436583 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] What are your current plans for this? Do you plan to commit your changes in any of the releases? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387694#comment-16387694 ] Zoltan Haindrich commented on HIVE-18743: - I also started to suspect that it's not easy to check this at all... I think for hive-2 this would be good. - could you submit your patch for branch-2 ? for hive-3: I think after HIVE-17478 this issue should be re-checked on master as well...since the goal of that is not to fix this - in case it will be still broken ; I think it will be more straightforward to fix it there. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386406#comment-16386406 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] This means that this fix doesn't make sense in 3.0 since you are removing the code altogether. Do you plan to port your change to hive-2 s well? For the ptest - the point of the patch is to ensure that we do not access the file system when we don't need to. It doesn't change any externally-visible behavior, so we can't really test it with ptest. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385693#comment-16385693 ] Zoltan Haindrich commented on HIVE-18743: - [~akolb] if the stats collection is removed from the metastore; that also means that the code you are testing will be also gonebecause it will no longer happen there... I think that probably the following command sequence could make this testable: create table; insert ; desc the table; remove files from the table datadir by dfs commands; alter table ; desc table - stats are the same > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385415#comment-16385415 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12912949/HIVE-18743.07.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 13062 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385397#comment-16385397 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s{color} | {color:red} standalone-metastore: The patch generated 5 new + 505 unchanged - 10 fixed = 510 total (was 515) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9478/dev-support/hive-personality.sh | | git revision | master / 05d4719 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus/diff-checkstyle-standalone-metastore.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385389#comment-16385389 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12912949/HIVE-18743.07.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 13061 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385374#comment-16385374 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] what is the value of high-value qtest? The unit test allows me to control execution environment of the function exactly and it gives me an opportunity to verify whether warehouse ops are called or not. What extra value would we get from a qtest that we don't get from unit test? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385370#comment-16385370 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] So the assumption here is that the value can be not just a "true"/"false" string but an actual JSON object in which case it is parsed and {{stats.basicStats = true}} just overwrites one property? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385340#comment-16385340 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} standalone-metastore: The patch generated 5 new + 505 unchanged - 10 fixed = 510 total (was 515) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9477/dev-support/hive-personality.sh | | git revision | master / 05d4719 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus/diff-checkstyle-standalone-metastore.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385296#comment-16385296 ] Zoltan Haindrich commented on HIVE-18743: - I don't think so...you left out the other parts of that code... https://github.com/apache/hive/blob/05d4719eefc56676a3e0e8f706e1c5e5e1f6b345/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java#L232 [~akolb] Could you please add a high level qtest ? the testcase from testmetastore will also be removed... > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385293#comment-16385293 ] Alexander Kolbasov commented on HIVE-18743: --- I noticed a bit of an odd code: {code:java} public static void setBasicStatsState(Mapparams, String setting) { ... ColumnStatsAccurate stats = parseStatsAcc(params.get(COLUMN_STATS_ACCURATE)); stats.basicStats = true; }{code} So it parses the value of {{COLUMN_STATS_ACCURATE}} but then always ignores it and sets {{stats.basicStats}} to true anyway. Is it intentional? Can this be removed? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385090#comment-16385090 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12912918/HIVE-18743.05.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 13062 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385060#comment-16385060 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s{color} | {color:red} standalone-metastore: The patch generated 4 new + 512 unchanged - 3 fixed = 516 total (was 515) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9475/dev-support/hive-personality.sh | | git revision | master / 05d4719 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus/diff-checkstyle-standalone-metastore.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.05.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean >
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385027#comment-16385027 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] Added unit tests. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.05.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384540#comment-16384540 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12912729/HIVE-18743.04.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 13430 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=94)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384521#comment-16384521 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} standalone-metastore: The patch generated 0 new + 484 unchanged - 3 fixed = 484 total (was 487) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9452/dev-support/hive-personality.sh | | git revision | master / 1a3090f | | Default Java | 1.8.0_111 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9452/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9452/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.04.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize()
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384416#comment-16384416 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] If you are fixing HIVE-17478, is there any value in fixing this or we should just wait for HIVE-17478? Do you plan to do the same for hive-2 as well? If not, should this be hive-2 only fix? What do you think? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.04.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383467#comment-16383467 ] Zoltan Haindrich commented on HIVE-18743: - [~akolb] Could you please write a test for this? I've experimented with HIVE-17478 ; and I think this whole filescanner logic will be gone from the metastore soon... > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.04.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376789#comment-16376789 ] Zoltan Haindrich commented on HIVE-18743: - [~akolb]: I will try to get in HIVE-17478 before 3.0 because the problem that this is at the metastore side is just keeps coming back from multiple directions (s3, acid, stat collection). I think here the most important would be to add a good test case for this...so that we didn't re-introduce this problem again... > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372642#comment-16372642 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12911489/HIVE-18743.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 36 failed/errored test(s), 13011 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372604#comment-16372604 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} standalone-metastore: The patch generated 0 new + 484 unchanged - 3 fixed = 484 total (was 487) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9307/dev-support/hive-personality.sh | | git revision | master / ec2378f | | Default Java | 1.8.0_111 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9307/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9307/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372504#comment-16372504 ] Alexander Kolbasov commented on HIVE-18743: --- I think it makes sense to separate these - removal looks to be a more involved problem. I don't have enough understanding of potential consequences. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372490#comment-16372490 ] Zoltan Haindrich commented on HIVE-18743: - if you think that removing could be also an option you may use HIVE-17478 to experiment with that path as well :) > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372480#comment-16372480 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12911489/HIVE-18743.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 12981 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=170) [vector_windowing_expressions.q,tez_union_group_by.q,vector_like_2.q,llap_acid.q,sqlmerge.q,tez_dynpart_hashjoin_1.q,schema_evol_orc_acid_part_update_llap_io.q,vector_windowing_gby.q,vectorized_timestamp.q,cbo_subq_exists.q,lateral_view.q,schema_evol_orc_vec_table_llap_io.q,optimize_nullscan.q,vectorization_decimal_date.q,schema_evol_orc_nonvec_table_llap_io.q,udaf_all_keyword.q,tez_self_join.q,vector_partitioned_date_time.q,acid_vectorization_original.q,tez_fsstat.q,stats11.q,vector_mapjoin_reduce.q,join_acid_non_acid.q,empty_join.q,vector_groupby_grouping_window.q,auto_join21.q,tez_input_counters.q,schema_evol_orc_nonvec_part_all_complex_llap_io.q,orc_ppd_timestamp.q,vector_decimal_1.q] TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372482#comment-16372482 ] Zoltan Haindrich commented on HIVE-18743: - probably there was a time when this was more relevant...but it seems like this thing causes more problem than it fixes - and it just keeps coming back :) so it might be easier to address the original problem differently ...if there is anyI think the original intention was that something wanted to skip the stats updatebut afaik currently hive also sets explicitly this flag to prevent the metastore from interferingso I guess that should leave the mostly unindended codepath-s ending up triggering this feature :D > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372477#comment-16372477 ] Alexander Kolbasov commented on HIVE-18743: --- [~kgyrtkirk] I can't find anyone using {{NUM_FILES}}, but there are some consumers of {{TOTAL_SIZE}}. I dont know whether these can be removed without breaking something. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372472#comment-16372472 ] Alexander Kolbasov commented on HIVE-18743: --- I don't know who is using these and what can break if this is removed. There were some purpose in putting this thing in I guess. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372465#comment-16372465 ] Zoltan Haindrich commented on HIVE-18743: - are there any reason you decided not to remove this thing? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372446#comment-16372446 ] Alexander Kolbasov commented on HIVE-18743: --- Partition versions {{updatePartitionStatsFast()}} do not have this bug, they only overuse overloading, but otherwise seem ok, so I will not add changes to {{updatePartitionStatsFast()}} as part of this fix. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372434#comment-16372434 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} standalone-metastore: The patch generated 0 new + 484 unchanged - 3 fixed = 484 total (was 487) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9304/dev-support/hive-personality.sh | | git revision | master / ec2378f | | Default Java | 1.8.0_111 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9304/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9304/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372416#comment-16372416 ] Alexander Kolbasov commented on HIVE-18743: --- I think we need to add a property in the environment context which disables stats update. We can keep existing {{DO_NOT_UPDATE_STATS}} for compatibility with existing apps for a while, but switch to the new property for new uses. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372412#comment-16372412 ] Alexander Kolbasov commented on HIVE-18743: --- Similar problems exist in {{updatePartitionStatsFast()}}. But there it isn't possible to disable with {{DO_NOT_UPDATE_STATS}} for some reason. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch, > HIVE-18743.03.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372391#comment-16372391 ] Hive QA commented on HIVE-18743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12911484/HIVE-18743.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 13011 tests executed *Failed tests:* {noformat} TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=93)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372368#comment-16372368 ] Hive QA commented on HIVE-18743: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s{color} | {color:red} standalone-metastore: The patch generated 1 new + 484 unchanged - 3 fixed = 485 total (was 487) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s{color} | {color:red} The patch generated 49 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9302/dev-support/hive-personality.sh | | git revision | master / ec2378f | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-9302/yetus/diff-checkstyle-standalone-metastore.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9302/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9302/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch, HIVE-18743.02.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372329#comment-16372329 ] Alexander Kolbasov commented on HIVE-18743: --- While looking at this code I discovered a few more interesting things. 1) There are many more conditions that should be true before the result of {{wh.getFileStatusesForUnpartitionedTable()}} is actually used, so all of them should be checked *before* we bother traversing the filesystem 2) If someone sets {{DO_NOT_UPDATE_STATS}} as a persistent property, it will be removed which seems wrong - it should be passed via environment context. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-18743.01.patch > > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372234#comment-16372234 ] Alexander Kolbasov commented on HIVE-18743: --- The code that checks for {{DO_NOT_UPDATE_STATS}} as well as the property itself were added as part of HIVE-10228 and it has the following comment: {code} // This string constant is used by AlterHandler to figure out that it should not attempt to // update stats. It is set by any client-side task which wishes to signal that no stats // update should take place, such as with replication. public static final String DO_NOT_UPDATE_STATS = "DO_NOT_UPDATE_STATS"; {code} The actual check is rather strange: {code} if ((params!=null) && params.containsKey(StatsSetupConst.DO_NOT_UPDATE_STATS)){ boolean doNotUpdateStats = Boolean.valueOf(params.get(StatsSetupConst.DO_NOT_UPDATE_STATS)); params.remove(StatsSetupConst.DO_NOT_UPDATE_STATS); tbl.setParameters(params); // to make sure we remove this marker property if (doNotUpdateStats){ return false; } } {code} So after the check the {{DO_NOT_UPDATE_STATS}} is removed from parameters for some reason. [~ashutoshc] [~thejas] Can you comment why the parameter is removed after the check and why the check is performed after file system operations are complete? To what extent does remote replication depends on existing behavior? > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370276#comment-16370276 ] Alexander Behm commented on HIVE-18743: --- Thanks you, [~kgyrtkirk]. I agree completely. I'm very much in favor of getting rid of all non-obvious side effects of Metastore API calls. Stats collection is one of those side effects. As is today, it is very hard to reason about what exactly the Metastore will do how expensive API calls are. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368883#comment-16368883 ] Zoltan Haindrich commented on HIVE-18743: - I fell that we may probably consider to abandon the stats collection in the metastore entirely; it should be done from only the hive task - which already sets DO_NOT_UPDATE_STATS - I think there is a ticket for this somewhere... > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
[ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368119#comment-16368119 ] Alexander Kolbasov commented on HIVE-18743: --- Will take a look at this. > CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround > is buggy. > --- > > Key: HIVE-18743 > URL: https://issues.apache.org/jira/browse/HIVE-18743 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.1.0 >Reporter: Alexander Behm >Assignee: Alexander Kolbasov >Priority: Major > > When hive.stats.autogather=true then the Metastore lists all files under the > table directory to populate basic stats like file counts and sizes. This file > listing operation can be very expensive particularly on filesystems like S3. > One way to address this issue is to reconfigure hive.stats.autogather=false. > *Here's the bug* > It is my understanding that the DO_NOT_UPDATE_STATS table property is > intended to selectively prevent this stats collection. Unfortunately, this > table property is checked *after* the expensive file listing operation, so > the DO_NOT_UPDATE_STATS does not seem to work as intended. See: > https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633 > Relevant code snippet: > {code} > public static boolean updateTableStatsFast(Database db, Table tbl, > Warehouse wh, > boolean madeDir, boolean > forceRecompute, EnvironmentContext environmentContext) throws MetaException { > if (tbl.getPartitionKeysSize() == 0) { > // Update stats only when unpartitioned > FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, > tbl); > return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, > environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after > wh.getFileStatusesForUnpartitionedTable() has already been called > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)