[jira] [Commented] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe
[ https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899726#comment-16899726 ] Christian Sanelli commented on HIVE-11785: -- Could you supply your test file, /tmp/repo/test.parquet, please. Thank you. > Support escaping carriage return and new line for LazySimpleSerDe > - > > Key: HIVE-11785 > URL: https://issues.apache.org/jira/browse/HIVE-11785 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, > HIVE-11785.patch, test.parquet > > > Create the table and perform the queries as follows. You will see different > results when the setting changes. > The expected result should be: > {noformat} > 1 newline > here > 2 carriage return > 3 both > here > {noformat} > {noformat} > hive> create table repo (lvalue int, charstring string) stored as parquet; > OK > Time taken: 0.34 seconds > hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo; > Loading data to table default.repo > chgrp: changing ownership of > 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not > belong to hive > Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, > rawDataSize=0] > OK > Time taken: 0.732 seconds > hive> set hive.fetch.task.conversion=more; > hive> select * from repo; > OK > 1 newline > here > here carriage return > 3 both > here > Time taken: 0.253 seconds, Fetched: 3 row(s) > hive> set hive.fetch.task.conversion=none; > hive> select * from repo; > Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1441752031022_0006, Tracking URL = > http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/ > Kill Command = > /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job > -kill job_1441752031022_0006 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0 > 2015-09-09 11:35:54,127 Stage-1 map = 0%, reduce = 0% > 2015-09-09 11:36:04,664 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.98 > sec > MapReduce Total cumulative CPU time: 2 seconds 980 msec > Ended Job = job_1441752031022_0006 > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 2.98 sec HDFS Read: 4251 HDFS > Write: 51 SUCCESS > Total MapReduce CPU Time Spent: 2 seconds 980 msec > OK > 1 newline > NULL NULL > 2 carriage return > NULL NULL > 3 both > NULL NULL > Time taken: 25.131 seconds, Fetched: 6 row(s) > hive> > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata
[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898685#comment-16898685 ] Hui An edited comment on HIVE-22077 at 8/5/19 1:46 AM: --- This issue is caused by method loadPartitionInternal of Hive.java {code:java} Path oldPartPath = (oldPart != null) ? oldPart.getDataLocation() : null; Path newPartPath = null; if (inheritLocation) { newPartPath = genPartPathFromTable(tbl, partSpec, tblDataLocationPath); if(oldPart != null) { /* * If we are moving the partition across filesystem boundaries * inherit from the table properties. Otherwise (same filesystem) use the * original partition location. * * See: HIVE-1707 and HIVE-2117 for background */ FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf()); FileSystem loadPathFS = loadPath.getFileSystem(getConf()); if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) { newPartPath = oldPartPath; } } } else { newPartPath = oldPartPath == null ? genPartPathFromTable(tbl, partSpec, tblDataLocationPath) : oldPartPath; } {code} Actually, oldPart is null does not mean oldPartPath does not exist in HDFS, but it just set oldPartPath is null, and give null value to following method replaceFiles. I think we could just give newPartPath value to the oldPartPath when oldPart is null, may this causes other problems? Or should we check partitions directory before mr work and throw errors to the end user if there are files under it? was (Author: bone an): This issue is caused by method loadPartitionInternal of Hive.java {code:java} Path oldPartPath = (oldPart != null) ? oldPart.getDataLocation() : null; Path newPartPath = null; if (inheritLocation) { newPartPath = genPartPathFromTable(tbl, partSpec, tblDataLocationPath); if(oldPart != null) { /* * If we are moving the partition across filesystem boundaries * inherit from the table properties. Otherwise (same filesystem) use the * original partition location. * * See: HIVE-1707 and HIVE-2117 for background */ FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf()); FileSystem loadPathFS = loadPath.getFileSystem(getConf()); if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) { newPartPath = oldPartPath; } } } else { newPartPath = oldPartPath == null ? genPartPathFromTable(tbl, partSpec, tblDataLocationPath) : oldPartPath; } {code} Actually, oldPart is null does not mean oldPartPath is not exists in HDFS, but it just set oldPartPath is null, and give null value to following method replaceFiles. I think we could just give newPartPath value to the oldPartPath when oldPart is null, may this causes other problems? Or should we check partitions directory before mr work and throw errors to the end user if there are files under it? > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > - > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1, 4.0.0, 2.3.4 >Reporter: Hui An >Assignee: Hui An >Priority: Major > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to Reproduce this issue : > > 1. Create a managed table : > > {code:sql} > CREATE TABLE `test`( >`id` string) > PARTITIONED BY ( >`dayno` string) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION | >'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( >'transient_lastDdlTime'='1564731656') > {code} > > 2. Create partition's directory and put some data under it > > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} >
[jira] [Commented] (HIVE-22054) Avoid recursive listing to check if a directory is empty
[ https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899709#comment-16899709 ] Jason Dere commented on HIVE-22054: --- Thanks for the patch [~prabhas], and for your input on the FS side [~ste...@apache.org] > Avoid recursive listing to check if a directory is empty > > > Key: HIVE-22054 > URL: https://issues.apache.org/jira/browse/HIVE-22054 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5 >Reporter: Prabhas Kumar Samanta >Assignee: Prabhas Kumar Samanta >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22054.2.patch, HIVE-22054.patch > > > During drop partition on a managed table, first we delete the directory > corresponding to the partition. After that we recursively delete the parent > directory as well if parent directory becomes empty. To do this emptiness > check, we call Warehouse::getContentSummary(), which in turn recursively > check all files and subdirectories. This is a costly operation when a > directory has a lot of files or subdirectories. This overhead is even more > prominent for cloud based file systems like s3. And for emptiness check, this > is unnecessary too. > This is recursive listing was introduced as part of HIVE-5220. Code snippet > for reference : > {code:java} > // Warehouse.java > public boolean isEmpty(Path path) throws IOException, MetaException { > ContentSummary contents = getFs(path).getContentSummary(path); > if (contents != null && contents.getFileCount() == 0 && > contents.getDirectoryCount() == 1) { > return true; > } > return false; > } > // HiveMetaStore.java > private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, > boolean needRecycle) > throws IOException, MetaException { > if (depth > 0 && parent != null && wh.isWritable(parent)) { > if (wh.isDir(parent) && wh.isEmpty(parent)) { > wh.deleteDir(parent, true, mustPurge, needRecycle); > } > deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, > needRecycle); > } > } > // Note: FileSystem::getContentSummary() performs a recursive listing.{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HIVE-22054) Avoid recursive listing to check if a directory is empty
[ https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-22054: -- Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) Committed to master > Avoid recursive listing to check if a directory is empty > > > Key: HIVE-22054 > URL: https://issues.apache.org/jira/browse/HIVE-22054 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5 >Reporter: Prabhas Kumar Samanta >Assignee: Prabhas Kumar Samanta >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22054.2.patch, HIVE-22054.patch > > > During drop partition on a managed table, first we delete the directory > corresponding to the partition. After that we recursively delete the parent > directory as well if parent directory becomes empty. To do this emptiness > check, we call Warehouse::getContentSummary(), which in turn recursively > check all files and subdirectories. This is a costly operation when a > directory has a lot of files or subdirectories. This overhead is even more > prominent for cloud based file systems like s3. And for emptiness check, this > is unnecessary too. > This is recursive listing was introduced as part of HIVE-5220. Code snippet > for reference : > {code:java} > // Warehouse.java > public boolean isEmpty(Path path) throws IOException, MetaException { > ContentSummary contents = getFs(path).getContentSummary(path); > if (contents != null && contents.getFileCount() == 0 && > contents.getDirectoryCount() == 1) { > return true; > } > return false; > } > // HiveMetaStore.java > private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, > boolean needRecycle) > throws IOException, MetaException { > if (depth > 0 && parent != null && wh.isWritable(parent)) { > if (wh.isDir(parent) && wh.isEmpty(parent)) { > wh.deleteDir(parent, true, mustPurge, needRecycle); > } > deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, > needRecycle); > } > } > // Note: FileSystem::getContentSummary() performs a recursive listing.{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22040) Drop partition throws exception with 'Failed to delete parent: File does not exist' when the partition's parent path does not exists
[ https://issues.apache.org/jira/browse/HIVE-22040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899706#comment-16899706 ] Jason Dere commented on HIVE-22040: --- FYI, the changes in HIVE-22054 will affect your patch since it replaces isEmpty() with isEmptyDir() which has different implementation (replaces getContentSummary() with listStatus()). But it could still use your changes to catch the FileNotFoundException. > Drop partition throws exception with 'Failed to delete parent: File does not > exist' when the partition's parent path does not exists > > > Key: HIVE-22040 > URL: https://issues.apache.org/jira/browse/HIVE-22040 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Major > Attachments: HIVE-22040.01.patch, HIVE-22040.02.patch, > HIVE-22040.patch > > > I create a manage table with multi partition columns, when i try to drop > partition throws exception with 'Failed to delete parent: File does not > exist' when the partition's parent path does not exist. The partition's > metadata in mysql has been deleted, but the exception is still thrown. it > will fail if connecting hiveserver2 with jdbc by java, this problem also > exists in master branch, I think it is very unfriendly and we should fix it. > Example: > – First, create manage table with nulti partition columns, and add partitions: > {code:java} > drop table if exists t1; > create table t1 (c1 int) partitioned by (year string, month string, day > string); > alter table t1 add partition(year='2019', month='07', day='01');{code} > – Second, delete the path of partition 'month=07': > {code:java} > hadoop fs -rm -r > /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07{code} > -- Third, when i try to drop partition, the metastore throws exception with > 'Failed to delete parent: File does not exist' . > {code:java} > alter table t1 drop partition(year='2019', month='07', day='01'); > {code} > exception like this: > {code:java} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to delete parent: File > does not exist: > /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07 > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummaryInt(FSDirStatAndListingOp.java:493) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummary(FSDirStatAndListingOp.java:140) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:3995) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1202) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:883) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111) > (state=08S01,code=1) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22054) Avoid recursive listing to check if a directory is empty
[ https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899705#comment-16899705 ] Jason Dere commented on HIVE-22054: --- +1 > Avoid recursive listing to check if a directory is empty > > > Key: HIVE-22054 > URL: https://issues.apache.org/jira/browse/HIVE-22054 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5 >Reporter: Prabhas Kumar Samanta >Assignee: Prabhas Kumar Samanta >Priority: Major > Attachments: HIVE-22054.2.patch, HIVE-22054.patch > > > During drop partition on a managed table, first we delete the directory > corresponding to the partition. After that we recursively delete the parent > directory as well if parent directory becomes empty. To do this emptiness > check, we call Warehouse::getContentSummary(), which in turn recursively > check all files and subdirectories. This is a costly operation when a > directory has a lot of files or subdirectories. This overhead is even more > prominent for cloud based file systems like s3. And for emptiness check, this > is unnecessary too. > This is recursive listing was introduced as part of HIVE-5220. Code snippet > for reference : > {code:java} > // Warehouse.java > public boolean isEmpty(Path path) throws IOException, MetaException { > ContentSummary contents = getFs(path).getContentSummary(path); > if (contents != null && contents.getFileCount() == 0 && > contents.getDirectoryCount() == 1) { > return true; > } > return false; > } > // HiveMetaStore.java > private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, > boolean needRecycle) > throws IOException, MetaException { > if (depth > 0 && parent != null && wh.isWritable(parent)) { > if (wh.isDir(parent) && wh.isEmpty(parent)) { > wh.deleteDir(parent, true, mustPurge, needRecycle); > } > deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, > needRecycle); > } > } > // Note: FileSystem::getContentSummary() performs a recursive listing.{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22040) Drop partition throws exception with 'Failed to delete parent: File does not exist' when the partition's parent path does not exists
[ https://issues.apache.org/jira/browse/HIVE-22040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899704#comment-16899704 ] Jason Dere commented on HIVE-22040: --- Sorry for the late response. Your patch does not apply on master branch because this path in your patch {noformat} --- standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java {noformat} is now the following path on master branch: {noformat} --- standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java {noformat} Are you trying to apply this patch and compile against Hive master branch? I would suggest doing that for this patch. > Drop partition throws exception with 'Failed to delete parent: File does not > exist' when the partition's parent path does not exists > > > Key: HIVE-22040 > URL: https://issues.apache.org/jira/browse/HIVE-22040 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Major > Attachments: HIVE-22040.01.patch, HIVE-22040.02.patch, > HIVE-22040.patch > > > I create a manage table with multi partition columns, when i try to drop > partition throws exception with 'Failed to delete parent: File does not > exist' when the partition's parent path does not exist. The partition's > metadata in mysql has been deleted, but the exception is still thrown. it > will fail if connecting hiveserver2 with jdbc by java, this problem also > exists in master branch, I think it is very unfriendly and we should fix it. > Example: > – First, create manage table with nulti partition columns, and add partitions: > {code:java} > drop table if exists t1; > create table t1 (c1 int) partitioned by (year string, month string, day > string); > alter table t1 add partition(year='2019', month='07', day='01');{code} > – Second, delete the path of partition 'month=07': > {code:java} > hadoop fs -rm -r > /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07{code} > -- Third, when i try to drop partition, the metastore throws exception with > 'Failed to delete parent: File does not exist' . > {code:java} > alter table t1 drop partition(year='2019', month='07', day='01'); > {code} > exception like this: > {code:java} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to delete parent: File > does not exist: > /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07 > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummaryInt(FSDirStatAndListingOp.java:493) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummary(FSDirStatAndListingOp.java:140) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:3995) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1202) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:883) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111) > (state=08S01,code=1) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22081) Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there are too many Table/partitions are eligible for compaction
[ https://issues.apache.org/jira/browse/HIVE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899688#comment-16899688 ] Rajkumar Singh commented on HIVE-22081: --- {quote}Is this for cases where the automatic compaction was turned off for a while, and then someone turns that on later?{quote} yes, that right other than this starting Hive3 by default hive tables managed tables are Acids and the user who upgraded to hive3 will see more no of managed ACID tables. currently org.apache.hadoop.hive.ql.txn.compactor.Initiator#checkForCompaction do lots of HDFS blocking operation which is time-consuming, per your suggestion I review what objects/results can be cached to make it more efficient. will upload the new patch with checkstyle warning and test failure. Thanks > Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there > are too many Table/partitions are eligible for compaction > -- > > Key: HIVE-22081 > URL: https://issues.apache.org/jira/browse/HIVE-22081 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Attachments: HIVE-22081.patch > > > if Automatic Compaction is turned on, Initiator thread check for potential > table/partitions which are eligible for compactions and run some checks in > for loop before requesting compaction for eligibles. Though initiator thread > is configured to run at interval 5 min default, in case of many objects it > keeps on running as these checks are IO intensive and hog cpu. > In the proposed changes, I am planning to do > 1. passing less object to for loop by filtering out the objects based on the > condition which we are checking within the loop. > 2. Doing Async call using future to determine compaction type(this is where > we do FileSystem calls) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22081) Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there are too many Table/partitions are eligible for compaction
[ https://issues.apache.org/jira/browse/HIVE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899672#comment-16899672 ] Peter Vary commented on HIVE-22081: --- [~Rajkumar Singh]: Is this for cases where the automatic compaction was turned off for a while, and then someone turns that on later? So we have big number of tables because of the accumulation of the changes before the automatic compaction was turned on. In this case splitting the jobs to multiple threads is really useful. On the other hand if we have so many changes under 5 min that it takes more than 5 min to check if compaction is needed then we might to consider some other way to calculate / cache the check results. Splitting out the tasks for multiple threads could help, but it is still a CPU hog and IO intensive. Also please consider fixing the checkstyle warnings. Thanks, Peter > Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there > are too many Table/partitions are eligible for compaction > -- > > Key: HIVE-22081 > URL: https://issues.apache.org/jira/browse/HIVE-22081 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Attachments: HIVE-22081.patch > > > if Automatic Compaction is turned on, Initiator thread check for potential > table/partitions which are eligible for compactions and run some checks in > for loop before requesting compaction for eligibles. Though initiator thread > is configured to run at interval 5 min default, in case of many objects it > keeps on running as these checks are IO intensive and hog cpu. > In the proposed changes, I am planning to do > 1. passing less object to for loop by filtering out the objects based on the > condition which we are checking within the loop. > 2. Doing Async call using future to determine compaction type(this is where > we do FileSystem calls) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-21637) Synchronized metastore cache
[ https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899588#comment-16899588 ] Hive QA commented on HIVE-21637: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 25s{color} | {color:blue} storage-api in master has 48 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 32s{color} | {color:blue} standalone-metastore/metastore-common in master has 31 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 15s{color} | {color:blue} standalone-metastore/metastore-server in master has 180 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 8s{color} | {color:blue} ql in master has 2250 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 27s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 28s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 27s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 26s{color} | {color:blue} standalone-metastore/metastore-tools/metastore-benchmarks in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 47s{color} | {color:blue} itests/util in master has 44 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} storage-api: The patch generated 2 new + 15 unchanged - 0 fixed = 17 total (was 15) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} standalone-metastore/metastore-common: The patch generated 9 new + 487 unchanged - 4 fixed = 496 total (was 491) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} standalone-metastore/metastore-server: The patch generated 178 new + 1910 unchanged - 65 fixed = 2088 total (was 1975) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} ql: The patch generated 64 new + 2295 unchanged - 32 fixed = 2359 total (was 2327) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} standalone-metastore/metastore-tools/tools-common: The patch generated 5 new + 31 unchanged - 0 fixed = 36 total (was 31) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} itests/hca
[jira] [Commented] (HIVE-21637) Synchronized metastore cache
[ https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899586#comment-16899586 ] Hive QA commented on HIVE-21637: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12976630/HIVE-21637.61.patch {color:green}SUCCESS:{color} +1 due to 124 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16717 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/18254/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18254/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18254/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12976630 - PreCommit-HIVE-Build > Synchronized metastore cache > > > Key: HIVE-21637 > URL: https://issues.apache.org/jira/browse/HIVE-21637 > Project: Hive > Issue Type: New Feature >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-21637-1.patch, HIVE-21637.10.patch, > HIVE-21637.11.patch, HIVE-21637.12.patch, HIVE-21637.13.patch, > HIVE-21637.14.patch, HIVE-21637.15.patch, HIVE-21637.16.patch, > HIVE-21637.17.patch, HIVE-21637.18.patch, HIVE-21637.19.patch, > HIVE-21637.19.patch, HIVE-21637.2.patch, HIVE-21637.20.patch, > HIVE-21637.21.patch, HIVE-21637.22.patch, HIVE-21637.23.patch, > HIVE-21637.24.patch, HIVE-21637.25.patch, HIVE-21637.26.patch, > HIVE-21637.27.patch, HIVE-21637.28.patch, HIVE-21637.29.patch, > HIVE-21637.3.patch, HIVE-21637.30.patch, HIVE-21637.31.patch, > HIVE-21637.32.patch, HIVE-21637.33.patch, HIVE-21637.34.patch, > HIVE-21637.35.patch, HIVE-21637.36.patch, HIVE-21637.37.patch, > HIVE-21637.38.patch, HIVE-21637.39.patch, HIVE-21637.4.patch, > HIVE-21637.40.patch, HIVE-21637.41.patch, HIVE-21637.42.patch, > HIVE-21637.43.patch, HIVE-21637.44.patch, HIVE-21637.45.patch, > HIVE-21637.46.patch, HIVE-21637.47.patch, HIVE-21637.48.patch, > HIVE-21637.49.patch, HIVE-21637.5.patch, HIVE-21637.50.patch, > HIVE-21637.51.patch, HIVE-21637.52.patch, HIVE-21637.53.patch, > HIVE-21637.54.patch, HIVE-21637.55.patch, HIVE-21637.56.patch, > HIVE-21637.57.patch, HIVE-21637.58.patch, HIVE-21637.59.patch, > HIVE-21637.6.patch, HIVE-21637.60.patch, HIVE-21637.61.patch, > HIVE-21637.7.patch, HIVE-21637.8.patch, HIVE-21637.9.patch > > > Currently, HMS has a cache implemented by CachedStore. The cache is > asynchronized and in HMS HA setting, we can only get eventual consistency. In > this Jira, we try to make it synchronized. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HIVE-21637) Synchronized metastore cache
[ https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-21637: -- Attachment: HIVE-21637.61.patch > Synchronized metastore cache > > > Key: HIVE-21637 > URL: https://issues.apache.org/jira/browse/HIVE-21637 > Project: Hive > Issue Type: New Feature >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-21637-1.patch, HIVE-21637.10.patch, > HIVE-21637.11.patch, HIVE-21637.12.patch, HIVE-21637.13.patch, > HIVE-21637.14.patch, HIVE-21637.15.patch, HIVE-21637.16.patch, > HIVE-21637.17.patch, HIVE-21637.18.patch, HIVE-21637.19.patch, > HIVE-21637.19.patch, HIVE-21637.2.patch, HIVE-21637.20.patch, > HIVE-21637.21.patch, HIVE-21637.22.patch, HIVE-21637.23.patch, > HIVE-21637.24.patch, HIVE-21637.25.patch, HIVE-21637.26.patch, > HIVE-21637.27.patch, HIVE-21637.28.patch, HIVE-21637.29.patch, > HIVE-21637.3.patch, HIVE-21637.30.patch, HIVE-21637.31.patch, > HIVE-21637.32.patch, HIVE-21637.33.patch, HIVE-21637.34.patch, > HIVE-21637.35.patch, HIVE-21637.36.patch, HIVE-21637.37.patch, > HIVE-21637.38.patch, HIVE-21637.39.patch, HIVE-21637.4.patch, > HIVE-21637.40.patch, HIVE-21637.41.patch, HIVE-21637.42.patch, > HIVE-21637.43.patch, HIVE-21637.44.patch, HIVE-21637.45.patch, > HIVE-21637.46.patch, HIVE-21637.47.patch, HIVE-21637.48.patch, > HIVE-21637.49.patch, HIVE-21637.5.patch, HIVE-21637.50.patch, > HIVE-21637.51.patch, HIVE-21637.52.patch, HIVE-21637.53.patch, > HIVE-21637.54.patch, HIVE-21637.55.patch, HIVE-21637.56.patch, > HIVE-21637.57.patch, HIVE-21637.58.patch, HIVE-21637.59.patch, > HIVE-21637.6.patch, HIVE-21637.60.patch, HIVE-21637.61.patch, > HIVE-21637.7.patch, HIVE-21637.8.patch, HIVE-21637.9.patch > > > Currently, HMS has a cache implemented by CachedStore. The cache is > asynchronized and in HMS HA setting, we can only get eventual consistency. In > this Jira, we try to make it synchronized. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-21637) Synchronized metastore cache
[ https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899577#comment-16899577 ] Hive QA commented on HIVE-21637: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 25s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 24s{color} | {color:blue} storage-api in master has 48 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 30s{color} | {color:blue} standalone-metastore/metastore-common in master has 31 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 7s{color} | {color:blue} standalone-metastore/metastore-server in master has 180 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 7s{color} | {color:blue} ql in master has 2250 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 26s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 24s{color} | {color:blue} standalone-metastore/metastore-tools/metastore-benchmarks in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 45s{color} | {color:blue} itests/util in master has 44 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 32s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} storage-api: The patch generated 2 new + 15 unchanged - 0 fixed = 17 total (was 15) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} standalone-metastore/metastore-common: The patch generated 9 new + 487 unchanged - 4 fixed = 496 total (was 491) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} standalone-metastore/metastore-server: The patch generated 178 new + 1910 unchanged - 65 fixed = 2088 total (was 1975) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} ql: The patch generated 64 new + 2295 unchanged - 32 fixed = 2359 total (was 2327) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} standalone-metastore/metastore-tools/tools-common: The patch generated 5 new + 31 unchanged - 0 fixed = 36 total (was 31) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s{color} | {color:red} itests/hca