[jira] [Commented] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe

2019-08-04 Thread Christian Sanelli (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899726#comment-16899726
 ] 

Christian Sanelli commented on HIVE-11785:
--

Could you supply your test file, /tmp/repo/test.parquet, please.  Thank you.

> Support escaping carriage return and new line for LazySimpleSerDe
> -
>
> Key: HIVE-11785
> URL: https://issues.apache.org/jira/browse/HIVE-11785
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, 
> HIVE-11785.patch, test.parquet
>
>
> Create the table and perform the queries as follows. You will see different 
> results when the setting changes. 
> The expected result should be:
> {noformat}
> 1 newline
> here
> 2 carriage return
> 3 both
> here
> {noformat}
> {noformat}
> hive> create table repo (lvalue int, charstring string) stored as parquet;
> OK
> Time taken: 0.34 seconds
> hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
> Loading data to table default.repo
> chgrp: changing ownership of 
> 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not 
> belong to hive
> Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, 
> rawDataSize=0]
> OK
> Time taken: 0.732 seconds
> hive> set hive.fetch.task.conversion=more;
> hive> select * from repo;
> OK
> 1 newline
> here
> here  carriage return
> 3 both
> here
> Time taken: 0.253 seconds, Fetched: 3 row(s)
> hive> set hive.fetch.task.conversion=none;
> hive> select * from repo;
> Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1441752031022_0006, Tracking URL = 
> http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job  
> -kill job_1441752031022_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-09-09 11:35:54,127 Stage-1 map = 0%,  reduce = 0%
> 2015-09-09 11:36:04,664 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 980 msec
> Ended Job = job_1441752031022_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 4251 HDFS 
> Write: 51 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 980 msec
> OK
> 1 newline
> NULL  NULL
> 2 carriage return
> NULL  NULL
> 3 both
> NULL  NULL
> Time taken: 25.131 seconds, Fetched: 6 row(s)
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-08-04 Thread Hui An (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898685#comment-16898685
 ] 

Hui An edited comment on HIVE-22077 at 8/5/19 1:46 AM:
---

This issue is caused by method loadPartitionInternal of Hive.java
{code:java}
Path oldPartPath = (oldPart != null) ? oldPart.getDataLocation() : null;
Path newPartPath = null;

if (inheritLocation) {
  newPartPath = genPartPathFromTable(tbl, partSpec, tblDataLocationPath);

  if(oldPart != null) {
/*
 * If we are moving the partition across filesystem boundaries
 * inherit from the table properties. Otherwise (same filesystem) use the
 * original partition location.
 *
 * See: HIVE-1707 and HIVE-2117 for background
 */
FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf());
FileSystem loadPathFS = loadPath.getFileSystem(getConf());
if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) {
  newPartPath = oldPartPath;
}
  }
} else {
  newPartPath = oldPartPath == null
? genPartPathFromTable(tbl, partSpec, tblDataLocationPath) : oldPartPath;
}
{code}
Actually, oldPart is null does not mean oldPartPath does not exist in HDFS, but 
it just set oldPartPath is null, and give null value to following method 
replaceFiles.
I think we could just give newPartPath value to the oldPartPath when oldPart is 
null, may this causes other problems? Or should we check partitions directory 
before mr work and throw errors to the end user if there are files under it? 


was (Author: bone an):
This issue is caused by method loadPartitionInternal of Hive.java
{code:java}
Path oldPartPath = (oldPart != null) ? oldPart.getDataLocation() : null;
Path newPartPath = null;

if (inheritLocation) {
  newPartPath = genPartPathFromTable(tbl, partSpec, tblDataLocationPath);

  if(oldPart != null) {
/*
 * If we are moving the partition across filesystem boundaries
 * inherit from the table properties. Otherwise (same filesystem) use the
 * original partition location.
 *
 * See: HIVE-1707 and HIVE-2117 for background
 */
FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf());
FileSystem loadPathFS = loadPath.getFileSystem(getConf());
if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) {
  newPartPath = oldPartPath;
}
  }
} else {
  newPartPath = oldPartPath == null
? genPartPathFromTable(tbl, partSpec, tblDataLocationPath) : oldPartPath;
}
{code}
Actually, oldPart is null does not mean oldPartPath is not exists in HDFS, but 
it just set oldPartPath is null, and give null value to following method 
replaceFiles.
I think we could just give newPartPath value to the oldPartPath when oldPart is 
null, may this causes other problems? Or should we check partitions directory 
before mr work and throw errors to the end user if there are files under it? 

> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to Reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
>  LOCATION   |
>'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
>  TBLPROPERTIES (
>'transient_lastDdlTime'='1564731656')   
> {code}
> 
> 2. Create partition's directory and put some data under it
> 
> {code:java}
> hdfs dfs -mkdir 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> 

[jira] [Commented] (HIVE-22054) Avoid recursive listing to check if a directory is empty

2019-08-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899709#comment-16899709
 ] 

Jason Dere commented on HIVE-22054:
---

Thanks for the patch [~prabhas], and for your input on the FS side 
[~ste...@apache.org]

> Avoid recursive listing to check if a directory is empty
> 
>
> Key: HIVE-22054
> URL: https://issues.apache.org/jira/browse/HIVE-22054
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5
>Reporter: Prabhas Kumar Samanta
>Assignee: Prabhas Kumar Samanta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22054.2.patch, HIVE-22054.patch
>
>
> During drop partition on a managed table, first we delete the directory 
> corresponding to the partition. After that we recursively delete the parent 
> directory as well if parent directory becomes empty. To do this emptiness 
> check, we call Warehouse::getContentSummary(), which in turn recursively 
> check all files and subdirectories. This is a costly operation when a 
> directory has a lot of files or subdirectories. This overhead is even more 
> prominent for cloud based file systems like s3. And for emptiness check, this 
> is unnecessary too.
> This is recursive listing was introduced as part of HIVE-5220. Code snippet 
> for reference :
> {code:java}
> // Warehouse.java
> public boolean isEmpty(Path path) throws IOException, MetaException {
>   ContentSummary contents = getFs(path).getContentSummary(path);
>   if (contents != null && contents.getFileCount() == 0 && 
> contents.getDirectoryCount() == 1) {
> return true;
>   }
>   return false;
> }
> // HiveMetaStore.java
> private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, 
> boolean needRecycle)
>   throws IOException, MetaException {
>   if (depth > 0 && parent != null && wh.isWritable(parent)) {
> if (wh.isDir(parent) && wh.isEmpty(parent)) {
>   wh.deleteDir(parent, true, mustPurge, needRecycle);
> }
> deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, 
> needRecycle);
>   }
> }
> // Note: FileSystem::getContentSummary() performs a recursive listing.{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22054) Avoid recursive listing to check if a directory is empty

2019-08-04 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-22054:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Avoid recursive listing to check if a directory is empty
> 
>
> Key: HIVE-22054
> URL: https://issues.apache.org/jira/browse/HIVE-22054
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5
>Reporter: Prabhas Kumar Samanta
>Assignee: Prabhas Kumar Samanta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22054.2.patch, HIVE-22054.patch
>
>
> During drop partition on a managed table, first we delete the directory 
> corresponding to the partition. After that we recursively delete the parent 
> directory as well if parent directory becomes empty. To do this emptiness 
> check, we call Warehouse::getContentSummary(), which in turn recursively 
> check all files and subdirectories. This is a costly operation when a 
> directory has a lot of files or subdirectories. This overhead is even more 
> prominent for cloud based file systems like s3. And for emptiness check, this 
> is unnecessary too.
> This is recursive listing was introduced as part of HIVE-5220. Code snippet 
> for reference :
> {code:java}
> // Warehouse.java
> public boolean isEmpty(Path path) throws IOException, MetaException {
>   ContentSummary contents = getFs(path).getContentSummary(path);
>   if (contents != null && contents.getFileCount() == 0 && 
> contents.getDirectoryCount() == 1) {
> return true;
>   }
>   return false;
> }
> // HiveMetaStore.java
> private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, 
> boolean needRecycle)
>   throws IOException, MetaException {
>   if (depth > 0 && parent != null && wh.isWritable(parent)) {
> if (wh.isDir(parent) && wh.isEmpty(parent)) {
>   wh.deleteDir(parent, true, mustPurge, needRecycle);
> }
> deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, 
> needRecycle);
>   }
> }
> // Note: FileSystem::getContentSummary() performs a recursive listing.{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22040) Drop partition throws exception with 'Failed to delete parent: File does not exist' when the partition's parent path does not exists

2019-08-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899706#comment-16899706
 ] 

Jason Dere commented on HIVE-22040:
---

FYI, the changes in HIVE-22054 will affect your patch since it replaces 
isEmpty() with isEmptyDir() which has different implementation (replaces 
getContentSummary() with listStatus()). But it could still use your changes to 
catch the FileNotFoundException.

> Drop partition throws exception with 'Failed to delete parent: File does not 
> exist' when the partition's parent path does not exists
> 
>
> Key: HIVE-22040
> URL: https://issues.apache.org/jira/browse/HIVE-22040
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
> Attachments: HIVE-22040.01.patch, HIVE-22040.02.patch, 
> HIVE-22040.patch
>
>
> I create a manage table with multi partition columns, when i try to drop 
> partition throws exception with 'Failed to delete parent: File does not 
> exist' when the partition's parent path does not exist. The partition's 
> metadata in mysql has been deleted, but the exception is still thrown. it 
> will fail if  connecting hiveserver2 with jdbc by java, this problem also 
> exists in master branch, I  think it is very unfriendly and we should fix it.
> Example:
> – First, create manage table with nulti partition columns, and add partitions:
> {code:java}
> drop table if exists t1;
> create table t1 (c1 int) partitioned by (year string, month string, day 
> string);
> alter table t1 add partition(year='2019', month='07', day='01');{code}
> – Second, delete the path of partition 'month=07':
> {code:java}
> hadoop fs -rm -r 
> /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07{code}
> --  Third, when i try to drop partition, the metastore throws exception with 
> 'Failed to delete parent: File does not exist' .
> {code:java}
> alter table t1 drop partition(year='2019', month='07', day='01');
> {code}
> exception like this:
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to delete parent: File 
> does not exist: 
> /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummaryInt(FSDirStatAndListingOp.java:493)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummary(FSDirStatAndListingOp.java:140)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:3995)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1202)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:883)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111) 
> (state=08S01,code=1)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22054) Avoid recursive listing to check if a directory is empty

2019-08-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899705#comment-16899705
 ] 

Jason Dere commented on HIVE-22054:
---

+1

> Avoid recursive listing to check if a directory is empty
> 
>
> Key: HIVE-22054
> URL: https://issues.apache.org/jira/browse/HIVE-22054
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5
>Reporter: Prabhas Kumar Samanta
>Assignee: Prabhas Kumar Samanta
>Priority: Major
> Attachments: HIVE-22054.2.patch, HIVE-22054.patch
>
>
> During drop partition on a managed table, first we delete the directory 
> corresponding to the partition. After that we recursively delete the parent 
> directory as well if parent directory becomes empty. To do this emptiness 
> check, we call Warehouse::getContentSummary(), which in turn recursively 
> check all files and subdirectories. This is a costly operation when a 
> directory has a lot of files or subdirectories. This overhead is even more 
> prominent for cloud based file systems like s3. And for emptiness check, this 
> is unnecessary too.
> This is recursive listing was introduced as part of HIVE-5220. Code snippet 
> for reference :
> {code:java}
> // Warehouse.java
> public boolean isEmpty(Path path) throws IOException, MetaException {
>   ContentSummary contents = getFs(path).getContentSummary(path);
>   if (contents != null && contents.getFileCount() == 0 && 
> contents.getDirectoryCount() == 1) {
> return true;
>   }
>   return false;
> }
> // HiveMetaStore.java
> private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, 
> boolean needRecycle)
>   throws IOException, MetaException {
>   if (depth > 0 && parent != null && wh.isWritable(parent)) {
> if (wh.isDir(parent) && wh.isEmpty(parent)) {
>   wh.deleteDir(parent, true, mustPurge, needRecycle);
> }
> deleteParentRecursive(parent.getParent(), depth - 1, mustPurge, 
> needRecycle);
>   }
> }
> // Note: FileSystem::getContentSummary() performs a recursive listing.{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22040) Drop partition throws exception with 'Failed to delete parent: File does not exist' when the partition's parent path does not exists

2019-08-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899704#comment-16899704
 ] 

Jason Dere commented on HIVE-22040:
---

Sorry for the late response.

Your patch does not apply on master branch because this path in your patch
{noformat}
--- 
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
{noformat}

is now the following path on master branch:
{noformat}
--- 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
{noformat}

Are you trying to apply this patch and compile against Hive master branch? I 
would suggest doing that for this patch.



> Drop partition throws exception with 'Failed to delete parent: File does not 
> exist' when the partition's parent path does not exists
> 
>
> Key: HIVE-22040
> URL: https://issues.apache.org/jira/browse/HIVE-22040
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
> Attachments: HIVE-22040.01.patch, HIVE-22040.02.patch, 
> HIVE-22040.patch
>
>
> I create a manage table with multi partition columns, when i try to drop 
> partition throws exception with 'Failed to delete parent: File does not 
> exist' when the partition's parent path does not exist. The partition's 
> metadata in mysql has been deleted, but the exception is still thrown. it 
> will fail if  connecting hiveserver2 with jdbc by java, this problem also 
> exists in master branch, I  think it is very unfriendly and we should fix it.
> Example:
> – First, create manage table with nulti partition columns, and add partitions:
> {code:java}
> drop table if exists t1;
> create table t1 (c1 int) partitioned by (year string, month string, day 
> string);
> alter table t1 add partition(year='2019', month='07', day='01');{code}
> – Second, delete the path of partition 'month=07':
> {code:java}
> hadoop fs -rm -r 
> /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07{code}
> --  Third, when i try to drop partition, the metastore throws exception with 
> 'Failed to delete parent: File does not exist' .
> {code:java}
> alter table t1 drop partition(year='2019', month='07', day='01');
> {code}
> exception like this:
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to delete parent: File 
> does not exist: 
> /user/hadoop/xiepengjietest.db/drop_partition/year=2019/month=07
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummaryInt(FSDirStatAndListingOp.java:493)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummary(FSDirStatAndListingOp.java:140)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:3995)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1202)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:883)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111) 
> (state=08S01,code=1)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22081) Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there are too many Table/partitions are eligible for compaction

2019-08-04 Thread Rajkumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899688#comment-16899688
 ] 

Rajkumar Singh commented on HIVE-22081:
---

 {quote}Is this for cases where the automatic compaction was turned off for a 
while, and then someone turns that on later?{quote} yes, that right other than 
this starting Hive3 by default hive tables managed tables are Acids and the 
user who upgraded to hive3 will see more no of managed ACID tables.
currently org.apache.hadoop.hive.ql.txn.compactor.Initiator#checkForCompaction 
do lots of HDFS blocking operation which is time-consuming, per your suggestion 
I review what objects/results can be cached to make it more efficient. will 
upload the new patch with checkstyle warning and test failure. Thanks

> Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there 
> are too many Table/partitions are eligible for compaction 
> --
>
> Key: HIVE-22081
> URL: https://issues.apache.org/jira/browse/HIVE-22081
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-22081.patch
>
>
> if Automatic Compaction is turned on, Initiator thread check for potential 
> table/partitions which are eligible for compactions and run some checks in 
> for loop before requesting compaction for eligibles. Though initiator thread 
> is configured to run at interval 5 min default, in case of many objects it 
> keeps on running as these checks are IO intensive and hog cpu.
> In the proposed changes, I am planning to do
> 1. passing less object to for loop by filtering out the objects based on the 
> condition which we are checking within the loop.
> 2. Doing Async call using future to determine compaction type(this is where 
> we do FileSystem calls)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22081) Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there are too many Table/partitions are eligible for compaction

2019-08-04 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899672#comment-16899672
 ] 

Peter Vary commented on HIVE-22081:
---

[~Rajkumar Singh]: Is this for cases where the automatic compaction was turned 
off for a while, and then someone turns that on later? So we have big number of 
tables because of the accumulation of the changes before the automatic 
compaction was turned on. In this case splitting the jobs to multiple threads 
is really useful. On the other hand if we have so many changes under 5 min that 
it takes more than 5 min to check if compaction is needed then we might to 
consider some other way to calculate / cache the check results. Splitting out 
the tasks for multiple threads could help, but it is still a CPU hog and IO 
intensive.

Also please consider fixing the checkstyle warnings.

Thanks,

Peter

> Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there 
> are too many Table/partitions are eligible for compaction 
> --
>
> Key: HIVE-22081
> URL: https://issues.apache.org/jira/browse/HIVE-22081
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-22081.patch
>
>
> if Automatic Compaction is turned on, Initiator thread check for potential 
> table/partitions which are eligible for compactions and run some checks in 
> for loop before requesting compaction for eligibles. Though initiator thread 
> is configured to run at interval 5 min default, in case of many objects it 
> keeps on running as these checks are IO intensive and hog cpu.
> In the proposed changes, I am planning to do
> 1. passing less object to for loop by filtering out the objects based on the 
> condition which we are checking within the loop.
> 2. Doing Async call using future to determine compaction type(this is where 
> we do FileSystem calls)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-21637) Synchronized metastore cache

2019-08-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899588#comment-16899588
 ] 

Hive QA commented on HIVE-21637:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
31s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
32s{color} | {color:blue} standalone-metastore/metastore-common in master has 
31 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
15s{color} | {color:blue} standalone-metastore/metastore-server in master has 
180 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2250 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} 
standalone-metastore/metastore-tools/metastore-benchmarks in master has 3 
extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
47s{color} | {color:blue} itests/util in master has 44 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
33s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} storage-api: The patch generated 2 new + 15 unchanged 
- 0 fixed = 17 total (was 15) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} standalone-metastore/metastore-common: The patch 
generated 9 new + 487 unchanged - 4 fixed = 496 total (was 491) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} standalone-metastore/metastore-server: The patch 
generated 178 new + 1910 unchanged - 65 fixed = 2088 total (was 1975) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
5s{color} | {color:red} ql: The patch generated 64 new + 2295 unchanged - 32 
fixed = 2359 total (was 2327) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} standalone-metastore/metastore-tools/tools-common: The 
patch generated 5 new + 31 unchanged - 0 fixed = 36 total (was 31) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} itests/hca

[jira] [Commented] (HIVE-21637) Synchronized metastore cache

2019-08-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899586#comment-16899586
 ] 

Hive QA commented on HIVE-21637:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12976630/HIVE-21637.61.patch

{color:green}SUCCESS:{color} +1 due to 124 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 16717 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18254/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18254/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18254/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12976630 - PreCommit-HIVE-Build

> Synchronized metastore cache
> 
>
> Key: HIVE-21637
> URL: https://issues.apache.org/jira/browse/HIVE-21637
> Project: Hive
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21637-1.patch, HIVE-21637.10.patch, 
> HIVE-21637.11.patch, HIVE-21637.12.patch, HIVE-21637.13.patch, 
> HIVE-21637.14.patch, HIVE-21637.15.patch, HIVE-21637.16.patch, 
> HIVE-21637.17.patch, HIVE-21637.18.patch, HIVE-21637.19.patch, 
> HIVE-21637.19.patch, HIVE-21637.2.patch, HIVE-21637.20.patch, 
> HIVE-21637.21.patch, HIVE-21637.22.patch, HIVE-21637.23.patch, 
> HIVE-21637.24.patch, HIVE-21637.25.patch, HIVE-21637.26.patch, 
> HIVE-21637.27.patch, HIVE-21637.28.patch, HIVE-21637.29.patch, 
> HIVE-21637.3.patch, HIVE-21637.30.patch, HIVE-21637.31.patch, 
> HIVE-21637.32.patch, HIVE-21637.33.patch, HIVE-21637.34.patch, 
> HIVE-21637.35.patch, HIVE-21637.36.patch, HIVE-21637.37.patch, 
> HIVE-21637.38.patch, HIVE-21637.39.patch, HIVE-21637.4.patch, 
> HIVE-21637.40.patch, HIVE-21637.41.patch, HIVE-21637.42.patch, 
> HIVE-21637.43.patch, HIVE-21637.44.patch, HIVE-21637.45.patch, 
> HIVE-21637.46.patch, HIVE-21637.47.patch, HIVE-21637.48.patch, 
> HIVE-21637.49.patch, HIVE-21637.5.patch, HIVE-21637.50.patch, 
> HIVE-21637.51.patch, HIVE-21637.52.patch, HIVE-21637.53.patch, 
> HIVE-21637.54.patch, HIVE-21637.55.patch, HIVE-21637.56.patch, 
> HIVE-21637.57.patch, HIVE-21637.58.patch, HIVE-21637.59.patch, 
> HIVE-21637.6.patch, HIVE-21637.60.patch, HIVE-21637.61.patch, 
> HIVE-21637.7.patch, HIVE-21637.8.patch, HIVE-21637.9.patch
>
>
> Currently, HMS has a cache implemented by CachedStore. The cache is 
> asynchronized and in HMS HA setting, we can only get eventual consistency. In 
> this Jira, we try to make it synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-21637) Synchronized metastore cache

2019-08-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21637:
--
Attachment: HIVE-21637.61.patch

> Synchronized metastore cache
> 
>
> Key: HIVE-21637
> URL: https://issues.apache.org/jira/browse/HIVE-21637
> Project: Hive
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21637-1.patch, HIVE-21637.10.patch, 
> HIVE-21637.11.patch, HIVE-21637.12.patch, HIVE-21637.13.patch, 
> HIVE-21637.14.patch, HIVE-21637.15.patch, HIVE-21637.16.patch, 
> HIVE-21637.17.patch, HIVE-21637.18.patch, HIVE-21637.19.patch, 
> HIVE-21637.19.patch, HIVE-21637.2.patch, HIVE-21637.20.patch, 
> HIVE-21637.21.patch, HIVE-21637.22.patch, HIVE-21637.23.patch, 
> HIVE-21637.24.patch, HIVE-21637.25.patch, HIVE-21637.26.patch, 
> HIVE-21637.27.patch, HIVE-21637.28.patch, HIVE-21637.29.patch, 
> HIVE-21637.3.patch, HIVE-21637.30.patch, HIVE-21637.31.patch, 
> HIVE-21637.32.patch, HIVE-21637.33.patch, HIVE-21637.34.patch, 
> HIVE-21637.35.patch, HIVE-21637.36.patch, HIVE-21637.37.patch, 
> HIVE-21637.38.patch, HIVE-21637.39.patch, HIVE-21637.4.patch, 
> HIVE-21637.40.patch, HIVE-21637.41.patch, HIVE-21637.42.patch, 
> HIVE-21637.43.patch, HIVE-21637.44.patch, HIVE-21637.45.patch, 
> HIVE-21637.46.patch, HIVE-21637.47.patch, HIVE-21637.48.patch, 
> HIVE-21637.49.patch, HIVE-21637.5.patch, HIVE-21637.50.patch, 
> HIVE-21637.51.patch, HIVE-21637.52.patch, HIVE-21637.53.patch, 
> HIVE-21637.54.patch, HIVE-21637.55.patch, HIVE-21637.56.patch, 
> HIVE-21637.57.patch, HIVE-21637.58.patch, HIVE-21637.59.patch, 
> HIVE-21637.6.patch, HIVE-21637.60.patch, HIVE-21637.61.patch, 
> HIVE-21637.7.patch, HIVE-21637.8.patch, HIVE-21637.9.patch
>
>
> Currently, HMS has a cache implemented by CachedStore. The cache is 
> asynchronized and in HMS HA setting, we can only get eventual consistency. In 
> this Jira, we try to make it synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-21637) Synchronized metastore cache

2019-08-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899577#comment-16899577
 ] 

Hive QA commented on HIVE-21637:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
25s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
24s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
30s{color} | {color:blue} standalone-metastore/metastore-common in master has 
31 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
7s{color} | {color:blue} standalone-metastore/metastore-server in master has 
180 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
7s{color} | {color:blue} ql in master has 2250 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
24s{color} | {color:blue} 
standalone-metastore/metastore-tools/metastore-benchmarks in master has 3 
extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} itests/util in master has 44 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} storage-api: The patch generated 2 new + 15 unchanged 
- 0 fixed = 17 total (was 15) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} standalone-metastore/metastore-common: The patch 
generated 9 new + 487 unchanged - 4 fixed = 496 total (was 491) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} standalone-metastore/metastore-server: The patch 
generated 178 new + 1910 unchanged - 65 fixed = 2088 total (was 1975) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} ql: The patch generated 64 new + 2295 unchanged - 32 
fixed = 2359 total (was 2327) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} standalone-metastore/metastore-tools/tools-common: The 
patch generated 5 new + 31 unchanged - 0 fixed = 36 total (was 31) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} itests/hca