[jira] [Created] (HIVE-22901) Variable substitution can lead to OOM on circular references
Daniel Voros created HIVE-22901: --- Summary: Variable substitution can lead to OOM on circular references Key: HIVE-22901 URL: https://issues.apache.org/jira/browse/HIVE-22901 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.2 Reporter: Daniel Voros Assignee: Daniel Voros {{SystemVariables#substitute()}} is dealing with circular references between variables by only doing the substitution 40 times by default. If the substituted part is sufficiently large though, it's possible that the substitution will produce a string bigger than the heap size within the 40 executions. Take the following test case that fails with OOM in current master (third round of execution would need 10G heap, while running with only 2G): {code} @Test public void testSubstitute() { String randomPart = RandomStringUtils.random(100_000); String reference = "${hiveconf:myTestVariable}"; StringBuilder longStringWithReferences = new StringBuilder(); for(int i = 0; i < 10; i ++) { longStringWithReferences.append(randomPart).append(reference); } SystemVariables uut = new SystemVariables(); HiveConf conf = new HiveConf(); conf.set("myTestVariable", longStringWithReferences.toString()); uut.substitute(conf, longStringWithReferences.toString(), 40); } {code} Produces: {code} java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.hadoop.hive.conf.SystemVariables.substitute(SystemVariables.java:110) at org.apache.hadoop.hive.conf.SystemVariablesTest.testSubstitute(SystemVariablesTest.java:27) {code} We should check the size of the substituted query and bail out earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22501) Stats reported multiple times during MR execution for UNION queries
Daniel Voros created HIVE-22501: --- Summary: Stats reported multiple times during MR execution for UNION queries Key: HIVE-22501 URL: https://issues.apache.org/jira/browse/HIVE-22501 Project: Hive Issue Type: Bug Reporter: Daniel Voros Assignee: Daniel Voros Take the following example: {code} set hive.execution.engine=mr; create table tb(id string) stored as orc; insert into tb values('1'); create table tb2 like tb stored as orc; insert into tb2 select * from tb union all select * from tb; {code} Last insert results in 2 records in the table, but {{TOTAL_TABLE_ROWS_WRITTEN}} statistic (and number of affected rows on the consolse) is 4. We seem to traverse the operator graph multiple times starting from every TS operator and increment the counters every time we hit the FS operator. UNION-ing the table 3 times results in 9 TOTAL_TABLE_ROWS_WRITTEN. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21724) Nested ARRAY and STRUCT inside MAP don't work with LazySimpleDeserializeRead
Daniel Voros created HIVE-21724: --- Summary: Nested ARRAY and STRUCT inside MAP don't work with LazySimpleDeserializeRead Key: HIVE-21724 URL: https://issues.apache.org/jira/browse/HIVE-21724 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.1 Reporter: Daniel Voros Assignee: Daniel Voros The logic during vectorized execution that keeps track of how deep we are in the nested structure doesn't work for ARRAYs and STRUCTs embedded inside maps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21034) Add option to schematool to drop Hive databases
Daniel Voros created HIVE-21034: --- Summary: Add option to schematool to drop Hive databases Key: HIVE-21034 URL: https://issues.apache.org/jira/browse/HIVE-21034 Project: Hive Issue Type: Improvement Reporter: Daniel Voros Assignee: Daniel Voros An option to remove all Hive managed data could be a useful addition to {{schematool}}. I propose to introduce a new flag {{-dropAllDatabases}} that would *drop all databases with CASCADE* to remove all data of managed tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20586) Beeline is asking for user/pass when invoked without -u
Daniel Voros created HIVE-20586: --- Summary: Beeline is asking for user/pass when invoked without -u Key: HIVE-20586 URL: https://issues.apache.org/jira/browse/HIVE-20586 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.1.0, 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros Since HIVE-18963 it's possible to define a default connection URL in beeline-site.xml to be able to use beeline without specifying the HS2 JDBC URL. When invoked with no arguments, beeline is asking for username/password on the command line. When running with {{-u}} and the exact same URL as in beeline-site.xml, it does not ask for username/password. I think these two should do exactly the same, given that the URL after {{-u}} is the same as in beeline-site.xml: {code:java} beeline -u URL beeline {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20231) Backport HIVE-19981 to branch-3
Daniel Voros created HIVE-20231: --- Summary: Backport HIVE-19981 to branch-3 Key: HIVE-20231 URL: https://issues.apache.org/jira/browse/HIVE-20231 Project: Hive Issue Type: Bug Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20191) PreCommit patch application doesn't fail if patch is empty
Daniel Voros created HIVE-20191: --- Summary: PreCommit patch application doesn't fail if patch is empty Key: HIVE-20191 URL: https://issues.apache.org/jira/browse/HIVE-20191 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Daniel Voros Assignee: Daniel Voros I've created some backport tickets to branch-3 (e.g. HIVE-20181) and made the mistake of uploading the patch files with wrong filename ({{.} instead of {{-}} between version and branch). These get applied on master, where they're already present, since {{git apply}} with {{-3}} won't fail if patch is already there. Tests are run on master instead of failing. I think the patch application should fail if the patch is empty and branch selection logic should probably fail too if the patch name is malformed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20185) Backport HIVE-20111 to branch-3
Daniel Voros created HIVE-20185: --- Summary: Backport HIVE-20111 to branch-3 Key: HIVE-20185 URL: https://issues.apache.org/jira/browse/HIVE-20185 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros Attachments: HIVE-20185.1.branch-3.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20184) Backport HIVE-20085 to branch-3
Daniel Voros created HIVE-20184: --- Summary: Backport HIVE-20085 to branch-3 Key: HIVE-20184 URL: https://issues.apache.org/jira/browse/HIVE-20184 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20182) Backport HIVE-20067 to branch-3
Daniel Voros created HIVE-20182: --- Summary: Backport HIVE-20067 to branch-3 Key: HIVE-20182 URL: https://issues.apache.org/jira/browse/HIVE-20182 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20181) Backport HIVE-20045 to branch-3
Daniel Voros created HIVE-20181: --- Summary: Backport HIVE-20045 to branch-3 Key: HIVE-20181 URL: https://issues.apache.org/jira/browse/HIVE-20181 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20180) Backport HIVE-19759 to branch-3
Daniel Voros created HIVE-20180: --- Summary: Backport HIVE-19759 to branch-3 Key: HIVE-20180 URL: https://issues.apache.org/jira/browse/HIVE-20180 Project: Hive Issue Type: Bug Components: Test Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20066) hive.load.data.owner is compared to full principal
Daniel Voros created HIVE-20066: --- Summary: hive.load.data.owner is compared to full principal Key: HIVE-20066 URL: https://issues.apache.org/jira/browse/HIVE-20066 Project: Hive Issue Type: Bug Reporter: Daniel Voros Assignee: Daniel Voros HIVE-19928 compares the user running HS2 to the configured owner (hive.load.data.owner) to check if we're able to move the file with LOAD DATA or need to copy. This check compares the full username (that may contain the full kerberos principal) to hive.load.data.owner. We should compare to the short username ({{UGI.getShortUserName()}}) instead. That's used in similar context [here|https://github.com/apache/hive/blob/f519db7eafacb4b4d2d9fe2a9e10e908d8077224/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L398]. cc [~djaiswal] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20022) Upgrade hadoop.version to 3.1.1
Daniel Voros created HIVE-20022: --- Summary: Upgrade hadoop.version to 3.1.1 Key: HIVE-20022 URL: https://issues.apache.org/jira/browse/HIVE-20022 Project: Hive Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros HIVE-19304 is relying on YARN-7142 and YARN-8122 that will only be released in Hadoop 3.1.1. We should upgrade when possible. cc [~gsaha] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19979) Backport HIVE-19304 to branch-3
Daniel Voros created HIVE-19979: --- Summary: Backport HIVE-19304 to branch-3 Key: HIVE-19979 URL: https://issues.apache.org/jira/browse/HIVE-19979 Project: Hive Issue Type: Task Reporter: Daniel Voros Assignee: Daniel Voros Needs HIVE-19978 (backport of HIVE-18037) to land first. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19978) Backport HIVE-18037 to branch-3
Daniel Voros created HIVE-19978: --- Summary: Backport HIVE-18037 to branch-3 Key: HIVE-19978 URL: https://issues.apache.org/jira/browse/HIVE-19978 Project: Hive Issue Type: Task Reporter: Daniel Voros Assignee: Daniel Voros -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19728) beeline with USE_BEELINE_FOR_HIVE_CLI fails when trying to set hive.aux.jars.path
Daniel Voros created HIVE-19728: --- Summary: beeline with USE_BEELINE_FOR_HIVE_CLI fails when trying to set hive.aux.jars.path Key: HIVE-19728 URL: https://issues.apache.org/jira/browse/HIVE-19728 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros Since HIVE-19385 it's possible to redirect bin/hive to beeline. This is not working as expected though, because in {{bin/hive}} we're setting {{hive.aux.jars.path}}. This leads to the following error: {code} $ USE_BEELINE_FOR_HIVE_CLI=true hive ... Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.aux.jars.path at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0) Beeline version 3.0.0 by Apache Hive beeline> {code} We already avoid setting {{hive.aux.jars.path}} when running {{beeline}} service but the USE_BEELINE_FOR_HIVE_CLI override happens after that. I'd suggest checking the value of USE_BEELINE_FOR_HIVE_CLI right after we've selected the service to run (cli/beeline/...) and override cli->beeline there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18858) System properties in job configuration not resolved when submitting MR job
Daniel Voros created HIVE-18858: --- Summary: System properties in job configuration not resolved when submitting MR job Key: HIVE-18858 URL: https://issues.apache.org/jira/browse/HIVE-18858 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Environment: Hadoop 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros Since [this hadoop commit|https://github.com/apache/hadoop/commit/5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1] that was first released in 3.0.0, Configuration has a restricted mode, that disables the resolution of system properties (that happens when retrieving a configuration option). This leads to test failures when switching to Hadoop 3.0.0 (instead of 3.0.0-beta1), since we're relying on the [substitution of test.tmp.dir|https://github.com/apache/hive/blob/05d4719eefc56676a3e0e8f706e1c5e5e1f6b345/data/conf/hive-site.xml#L37] during the [maven build|https://github.com/apache/hive/blob/05d4719eefc56676a3e0e8f706e1c5e5e1f6b345/pom.xml#L83]. See test results on HIVE-18327. When we're passing job configurations to Hadoop, I believe there's no way to disable the restricted mode, since we go through some Hadoop MR calls first, see here: {code} "HiveServer2-Background-Pool: Thread-105@9500" prio=5 tid=0x69 nid=NA runnable java.lang.Thread.State: RUNNABLE at org.apache.hadoop.conf.Configuration.addResourceObject(Configuration.java:970) - locked <0x2fe6> (a org.apache.hadoop.mapred.JobConf) at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:895) at org.apache.hadoop.mapred.JobConf.(JobConf.java:476) at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:162) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:788) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(AccessController.java:-1) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) at java.security.AccessController.doPrivileged(AccessController.java:-1) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:415) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2314) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1985) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1687) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1438) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1432) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:248) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340) at java.security.AccessController.doPrivileged(AccessController.java:-1) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:353) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I suggest to resolve all variables before passing the configuration to Hadoop in ExecDriver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18784) TestJdbcWithMiniKdcSQLAuthBinary runs with HTTP transport mode instead of binary
Daniel Voros created HIVE-18784: --- Summary: TestJdbcWithMiniKdcSQLAuthBinary runs with HTTP transport mode instead of binary Key: HIVE-18784 URL: https://issues.apache.org/jira/browse/HIVE-18784 Project: Hive Issue Type: Test Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros TestJdbcWithMiniKdcSQLAuthHttp should run HTTP and TestJdbcWithMiniKdcSQLAuthBinary should run binary, but currently they're both using HTTP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18646) Update errata.txt for HIVE-18617
Daniel Voros created HIVE-18646: --- Summary: Update errata.txt for HIVE-18617 Key: HIVE-18646 URL: https://issues.apache.org/jira/browse/HIVE-18646 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Daniel Voros Assignee: Daniel Voros HIVE-18617 was committed as HIVE-18671. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18091) Failing tests of itests/qtest-spark and itests/hive-unit on branch-1
Daniel Voros created HIVE-18091: --- Summary: Failing tests of itests/qtest-spark and itests/hive-unit on branch-1 Key: HIVE-18091 URL: https://issues.apache.org/jira/browse/HIVE-18091 Project: Hive Issue Type: Bug Components: Test, Testing Infrastructure Reporter: Daniel Voros Assignee: Daniel Voros Seen this while looking at ptest results for HIVE-17947 but is probably an older issue. Tests under itests/qtest-spark and itests/hive-unit fail when trying to execute the download-spark plugin with: {code} [INFO] [INFO] [INFO] Building Hive Integration - Unit Tests 1.3.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-it-unit --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (download-spark) @ hive-it-unit --- [INFO] Executing tasks main: [exec] + /bin/pwd [exec] + BASE_DIR=./target [exec] + HIVE_ROOT=./target/../../../ [exec] + DOWNLOAD_DIR=./../thirdparty [exec] + mkdir -p ./../thirdparty [exec] /home/hiveptest/35.192.99.254-hiveptest-0/apache-github-branch-1-source/itests/hive-unit [exec] + download http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz spark [exec] + url=http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz [exec] + finalName=spark [exec] ++ basename http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz [exec] + tarName=spark-1.5.0-bin-hadoop2-without-hive.tgz [exec] + rm -rf ./target/spark [exec] + [[ ! -f ./../thirdparty/spark-1.5.0-bin-hadoop2-without-hive.tgz ]] [exec] + tar -zxf ./../thirdparty/spark-1.5.0-bin-hadoop2-without-hive.tgz -C ./target [exec] + mv ./target/spark-1.5.0-bin-hadoop2-without-hive ./target/spark [exec] + cp -f ./target/../../..//data/conf/spark/log4j.properties ./target/spark/conf/ [exec] + sed '/package /d' /data/hiveptest/working/apache-github-branch-1-source/itests/../contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleAdd.java [exec] sed: can't read /data/hiveptest/working/apache-github-branch-1-source/itests/../contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleAdd.java: No such file or directory [exec] + javac -cp /data/hiveptest/working/maven/org/apache/hive/hive-exec/1.3.0-SNAPSHOT/hive-exec-1.3.0-SNAPSHOT.jar /tmp/UDFExampleAdd.java -d /tmp [exec] + jar -cf /tmp/udfexampleadd-1.0.jar -C /tmp UDFExampleAdd.class [exec] /tmp/UDFExampleAdd.class : no such file or directory [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.376s [INFO] Finished at: Mon Nov 06 22:29:39 UTC 2017 [INFO] Final Memory: 18M/241M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (download-spark) on project hive-it-unit: An Ant BuildException has occured: exec returned: 1 [ERROR] around Ant part .. @ 4:141 in /home/hiveptest/35.192.99.254-hiveptest-0/apache-github-branch-1-source/itests/hive-unit/target/antrun/build-main.xml {code} {{mvn antrun:run@download-spark}} passes when run locally, so I guess it might be an issue with the way wer're executing ptests. Full list of classes with the same error: {code} TestAcidOnTez TestAdminUser TestAuthorizationPreEventListener TestAuthzApiEmbedAuthorizerInEmbed TestAuthzApiEmbedAuthorizerInRemote TestBeeLineWithArgs TestCLIAuthzSessionContext TestClearDanglingScratchDir TestClientSideAuthorizationProvider TestCompactor TestCreateUdfEntities TestCustomAuthentication TestDBTokenStore TestDDLWithRemoteMetastoreSecondNamenode TestDynamicSerDe TestEmbeddedHiveMetaStore TestEmbeddedThriftBinaryCLIService TestFilterHooks TestFolderPermissions TestHS2AuthzContext TestHS2AuthzSessionContext TestHS2ClearDanglingScratchDir TestHS2ImpersonationWithRemoteMS TestHiveAuthorizerCheckInvocation TestHiveAuthorizerShowFilters TestHiveHistory TestHiveMetaStoreTxns TestHiveMetaStoreWithEnvironmentContext TestHiveMetaTool TestHiveServer2 TestHiveServer2SessionTimeout TestHiveSessionImpl TestHs2Hooks TestJdbcDriver2 TestJdbcMetadataApiAuth TestJdbcWithLocalClusterSpark TestJdbcWithMiniHS2 TestJdbcWithMiniMr TestJdbcWithSQLAuthUDFBlacklist TestJdbcWithSQLAuthorization TestLocationQueries TestMTQueries TestMarkPartition TestMarkPartitionRemote TestMetaStoreAuthorization TestMetaStoreConnectionUrlHook TestMetaStoreEndFunctionListener TestMetaStoreEventListener TestMetaStoreEventListenerOnlyOnCom
[jira] [Created] (HIVE-17947) Concurrent inserts might fail for ACID table since HIVE-17526 on branch-1
Daniel Voros created HIVE-17947: --- Summary: Concurrent inserts might fail for ACID table since HIVE-17526 on branch-1 Key: HIVE-17947 URL: https://issues.apache.org/jira/browse/HIVE-17947 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0 Reporter: Daniel Voros Assignee: Daniel Voros Priority: Blocker HIVE-17526 (only on branch-1) disabled conversion to ACID if there are *_copy_N files under the table, but the filesystem checks introduced there are running for every insert since the MoveTask in the end of the insert will call alterTable eventually. The filename checking also recurses into staging directories created by other inserts. If those are removed while listing the files, it leads to the following exception and failing insert: {code} java.io.FileNotFoundException: File hdfs://mycluster/apps/hive/warehouse/dvoros.db/concurrent_insert/.hive-staging_hive_2017-10-30_13-23-35_056_2844419018556002410-2/-ext-10001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1081) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1004) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1000) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1018) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1735) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystem$6.handleFileStat(FileSystem.java:1864) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystem$6.hasNext(FileSystem.java:1841) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hive.metastore.TransactionalValidationListener.containsCopyNFiles(TransactionalValidationListener.java:226) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.TransactionalValidationListener.handleAlterTableTransactionalProp(TransactionalValidationListener.java:104) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.TransactionalValidationListener.handle(TransactionalValidationListener.java:63) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.TransactionalValidationListener.onEvent(TransactionalValidationListener.java:55) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(HiveMetaStore.java:2478) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:4145) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:4117) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at com.sun.proxy.$Proxy32.alter_table_with_environment_context(Unknown Source) [?:?] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:299) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:325) [hive-exec-2.1.0.2.6.3.0-235.jar:2.1.0.2.6.3.0-235] at sun.reflect.GeneratedMethodAccessor87.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.hi
[jira] [Created] (HIVE-17526) Disable conversion to ACID if table has _copy_N files on branch-1
Daniel Voros created HIVE-17526: --- Summary: Disable conversion to ACID if table has _copy_N files on branch-1 Key: HIVE-17526 URL: https://issues.apache.org/jira/browse/HIVE-17526 Project: Hive Issue Type: Bug Reporter: Daniel Voros Assignee: Daniel Voros Fix For: 1.3.0 As discussed in HIVE-16177, non-ACID to ACID conversion can lead to data loss if the table has *_copy_N files. The patch for HIVE-16177 is quite massive and would basically need a reimplementation to apply for branch-1 since the related code paths have diverged a lot. We could disable the conversion to ACID if there are *_copy_N files instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-15833) Add unit tests for org.json usage on branch-1
Daniel Voros created HIVE-15833: --- Summary: Add unit tests for org.json usage on branch-1 Key: HIVE-15833 URL: https://issues.apache.org/jira/browse/HIVE-15833 Project: Hive Issue Type: Sub-task Reporter: Daniel Voros Assignee: Daniel Voros Before switching implementation, we should add some tests that capture the current behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15834) Add unit tests for org.json usage on master
Daniel Voros created HIVE-15834: --- Summary: Add unit tests for org.json usage on master Key: HIVE-15834 URL: https://issues.apache.org/jira/browse/HIVE-15834 Project: Hive Issue Type: Sub-task Reporter: Daniel Voros Assignee: Daniel Voros Before switching implementation, we should add some tests that capture the current behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)