[jira] [Created] (HIVE-27225) Speedup build by skipping SBOM generation by default
Stamatis Zampetakis created HIVE-27225: -- Summary: Speedup build by skipping SBOM generation by default Key: HIVE-27225 URL: https://issues.apache.org/jira/browse/HIVE-27225 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis A full build of Hive locally in my environment takes ~15 minutes. {noformat} mvn clean install -DskipTests -Pitests [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 14:15 min {noformat} Profiling the build shows that we are spending roughly 30% of CPU in org.cyclonedx.maven plugin which is used to generate SBOM artifacts (HIVE-26912). The SBOM generation does not need run in every single build and probably needs to be active only during the release build. To speed-up every-day builds I propose to activate the cyclonedx plugin only in the dist (release) profile. After this change, the default build drops from 14 minutes to 8. {noformat} mvn clean install -DskipTests -Pitests [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:19 min {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats
Stamatis Zampetakis created HIVE-27199: -- Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats Key: HIVE-27199 URL: https://issues.apache.org/jira/browse/HIVE-27199 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Timestamp values come in many flavors and formats and there is no single representation that can satisfy everyone especially when such values are stored in plain text/csv files. HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that allows to provide custom timestamp patterns to parse correctly TIMESTAMP values coming from files. However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not possible to use a custom pattern thus when the built-in Hive parser does not match the expected format a NULL value is returned. Consider a text file, F1, with the following values: {noformat} 2016-05-03 12:26:34 2016-05-03T12:26:34 {noformat} and a table with a column declared as LTZ. {code:sql} CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE); LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table; SELECT * FROM ts_table; 2016-05-03 12:26:34.0 US/Pacific NULL {code} In order to give more flexibility to the users relying on the TIMESTAMP WITH LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP types. The work here focuses exclusively on simple text files but the same could be done for other SERDE such as JSON etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27162) Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator
Stamatis Zampetakis created HIVE-27162: -- Summary: Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator Key: HIVE-27162 URL: https://issues.apache.org/jira/browse/HIVE-27162 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis The two classes below both represent the {{unix_timestamp}} operator and have identical implementations. * https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java * https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java Probably there is a way to use one or the other and not both; having two ways of representing the same thing can bring various problems in query planning and it also leads to code duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler
Stamatis Zampetakis created HIVE-27161: -- Summary: MetaException when executing CTAS query in Druid storage handler Key: HIVE-27161 URL: https://issues.apache.org/jira/browse/HIVE-27161 Project: Hive Issue Type: Bug Components: Druid integration Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Any kind of CTAS query targeting the Druid storage handler fails with the following exception: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:LOCATION may not be specified for Druid) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:60) ~[test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) ~[junit-4.13.2.jar:4.13.2] at org.junit.internal.run
[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function
Stamatis Zampetakis created HIVE-27157: -- Summary: AssertionError when inferring return type for unix_timestamp function Key: HIVE-27157 URL: https://issues.apache.org/jira/browse/HIVE-27157 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Any attempt to derive the return data type for the {{unix_timestamp}} function results into the following assertion error. {noformat} java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT at org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65) at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81) at org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67) at org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78) at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278) {noformat} due to a faulty implementation of type inference for the respective operators: * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java] * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java] Although at this stage in master it is not possible to reproduce the problem with an actual SQL query the buggy implementation must be fixed since slight changes in the code/CBO rules may lead to methods relying on {{{}SqlOperator.inferReturnType{}}}. Note that in older versions of Hive it is possible to hit the AssertionError in various ways. For example in Hive 3.1.3 (and older), the error may come from [HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933] in the presence of sub-queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP
Stamatis Zampetakis created HIVE-27156: -- Summary: Wrong results when CAST timestamp literal with timezone to TIMESTAMP Key: HIVE-27156 URL: https://issues.apache.org/jira/browse/HIVE-27156 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype results into a timestamp with the time part truncated to midnight (00:00:00). *Case I* {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| *Case II* {code:sql} select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| The existing documentation does not cover what should be the output in the cases above: * https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps * https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types *Case III* Another subtle but important case is the following where the timestamp literal has a valid timezone but we are attempting a cast to a datatype that does not store the timezone. {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp); {code} +Actual+ |2020-06-28 22:17:33.123456| The correctness of the last result is debatable since someone would expect a NULL or ERROR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27131) Remove empty module shims/scheduler
Stamatis Zampetakis created HIVE-27131: -- Summary: Remove empty module shims/scheduler Key: HIVE-27131 URL: https://issues.apache.org/jira/browse/HIVE-27131 Project: Hive Issue Type: Task Components: Shims Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The module has nothing more than a plain pom.xml file and the latter does not seem to do anything special apart from bundling up together some optional dependencies. There is no source code, no tests, and no reason for the module to exist. At some point it used to contain a few classes but these were removed progressively (e.g., HIVE-22398) leaving back an empty module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
Stamatis Zampetakis created HIVE-27102: -- Summary: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 Key: HIVE-27102 URL: https://issues.apache.org/jira/browse/HIVE-27102 Project: Hive Issue Type: Improvement Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis New versions for Calcite and Avatica are available so we should upgrade to them. I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that the work was not in very advanced state it is preferred to jump directly to 1.33.0. Avatica must be inline with Calcite so both need to be updated at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27100) Remove unused data/files from repo
Stamatis Zampetakis created HIVE-27100: -- Summary: Remove unused data/files from repo Key: HIVE-27100 URL: https://issues.apache.org/jira/browse/HIVE-27100 Project: Hive Issue Type: Task Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Some files under [https://github.com/apache/hive/tree/master/data/files] are not referenced anywhere else in the repo and can be removed. Removing them makes it easier to see what is actually tested. Other minor benefits: * faster checkout times; * smaller source/binary releases. The script that was used to find which files are not referenced can be found below: {code:bash} for f in `ls data/files`; do echo -n "$f "; grep -a -R "$f" --exclude-dir=".git" --exclude-dir=target --exclude=\*.q.out --exclude=\*.class --exclude=\*.jar | wc -l | grep " 0$"; done {code} +Output+ {noformat} cbo_t4.txt 0 cbo_t5.txt 0 cbo_t6.txt 0 compressed_4line_file1.csv.bz2 0 empty2.txt 0 filterCard.txt 0 fullouter_string_big_1a_old.txt 0 fullouter_string_small_1a_old.txt 0 futurama_episodes.avro 0 in9.txt 0 map_null_schema.avro 0 regex-path-2015-12-10_03.txt 0 regex-path-201512-10_03.txt 0 regex-path-2015121003.txt 0 sample.json 0 sample-queryplan-in-history.txt 0 sample-queryplan.txt 0 smbbucket_2.txt 0 smb_bucket_input.txt 0 SortDescCol1Col2.txt 0 SortDescCol2Col1.txt 0 sortdp.txt 0 srcsortbucket1outof4.txt 0 srcsortbucket2outof4.txt 0 srcsortbucket4outof4.txt 0 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27080) Support project pushdown in JDBC storage handler even when filters are not pushed
Stamatis Zampetakis created HIVE-27080: -- Summary: Support project pushdown in JDBC storage handler even when filters are not pushed Key: HIVE-27080 URL: https://issues.apache.org/jira/browse/HIVE-27080 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis {code:sql} CREATE EXTERNAL TABLE book ( id int, title varchar(20), author int ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver" = "org.postgresql.Driver", "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB", "hive.sql.dbcp.username" = "qtestuser", "hive.sql.dbcp.password" = "qtestpassword", "hive.sql.table" = "book" ); {code} {code:sql} explain cbo select id from book where title = 'Les Miserables'; {code} {noformat} CBO PLAN: HiveJdbcConverter(convention=[JDBC.POSTGRES]) JdbcProject(id=[$0]) JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')]) JdbcHiveTableScan(table=[[default, book]], table:alias=[book]) {noformat} +Good case:+ Only the id column is fetched from the underlying database (see JdbcProject) since it is necessary for the result. {code:sql} explain cbo select id from book where UPPER(title) = 'LES MISERABLES'; {code} {noformat} CBO PLAN: HiveProject(id=[$0]) HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'LES MISERABLES')]) HiveProject(id=[$0], title=[$1], author=[$2]) HiveJdbcConverter(convention=[JDBC.POSTGRES]) JdbcHiveTableScan(table=[[default, book]], table:alias=[book]) {noformat} +Bad case:+ All table columns are fetched from the database although only id and title are necessary; id is the result so cannot be dropped and title is needed for HiveFilter since the UPPER operation was not pushed in the DBMS. The author column is not needed at all so the plan should have a JdbcProject with id, and title, on top of the JdbcHiveTableScan. Although it doesn't seem a big deal in some cases tables are pretty wide (more than 100 columns) while the queries rarely return all of them. Improving project pushdown to handle such cases can give a major performance boost. Pushing the filter with UPPER to JDBC storage handler is also a relevant improvement but this should be tracked under another ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27061) Website deployment GitHub action should not trigger on pull requests
Stamatis Zampetakis created HIVE-27061: -- Summary: Website deployment GitHub action should not trigger on pull requests Key: HIVE-27061 URL: https://issues.apache.org/jira/browse/HIVE-27061 Project: Hive Issue Type: Bug Components: Website Reporter: Stamatis Zampetakis The Website deployment GitHub action configured here: [https://github.com/apache/hive-site/blob/a3132faf0f4a555434076cb8ad690ae2c2c8c371/.github/workflows/gh-pages.yml] should not trigger on pull requests. The issue can be seen here: https://github.com/apache/hive-site/actions/runs/4127993132/jobs/7131893178 where the action was launched for https://github.com/apache/hive-site/pull/1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26987) InvalidProtocolBufferException when reading column statistics from ORC files
Stamatis Zampetakis created HIVE-26987: -- Summary: InvalidProtocolBufferException when reading column statistics from ORC files Key: HIVE-26987 URL: https://issues.apache.org/jira/browse/HIVE-26987 Project: Hive Issue Type: Bug Components: HiveServer2, ORC Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Attachments: data.csv.gz, orc_large_column_metadata.q Any attempt to read an ORC file (query an ORC table) having a metadata section with column statistics exceeding the hardcoded limit of 1GB ([https://github.com/apache/orc/blob/2ff9001ddef082eaa30e21cbb034f266e0721664/java/core/src/java/org/apache/orc/impl/InStream.java#L41]) leads to the following exception. {noformat} Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:162) at com.google.protobuf.CodedInputStream$StreamDecoder.readRawBytesSlowPathOneChunk(CodedInputStream.java:2940) at com.google.protobuf.CodedInputStream$StreamDecoder.readBytesSlowPath(CodedInputStream.java:3021) at com.google.protobuf.CodedInputStream$StreamDecoder.readBytes(CodedInputStream.java:2432) at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1718) at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1663) at org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1766) at org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1761) at com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6552) at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6468) at org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6678) at org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6673) at com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) at org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19586) at org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19533) at org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19622) at org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19617) at com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20270) at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20217) at org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20306) at org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20301) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:91) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48) at org.apache.orc.OrcProto$Metadata.parseFrom(OrcProto.java:20438) at org.apache.orc.impl.ReaderImpl.deserializeStripeStats(ReaderImpl.java:1013) at org.apache.orc.impl.ReaderImpl.getVariantStripeStatistics(ReaderImpl.java:317) at org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1047) at org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1034) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1679) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1557) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1342) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1529) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1526) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1526) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1342) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecu
[jira] [Created] (HIVE-26877) Parquet CTAS with JOIN on decimals with different precision/scale fail
Stamatis Zampetakis created HIVE-26877: -- Summary: Parquet CTAS with JOIN on decimals with different precision/scale fail Key: HIVE-26877 URL: https://issues.apache.org/jira/browse/HIVE-26877 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: ctas_parquet_join.q Creating a Parquet table using CREATE TABLE AS SELECT syntax (CTAS) leads to runtime error when the SELECT statement joins columns with different precision/scale. Steps to reproduce: {code:sql} CREATE TABLE table_a (col_dec decimal(5,0)); CREATE TABLE table_b(col_dec decimal(38,10)); INSERT INTO table_a VALUES (1); INSERT INTO table_b VALUES (1.00); set hive.default.fileformat=parquet; create table target as select table_a.col_dec from table_a left outer join table_b on table_a.col_dec = table_b.col_dec; {code} Stacktrace: {noformat} 2022-12-20T07:02:52,237 INFO [2dfbd95a-7553-467b-b9d0-629100785502 Listener at 0.0.0.0/46609] reexec.ReExecuteLostAMQueryPlugin: Got exception message: Vertex failed, vertexName=Reducer 2, vertexId=vertex_1671548565336_0001_3_02, diagnostics=[Task failed, taskId=task_1671548565336_0001_3_02_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1671548565336_0001_3_02_00_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators: Fixed Binary size 16 does not match field type length 3 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators: Fixed Binary size 16 does not match field type length 3 at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:379) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310) ... 15 more Caused by: java.lang.IllegalArgumentException: Fixed Binary size 16 does not match field type length 3 at org.apache.parquet.column.values.plain.FixedLenByteArrayPlainValuesWriter.writeBytes(FixedLenByteArrayPlainValuesWriter.java:56) at org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:174) at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:476) at org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:116) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:571) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
[jira] [Created] (HIVE-26849) Nightly build fails in master (build 1533 onwards)
Stamatis Zampetakis created HIVE-26849: -- Summary: Nightly build fails in master (build 1533 onwards) Key: HIVE-26849 URL: https://issues.apache.org/jira/browse/HIVE-26849 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: master_build1534_nodes_101_steps_439.txt The last builds on master fail when testing the nightly build: * http://ci.hive.apache.org/job/hive-precommit/job/master/1533/ * http://ci.hive.apache.org/job/hive-precommit/job/master/1534/ Full log attached in master_build1534_nodes_101_steps_439.txt Relevant extract: {noformat} [2022-12-14T14:50:48.606Z] [INFO] Hive Packaging 4.0.0-nightly-89bf37bb45-20221214_144325 FAILURE [ 3.734 s] [2022-12-14T14:50:48.607Z] [INFO] [2022-12-14T14:50:48.607Z] [INFO] BUILD FAILURE [2022-12-14T14:50:48.607Z] [INFO] [2022-12-14T14:50:48.607Z] [INFO] Total time: 06:49 min [2022-12-14T14:50:48.607Z] [INFO] Finished at: 2022-12-14T14:50:48Z [2022-12-14T14:50:48.607Z] [INFO] [2022-12-14T14:50:48.607Z] [WARNING] The requested profile "qsplits" could not be activated because it does not exist. [2022-12-14T14:50:48.607Z] [ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:4.0.0-nightly-89bf37bb45-20221214_144325: The following artifacts could not be resolved: org.apache.hive.hcatalog:hive-webhcat:jar:4.0.0-nightly-89bf37bb45-20221214_144325, org.apache.hive.hcatalog:hive-webhcat-java-client:jar:4.0.0-nightly-89bf37bb45-20221214_144325: Could not find artifact org.apache.hive.hcatalog:hive-webhcat:jar:4.0.0-nightly-89bf37bb45-20221214_144325 in wonder (http://artifactory/artifactory/wonder) -> [Help 1] [2022-12-14T14:50:48.607Z] [ERROR] [2022-12-14T14:50:48.607Z] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [2022-12-14T14:50:48.607Z] [ERROR] Re-run Maven using the -X switch to enable full debug logging. [2022-12-14T14:50:48.607Z] [ERROR] [2022-12-14T14:50:48.607Z] [ERROR] For more information about the errors and possible solutions, please read the following articles: [2022-12-14T14:50:48.607Z] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [2022-12-14T14:50:48.607Z] [ERROR] [2022-12-14T14:50:48.607Z] [ERROR] After correcting the problems, you can resume the build with the command [2022-12-14T14:50:48.607Z] [ERROR] mvn -rf :hive-packaging {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26818) Beeline module misses transitive dependencies due to shading
Stamatis Zampetakis created HIVE-26818: -- Summary: Beeline module misses transitive dependencies due to shading Key: HIVE-26818 URL: https://issues.apache.org/jira/browse/HIVE-26818 Project: Hive Issue Type: Bug Components: Beeline Reporter: Stamatis Zampetakis Due to shading, the dependecy-reduced-pom.xml file is installed in the local maven repository (~/.m2/repository/org/apache/hive/hive-beeline/4.0.0-SNAPSHOT/) for beeline. The latter indicates that the module doesn't have any transitive dependencies. If we were publishing the shaded jar that would be true but we publish the regular jar. At this point, modules which include hive-beeline as a maven dependency are broken and problems such as HIVE-26812 may occur. I was under the impression that these also affects 4.0.0-alpha-2 release (since it includes ) HIVE-25750) but strangely the published pom has all the dependencies: https://repo1.maven.org/maven2/org/apache/hive/hive-beeline/4.0.0-alpha-2/hive-beeline-4.0.0-alpha-2.pom -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3
Stamatis Zampetakis created HIVE-26807: -- Summary: Investigate test running times before/after Zookeeper upgrade to 3.6.3 Key: HIVE-26807 URL: https://issues.apache.org/jira/browse/HIVE-26807 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis During the investigation of the CI timing out (HIVE-2686) there were some concerns that the Zookeeper (HIVE-26763) upgrade caused some significant slowdown. The goal of this issue is to analyse the test results from the following builds: * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], commit just before Zookeeper upgrade; * [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI timeouts (HIVE-26806) fixed; and reason about the impact of the Zookeeper upgrade in test execution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
Stamatis Zampetakis created HIVE-26806: -- Summary: Precommit tests in CI are timing out after HIVE-26796 Key: HIVE-26806 URL: https://issues.apache.org/jira/browse/HIVE-26806 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ {noformat} ancelling nested steps due to timeout 15:22:08 Sending interrupt signal to process 15:22:08 Killing processes 15:22:09 kill finished with exit code 0 15:22:19 Terminated 15:22:19 script returned exit code 143 [Pipeline] } [Pipeline] // withEnv [Pipeline] } 15:22:19 Deleting 1 temporary files [Pipeline] // configFileProvider [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (PostProcess) [Pipeline] sh [Pipeline] sh [Pipeline] sh [Pipeline] junit 15:22:25 Recording test results 15:22:32 [Checks API] No suitable checks publisher found. [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // timeout [Pipeline] } [Pipeline] // podTemplate [Pipeline] } 15:22:32 Failed in branch split-01 [Pipeline] // parallel [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Archive) [Pipeline] podTemplate [Pipeline] { [Pipeline] timeout 15:22:33 Timeout set to expire in 6 hr 0 min {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26796) All tests in hive-unit module are skipped silently
Stamatis Zampetakis created HIVE-26796: -- Summary: All tests in hive-unit module are skipped silently Key: HIVE-26796 URL: https://issues.apache.org/jira/browse/HIVE-26796 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis In current master (7207a62def246b3290f1ece529e65b79012a3578) the tests in hive-unit module are not running. {noformat} $ cd itests/hive-unit && mvn test [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-it-unit --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] [INFO] Results: [INFO] [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26785) Remove explicit protobuf-java dependency from blobstore and minikdc modules
Stamatis Zampetakis created HIVE-26785: -- Summary: Remove explicit protobuf-java dependency from blobstore and minikdc modules Key: HIVE-26785 URL: https://issues.apache.org/jira/browse/HIVE-26785 Project: Hive Issue Type: Task Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The modules do not directly need protobuf dependency so it is misleading to declare it explicitly. Moreover, these modules use a different protobuf version (3.3.0) than the rest of the project (3.21.4) which can lead to compatibility problems, inconsistent behavior in tests, and undesired transitive propagation to other modules. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26755) Wrong results after renaming Parquet column
Stamatis Zampetakis created HIVE-26755: -- Summary: Wrong results after renaming Parquet column Key: HIVE-26755 URL: https://issues.apache.org/jira/browse/HIVE-26755 Project: Hive Issue Type: Bug Components: HiveServer2, Parquet Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Renaming the column of a Parquet table leads to wrong results when the query uses the renamed column. {code:sql} create table person (id int, fname string, lname string, age int) stored as parquet; insert into person values (1, 'Victor', 'Hugo', 23); insert into person values (2, 'Alex', 'Dumas', 38); insert into person values (3, 'Marco', 'Pollo', 25); select fname from person where age >=25; {code} ||Correct results|| |Alex| |Marco| {code:sql} alter table person change column age years_from_birth int; select fname from person where years_from_birth >=25; {code} After renaming the column the query above returns an empty result set. {code:sql} select years_from_birth from person; {code} ||Wrong results|| |NULL| |NULL| |NULL| After renaming the column the query returns the correct number of rows but all filled with nulls. The problem is reproducible on current master (commit ae0cabffeaf284a6d2ec13a6993c87770818fbb9). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26690) Redirect hive-site notifications to the appropriate mailing lists
Stamatis Zampetakis created HIVE-26690: -- Summary: Redirect hive-site notifications to the appropriate mailing lists Key: HIVE-26690 URL: https://issues.apache.org/jira/browse/HIVE-26690 Project: Hive Issue Type: Task Components: Website Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Currently various notifications from [hive-site|https://github.com/apache/hive-site] repository, such as opening/reviewing/commenting pull requests, are send to the [dev mailing list|https://lists.apache.org/list.html?dev@hive.apache.org] (e.g., [https://lists.apache.org/thread/xthvd9m148xkhshco772llckfc1qk0sf]). The respective notifications from the main [hive repository|https://github.com/apache/hive] are send to the [gitbox mailing list|https://lists.apache.org/list.html?git...@hive.apache.org]. The goal of this ticket is to redirect the notifications for hive-site repository from dev to gitbox/commit mailing lists by modifying the [.asf.yaml file|https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Notificationsettingsforrepositories]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26658) INT64 Parquet timestamps cannot be mapped to most Hive numeric types
Stamatis Zampetakis created HIVE-26658: -- Summary: INT64 Parquet timestamps cannot be mapped to most Hive numeric types Key: HIVE-26658 URL: https://issues.apache.org/jira/browse/HIVE-26658 Project: Hive Issue Type: Bug Components: Parquet, Serializers/Deserializers Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis When attempting to read a Parquet file with column of primitive type INT64 and logical type [TIMESTAMP|https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/LogicalTypes.md?plain=1#L337] an error is raised when the Hive type is different from TIMESTAMP and BIGINT. Consider a Parquet file (e.g., ts_file.parquet) with the following schema: {code:json} { "name": "eventtime", "type": ["null", { "type": "long", "logicalType": "timestamp-millis" }], "default": null } {code} Mapping the column to a Hive numeric type among TINYINT, SMALLINT, INT, FLOAT, DOUBLE, DECIMAL, and trying to run a SELECT will give back an error. The following snippet can be used to reproduce the problem. {code:sql} CREATE TABLE ts_table (eventtime INT) STORED AS PARQUET; LOAD DATA LOCAL INPATH 'ts_file.parquet' into table ts_table; SELECT * FROM ts_table; {code} This is a regression caused by HIVE-21215. Although, HIVE-21215 allows to read INT64 types as Hive TIMESTAMP, which was not possible before, at the same time it broke the mapping to every other Hive numeric type. The problem was addressed selectively for BIGINT type very recently (HIVE-26612). The primary goal of this ticket is to restore backward compatibility since these use-cases were working before HIVE-21215. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26653) Wrong results when (map) joining multiple tables on partition column
Stamatis Zampetakis created HIVE-26653: -- Summary: Wrong results when (map) joining multiple tables on partition column Key: HIVE-26653 URL: https://issues.apache.org/jira/browse/HIVE-26653 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The result of the query must have exactly one row matching the date specified in the WHERE clause but the query returns nothing. {code:sql} CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string) row format delimited fields terminated by ',' stored as textfile; LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a; CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string) row format delimited fields terminated by ',' stored as textfile; LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b; set hive.auto.convert.join=true; set hive.optimize.semijoin.conversion=false; SELECT a.p_dt FROM ((SELECT p_dt FROM table_b GROUP BY p_dt) a JOIN (SELECT p_dt FROM table_a GROUP BY p_dt) b ON a.p_dt = b.p_dt JOIN (SELECT p_dt FROM table_a GROUP BY p_dt) c ON a.p_dt = c.p_dt) WHERE a.p_dt = translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), '-', ''); {code} +Expected result+ 20220731 +Actual result+ Empty To reproduce the problem the tables need to have some data. Values in aid and bid columns are not important. For p_dt column use one of the following values 20220731, 20220630. I will attach some sample data with which the problem can be reproduced. The tables look like below. ||aid|pdt|| |611|20220731| |239|20220630| |...|...| The problem can be reproduced via qtest in current master (commit [6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a]) by running the TestMiniLlapLocalCliDriver. There is specific query plan (will attach shortly) for which the problem shows up so if the plan changes slightly the problem may not appear anymore; this is why we need to set explicitly hive.optimize.semijoin.conversion and hive.auto.convert.join to trigger the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26642) Replace HiveFilterMergeRule with Calcite's built-in implementation
Stamatis Zampetakis created HIVE-26642: -- Summary: Replace HiveFilterMergeRule with Calcite's built-in implementation Key: HIVE-26642 URL: https://issues.apache.org/jira/browse/HIVE-26642 Project: Hive Issue Type: Improvement Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The rule was copied from Calcite to address HIVE-23389 as a temporary workaround till the next Calcite upgrade. Now that Hive is on calcite 1.25.0 (HIVE-23456) the in-house copy can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26638) Replace in-house CBO reduce expressions rules with Calcite's built-in classes
Stamatis Zampetakis created HIVE-26638: -- Summary: Replace in-house CBO reduce expressions rules with Calcite's built-in classes Key: HIVE-26638 URL: https://issues.apache.org/jira/browse/HIVE-26638 Project: Hive Issue Type: Improvement Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The goal of this ticket is to remove Hive specific code in [HiveReduceExpressionsRule|https://github.com/apache/hive/blob/b48c1bf11c4f75ba2c894e4732a96813ddde1414/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java] and use exclusively the respective Calcite classes (i.e., [ReduceExpressionsRule|https://github.com/apache/calcite/blob/2c30a56158cdd351d35725006bc1f76bb6aac75b/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java]) to reduce maintenance overhead and facilitate code evolution. Currently the only difference between in-house (HiveReduceExpressionsRule) and built-in (ReduceExpressionsRule) reduce expressions rules lies in the way we treat the {{Filter}} operator (i.e., FilterReduceExpressionsRule). There are four differences when comparing the in-house code with the respective part in Calcite 1.25.0 that are Hive specific. +Match nullability when reducing expressions+ When we reduce filters we always set {{matchNullability}} (last parameter) to false. {code:java} if (reduceExpressions(filter, expList, predicates, true, false)) { {code} This means that the original and reduced expression can have a slightly different type in terms of nullability; the original is nullable and the reduced is not nullable. When the value is true the type can be preserved by adding a "nullability" CAST, which is a cast to the same type which differs only to if it is nullable or not. Hardcoding {{matchNullability}} to false was done as part of the upgrade in Calcite 1.15.0 (HIVE-18068) where the behavior of the rule became configurable (CALCITE-2041). +Remove nullability cast explicitly+ When the expression is reduced we try to remove the nullability cast; if there is one. {code:java} if (RexUtil.isNullabilityCast(filter.getCluster().getTypeFactory(), newConditionExp)) { newConditionExp = ((RexCall) newConditionExp).getOperands().get(0); } {code} The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316). However, the code is redundant as of HIVE-18068; setting {{matchNullability}} to {{false}} no longer generates nullability casts during the reduction. +Avoid creating filters with condition of type NULL+ {code:java} if(newConditionExp.getType().getSqlTypeName() == SqlTypeName.NULL) { newConditionExp = call.builder().cast(newConditionExp, SqlTypeName.BOOLEAN); } {code} Hive tries to cast such expressions to BOOLEAN to avoid the weird (and possibly problematic) situation of having a condition with NULL type. In Calcite, there is specific code for detecting if the new condition is the NULL literal (with NULL type) and if that's the case it turns the relation to empty. {code:java} } else if (newConditionExp instanceof RexLiteral || RexUtil.isNullLiteral(newConditionExp, true)) { call.transformTo(createEmptyRelOrEquivalent(call, filter)); {code} Due to that the Hive specific code is redundant if the Calcite rule is used. +Bail out when input to reduceNotNullableFilter is not a RexCall+ {code:java} if (!(rexCall.getOperands().get(0) instanceof RexCall)) { // If child is not a RexCall instance, we can bail out return; } {code} The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316) but it does not add any functional value. The instanceof check is redundant since the code in reduceNotNullableFilter [is a noop|https://github.com/apache/hive/blob/6e8fc53fb68898d1a404435859cea5bbc79200a4/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java#L228] when the expression/call is not one of the following: IS_NULL, IS_UNKNOWN, IS_NOT_NULL, which are all rex calls. +Summary+ All of the Hive specific changes mentioned previously can be safely replaced by appropriate uses of the Calcite APIs without affecting the behavior of CBO. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26627) Remove HiveRelBuilder.aggregateCall override and refactor callers to use existing public methods
Stamatis Zampetakis created HIVE-26627: -- Summary: Remove HiveRelBuilder.aggregateCall override and refactor callers to use existing public methods Key: HIVE-26627 URL: https://issues.apache.org/jira/browse/HIVE-26627 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The HiveRelBuilder overrides [aggregateCall|https://github.com/apache/hive/blob/8c3567ea8e423b202cde370f4d3fb401bcc23e46/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java#L246] from its superclass simply to expose and use it in HiveRewriteToDataSketchesRules. However, there is no real need to override this method since we can achieve the same outcome by using existing methods in RelBuilder which are easier to use and understand. Furthermore it is safer to depend on public APIs since are more stable in general. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26626) Cut dependencies between HiveXxPullUpConstantsRule and HiveReduceExpressionsRule
Stamatis Zampetakis created HIVE-26626: -- Summary: Cut dependencies between HiveXxPullUpConstantsRule and HiveReduceExpressionsRule Key: HIVE-26626 URL: https://issues.apache.org/jira/browse/HIVE-26626 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis HiveSortPullUpConstantsRule and HiveUnionPullUpConstantsRule are calling [predicateConstants|https://github.com/apache/hive/blob/8c3567ea8e423b202cde370f4d3fb401bcc23e46/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortPullUpConstantsRule.java#L128 ] method from HiveReduceExpressionsRule. The method in HiveReduceExpressionsRule is deprecated and creates unnecessary dependencies among the rules. It can be replaced by a direct call to RexUtil.predicateConstants; the two methods are functionally equivalent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26609) Cleanup external table directories created in qtests after test run
Stamatis Zampetakis created HIVE-26609: -- Summary: Cleanup external table directories created in qtests after test run Key: HIVE-26609 URL: https://issues.apache.org/jira/browse/HIVE-26609 Project: Hive Issue Type: Improvement Components: Tests Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis In many qtests we are creating external tables by setting explicitly the location of the table. [https://github.com/apache/hive/blob/566f48d3d3fc740ef958bdf963e511e0853da402/ql/src/test/queries/clientnegative/authorization_uri_create_table_ext.q#L7] If the test does not remove explicitly the directory (as it happens above) then it remains there and may cause flakiness and unrelated test failures if other tests happen to use the same directory somehow. A recent case where this problem appeared (directory conflict between tests) is logged under HIVE-26584. There the solution was to add explicit rm commands in the qfiles. A more general solution would be to handle the cleanup of such directories inside [QTestUtil.clearTablesCreatedDuringTests|https://github.com/apache/hive/blob/566f48d3d3fc740ef958bdf963e511e0853da402/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L342]. The idea is to get the location for an external table from the metastore and remove the respective directory if it is under a known "safe" directory such as {{{}$\{system:test.tmp.dir{. As discussed under HIVE-26584 it might be risky to forcefully delete any kind of directory coming from an external table at the risk of corrupting the development environment. If we restrict the cleanup to known directories it should be fine though. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26557) AbstractMethodError when running TestWebHCatE2e
Stamatis Zampetakis created HIVE-26557: -- Summary: AbstractMethodError when running TestWebHCatE2e Key: HIVE-26557 URL: https://issues.apache.org/jira/browse/HIVE-26557 Project: Hive Issue Type: Sub-task Components: HCatalog, Tests Reporter: Stamatis Zampetakis {code:bash} mvn test -Dtest=TestWebHCatE2e {code} {noformat} java.lang.AbstractMethodError: javax.ws.rs.core.UriBuilder.uri(Ljava/lang/String;)Ljavax/ws/rs/core/UriBuilder; at javax.ws.rs.core.UriBuilder.fromUri(UriBuilder.java:119) ~[javax.ws.rs-api-2.0.1.jar:2.0.1] at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:669) ~[jersey-servlet-1.19.jar:1.19] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.apache.hive.hcatalog.templeton.Main$XFrameOptionsFilter.doFilter(Main.java:355) ~[classes/:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:650) ~[hadoop-auth-3.3.1.jar:?] at org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.doFilter(ProxyUserAuthenticationFilter.java:104) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) ~[hadoop-auth-3.3.1.jar:?] at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:51) ~[hadoop-hdfs-3.3.1.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.40.v2021
[jira] [Created] (HIVE-26549) WebHCat servers fails to start due to authentication filter configuration
Stamatis Zampetakis created HIVE-26549: -- Summary: WebHCat servers fails to start due to authentication filter configuration Key: HIVE-26549 URL: https://issues.apache.org/jira/browse/HIVE-26549 Project: Hive Issue Type: Sub-task Components: HCatalog, Test Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The TestWebHCatE2e test fails cause the server cannot start. The exception is shown below: {noformat} 2022-09-20T02:10:15,186 ERROR [main] templeton.Main: Server failed to start: javax.servlet.ServletException: Authentication type must be specified: simple|kerberos| at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:164) ~[hadoop-auth-3.3.1.jar:?] at org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.init(ProxyUserAuthenticationFilter.java:57) ~[hadoop-common-3.3.1.jar:?] at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:140) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:731) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_261] at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_261] at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) ~[?:1.8.0_261] at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:911) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.Server.start(Server.java:423) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.server.Server.doStart(Server.java:387) ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413] at org.apache.hive.hcatalog.templeton.Main.runServer(Main.java:255) ~[classes/:?] at org.apache.hive.hcatalog.templeton.Main.run(Main.java:147) ~[classes/:?] at org.apache.hive.hcatalog.templeton.TestWebHCatE2e.startHebHcatInMem(TestWebHCatE2e.java:94) ~[test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) ~[junit-4.13.jar:4.13] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) ~[junit-4.13.jar:4.13] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) ~[junit-4.13.jar:4.13] at org.junit.internal.runners.statements.RunBefores.invokeMeth
[jira] [Created] (HIVE-26461) Add CI build check for macOS
Stamatis Zampetakis created HIVE-26461: -- Summary: Add CI build check for macOS Key: HIVE-26461 URL: https://issues.apache.org/jira/browse/HIVE-26461 Project: Hive Issue Type: Test Components: Build Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Add CI builds for Hive over macOS distribution to test that the project can be compiled successfully in this platform and ensure that future changes will not break it accidentally. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26458) Add explicit dependency to commons-dbcp2 in hive-exec module
Stamatis Zampetakis created HIVE-26458: -- Summary: Add explicit dependency to commons-dbcp2 in hive-exec module Key: HIVE-26458 URL: https://issues.apache.org/jira/browse/HIVE-26458 Project: Hive Issue Type: Task Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Hive CBO relies on Calcite so there is a direct dependency towards Calcite in hive-exec module. On its turn, Calcite needs commons-dbcp2 dependency in order to compile and run properly: https://github.com/apache/calcite/blob/b9c2099ea92a575084b55a206efc5dd341c0df62/core/build.gradle.kts#L69 In particular the dependency is necessary in order to use the JDBC adapter and some of its usages are shown below: * https://github.com/apache/calcite/blob/257c81b5cac35e29598a246463356fea7e0b0336/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcUtils.java#L29 * https://github.com/apache/calcite/blob/257c81b5cac35e29598a246463356fea7e0b0336/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcUtils.java#L262 However, due to the [shading of Calcite|https://github.com/apache/hive/blob/778c838317c952dcd273fd6c7a51491746a1d807/ql/pom.xml#L1075] inside hive-exec module all the transitive dependencies coming from Calcite must be defined explicitly otherwise they will not make it to the classpath. At the moment this does not pose a problem in master since {{commons-dbcp2}} dependency comes transitively from other modules. But in certain Hive branches with slightly different dependencies between modules we have seen failures like the one shown below: {noformat} java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/commons/dbcp2/BasicDataSource at org.apache.calcite.adapter.jdbc.JdbcUtils$DataSourcePool.(JdbcUtils.java:213) at org.apache.calcite.adapter.jdbc.JdbcUtils$DataSourcePool.(JdbcUtils.java:210) at org.apache.calcite.adapter.jdbc.JdbcSchema.dataSource(JdbcSchema.java:207) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3331) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5324) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1815) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.plan(CalcitePlanner.java:1411) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:588) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13071) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:472) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:312) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:201) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:650) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:596) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:590) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:867) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:178) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:173) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccesso
[jira] [Created] (HIVE-26441) Add DatabaseAccessor unit tests for all methods and supported DBMS
Stamatis Zampetakis created HIVE-26441: -- Summary: Add DatabaseAccessor unit tests for all methods and supported DBMS Key: HIVE-26441 URL: https://issues.apache.org/jira/browse/HIVE-26441 Project: Hive Issue Type: Test Components: JDBC storage handler Reporter: Stamatis Zampetakis The [DatabaseAccessor|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/DatabaseAccessor.java] interface provides various APIs and has multiple concrete implementations one for each supported DBMS. There are a few end-to-end tests for JDBC storage handler (see [relevant|https://github.com/search?q=repo%3Aapache%2Fhive+filename%3A*jdbc*.q+extension%3Aq+filename%3A*jdbc*&type=Code] qfiles) and also a few unit tests ([TestGenericJdbcDatabaseAccessor|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/jdbc-handler/src/test/java/org/apache/hive/storage/jdbc/dao/TestGenericJdbcDatabaseAccessor.java]) but we do not have enough coverage. Ideally we should have unit tests for each method present in the top level interface and for each supported DBMS. The goal of this JIRA is to add more unit tests, similar to what {{TestGenericJdbcDatabaseAccessor}} is doing, covering more methods, use-cases, and DBMS. The scope of this JIRA can get quite big so it makes sense to create additional sub-tasks for addressing specific cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26440) Duplicate hive-standalone-metastore-server dependency in QFile module
Stamatis Zampetakis created HIVE-26440: -- Summary: Duplicate hive-standalone-metastore-server dependency in QFile module Key: HIVE-26440 URL: https://issues.apache.org/jira/browse/HIVE-26440 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The hive-standalone-metastore-server dependency is defined two times in the QFile module ([pom.xml|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/itests/qtest/pom.xml#L67]) leading to the following warning. {noformat} [INFO] Scanning for projects... [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-it-qfile:jar:4.0.0-alpha-2-SNAPSHOT [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hive:hive-standalone-metastore-server:jar:tests -> duplicate declaration of version (?) @ line 67, column 17 [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support building such malformed projects. [WARNING] [INFO] [INFO] ---< org.apache.hive:hive-it-qfile > [INFO] Building Hive Integration - QFile Tests 4.0.0-alpha-2-SNAPSHOT [INFO] [ jar ]- {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26427) Unify JoinDeriveIsNotNullFilterRule with HiveJoinAddNotNullRule
Stamatis Zampetakis created HIVE-26427: -- Summary: Unify JoinDeriveIsNotNullFilterRule with HiveJoinAddNotNullRule Key: HIVE-26427 URL: https://issues.apache.org/jira/browse/HIVE-26427 Project: Hive Issue Type: Improvement Components: CBO Reporter: Stamatis Zampetakis [JoinDeriveIsNotNullFilterRule|https://github.com/apache/calcite/blob/9bdd26159110663c4a207e3e8c378d1c3d16e034/core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveIsNotNullFilterRule.java] has been introduced recently in Calcite as part of CALCITE-3890. The rule has similar goals with HiveJoinAddNotNullRule (that exists in Hive since HIVE-9581) so ideally (and in order to avoid maintaining the code twice) we should use the one provided by Calcite if possible. At this stage the rules are not identical so we cannot replace one with the other immediately but hopefully we can work together with the Calcite community to reuse common parts so both project can benefit from each other. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26404) HMS memory leak when compaction cleaner fails to remove obsolete files
Stamatis Zampetakis created HIVE-26404: -- Summary: HMS memory leak when compaction cleaner fails to remove obsolete files Key: HIVE-26404 URL: https://issues.apache.org/jira/browse/HIVE-26404 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis While investigating an issue where HMS becomes unresponsive we noticed a lot of failed attempts from the compaction Cleaner thread to remove obsolete directories with exceptions similar to the one below. {noformat} 2022-06-16 05:48:24,819 ERROR org.apache.hadoop.hive.ql.txn.compactor.Cleaner: [Cleaner-executor-thread-0]: Caught exception when cleaning, unable to complete cleaning of id:4410976,dbname:my_database,tableName:my_table,partName:day=20220502,state:,type:MAJOR,enqueueTime:0,start:0,properties:null,runAs:some_user,tooManyAborts:false,hasOldAbort:false,highestWriteId:187502,errorMessage:null java.io.IOException: Not enough history available for (187502,x). Oldest available base: hdfs://nameservice1/warehouse/tablespace/managed/hive/my_database.db/my_table/day=20220502/base_0188687_v4297872 at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1432) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.removeFiles(Cleaner.java:261) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.access$000(Cleaner.java:71) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner$1.run(Cleaner.java:203) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.clean(Cleaner.java:200) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.lambda$run$0(Cleaner.java:105) at org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil$ThrowingRunnable.lambda$unchecked$0(CompactorUtil.java:54) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} In addition the logs contained a large number of long JVM pauses as shown below and the HMS (RSZ) memory kept increasing at rate of 90MB per hour. {noformat} 2022-06-16 16:17:17,805 WARN org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 34346ms 2022-06-16 16:17:21,497 INFO org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 1690ms 2022-06-16 16:17:57,696 WARN org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 34697ms 2022-06-16 16:18:01,326 INFO org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 1628ms 2022-06-16 16:18:37,280 WARN org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 34453ms 2022-06-16 16:18:40,927 INFO org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 1646ms 2022-06-16 16:19:16,929 WARN org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 33997ms 2022-06-16 16:19:20,572 INFO org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 1637ms 2022-06-16 16:20:01,643 WARN org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: Detected pause in JVM or host machine (eg GC): pause of approximately 39329ms 2022-06-16 16:20:05,572 INFO org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: [org.apache.hadoop.h
[jira] [Created] (HIVE-26389) ALTER TABLE CASCADE is slow for tables with many partitions
Stamatis Zampetakis created HIVE-26389: -- Summary: ALTER TABLE CASCADE is slow for tables with many partitions Key: HIVE-26389 URL: https://issues.apache.org/jira/browse/HIVE-26389 Project: Hive Issue Type: Improvement Components: Metastore, Query Planning Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: native_sql_queries.txt, per_partition_sql_queries.txt Consider the following simplified scenario with a table having two partitions. {code:sql} CREATE TABLE student (fname string, lname string) PARTITIONED BY (department string); INSERT INTO student VALUES ('Alex','Dumas', 'Computer Science'); INSERT INTO student VALUES ('Victor','Hugo', 'Physics'); {code} Altering a column of this table and propagating the changes to the partitions (using the CASCADE) syntax is slow. {code:sql} ALTER TABLE student CHANGE lname lastname STRING CASCADE; {code} The seemingly simple ALTER statement outlined above triggers roughly 136 SQL queries in the underlying DBMS of the metastore (see native_sql_queries.txt). We can observe that some of these queries are recurring and appear as many times as there are partitions in the table (see per_partition_sql_queries.txt). As the number of partitions grows so does the number of queries so if we manage to reduce the number of queries send per partition or make them more efficient this will have a positive impact on performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26350) IndexOutOfBoundsException when generating splits for external JDBC table with partition columns
Stamatis Zampetakis created HIVE-26350: -- Summary: IndexOutOfBoundsException when generating splits for external JDBC table with partition columns Key: HIVE-26350 URL: https://issues.apache.org/jira/browse/HIVE-26350 Project: Hive Issue Type: Bug Components: CBO, JDBC storage handler Reporter: Stamatis Zampetakis Create the following table in some JDBC database (e.g., Postgres). {code:sql} CREATE TABLE country ( id int, name varchar(20) ); {code} Create the following tables in Hive ensuring that the external JDBC table has the {{hive.sql.partitionColumn}} table property set. {code:sql} CREATE TABLE city (id int); CREATE EXTERNAL TABLE country ( id int, name varchar(20) ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver" = "org.postgresql.Driver", "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB", "hive.sql.dbcp.username" = "qtestuser", "hive.sql.dbcp.password" = "qtestpassword", "hive.sql.table" = "country", "hive.sql.partitionColumn" = "name", "hive.sql.numPartitions" = "2" ); {code} The query below fails with IndexOutOfBoundsException when the mapper scanning the JDBC table tries to generate the splits by exploiting the partitioning column. {code:sql} select country.id from country cross join city; {code} The full stack trace is given below. {noformat} java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261] at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261] at org.apache.hive.storage.jdbc.JdbcInputFormat.getSplits(JdbcInputFormat.java:102) [hive-jdbc-handler-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:564) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:858) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:263) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281) [tez-dag-0.10.1.jar:0.10.1] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) [hadoop-common-3.1.0.jar:?] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) [tez-dag-0.10.1.jar:0.10.1] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) [guava-19.0.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) [guava-19.0.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) [guava-19.0.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261] {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26349) TestOperatorCmp/TestReOptimization fail silently due to incompatible configuration
Stamatis Zampetakis created HIVE-26349: -- Summary: TestOperatorCmp/TestReOptimization fail silently due to incompatible configuration Key: HIVE-26349 URL: https://issues.apache.org/jira/browse/HIVE-26349 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Running TestOperatorCmp, TestReOptimization currently in master (https://github.com/apache/hive/commit/10e5381cb6a4215c0b25fe0cda0a26a084ba6a89) shows BUILD SUCCESS although the tests are actually failing when executing the {{@BeforeClass}} logic. Since the error appears inside {{@BeforeClass}} the failure remains unnoticed and the only indication that something is wrong is given by the INFO line below: {noformat} [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 {noformat} +Steps to reproduce:+ {code:bash} mvn test -Dtest=TestOperatorCmp mvn test -Dtest=TestReOptimization {code} {noformat} [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-exec --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.732 s - in org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp [INFO] [INFO] Results: [INFO] [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 18.962 s [INFO] Finished at: 2022-06-22T12:49:54+02:00 [INFO] {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26343) TestWebHCatE2e causes surefire fork to exit and fails
Stamatis Zampetakis created HIVE-26343: -- Summary: TestWebHCatE2e causes surefire fork to exit and fails Key: HIVE-26343 URL: https://issues.apache.org/jira/browse/HIVE-26343 Project: Hive Issue Type: Bug Components: HCatalog, Testing Infrastructure Reporter: Stamatis Zampetakis Any attempt to run TestWebHCatE2e in current master ([https://github.com/apache/hive/commit/948f9fb56a00e981cd653146de44ae82307b4f2f]) causes the surefire fork to exit and the test fails. {noformat} cd hcatalog/webhcat/svr && mvn test -Dtest=TestWebHCatE2e [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project hive-webhcat: There are test failures. [ERROR] [ERROR] Please refer to /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr && /opt/jdks/jdk1.8.0_261/jre/bin/java -Xmx2048m -jar /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire/surefirebooter4564605288390864592.jar /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire 2022-06-20T16-29-05_858-jvmRun1 surefire4795088574293215609tmp surefire_01535173811171404671tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr && /opt/jdks/jdk1.8.0_261/jre/bin/java -Xmx2048m -jar /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire/surefirebooter4564605288390864592.jar /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire 2022-06-20T16-29-05_858-jvmRun1 surefire4795088574293215609tmp surefire_01535173811171404671tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:513) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:460) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:301) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:249) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:193) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) [ERROR]
[jira] [Created] (HIVE-26332) Upgrade maven-surefire-plugin to 3.0.0-M7
Stamatis Zampetakis created HIVE-26332: -- Summary: Upgrade maven-surefire-plugin to 3.0.0-M7 Key: HIVE-26332 URL: https://issues.apache.org/jira/browse/HIVE-26332 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Currently we use 3.0.0-M4 which was released in 2019. Since there have been multiple bug fixes and improvements: [https://issues.apache.org/jira/issues/?jql=project%20%3D%20SUREFIRE%20AND%20(fixVersion%20%3D%203.0.0-M5%20OR%20fixVersion%20%3D%203.0.0-M6%20OR%20fixVersion%20%3D%203.0.0-M7)%20ORDER%20BY%20resolutiondate%20%20DESC%2C%20key] Worth mentioning that interaction with JUnit5 is much more mature as well and this is one of the main reasons driving this upgrade. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26331) Use maven-surefire-plugin version consistently in standalone-metastore modules
Stamatis Zampetakis created HIVE-26331: -- Summary: Use maven-surefire-plugin version consistently in standalone-metastore modules Key: HIVE-26331 URL: https://issues.apache.org/jira/browse/HIVE-26331 Project: Hive Issue Type: Task Components: Standalone Metastore, Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Due to some problems in the pom.xml files inside the standalone-metastore modules we end up using different maven-surefire-plugin versions. Most of the modules use 3.0.0-M4, which is the expected one, while the {{hive-standalone-metastore-common}} uses the older 2.22.0 version. +Actual+ {noformat} [INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @ hive-standalone-metastore-common --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-standalone-metastore-server --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ metastore-tools-common --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore-benchmarks --- {noformat} The goal of this JIRA is to ensure we use the same version consistently in all modules. +Expected+ {noformat} [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-standalone-metastore-common --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-standalone-metastore-server --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ metastore-tools-common --- [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore-benchmarks --- {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26312) Use default digest normalization strategy in CBO
Stamatis Zampetakis created HIVE-26312: -- Summary: Use default digest normalization strategy in CBO Key: HIVE-26312 URL: https://issues.apache.org/jira/browse/HIVE-26312 Project: Hive Issue Type: Task Components: CBO Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis CALCITE-2450 introduced a way to improve planning time by normalizing some query expressions (RexNodes). The behavior can be enabled/disabled via the following system property: calcite.enable.rexnode.digest.normalize There was an attempt to disable the normalization explicitly in HIVE-23456 to avoid rendering HiveFilterSortPredicates rule useless. However, the [way the normalization is disabled now|https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L549], dependents on the way classes are loaded. If for some reason CalciteSystemProperty is loaded before hitting the respective line in Hive.java setting the property will not have any effect. After HIVE-26238 the behavior of the rule is not dependent in the value of the property so there is nothing holding us back from enabling the normalization. At the moment there is not strong reason to enable or disable the normalization explicitly so it is better to rely on the default value provided by Calcite to avoid running with different normalization strategy when the class loading order changes. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26310) Remove unused junit runners from test-utils module
Stamatis Zampetakis created HIVE-26310: -- Summary: Remove unused junit runners from test-utils module Key: HIVE-26310 URL: https://issues.apache.org/jira/browse/HIVE-26310 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The two classes under https://github.com/apache/hive/tree/master/testutils/src/java/org/apache/hive/testutils/junit/runners namely: * [ConcurrentTestRunner|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/ConcurrentTestRunner.java] * [ConcurrentScheduler|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/model/ConcurrentScheduler.java] have been introduced a long time ago by HIVE-2935 to somewhat parallelize execution for {{TestBeeLineDriver}}. However, since HIVE-1 (resolved 6 years ago) they are not used by anyone and unlikely to be used again in the future since there are much more modern alternatives. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26309) Remove Log4jConfig junit extension in favor LoggerContextSource
Stamatis Zampetakis created HIVE-26309: -- Summary: Remove Log4jConfig junit extension in favor LoggerContextSource Key: HIVE-26309 URL: https://issues.apache.org/jira/browse/HIVE-26309 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The Log4JConfig JUnit extension was introduced by HIVE-24588 in order to facilitate running tests with a specific log4j2 configuration. However, there is a very similar and seemingly more powerful JUnit extension in the official LOG4J2 release/repo, i.e., [LoggerContextSource|https://github.com/apache/logging-log4j2/blob/eedc3cdb6be6744071f8ae6dcfb37b26b1fc0940/log4j-core/src/test/java/org/apache/logging/log4j/junit/LoggerContextSource.java]. The goal of this JIRA is to remove code related to Log4jConfig from Hive repo and replace its usages with LoggerContextSource. By doing this we reduce the maintenance overhead for the Hive community and reduce the dependencies to log4j-core. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26296) RuntimeException when executing EXPLAIN CBO JOINCOST on query with JDBC tables
Stamatis Zampetakis created HIVE-26296: -- Summary: RuntimeException when executing EXPLAIN CBO JOINCOST on query with JDBC tables Key: HIVE-26296 URL: https://issues.apache.org/jira/browse/HIVE-26296 Project: Hive Issue Type: Bug Components: CBO, HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Consider a JDBC database with two tables _author_ and _book_. {code:sql} CREATE EXTERNAL TABLE author ( id int, fname varchar(20), lname varchar(20) ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "MYSQL", "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver", ... "hive.sql.table" = "author" ); CREATE EXTERNAL TABLE book ( id int, title varchar(100), author int ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "MYSQL", "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver", ... "hive.sql.table" = "book" ); {code} Executing an {{EXPLAIN CBO JOINCOST}} with a query joining two JDBC tables fails with {{RuntimeException}} while trying to compute the selectivity of the join. {code:sql} EXPLAIN CBO JOINCOST SELECT a.lname, b.title FROM author a JOIN book b ON a.id=b.author; {code} +Stacktrace+ {noformat} java.lang.RuntimeException: Unexpected Join type: org.apache.calcite.adapter.jdbc.JdbcRules$JdbcJoin at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdSelectivity.computeInnerJoinSelectivity(HiveRelMdSelectivity.java:156) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdSelectivity.getSelectivity(HiveRelMdSelectivity.java:68) at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source) at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426) at org.apache.calcite.rel.metadata.RelMdUtil.getJoinRowCount(RelMdUtil.java:736) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:195) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:140) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:191) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) at org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:100) at org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:648) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12699) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:495) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:447) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:412) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:406) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200) at org.apache.hadoop
[jira] [Created] (HIVE-26290) Remove useless calls to DateTimeFormatter#withZone without assignment
Stamatis Zampetakis created HIVE-26290: -- Summary: Remove useless calls to DateTimeFormatter#withZone without assignment Key: HIVE-26290 URL: https://issues.apache.org/jira/browse/HIVE-26290 Project: Hive Issue Type: Task Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis There are some places in the code calling \{{DateTimeFormatter#withZone}} without assigning the result anywhere. This basically makes the call useless since the method does not modify the formatter instance but always creates a new one. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26289) Remove useless try catch in DataWritableReadSupport#getWriterDateProleptic
Stamatis Zampetakis created HIVE-26289: -- Summary: Remove useless try catch in DataWritableReadSupport#getWriterDateProleptic Key: HIVE-26289 URL: https://issues.apache.org/jira/browse/HIVE-26289 Project: Hive Issue Type: Task Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis {code:java} try { if (value != null) { return Boolean.valueOf(value); } } catch (DateTimeException e) { throw new RuntimeException("Can't parse writer proleptic property stored in file metadata", e); } {code} The Boolean.valueOf never throws so try catch block is completely useless. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26281) Missing statistics when requesting partition by names via HS2
Stamatis Zampetakis created HIVE-26281: -- Summary: Missing statistics when requesting partition by names via HS2 Key: HIVE-26281 URL: https://issues.apache.org/jira/browse/HIVE-26281 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] method can be used to obtain partition objects from the metastore by specifying their names and other options. {code:java} public List getPartitionsByNames(Table tbl, List partNames, boolean getColStats){code} However, the partition statistics are missing from the returned objects no matter the value of the {{getColStats}} parameter. The problem is [here|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4174] and was caused by HIVE-24743. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
Stamatis Zampetakis created HIVE-26279: -- Summary: Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker Key: HIVE-26279 URL: https://issues.apache.org/jira/browse/HIVE-26279 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a request but not really using them so it is basically dead code that can be removed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
Stamatis Zampetakis created HIVE-26278: -- Summary: Add unit tests for Hive#getPartitionsByNames using batching Key: HIVE-26278 URL: https://issues.apache.org/jira/browse/HIVE-26278 Project: Hive Issue Type: Task Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] supports decomposing requests in batches but there are no unit tests checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
Stamatis Zampetakis created HIVE-26270: -- Summary: Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader Key: HIVE-26270 URL: https://issues.apache.org/jira/browse/HIVE-26270 Project: Hive Issue Type: Bug Components: HiveServer2, Parquet Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific. {code:sql} CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'), (2, '1884-01-01 00:00:00'), (3, '1990-01-01 00:00:00'); {code} Parquet files read with Hive 4.0.0-apha-1 onwards. +Without vectorization+ results are correct. {code:sql} SELECT * FROM employee; {code} {noformat} 1 1880-01-01 00:00:00 2 1884-01-01 00:00:00 3 1990-01-01 00:00:00 {noformat} +With vectorization+ some timestamps are shifted. {code:sql} -- Disable fetch task conversion to force vectorization kick in set hive.fetch.task.conversion=none; SELECT * FROM employee; {code} {noformat} 1 1879-12-31 23:52:58 2 1884-01-01 00:00:00 3 1990-01-01 00:00:00 {noformat} The problem is the same reported under HIVE-24074. The data were written using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they were read using the old APIs (java.sql). The difference with HIVE-24074 is that here the problem appears only for vectorized execution while the non-vectorized reader is working fine so there is some *inconsistency in the behavior* of vectorized and non vectorized readers. Non-vectorized reader works fine cause it derives automatically that it should use the new JDK APIs to read back the timestamp value. This is possible in this case cause there are metadata information in the file (i.e., the presence of {{{}writer.time.zone{}}}) from where it can infer that the timestamps were written using the new Date/Time APIs. The inconsistent behavior between vectorized and non-vectorized reader is a regression caused by HIVE-25104. This JIRA is an attempt to re-align the behavior between vectorized and non-vectorized readers. Note that if the file metadata are empty both vectorized and non-vectorized reader cannot determine which APIs to use for the conversion and in this case it is necessary the user to set the {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back the correct results. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26238) Decouple sort filter predicates optimization from digest normalization in CBO
Stamatis Zampetakis created HIVE-26238: -- Summary: Decouple sort filter predicates optimization from digest normalization in CBO Key: HIVE-26238 URL: https://issues.apache.org/jira/browse/HIVE-26238 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis HIVE-21857 introduced an optimization for ordering predicates inside a filter based on a cost function. After HIVE-23456, this optimization can run only if the the digest normalization (introduced in CALCITE-2450) in CBO is disabled (via {{calcite.enable.rexnode.digest.normalize}}). The goal of this issue is to decouple the sort predicate optimization from digest normalization. After the changes here the optimization shouldn't be affected by the value of {{calcite.enable.rexnode.digest.normalize}} property. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26168) EXPLAIN DDL command output is not deterministic
Stamatis Zampetakis created HIVE-26168: -- Summary: EXPLAIN DDL command output is not deterministic Key: HIVE-26168 URL: https://issues.apache.org/jira/browse/HIVE-26168 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a given query in order to debug planner issues. This is achieved by fetching information from the metastore and outputting series of DDL commands. The output commands though may appear in different order among runs since there is no mechanism to enforce an explicit order. Consider for instance the following scenario. {code:sql} CREATE TABLE customer ( `c_custkey` bigint, `c_name`string, `c_address` string ); INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24 avenue Mansart'); EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob'; {code} +Result 1+ {noformat} ALTER TABLE default.customer UPDATE STATISTICS SET('numRows'='2','rawDataSize'='48' ); ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== {noformat} +Result 2+ {noformat} ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED ALTER TABLE default.customer UPDATE STATISTICS SET('numRows'='2','rawDataSize'='48' ); ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' ); -- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== {noformat} The two results are equivalent but the statements appear in a different order. This is not a big issue cause the results remain correct but it may lead to test flakiness so it might be worth addressing. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26166) Make website GDPR compliant
Stamatis Zampetakis created HIVE-26166: -- Summary: Make website GDPR compliant Key: HIVE-26166 URL: https://issues.apache.org/jira/browse/HIVE-26166 Project: Hive Issue Type: Task Components: Website Reporter: Stamatis Zampetakis Per the email that was sent out from privacy we need to make the Hive website GDPR compliant. # The link to privacy policy needs to be updated from [https://hive.apache.org/privacy_policy.html] to [https://privacy.apache.org/policies/privacy-policy-public.html] # The google analytics service must be removed -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests
Stamatis Zampetakis created HIVE-26126: -- Summary: Allow capturing/validating SQL generated from HMS calls in qtests Key: HIVE-26126 URL: https://issues.apache.org/jira/browse/HIVE-26126 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis During the compilation/execution of a Hive command there are usually calls in the HiveMetastore (HMS). Most of the time these calls need to connect to the underlying database backend in order to return the requested information so they trigger the generation and execution of SQL queries. We have a lot of code in Hive which affects the generation and execution of these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and {{CachedStore}} classes. [MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java] is responsible for building explicitly SQL queries for performance reasons. [CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java] is responsible for caching certain requests to avoid going to the database on every call. Ensuring that the generated SQL is the expected one and/or that certain queries are hitting (or not) the DB is valuable for catching regressions or evaluating the effectiveness of caches. The idea is that for each Hive command/query in some qtest there is an option to include in the output (.q.out) the list of SQL queries that were generated by HMS calls. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
Stamatis Zampetakis created HIVE-26095: -- Summary: Add queryid in QueryLifeTimeHookContext Key: HIVE-26095 URL: https://issues.apache.org/jira/browse/HIVE-26095 Project: Hive Issue Type: New Feature Components: Hooks Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Fix For: 4.0.0-alpha-2 A [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] is executed various times in the life-cycle of a query but it is not always possible to obtain the id of the query. The query id is inside the {{HookContext}} but the latter is not always available notably during compilation. The query id is useful for many purposes as it is the only way to uniquely identify the query/command that is currently running. It is also the only way to match together events appearing in before and after methods. The goal of this jira is to add the query id in [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26022) Error: ORA-00904 when initializing metastore schema in Oracle
Stamatis Zampetakis created HIVE-26022: -- Summary: Error: ORA-00904 when initializing metastore schema in Oracle Key: HIVE-26022 URL: https://issues.apache.org/jira/browse/HIVE-26022 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Stamatis Zampetakis Fix For: 4.0.0-alpha-1 The Metastore schema tool fails to create the database schema when the underlying backend is Oracle. The initialization scripts fails while creating the "REPLICATION_METRICS" table: {noformat} 338/362 --Create table replication metrics 339/362 CREATE TABLE "REPLICATION_METRICS" ( "RM_SCHEDULED_EXECUTION_ID" number PRIMARY KEY, "RM_POLICY" varchar2(256) NOT NULL, "RM_DUMP_EXECUTION_ID" number NOT NULL, "RM_METADATA" varchar2(4000), "RM_PROGRESS" varchar2(4000), "RM_START_TIME" integer NOT NULL, "MESSAGE_FORMAT" VARCHAR(16) DEFAULT 'json-0.2', ); Error: ORA-00904: : invalid identifier (state=42000,code=904) {noformat} The problem can be reproduced by running the {{ITestOracle}}. {noformat} mvn -pl standalone-metastore/metastore-server verify -DskipITests=false -Dit.test=ITestOracle -Dtest=nosuch {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26021) Change integration tests under DBInstallBase to regular unit tests
Stamatis Zampetakis created HIVE-26021: -- Summary: Change integration tests under DBInstallBase to regular unit tests Key: HIVE-26021 URL: https://issues.apache.org/jira/browse/HIVE-26021 Project: Hive Issue Type: Improvement Components: Tests Reporter: Stamatis Zampetakis After HIVE-18588, some tests including those under [DBInstallBase|https://github.com/apache/hive/blob/1139c4b14db82a9e2316196819b35cfb713f34b5/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/DbInstallBase.java] class have been marked as integration tests mainly to keep the test duration low. Nowadays, Hive developers rarely run all tests locally so separating between integration tests and unit tests does not provide a clear benefit. The separation adds maintenance cost and makes their execution more difficult scaring people away. The goal of this issue is to change the tests under {{DBInstallBase}} from "integration" tests back to regular unit tests and run them as part of the standard maven test phase without any fancy arguments. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26020) Set dependency scope for json-path, commons-compiler and janino to runtime
Stamatis Zampetakis created HIVE-26020: -- Summary: Set dependency scope for json-path, commons-compiler and janino to runtime Key: HIVE-26020 URL: https://issues.apache.org/jira/browse/HIVE-26020 Project: Hive Issue Type: Improvement Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The dependencies are necessary only when running Hive. They are not required during compilation since Hive does not depend on them directly but transitively through Calcite. Changing the scope to runtime makes the intention clear and guards against accidental usages in Hive. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
Stamatis Zampetakis created HIVE-26019: -- Summary: Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0 Key: HIVE-26019 URL: https://issues.apache.org/jira/browse/HIVE-26019 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26014) Remove redundant HushableRandomAccessFileAppender
Stamatis Zampetakis created HIVE-26014: -- Summary: Remove redundant HushableRandomAccessFileAppender Key: HIVE-26014 URL: https://issues.apache.org/jira/browse/HIVE-26014 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis [HushableRandomAccessFileAppender|https://github.com/apache/hive/blob/d3cd596aa15ebedd58f99628d43a03eb2f5f3909/ql/src/java/org/apache/hadoop/hive/ql/log/HushableRandomAccessFileAppender.java] was introduced by HIVE-17826 to avoid exceptions originating from attempts to write to a closed appender. After the changes in HIVE-24590, the life-cycle (opening/closing/deleting) of appenders is managed by the Log4j framework and not explicitly by Hive as it used to be before. With HIVE-24590 in place, it is no longer possible to have the exception in HIVE-17826 cause appenders are opened and closed when necessary. Due to the above, the {{HushableRandomAccessFileAppender}} is completely redundant and can be removed in favor of the {{RandomAccessFileAppender}} already provided by the Log4j framework. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26005) Run selected qtest on different metastore backends
Stamatis Zampetakis created HIVE-26005: -- Summary: Run selected qtest on different metastore backends Key: HIVE-26005 URL: https://issues.apache.org/jira/browse/HIVE-26005 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Stamatis Zampetakis In various cases there are bugs which affect only certain types of metastore databases (e.g., HIVE-26000) and it would be nice to be able to specify for each test or a bunch of tests which metastore backend to use and have these tests consistently running in CI. After HIVE-21954, it is possible to run qtests on different metastores by setting the system property {{test.metastore.db}} or introducing new [AbstractCliConfig|https://github.com/apache/hive/blob/fcd0a47c2e27defb04247ffca6da11734e3e25c3/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCliConfig.java] configuration with a new driver etc. The naive way of implementing this task would be to copy an existing configuration, change the metastore type, select the input files, and create a new driver (probably again a copy from {{CoreCliDriver}}. Other ideas would be to allow a driver to run with multiple configurations, or handle the selection of the metastore type via QT options (similar to what was done in HIVE-25594). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25995) Build from source distribution archive fails
Stamatis Zampetakis created HIVE-25995: -- Summary: Build from source distribution archive fails Key: HIVE-25995 URL: https://issues.apache.org/jira/browse/HIVE-25995 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Stamatis Zampetakis The source distribution archive, apache-hive-4.0.0-SNAPSHOT-src.tar.gz, can be produced by running: {code:bash} mvn clean package -DskipTests -Pdist {code} The file is generated under: {noformat} packaging/target/apache-hive-4.0.0-SNAPSHOT-src.tar.gz {noformat} The source distribution archive/package [should|https://www.apache.org/legal/release-policy.html#source-packages] allow anyone who downloads it to build and test Hive. At the moment, on commit [b63dab11d229abac59a4ef5e141d8d9b28037c8b|https://github.com/apache/hive/commit/b63dab11d229abac59a4ef5e141d8d9b28037c8b], if someone produces the source package and extracts the contents of the archive, it is not possible to build Hive. Both {{mvn install}} and {{mvn package}} commands fail when they are executed inside the directory extracted from the archive. {noformat} mvn clean install -DskipTests mvn clean package -DskipTests {noformat} The error is shown below: {noformat} [INFO] Scanning for projects... [ERROR] [ERROR] Some problems were encountered while processing the POMs: [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist @ [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist @ [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist @ @ [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive:4.0.0-SNAPSHOT (/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml) has 3 errors [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist [ERROR] Child module /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25970) Missing messages in HS2 operation logs
Stamatis Zampetakis created HIVE-25970: -- Summary: Missing messages in HS2 operation logs Key: HIVE-25970 URL: https://issues.apache.org/jira/browse/HIVE-25970 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation log messages can get lost and never appear in the appropriate files. The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} from being created if the latter refers to a file that has been closed in the last second. Preventing the creation of the appender also means that the message which triggered the creation will be lost forever. In fact any message (for the same query) that comes in the interval of 1 second will be lost forever. Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) and thus the problem may be very hard to notice in practice. However, with the arrival of HIVE-24590 appenders may close much more frequently (and not via HS2) making the issue reproducible rather easily. It suffices to set _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and check the operation logs. The problem was discovered by investigating some intermittent failures in operation logging tests (e.g., TestOperationLoggingAPIWithTez). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25965) SQLDataException when obtaining partitions from HMS via direct SQL over Derby
Stamatis Zampetakis created HIVE-25965: -- Summary: SQLDataException when obtaining partitions from HMS via direct SQL over Derby Key: HIVE-25965 URL: https://issues.apache.org/jira/browse/HIVE-25965 Project: Hive Issue Type: Bug Components: Metastore Reporter: Stamatis Zampetakis In certain cases fetching the partition information from the metastore using direct SQL fails with the stack trace below. {noformat} javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = ? and (((case when "FILTER0"."PART_KEY_VAL" <> ? and "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))". at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:542) ~[datanucleus-api-jdo-5.2.4.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:456) ~[datanucleus-api-jdo-5.2.4.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:318) ~[datanucleus-api-jdo-5.2.4.jar:?] at org.apache.hadoop.hive.metastore.QueryWrapper.executeWithArray(QueryWrapper.java:137) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.executeWithArray(MetastoreDirectSqlUtils.java:69) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2156) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionIdsViaSqlFilter(MetaStoreDirectSql.java:894) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:663) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3962) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3953) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4269) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3989) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at com.sun.proxy.$Proxy60.getPartitionsByExpr(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7346) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at com.sun.proxy.$
[jira] [Created] (HIVE-25947) Compactor job queue cannot be set per table via compactor.mapred.job.queue.name
Stamatis Zampetakis created HIVE-25947: -- Summary: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name Key: HIVE-25947 URL: https://issues.apache.org/jira/browse/HIVE-25947 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Before HIVE-20723 it was possible to schedule the compaction for each table on specific job queues by putting {{compactor.mapred.job.queue.name}} in the table properties. {code:sql} CREATE TABLE person (name STRING, age INT) STORED AS ORC TBLPROPERTIES( 'transactional'='true', 'compactor.mapred.job.queue.name'='root.user2); ALTER TABLE person COMPACT 'major' WITH OVERWRITE TBLPROPERTIES('compactor.mapred.job.queue.name'='root.user2') {code} This is no longer possible (after HIVE-20723) and in order to achieve the same effect someone needs to use the {{compactor.hive.compactor.job.queue}}. {code:sql} CREATE TABLE person (name STRING, age INT) STORED AS ORC TBLPROPERTIES( 'transactional'='true', 'compactor.hive.compactor.job.queue'='root.user2); ALTER TABLE person COMPACT 'major' WITH OVERWRITE TBLPROPERTIES('compactor.hive.compactor.job.queue'='root.user2') {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25945) Upgrade H2 database version to 2.1.210
Stamatis Zampetakis created HIVE-25945: -- Summary: Upgrade H2 database version to 2.1.210 Key: HIVE-25945 URL: https://issues.apache.org/jira/browse/HIVE-25945 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The 1.3.166 version, which is in use in Hive, suffers from the following security vulnerabilities: https://nvd.nist.gov/vuln/detail/CVE-2021-42392 https://nvd.nist.gov/vuln/detail/CVE-2022-23221 In the project, we use H2 only for testing purposes (inside the jdbc-handler module) thus the H2 binaries are not present in the runtime classpath thus these CVEs do not pose a problem for Hive or its users. Nevertheless, it would be good to upgrade to a more recent version to avoid Hive coming up in vulnerability scans due to this. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25939) Support filter pushdown in HBaseStorageHandler for simple expressions with boolean columns
Stamatis Zampetakis created HIVE-25939: -- Summary: Support filter pushdown in HBaseStorageHandler for simple expressions with boolean columns Key: HIVE-25939 URL: https://issues.apache.org/jira/browse/HIVE-25939 Project: Hive Issue Type: Improvement Reporter: Stamatis Zampetakis In current master (commit [4b7a948e45fd88372fef573be321cda40d189cc7|https://github.com/apache/hive/commit/4b7a948e45fd88372fef573be321cda40d189cc7]), the HBaseStorageHandler is able to push many simple comparison predicates into the underlying engine but fails do so for some simple predicates with boolean columns. The goal of this issue is to support filter pushdown in HBaseStorageHandler for the following queries. {code:sql} CREATE TABLE hbase_table(row_key string, c1 boolean, c2 boolean) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:c1,cf:c2" ); explain select * from hbase_table where c1; explain select * from hbase_table where not c1; explain select * from hbase_table where c1 = true; explain select * from hbase_table where c1 = false; explain select * from hbase_table where c1 IS TRUE; explain select * from hbase_table where c1 IS FALSE; {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2
Stamatis Zampetakis created HIVE-25936: -- Summary: ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2 Key: HIVE-25936 URL: https://issues.apache.org/jira/browse/HIVE-25936 Project: Hive Issue Type: Sub-task Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis According to HIVE-24743 the table id and {{ValidWriteIdList}} are important for keeping HMS remote metadata cache consistent. Although HIVE-24743 attempted to pass the write id list and table id in every call to HMS it failed to do so completely. For those partitions not handled in the batch logic, the [metastore call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161] in {{Hive#getPartitionsByName}} method does not pass the table id and write id list. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs
Stamatis Zampetakis created HIVE-25935: -- Summary: Cleanup IMetaStoreClient#getPartitionsByNames APIs Key: HIVE-25935 URL: https://issues.apache.org/jira/browse/HIVE-25935 Project: Hive Issue Type: Task Components: Metastore Reporter: Stamatis Zampetakis Currently the [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java] interface has 8 variants of the {{getPartitionsByNames}} method. Going quickly over the concrete implementation it appears that not all of them are useful/necessary so a bit of cleanup is needed. Below a few potential problems I observed: * Some of the APIs are not used anywhere in the project (neither by production nor by test code). * Some of the APIs are deprecated in some concrete implementations but not globally at the interface level without an explanation why. * Some of the implementations simply throw without doing anything. * Many of the APIs are partially tested or not tested at all. HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the aforementioned APIs. It would be good to review the aforementioned APIs and decide what needs to stay and what needs to go as well as complete necessary when relevant. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT
Stamatis Zampetakis created HIVE-25856: -- Summary: Intermittent null ordering in plans of queries with GROUP BY and LIMIT Key: HIVE-25856 URL: https://issues.apache.org/jira/browse/HIVE-25856 Project: Hive Issue Type: Bug Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis {code:sql} CREATE TABLE person (id INTEGER, country STRING); EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5; {code} The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) from one execution to another. {noformat} CBO PLAN: HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5]) HiveProject(country=[$0], $f1=[$1]) HiveAggregate(group=[{1}], agg#0=[count()]) HiveTableScan(table=[[default, person]], table:alias=[person]) {noformat} {noformat} CBO PLAN: HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5]) HiveProject(country=[$0], $f1=[$1]) HiveAggregate(group=[{1}], agg#0=[count()]) HiveTableScan(table=[[default, person]], table:alias=[person]) {noformat} This is unlikely to cause wrong results cause most aggregate functions (not all) do not return nulls thus null ordering doesn't matter much but it can lead to other problems such as: * intermittent CI failures * query/plan caching I bumped into this problem after investigating test failures in CI. The following query in [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q] returns different plan when it runs individually and when it runs along with some other qtest files. {code:sql} explain select * from (select key, count(1) from src group by key order by key limit 10,20) subq join (select key, count(1) from src group by key limit 20,20) subq2 on subq.key=subq2.key limit 3,5; {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25832) Exclude Category-X JDBC drivers from binary distribution
Stamatis Zampetakis created HIVE-25832: -- Summary: Exclude Category-X JDBC drivers from binary distribution Key: HIVE-25832 URL: https://issues.apache.org/jira/browse/HIVE-25832 Project: Hive Issue Type: Task Components: distribution Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The binary distribution contains all the required elements to be able to run Hive in a cluster. It can be obtained by building from source using the following command: {code:java} mvn clean package -DskipTests -Pdist{code} The binary distribution is also published during a release along with the source code. In current master, commit 8572c1201e1d483eb03c7e413f4ff7f9b6f4a3d2, the binary distribution includes the following JDBC drivers: * derby-10.14.1.0.jar * postgresql-42.2.14.jar * ojdbc8-21.3.0.0.jar * mssql-jdbc-6.2.1.jre8.jar * mysql-connector-java-8.0.27.jar JDBC drivers are needed: * by schemaTool to initialize the database backend for the Metastore * by metastore to communicate with underlying database so if we want Hive to work out of the box we have to provide at least one. The Oracle (ojdbc8) and MySQL (mysql-connector-java) drivers must be removed cause their license is not compatible with Apache License 2 (see [category x|https://www.apache.org/legal/resolved.html#category-x]). Previous Hive releases (e.g., 3.1.2) are not affected since they only contain: * derby-10.14.1.0.jar * postgresql-9.4.1208.jre7.jar The additional drivers that appear in the binary distribution are a side effect of HIVE-25701. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25816) Log CBO plan after rule application for debugging purposes
Stamatis Zampetakis created HIVE-25816: -- Summary: Log CBO plan after rule application for debugging purposes Key: HIVE-25816 URL: https://issues.apache.org/jira/browse/HIVE-25816 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis In many cases, we want to identify which rule lead to a certain transformation in the plan or need to observe how the query plan evolves by applying some rules in order to fix some bug or find the right place to introduce another optimization step. Currently there are some logs during the application of a rule triggered by the [HepPlanner|https://github.com/apache/calcite/blob/e04f3b08dcfb6910ff4df3810772c346b25ed424/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L367] and [VolcanoPlanner|https://github.com/apache/calcite/blob/e04f3b08dcfb6910ff4df3810772c346b25ed424/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoRuleCall.java#L126] but they more or less display only the top operator of the transformation and not the whole subtree. It would help if instead of displaying only the top operator we logged the equivalent of {{EXPLAIN CBO}} on the transformed sub-tree. The change is going to be introduced soon by default in Calcite (CALCITE-4704) but till we update to that version it would help to have this functionality already in Hive. For more examples about the proposed change have a look in CALCITE-4704. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25718) ORDER BY query on external MSSQL table fails
Stamatis Zampetakis created HIVE-25718: -- Summary: ORDER BY query on external MSSQL table fails Key: HIVE-25718 URL: https://issues.apache.org/jira/browse/HIVE-25718 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis +Microsoft SQLServer+ {code:sql} CREATE TABLE country (id int, name varchar(20)); insert into country values (1, 'India'); insert into country values (2, 'Russia'); insert into country values (3, 'USA'); {code} +Hive+ {code:sql} CREATE EXTERNAL TABLE country (id int, name varchar(20)) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "MSSQL", "hive.sql.jdbc.driver" = "com.microsoft.sqlserver.jdbc.SQLServerDriver", "hive.sql.jdbc.url" = "jdbc:sqlserver://localhost:1433;", "hive.sql.dbcp.username" = "sa", "hive.sql.dbcp.password" = "Its-a-s3cret", "hive.sql.table" = "country"); SELECT * FROM country ORDER BY id; {code} The query fails with the following stacktrace: {noformat} com.microsoft.sqlserver.jdbc.SQLServerException: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1535) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:467) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:409) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:219) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:199) ~[mssql-jdbc-6.2.1.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:331) ~[mssql-jdbc-6.2.1.jre8.jar:?] at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getRecordIterator(GenericJdbcDatabaseAccessor.java:180) [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.storage.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:58) [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.storage.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:35) [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:589) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:716) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:668) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:241) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:277) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726) [hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:6
[jira] [Created] (HIVE-25717) INSERT INTO on external MariaDB/MySQL table fails silently
Stamatis Zampetakis created HIVE-25717: -- Summary: INSERT INTO on external MariaDB/MySQL table fails silently Key: HIVE-25717 URL: https://issues.apache.org/jira/browse/HIVE-25717 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis +MariaDB/MySQL+ {code:sql} CREATE TABLE country (id int, name varchar(20)); insert into country values (1, 'India'); insert into country values (2, 'Russia'); insert into country values (3, 'USA'); {code} +Hive+ {code:sql} CREATE EXTERNAL TABLE country (id int, name varchar(20)) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "MYSQL", "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver", "hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/qtestDB", "hive.sql.dbcp.username" = "root", "hive.sql.dbcp.password" = "qtestpassword", "hive.sql.table" = "country" ); INSERT INTO country VALUES (8, 'Hungary'); SELECT * FROM country; {code} +Expected results+ ||ID||NAME|| |1| India| |2| Russia| |3| USA| |8| Hungary| +Actual results+ ||ID||NAME|| |1| India| |2| Russia| |3| USA| The {{INSERT INTO}} statement finishes without showing any kind of problem in the logs but the row is not inserted in the table. Running the test it comes back green although the following exception is printed in the System.err (not in the logs). {noformat} java.sql.SQLException: Parameter metadata not available for the given statement at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) at com.mysql.cj.jdbc.MysqlParameterMetadata.checkAvailable(MysqlParameterMetadata.java:86) at com.mysql.cj.jdbc.MysqlParameterMetadata.getParameterType(MysqlParameterMetadata.java:138) at org.apache.hive.storage.jdbc.DBRecordWritable.write(DBRecordWritable.java:67) at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat$DBRecordWriter.write(DBOutputFormat.java:122) at org.apache.hive.storage.jdbc.JdbcRecordWriter.write(JdbcRecordWriter.java:47) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.ja
[jira] [Created] (HIVE-25705) Use dynamic host/post binding for dockerized databases in tests
Stamatis Zampetakis created HIVE-25705: -- Summary: Use dynamic host/post binding for dockerized databases in tests Key: HIVE-25705 URL: https://issues.apache.org/jira/browse/HIVE-25705 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Currently all dockerized databases (subclasses of [DatabaseRule|https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java], subclasses of [AbstractExternalDB.java|https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java]) are mapped statically to a specific hostname (usually localhost) and port when the container is launched; the host/port values are hardcoded in the code. This may create problems when a certain port is already taken by another process leading to errors like the one below: {noformat} Bind for 0.0.0.0:5432 failed: port is already allocated. {noformat} Similar problems can occur by assuming that every database will be accessible on localhost. This can lead to flakiness in CI and/or poor developer experience when running tests backed by Docker. The goal of this case is to allow the containers/databases bind dynamically to a random port at startup and expose the appropriate IP address & port to the tests relying on these databases. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25701) Declare JDBC drivers as runtime & optional dependencies
Stamatis Zampetakis created HIVE-25701: -- Summary: Declare JDBC drivers as runtime & optional dependencies Key: HIVE-25701 URL: https://issues.apache.org/jira/browse/HIVE-25701 Project: Hive Issue Type: Task Components: Standalone Metastore, Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Currently, we are using the following JDBC drivers in various Hive modules: * MariaDB * MySQL * Oracle * Postgres * MSSQL * Derby MariaDB, MySQL, and Oracle licenses are not compatible with Apache License 2 ([Category-X |https://www.apache.org/legal/resolved.html#category-x]) and in the past we used various ways to circumvent licensing problems (see HIVE-23284). Now, some of them appear as test scope dependency which is OKish but in the near future may lead again to licensing problems. JDBC drivers are only needed at runtime so they could all be declared at runtime scope. Moreover, Hive does not require a specific JDBC driver in order to operate so they are all optional. The goal of this issue is to declare every JDBC driver at runtime scope and mark it as optional ([ASF-optional|https://www.apache.org/legal/resolved.html#optional], [maven-optional|https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html]). This has the following advantages: * Eliminates the risk to write code which needs JDBC driver classes in order to compile and potentially violate AL2. * Unifies the declaration of JDBC drivers making easier to add/remove some if necessary. * Removes the need to use download-maven-plugin and other similar workarounds to avoid licensing problems. * Simplifies the execution of tests using these drivers since now they are added in the runtime classpath automatically by maven. * Projects with dependencies depending on Hive will not inherit any JDBC driver by default. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25684) Many (~16K) skipped tests in TestGenericUDFInitializeOnCompareUDF
Stamatis Zampetakis created HIVE-25684: -- Summary: Many (~16K) skipped tests in TestGenericUDFInitializeOnCompareUDF Key: HIVE-25684 URL: https://issues.apache.org/jira/browse/HIVE-25684 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: skipped_tests.png TestGenericUDFInitializeOnCompareUDF is a parameterized test leading to 24K possible test combinations. From those only 7K are actually run and the rest (~16K) are skipped. {noformat} mvn test -Dtest=TestGenericUDFInitializeOnCompareUDF ... [WARNING] Tests run: 24300, Failures: 0, Errors: 0, Skipped: 16452, Time elapsed: 7.098 s - in org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFInitializeOnCompareUDF [INFO] [INFO] Results: [INFO] [INFO] Tests run: 7848, Failures: 0, Errors: 0, Skipped: 0 {noformat} This generates a lot of noise in Jenkins CI, where many tests appear as skipped, and it may make people believe it is a problem (side effect of their changes). Moreover, we know in advance which tests are skipped and why so instead of generating invalid parameter combinations we could simply remove those combinations altogether. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25681) Drop support for multi-threaded qtest execution via QTestRunnerUtils
Stamatis Zampetakis created HIVE-25681: -- Summary: Drop support for multi-threaded qtest execution via QTestRunnerUtils Key: HIVE-25681 URL: https://issues.apache.org/jira/browse/HIVE-25681 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis There is an option for running qtest concurrently via [QTestRunnerUtils#queryListRunnerMultiThreaded|https://github.com/apache/hive/blob/a72db99676ca6a79b414906ab78963a3e955ae69/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestRunnerUtils.java#L128] but it is not in use for more than a year now. Moreover, with the move the kubernetes containerized test execution (HIVE-22942) it is unlikely that we will run concurrent tests using these APIs anytime soon. The only consumer of this API at the moment is [TestMTQueries|https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMTQueries.java] which is disabled and it basically corresponds to the unit tests for these APIs. I propose to drop these APIs and related test to facilitate code evolution and maintenance. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25676) Uncaught exception in QTestDatabaseHandler#afterTest causes unrelated test failures
Stamatis Zampetakis created HIVE-25676: -- Summary: Uncaught exception in QTestDatabaseHandler#afterTest causes unrelated test failures Key: HIVE-25676 URL: https://issues.apache.org/jira/browse/HIVE-25676 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis When for some reason we fail to cleanup a database after running a test using the {{qt:database}} option an exception is raised and propagates up the stack. Not catching it in [QTestDatabaseHandler#afterTest|https://github.com/apache/hive/blob/0616bcaa2436ccbf388b635bfea160b47849553c/itests/util/src/main/java/org/apache/hadoop/hive/ql/qoption/QTestDatabaseHandler.java#L124] disrupts subsequent cleanup actions, which are not executed, and leads to failures in subsequent tests which are not related. Moreover, the exception leaves {{QTestDatabaseHandler}} in an invalid state since the internal map holding the running databases is not updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25675) Intermittent PSQLException when trying to connect to Postgres in tests
Stamatis Zampetakis created HIVE-25675: -- Summary: Intermittent PSQLException when trying to connect to Postgres in tests Key: HIVE-25675 URL: https://issues.apache.org/jira/browse/HIVE-25675 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The following exception appears intermittently when running tests using dockerized Postgres. {noformat} Unexpected exception org.postgresql.util.PSQLException: FATAL: the database system is starting up 21:26:55at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:525) 21:26:55at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:146) 21:26:55at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197) 21:26:55at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) 21:26:55at org.postgresql.jdbc.PgConnection.(PgConnection.java:217) 21:26:55at org.postgresql.Driver.makeConnection(Driver.java:458) 21:26:55at org.postgresql.Driver.connect(Driver.java:260) 21:26:55at java.sql.DriverManager.getConnection(DriverManager.java:664) 21:26:55at java.sql.DriverManager.getConnection(DriverManager.java:247) 21:26:55at org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.execute(AbstractExternalDB.java:191) 21:26:55at org.apache.hadoop.hive.ql.qoption.QTestDatabaseHandler.beforeTest(QTestDatabaseHandler.java:116) 21:26:55at org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79) 21:26:55at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:717) 21:26:55at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:189) 21:26:55at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) 21:26:55at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) {noformat} As the exception indicates when we try to connect to Postgres the database is not yet completely ready despite the fact that the respective port is open thus leading to the previous exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25668) Support database reuse when using qt:database option
Stamatis Zampetakis created HIVE-25668: -- Summary: Support database reuse when using qt:database option Key: HIVE-25668 URL: https://issues.apache.org/jira/browse/HIVE-25668 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis With HIVE-25594 it is possible to initialize and use various types of databases in tests. At the moment all the supported databases rely on docker containers which are initialized/destroyed in per test basis. This is good in terms of test isolation but it brings a certain performance overhead slowing down tests. At the moment it is fine since the feature it is not widely used but it would be good to have a way to reuse a database in multiple qfiles. The developper could specify in the qfile if they want to reuse a container (if it is possible) by passing certain additional options. The declaration could look like below: {noformat} --!qt:database:type=mysql;script=q_test_country_table.sql;reuse=true{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25667) Unify code managing JDBC databases in tests
Stamatis Zampetakis created HIVE-25667: -- Summary: Unify code managing JDBC databases in tests Key: HIVE-25667 URL: https://issues.apache.org/jira/browse/HIVE-25667 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Currently there are two class hierarchies managing JDBC databases in tests, [DatabaseRule| https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java] and [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java]. There are many similarities between these hierarchies and certain parts are duplicated. The goal of this JIRA is to refactor the aforementioned hierarchies to reduce code duplication and improve extensibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries
Stamatis Zampetakis created HIVE-25665: -- Summary: Checkstyle LGPL files must not be in the release sources/binaries Key: HIVE-25665 URL: https://issues.apache.org/jira/browse/HIVE-25665 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.6.0 Reporter: Stamatis Zampetakis As discussed in the [dev list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e] LGPL files must not be present in the Apache released sources/binaries. The following files must not be present in the release: https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl There may be other checkstyle LGPL files in the repo. All these should either be removed entirely from the repository or selectively excluded from the release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25655) Remove ElapsedTimeLoggingWrapper from tests
Stamatis Zampetakis created HIVE-25655: -- Summary: Remove ElapsedTimeLoggingWrapper from tests Key: HIVE-25655 URL: https://issues.apache.org/jira/browse/HIVE-25655 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The [ElapsedTimeLoggingWrapper|https://github.com/apache/hive/blob/f749ef2af27638914984c183bcfa213920f5cdd9/itests/util/src/main/java/org/apache/hadoop/hive/util/ElapsedTimeLoggingWrapper.java] introduced in HIVE-14625 is used by the [CoreCliDriver|#L68] to execute, measure, and display the time spend on some operations during the execution of {{@Before/@After}} methods. The benefit of logging the elapsed time for these methods is unclear. The time is usually rather short, especially compared to the actual time a query takes to run, so it is not an information which can be of much use. The enforced coding pattern for measuring and logging the time leads to boilerplate and makes the code harder to read and understand. {code:java} qt = new ElapsedTimeLoggingWrapper() { @Override public QTestUtil invokeInternal() throws Exception { return new QTestUtil( QTestArguments.QTestArgumentsBuilder.instance() .withOutDir(cliConfig.getResultsDir()) .withLogDir(cliConfig.getLogDir()) .withClusterType(miniMR) .withConfDir(hiveConfDir) .withInitScript(initScript) .withCleanupScript(cleanupScript) .withLlapIo(true) .withFsType(cliConfig.getFsType()) .build()); } }.invoke("QtestUtil instance created", LOG, true); {code} Moreover, the wrapper is not used consistently across drivers making results less uniform. The goal of this issue is to remove {{ElapsedTimeLoggingWrapper}} and its usages to improve code readability and maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25632) Remove unused code from ptest/ptest2
Stamatis Zampetakis created HIVE-25632: -- Summary: Remove unused code from ptest/ptest2 Key: HIVE-25632 URL: https://issues.apache.org/jira/browse/HIVE-25632 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Ptest framework was deprecated when PTest2 was introduced, and the latter is no longer used since it was superseded by HIVE-22942. The code is more or less dead and keeping it in the repo leads to maintenance overhead. People update files from time to time assuming that it is maintained and occasionally it also leads to broken build since ptest2 is an actual maven module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25629) Drop support of multiple qfiles in QTestUtil, output and result processors
Stamatis Zampetakis created HIVE-25629: -- Summary: Drop support of multiple qfiles in QTestUtil, output and result processors Key: HIVE-25629 URL: https://issues.apache.org/jira/browse/HIVE-25629 Project: Hive Issue Type: Task Components: Testing Infrastructure Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The current implementation of [QTestUtil|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java], [QOutProcessor|https://github.com/apache/hive/blob/master/itests/util/src/main/java/org/apache/hadoop/hive/ql/QOutProcessor.java], and [QTestResultProcessor|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestResultProcessor.java], has some methods and fields (maps) for managing multiple input files. However, *all* clients of this API, such as [CoreCliDriver|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java], use these classes by processing one file per run. +Example+ {code:java} public void runTest(String testName, String fname, String fpath) { ... qt.addFile(fpath); qt.cliInit(new File(fpath)); ... try { qt.executeClient(fname); } catch (CommandProcessorException e) { qt.failedQuery(e.getCause(), e.getResponseCode(), fname, QTestUtil.DEBUG_HINT); } ... } {code} Notice that {{qt.addFile}} will keep accumulating input files to memory (filename + content) while {{qt.executeClient}} (and other similar APIs) always operate on the last file added. Apart from wasting memory, the APIs for multiple files are harder to understand, and extend. The goal of this JIRA is to simplify the aforementioned APIs by removing unused/redundant parts associated to multiple files to improve code readability, and reduce memory consumption. +Historical note+ Before HIVE-25625 the functionality of multiple input files was used by the {{TestCompareCliDriver}} but it was still useless for all the other clients. With the removal of {{TestCompareCliDriver}} in HIVE-25625 keeping multiple files is completely redundant. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25625) Drop TestCompareCliDriver and related code from tests
Stamatis Zampetakis created HIVE-25625: -- Summary: Drop TestCompareCliDriver and related code from tests Key: HIVE-25625 URL: https://issues.apache.org/jira/browse/HIVE-25625 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The driver has been introduced back in 2015 (HIVE-6010) aiming to run queries with vectorization on/off and comparing the results. However it didn't receive much attention since then and currently only two queries are run with this driver. The majority of tests aiming to ensure vectorization works correctly use the {{TestMiniLlapLocalCliDriver}} and run a query twice switching on/off the necessary properties. Summing up having the [TestCompareCliDriver|https://github.com/apache/hive/blob/d521f149fade25f74e7ca28fa399103684a80580/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestCompareCliDriver.java] in the repo leads to extra code maintenance cost without significant benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25624) Drop DummyCliDriver and related code from tests
Stamatis Zampetakis created HIVE-25624: -- Summary: Drop DummyCliDriver and related code from tests Key: HIVE-25624 URL: https://issues.apache.org/jira/browse/HIVE-25624 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The only thing this test code does is fail no matter the input file, potentially with different message (see [CoreDummy.runTest|https://github.com/apache/hive/blob/d521f149fade25f74e7ca28fa399103684a80580/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreDummy.java#L56]). It is very close to "dead-code" so keeping it in the repository only adds maintenance overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25618) Stack trace is difficult to find when qtest fails during setup/teardown
Stamatis Zampetakis created HIVE-25618: -- Summary: Stack trace is difficult to find when qtest fails during setup/teardown Key: HIVE-25618 URL: https://issues.apache.org/jira/browse/HIVE-25618 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis When a qtest fails while executing one of the setup/teardown methods of a CLI driver ([CliAdapter|https://github.com/apache/hive/blob/3e37ba473545a691f5f32c08fc4b62b49257cab4/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliAdapter.java#L36] and its subclasses): {code:java} public abstract void beforeClass() throws Exception; public abstract void setUp(); public abstract void tearDown(); public abstract void shutdown() throws Exception; {code} the original stack trace leading to the failure cannot be found easily. Maven console shows a stack trace which doesn't correspond to the actual exception causing the problem but another one which in most cases does not contain the original cause. The original stack trace is not displayed in the maven console and it is not in the {{target/tmp/logs/hive.log}} either. At the moment it goes to {{target/surefire-reports/...-output.txt}}. The developer needs to search in 2-3 places and navigate back and forth to the code in order to find what went wrong. Ideally the stack trace from the original exception should be printed directly in maven console. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25611) OOM when running MERGE query on wide transactional table with many buckets
Stamatis Zampetakis created HIVE-25611: -- Summary: OOM when running MERGE query on wide transactional table with many buckets Key: HIVE-25611 URL: https://issues.apache.org/jira/browse/HIVE-25611 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: merge_query_plan.txt, merge_wide_acid_bucketed_table.q, wide_table_100_char_cols.csv Running a {{MERGE}} statement over a wide transactional/ACID table with many buckets leads to {{OutOfMemoryError}} during the execution of the query. A step-by-step reproducer is attached to the case ( [^merge_wide_acid_bucketed_table.q] [^wide_table_100_char_cols.csv] ) but the main idea is outlined below. {code:sql} CREATE TABLE wide_table_txt ( w_id_colint, w_char_col0 char(20), ... w_char_col99 char(20)) STORED AS ORC TBLPROPERTIES ('transactional'='true') -- Load data into the table in a way that it gets bucketed CREATE TABLE simple_table_txt (id int, name char(20)) STORED AS TEXTFILE; -- Load data into simple_table_txt overlapping with the data in wide_table_txt MERGE INTO wide_table_orc target USING simple_table_txt source ON (target.w_id_col = source.id) WHEN MATCHED THEN UPDATE SET w_char_col0 = source.name WHEN NOT MATCHED THEN INSERT (w_id_col, w_char_col1) VALUES (source.id, 'Actual value does not matter'); {code} A sample stacktrace showing the memory pressure is given below: {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.orc.OrcProto$RowIndexEntry$Builder.create(OrcProto.java:8962) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.OrcProto$RowIndexEntry$Builder.access$12100(OrcProto.java:8931) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.OrcProto$RowIndexEntry.newBuilder(OrcProto.java:8915) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriterBase.(TreeWriterBase.java:98) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.StringBaseTreeWriter.(StringBaseTreeWriter.java:66) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.CharTreeWriter.(CharTreeWriter.java:40) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:163) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.StructTreeWriter.(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.StructTreeWriter.(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.orc.impl.WriterImpl.(WriterImpl.java:216) ~[orc-core-1.6.9.jar:1.6.9] at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:95) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:396) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.initWriter(OrcRecordUpdater.java:615) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSimpleEvent(OrcRecordUpdater.java:442) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:495) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:519) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1200) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apa
[jira] [Created] (HIVE-25594) Setup JDBC databases in tests via QT options
Stamatis Zampetakis created HIVE-25594: -- Summary: Setup JDBC databases in tests via QT options Key: HIVE-25594 URL: https://issues.apache.org/jira/browse/HIVE-25594 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The goal of this jira is to add a new QT option for setting up JDBC DBMS and using it in qtests which need a JDBC endpoint up and running. It can be used in tests with external JDBC tables, connectors, etc. A sample file using the proposed option ({{qt:database}}) is shown below. {code:sql} --!qt:database:postgres:init_sript_1234.sql:cleanup_script_1234.sql CREATE EXTERNAL TABLE country (name varchar(80)) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver" = "org.postgresql.Driver", "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB", "hive.sql.dbcp.username" = "qtestuser", "hive.sql.dbcp.password" = "qtestpassword", "hive.sql.table" = "country"); EXPLAIN CBO SELECT COUNT(*) from country; SELECT COUNT(*) from country; {code} This builds upon HIVE-25423 but proposes to use JDBC datasources without the need for a using a specific CLI driver. Furthermore, the proposed QT option syntax allows using customised init/cleanup scripts for the JDBC datasource per test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25591) CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema
Stamatis Zampetakis created HIVE-25591: -- Summary: CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema Key: HIVE-25591 URL: https://issues.apache.org/jira/browse/HIVE-25591 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Consider the following use case where tables reside in some user-defined schema in some JDBC compliant database: +Postgres+ {code:sql} create schema world; create table if not exists world.country (name varchar(80) not null); insert into world.country (name) values ('India'); insert into world.country (name) values ('Russia'); insert into world.country (name) values ('USA'); {code} The following DDL statement in Hive fails: +Hive+ {code:sql} CREATE EXTERNAL TABLE country (name varchar(80)) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver" = "org.postgresql.Driver", "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/test", "hive.sql.dbcp.username" = "user", "hive.sql.dbcp.password" = "pwd", "hive.sql.schema" = "world", "hive.sql.table" = "country"); {code} The exception is the following: {noformat} org.postgresql.util.PSQLException: ERROR: relation "country" does not exist Position: 15 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153) ~[postgresql-42.2.14.jar:42.2.14] at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103) ~[postgresql-42.2.14.jar:42.2.14] at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getColumnNames(GenericJdbcDatabaseAccessor.java:83) [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.storage.jdbc.JdbcSerDe.initialize(JdbcSerDe.java:98) [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:95) [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:78) [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:342) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:324) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:734) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:717) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableDesc.toTable(CreateTableDesc.java:933) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:59) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] a
[jira] [Created] (HIVE-25530) AssertionError when query involves multiple JDBC tables and views
Stamatis Zampetakis created HIVE-25530: -- Summary: AssertionError when query involves multiple JDBC tables and views Key: HIVE-25530 URL: https://issues.apache.org/jira/browse/HIVE-25530 Project: Hive Issue Type: Bug Components: CBO, HiveServer2 Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis Assignee: Soumyakanti Das Fix For: 4.0.0 Attachments: engesc_6056.q An {{AssertionError}} is thrown during compilation when a query contains multiple external JDBC tables and there are available materialized views which can be used to answer the query. The problem can be reproduced by running the scenario in [^engesc_6056.q]. {code:bash} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=engesc_6056.q -Dtest.output.overwrite {code} The stacktrace is shown below: {noformat} java.lang.AssertionError: Rule's description should be unique; existing rule=JdbcToEnumerableConverterRule(in:JDBC.DERBY,out:ENUMERABLE); new rule=JdbcToEnumerableConverterRule(in:JDBC.DERBY,out:ENUMERABLE) at org.apache.calcite.plan.AbstractRelOptPlanner.addRule(AbstractRelOptPlanner.java:158) at org.apache.calcite.plan.volcano.VolcanoPlanner.addRule(VolcanoPlanner.java:406) at org.apache.calcite.adapter.jdbc.JdbcConvention.register(JdbcConvention.java:66) at org.apache.calcite.plan.AbstractRelOptPlanner.registerClass(AbstractRelOptPlanner.java:233) at org.apache.hadoop.hive.ql.optimizer.calcite.cost.HiveVolcanoPlanner.registerClass(HiveVolcanoPlanner.java:90) at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1224) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84) at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268) at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84) at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268) at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604) at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewBoxing$HiveMaterializedViewUnboxingRule.onMatch(HiveMaterializedViewBoxing.java:210) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2027) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1717) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1589) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1341) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:559) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12549) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:452) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) a
[jira] [Created] (HIVE-25316) Query with window function over external JDBC table and filter fails at runtime
Stamatis Zampetakis created HIVE-25316: -- Summary: Query with window function over external JDBC table and filter fails at runtime Key: HIVE-25316 URL: https://issues.apache.org/jira/browse/HIVE-25316 Project: Hive Issue Type: Bug Components: JDBC storage handler, Query Processor Affects Versions: 4.0.0 Reporter: Stamatis Zampetakis The following TPC-DS query fails at runtime when the table {{store_sales}} is an external JDBC table. {code:sql} SELECT ranking FROM (SELECT rank() OVER (PARTITION BY ss_store_sk ORDER BY sum(ss_net_profit)) AS ranking FROM store_sales GROUP BY ss_store_sk) tmp1 WHERE ranking <= 5 {code} The stacktrace below shows that problem occurs while trying to initialize the {{TopNKeyOperator}}. {noformat} 2021-07-08T09:04:37,444 ERROR [TezTR-270335_1_3_0_0_0] tez.TezProcessor: Failed initializeAndRunProcessor java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) [tez-runtime-internals-0.10.0.jar:0.10.0] at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) [tez-runtime-internals-0.10.0.jar:0.10.0] at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) [tez-runtime-internals-0.10.0.jar:0.10.0] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) [hadoop-common-3.1.0.jar:?] at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) [tez-runtime-internals-0.10.0.jar:0.10.0] at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) [tez-runtime-internals-0.10.0.jar:0.10.0] at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) [tez-common-0.10.0.jar:0.10.0] at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) [hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_261] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261] Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:ss_store_sk, 1:$f1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550) ~[hive-serde-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153) ~[hive-serde-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TopNKeyOperator.initObjectInspectors(TopNKeyOperator.java:101) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TopNKeyOperator.initializeOp(TopNKeyOperator.java:82) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:506) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] ... 16 more {noformat} -- This message was sent by Atlassi
[jira] [Created] (HIVE-25296) Replace parquet-hadoop-bundle dependency with the actual parquet modules
Stamatis Zampetakis created HIVE-25296: -- Summary: Replace parquet-hadoop-bundle dependency with the actual parquet modules Key: HIVE-25296 URL: https://issues.apache.org/jira/browse/HIVE-25296 Project: Hive Issue Type: Improvement Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Fix For: 4.0.0 The parquet-hadoop-bundle is not a real dependency but a mere packaging of three parquet modules to create an uber jar. The Parquet community created this artificial module on demand by HIVE-5783 but the benefits if any are unclear. On the contrary using the uber dependency has some drawbacks: * Parquet souce code cannot be attached easily in IDEs which makes debugging sessions cumbersome. * Finding concrete dependencies with Parquet is not possible just by inspecting the pom files. * Extra maintenance cost for the Parquet community adding additional verification steps during a release. The goal of this JIRA is to replace the uber dependency with concrete dependencies to the respective modules: * parquet-common * parquet-column * parquet-hadoop -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones
Stamatis Zampetakis created HIVE-25219: -- Summary: Backward incompatible timestamp serialization in Avro for certain timezones Key: HIVE-25219 URL: https://issues.apache.org/jira/browse/HIVE-25219 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Fix For: 4.0.0 HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro). In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system. Consider the following scenario where the default system timezone is set to US/Pacific. At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee; {code} |1|1880-01-01 00:00:00| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee; {code} |1|1879-12-31 23:52:58| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
Stamatis Zampetakis created HIVE-25129: -- Summary: Wrong results when timestamps stored in Avro/Parquet fall into the DST shift Key: HIVE-25129 URL: https://issues.apache.org/jira/browse/HIVE-25129 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Timestamp values falling into the daylight savings time of the system timezone cannot be retrieved as is when those are stored in Parquet/Avro tables. The respective SELECT query shifts those timestamps by +1 reflecting the DST shift. +Example+ {code:sql} --! qt:timezone:US/Pacific create table employee (eid int, birthdate timestamp) stored as parquet; insert into employee values (0, '2019-03-10 02:00:00'); insert into employee values (1, '2020-03-08 02:00:00'); insert into employee values (2, '2021-03-14 02:00:00'); select eid, birthdate from employee order by eid;{code} +Actual results+ |0|2019-03-10 03:00:00| |1|2020-03-08 03:00:00| |2|2021-03-14 03:00:00| +Expected results+ |0|2019-03-10 02:00:00| |1|2020-03-08 02:00:00| |2|2021-03-14 02:00:00| Storing and retrieving values in columns using the [timestamp data type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types] (equivalent with LocalDateTime java API) should not alter at any way the value that the user is seeing. The results are correct for {{TEXTFILE}} and {{ORC}} tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones
Stamatis Zampetakis created HIVE-25104: -- Summary: Backward incompatible timestamp serialization in Parquet for certain timezones Key: HIVE-25104 URL: https://issues.apache.org/jira/browse/HIVE-25104 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro, Orc). In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system. Consider the following scenario where the default system timezone is set to US/Pacific. At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee; {code} |1|1880-01-01 00:00:00| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee; {code} |1|1879-12-31 23:52:58| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)