from:"Stamatis Zampetakis \(Jira\)"

[jira] [Created] (HIVE-27225) Speedup build by skipping SBOM generation by default

2023-04-06 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27225:
--

 Summary: Speedup build by skipping SBOM generation by default
 Key: HIVE-27225
 URL: https://issues.apache.org/jira/browse/HIVE-27225
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


A full build of Hive locally in my environment takes ~15 minutes.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  14:15 min
{noformat}

Profiling the build shows that we are spending roughly 30% of CPU in 
org.cyclonedx.maven plugin which is used to generate SBOM artifacts 
(HIVE-26912). 

The SBOM generation does not need run in every single build and probably needs 
to be active only during the release build. To speed-up every-day builds I 
propose to activate the cyclonedx plugin only in the dist (release) profile.

After this change, the default build drops from 14 minutes to 8.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  08:19 min
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats

2023-03-30 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27199:
--

 Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text 
files using custom formats
 Key: HIVE-27199
 URL: https://issues.apache.org/jira/browse/HIVE-27199
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values come in many flavors and formats and there is no single 
representation that can satisfy everyone especially when such values are stored 
in plain text/csv files.

HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that 
allows to provide custom timestamp patterns to parse correctly TIMESTAMP values 
coming from files.

However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not 
possible to use a custom pattern thus when the built-in Hive parser does not 
match the expected format a NULL value is returned.

Consider a text file, F1, with the following values:
{noformat}
2016-05-03 12:26:34
2016-05-03T12:26:34
{noformat}
and a table with a column declared as LTZ.
{code:sql}
CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;

SELECT * FROM ts_table;
2016-05-03 12:26:34.0 US/Pacific
NULL
{code}
In order to give more flexibility to the users relying on the TIMESTAMP WITH 
LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type 
this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP 
types.

The work here focuses exclusively on simple text files but the same could be 
done for other SERDE such as JSON etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27162) Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator

2023-03-21 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27162:
--

 Summary: Unify HiveUnixTimestampSqlOperator and 
HiveToUnixTimestampSqlOperator
 Key: HIVE-27162
 URL: https://issues.apache.org/jira/browse/HIVE-27162
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis


The two classes below both represent the {{unix_timestamp}} operator and have 
identical implementations.
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java

Probably there is a way to use one or the other and not both; having two ways 
of representing the same thing can bring various problems in query planning and 
it also leads to code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler

2023-03-21 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27161:
--

 Summary: MetaException when executing CTAS query in Druid storage 
handler
 Key: HIVE-27161
 URL: https://issues.apache.org/jira/browse/HIVE-27161
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis


Any kind of CTAS query targeting the Druid storage handler fails with the 
following exception:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:LOCATION may not be specified for Druid)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:60)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 ~[junit-4.13.2.jar:4.13.2]
at 
org.junit.internal.run

[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function

2023-03-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27157:
--

 Summary: AssertionError when inferring return type for 
unix_timestamp function
 Key: HIVE-27157
 URL: https://issues.apache.org/jira/browse/HIVE-27157
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Any attempt to derive the return data type for the {{unix_timestamp}} function 
results into the following assertion error.
{noformat}
java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT
at 
org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65)
at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67)
at 
org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78)
at 
org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278)
{noformat}
due to a faulty implementation of type inference for the respective operators:
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java]
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java]

Although at this stage in master it is not possible to reproduce the problem 
with an actual SQL query the buggy implementation must be fixed since slight 
changes in the code/CBO rules may lead to methods relying on 
{{{}SqlOperator.inferReturnType{}}}.

Note that in older versions of Hive it is possible to hit the AssertionError in 
various ways. For example in Hive 3.1.3 (and older), the error may come from 
[HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933]
 in the presence of sub-queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP

2023-03-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27156:
--

 Summary: Wrong results when CAST timestamp literal with timezone 
to TIMESTAMP
 Key: HIVE-27156
 URL: https://issues.apache.org/jira/browse/HIVE-27156
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype 
results into a timestamp with the time part truncated to midnight (00:00:00). 

*Case I*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

*Case II*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

The existing documentation does not cover what should be the output in the 
cases above:
* 
https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps
* https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types

*Case III*
Another subtle but important case is the following where the timestamp literal 
has a valid timezone but we are attempting a cast to a datatype that does not 
store the timezone.

{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp);
{code}

+Actual+
|2020-06-28 22:17:33.123456|

The correctness of the last result is debatable since someone would expect a 
NULL or ERROR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27131) Remove empty module shims/scheduler

2023-03-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27131:
--

 Summary: Remove empty module shims/scheduler 
 Key: HIVE-27131
 URL: https://issues.apache.org/jira/browse/HIVE-27131
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The module has nothing more than a plain pom.xml file and the latter does not 
seem to do anything special apart from bundling up together some optional 
dependencies.

There is no source code, no tests, and no reason for the module to exist.

At some point it used to contain a few classes but these were removed 
progressively (e.g., HIVE-22398) leaving back an empty module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2023-02-23 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27102:
--

 Summary: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
 Key: HIVE-27102
 URL: https://issues.apache.org/jira/browse/HIVE-27102
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


New versions for Calcite and Avatica are available so we should upgrade to them.

I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that the 
work was not in very advanced state it is preferred to jump directly to 1.33.0.

Avatica must be inline with Calcite so both need to be updated at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27100) Remove unused data/files from repo

2023-02-23 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27100:
--

 Summary: Remove unused data/files from repo
 Key: HIVE-27100
 URL: https://issues.apache.org/jira/browse/HIVE-27100
 Project: Hive
  Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Some files under [https://github.com/apache/hive/tree/master/data/files] are 
not referenced anywhere else in the repo and can be removed.

Removing them makes it easier to see what is actually tested. Other minor 
benefits:
 * faster checkout times;
 * smaller source/binary releases.

The script that was used to find which files are not referenced can be found 
below:
{code:bash}
for f in `ls data/files`; do
  echo -n "$f "; 
  grep -a -R "$f" --exclude-dir=".git" --exclude-dir=target --exclude=\*.q.out 
--exclude=\*.class --exclude=\*.jar | wc -l | grep " 0$";
done
{code}
+Output+
{noformat}
cbo_t4.txt 0
cbo_t5.txt 0
cbo_t6.txt 0
compressed_4line_file1.csv.bz2 0
empty2.txt 0
filterCard.txt 0
fullouter_string_big_1a_old.txt 0
fullouter_string_small_1a_old.txt 0
futurama_episodes.avro 0
in9.txt 0
map_null_schema.avro 0
regex-path-2015-12-10_03.txt 0
regex-path-201512-10_03.txt 0
regex-path-2015121003.txt 0
sample.json 0
sample-queryplan-in-history.txt 0
sample-queryplan.txt 0
smbbucket_2.txt 0
smb_bucket_input.txt 0
SortDescCol1Col2.txt 0
SortDescCol2Col1.txt 0
sortdp.txt 0
srcsortbucket1outof4.txt 0
srcsortbucket2outof4.txt 0
srcsortbucket4outof4.txt 0
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27080) Support project pushdown in JDBC storage handler even when filters are not pushed

2023-02-14 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27080:
--

 Summary: Support project pushdown in JDBC storage handler even 
when filters are not pushed
 Key: HIVE-27080
 URL: https://issues.apache.org/jira/browse/HIVE-27080
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis


{code:sql}
CREATE EXTERNAL TABLE book
(
id int,
title varchar(20),
author int
)
STORED BY  
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver" = "org.postgresql.Driver",
"hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
"hive.sql.dbcp.username" = "qtestuser",
"hive.sql.dbcp.password" = "qtestpassword",
"hive.sql.table" = "book"
);
{code}
{code:sql}
explain cbo select id from book where title = 'Les Miserables';
{code}
{noformat}
CBO PLAN:
HiveJdbcConverter(convention=[JDBC.POSTGRES])
  JdbcProject(id=[$0])
JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
  JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Good case:+ Only the id column is fetched from the underlying database (see 
JdbcProject) since it is necessary for the result.
{code:sql}
explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
{code}
{noformat}
CBO PLAN:
HiveProject(id=[$0])
  HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE", _UTF-16LE'LES MISERABLES')])
HiveProject(id=[$0], title=[$1], author=[$2])
  HiveJdbcConverter(convention=[JDBC.POSTGRES])
JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Bad case:+ All table columns are fetched from the database although only id 
and title are necessary; id is the result so cannot be dropped and title is 
needed for HiveFilter since the UPPER operation was not pushed in the DBMS. The 
author column is not needed at all so the plan should have a JdbcProject with 
id, and title, on top of the JdbcHiveTableScan.

Although it doesn't seem a big deal in some cases tables are pretty wide (more 
than 100 columns) while the queries rarely return all of them. Improving 
project pushdown to handle such cases can give a major performance boost.

Pushing the filter with UPPER to JDBC storage handler is also a relevant 
improvement but this should be tracked under another ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27061) Website deployment GitHub action should not trigger on pull requests

2023-02-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27061:
--

 Summary: Website deployment GitHub action should not trigger on 
pull requests
 Key: HIVE-27061
 URL: https://issues.apache.org/jira/browse/HIVE-27061
 Project: Hive
  Issue Type: Bug
  Components: Website
Reporter: Stamatis Zampetakis


The Website deployment GitHub action configured here:

[https://github.com/apache/hive-site/blob/a3132faf0f4a555434076cb8ad690ae2c2c8c371/.github/workflows/gh-pages.yml]

should not trigger on pull requests.

The issue can be seen here:

https://github.com/apache/hive-site/actions/runs/4127993132/jobs/7131893178

where the action was launched for https://github.com/apache/hive-site/pull/1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26987) InvalidProtocolBufferException when reading column statistics from ORC files

2023-01-25 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26987:
--

 Summary: InvalidProtocolBufferException when reading column 
statistics from ORC files
 Key: HIVE-26987
 URL: https://issues.apache.org/jira/browse/HIVE-26987
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, ORC
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
 Attachments: data.csv.gz, orc_large_column_metadata.q

Any attempt to read an ORC file (query an ORC table) having a metadata section 
with column statistics exceeding the hardcoded limit of 1GB 
([https://github.com/apache/orc/blob/2ff9001ddef082eaa30e21cbb034f266e0721664/java/core/src/java/org/apache/orc/impl/InStream.java#L41])
 leads to the following exception.
{noformat}
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:162)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readRawBytesSlowPathOneChunk(CodedInputStream.java:2940)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readBytesSlowPath(CodedInputStream.java:3021)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readBytes(CodedInputStream.java:2432)
at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1718)
at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1663)
at 
org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1766)
at 
org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1761)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6552)
at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6468)
at 
org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6678)
at 
org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6673)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
at org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19586)
at org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19533)
at 
org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19622)
at 
org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19617)
at 
com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20270)
at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20217)
at 
org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20306)
at 
org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20301)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:91)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
at org.apache.orc.OrcProto$Metadata.parseFrom(OrcProto.java:20438)
at 
org.apache.orc.impl.ReaderImpl.deserializeStripeStats(ReaderImpl.java:1013)
at 
org.apache.orc.impl.ReaderImpl.getVariantStripeStatistics(ReaderImpl.java:317)
at 
org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1047)
at 
org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1034)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1679)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1557)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1342)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1529)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1526)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1526)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1342)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecu

[jira] [Created] (HIVE-26877) Parquet CTAS with JOIN on decimals with different precision/scale fail

2022-12-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26877:
--

 Summary: Parquet CTAS with JOIN on decimals with different 
precision/scale fail
 Key: HIVE-26877
 URL: https://issues.apache.org/jira/browse/HIVE-26877
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: ctas_parquet_join.q

Creating a Parquet table using CREATE TABLE AS SELECT syntax (CTAS) leads to 
runtime error when the SELECT statement joins columns with different 
precision/scale.

Steps to reproduce:
{code:sql}
CREATE TABLE table_a (col_dec decimal(5,0));
CREATE TABLE table_b(col_dec decimal(38,10));

INSERT INTO table_a VALUES (1);
INSERT INTO table_b VALUES (1.00);

set hive.default.fileformat=parquet;

create table target as
select table_a.col_dec
from table_a
left outer join table_b on
table_a.col_dec = table_b.col_dec;
{code}

Stacktrace:

{noformat}
2022-12-20T07:02:52,237  INFO [2dfbd95a-7553-467b-b9d0-629100785502 Listener at 
0.0.0.0/46609] reexec.ReExecuteLostAMQueryPlugin: Got exception message: Vertex 
failed, vertexName=Reducer 2, vertexId=vertex_1671548565336_0001_3_02, 
diagnostics=[Task failed, taskId=task_1671548565336_0001_3_02_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1671548565336_0001_3_02_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators: Fixed 
Binary size 16 does not match field type length 3
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators: Fixed Binary size 16 does not match field type length 3
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:379)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310)
... 15 more
Caused by: java.lang.IllegalArgumentException: Fixed Binary size 16 does not 
match field type length 3
at 
org.apache.parquet.column.values.plain.FixedLenByteArrayPlainValuesWriter.writeBytes(FixedLenByteArrayPlainValuesWriter.java:56)
at 
org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:174)
at 
org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:476)
at 
org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:116)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:571)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:35)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)

[jira] [Created] (HIVE-26849) Nightly build fails in master (build 1533 onwards)

2022-12-14 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26849:
--

 Summary: Nightly build fails in master (build 1533 onwards)
 Key: HIVE-26849
 URL: https://issues.apache.org/jira/browse/HIVE-26849
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: master_build1534_nodes_101_steps_439.txt

The last builds on master fail when testing the nightly build:
* http://ci.hive.apache.org/job/hive-precommit/job/master/1533/
* http://ci.hive.apache.org/job/hive-precommit/job/master/1534/

Full log attached in master_build1534_nodes_101_steps_439.txt

Relevant extract:
{noformat}
[2022-12-14T14:50:48.606Z] [INFO] Hive Packaging 
4.0.0-nightly-89bf37bb45-20221214_144325 FAILURE [  3.734 s]
[2022-12-14T14:50:48.607Z] [INFO] 

[2022-12-14T14:50:48.607Z] [INFO] BUILD FAILURE
[2022-12-14T14:50:48.607Z] [INFO] 

[2022-12-14T14:50:48.607Z] [INFO] Total time:  06:49 min
[2022-12-14T14:50:48.607Z] [INFO] Finished at: 2022-12-14T14:50:48Z
[2022-12-14T14:50:48.607Z] [INFO] 

[2022-12-14T14:50:48.607Z] [WARNING] The requested profile "qsplits" could not 
be activated because it does not exist.
[2022-12-14T14:50:48.607Z] [ERROR] Failed to execute goal on project 
hive-packaging: Could not resolve dependencies for project 
org.apache.hive:hive-packaging:pom:4.0.0-nightly-89bf37bb45-20221214_144325: 
The following artifacts could not be resolved: 
org.apache.hive.hcatalog:hive-webhcat:jar:4.0.0-nightly-89bf37bb45-20221214_144325,
 
org.apache.hive.hcatalog:hive-webhcat-java-client:jar:4.0.0-nightly-89bf37bb45-20221214_144325:
 Could not find artifact 
org.apache.hive.hcatalog:hive-webhcat:jar:4.0.0-nightly-89bf37bb45-20221214_144325
 in wonder (http://artifactory/artifactory/wonder) -> [Help 1]
[2022-12-14T14:50:48.607Z] [ERROR] 
[2022-12-14T14:50:48.607Z] [ERROR] To see the full stack trace of the errors, 
re-run Maven with the -e switch.
[2022-12-14T14:50:48.607Z] [ERROR] Re-run Maven using the -X switch to enable 
full debug logging.
[2022-12-14T14:50:48.607Z] [ERROR] 
[2022-12-14T14:50:48.607Z] [ERROR] For more information about the errors and 
possible solutions, please read the following articles:
[2022-12-14T14:50:48.607Z] [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[2022-12-14T14:50:48.607Z] [ERROR] 
[2022-12-14T14:50:48.607Z] [ERROR] After correcting the problems, you can 
resume the build with the command
[2022-12-14T14:50:48.607Z] [ERROR]   mvn  -rf :hive-packaging
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26818) Beeline module misses transitive dependencies due to shading

2022-12-07 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26818:
--

 Summary: Beeline module misses transitive dependencies due to 
shading
 Key: HIVE-26818
 URL: https://issues.apache.org/jira/browse/HIVE-26818
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Stamatis Zampetakis


Due to shading, the dependecy-reduced-pom.xml file is installed in the local 
maven repository 
(~/.m2/repository/org/apache/hive/hive-beeline/4.0.0-SNAPSHOT/) for beeline. 

The latter indicates that the module doesn't have any transitive dependencies. 
If we were publishing the shaded jar that would be true but we publish the 
regular jar.

At this point, modules which include hive-beeline as a maven dependency are 
broken and problems such as HIVE-26812 may occur.

I was under the impression that these also affects 4.0.0-alpha-2 release (since 
it includes ) HIVE-25750) but strangely the published pom has all the 
dependencies:
https://repo1.maven.org/maven2/org/apache/hive/hive-beeline/4.0.0-alpha-2/hive-beeline-4.0.0-alpha-2.pom



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26807:
--

 Summary: Investigate test running times before/after Zookeeper 
upgrade to 3.6.3
 Key: HIVE-26807
 URL: https://issues.apache.org/jira/browse/HIVE-26807
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


During the investigation of the CI timing out (HIVE-2686) there were some 
concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
slowdown.

The goal of this issue is to analyse the test results from the following builds:

* [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
commit just before Zookeeper upgrade;
* [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI timeouts 
(HIVE-26806) fixed;

and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-02 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26806:
--

 Summary: Precommit tests in CI are timing out after HIVE-26796
 Key: HIVE-26806
 URL: https://issues.apache.org/jira/browse/HIVE-26806
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


http://ci.hive.apache.org/job/hive-precommit/job/master/1506/

{noformat}
ancelling nested steps due to timeout
15:22:08  Sending interrupt signal to process
15:22:08  Killing processes
15:22:09  kill finished with exit code 0
15:22:19  Terminated
15:22:19  script returned exit code 143
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
15:22:19  Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (PostProcess)
[Pipeline] sh
[Pipeline] sh
[Pipeline] sh
[Pipeline] junit
15:22:25  Recording test results
15:22:32  [Checks API] No suitable checks publisher found.
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // container
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // podTemplate
[Pipeline] }
15:22:32  Failed in branch split-01
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Archive)
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] timeout
15:22:33  Timeout set to expire in 6 hr 0 min
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26796) All tests in hive-unit module are skipped silently

2022-11-30 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26796:
--

 Summary: All tests in hive-unit module are skipped silently
 Key: HIVE-26796
 URL: https://issues.apache.org/jira/browse/HIVE-26796
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


In current master (7207a62def246b3290f1ece529e65b79012a3578) the tests in 
hive-unit module are not running.

{noformat}
$ cd itests/hive-unit && mvn test
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-it-unit ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26785) Remove explicit protobuf-java dependency from blobstore and minikdc modules

2022-11-28 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26785:
--

 Summary: Remove explicit protobuf-java dependency from blobstore 
and minikdc modules
 Key: HIVE-26785
 URL: https://issues.apache.org/jira/browse/HIVE-26785
 Project: Hive
  Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The modules do not directly need protobuf dependency so it is misleading to 
declare it explicitly.

Moreover, these modules use a different protobuf version (3.3.0) than the rest 
of the project (3.21.4) which can lead to compatibility problems, inconsistent 
behavior in tests, and undesired transitive propagation to other modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26755) Wrong results after renaming Parquet column

2022-11-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26755:
--

 Summary: Wrong results after renaming Parquet column
 Key: HIVE-26755
 URL: https://issues.apache.org/jira/browse/HIVE-26755
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Parquet
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis


Renaming the column of a Parquet table leads to wrong results when the query 
uses the renamed column.
{code:sql}
create table person (id int, fname string, lname string, age int) stored as 
parquet;

insert into person values (1, 'Victor', 'Hugo', 23);
insert into person values (2, 'Alex', 'Dumas', 38);
insert into person values (3, 'Marco', 'Pollo', 25);

select fname from person where age >=25;
{code}
||Correct results||
|Alex|
|Marco|
{code:sql}
alter table person change column age years_from_birth int;
select fname from person where years_from_birth >=25;
{code}
After renaming the column the query above returns an empty result set.
{code:sql}
select years_from_birth from person;
{code}
||Wrong results||
|NULL|
|NULL|
|NULL|

After renaming the column the query returns the correct number of rows but all 
filled with nulls.

The problem is reproducible on current master (commit 
ae0cabffeaf284a6d2ec13a6993c87770818fbb9).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26690) Redirect hive-site notifications to the appropriate mailing lists

2022-11-02 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26690:
--

 Summary: Redirect hive-site notifications to the appropriate 
mailing lists
 Key: HIVE-26690
 URL: https://issues.apache.org/jira/browse/HIVE-26690
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Currently various notifications from 
[hive-site|https://github.com/apache/hive-site] repository, such as 
opening/reviewing/commenting pull requests, are send to the [dev mailing 
list|https://lists.apache.org/list.html?dev@hive.apache.org] (e.g., 
[https://lists.apache.org/thread/xthvd9m148xkhshco772llckfc1qk0sf]).

The respective notifications from the main [hive 
repository|https://github.com/apache/hive] are send to the [gitbox mailing 
list|https://lists.apache.org/list.html?git...@hive.apache.org].

The goal of this ticket is to redirect the notifications for hive-site 
repository from dev to gitbox/commit mailing lists by modifying the [.asf.yaml 
file|https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Notificationsettingsforrepositories].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26658) INT64 Parquet timestamps cannot be mapped to most Hive numeric types

2022-10-21 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26658:
--

 Summary: INT64 Parquet timestamps cannot be mapped to most Hive 
numeric types
 Key: HIVE-26658
 URL: https://issues.apache.org/jira/browse/HIVE-26658
 Project: Hive
  Issue Type: Bug
  Components: Parquet, Serializers/Deserializers
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


When attempting to read a Parquet file with column of primitive type INT64 and 
logical type 
[TIMESTAMP|https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/LogicalTypes.md?plain=1#L337]
 an error is raised when the Hive type is different from TIMESTAMP and BIGINT.

Consider a Parquet file (e.g., ts_file.parquet) with the following schema:
{code:json}
{
  "name": "eventtime",
  "type": ["null", {
"type": "long",
"logicalType": "timestamp-millis"
  }],
  "default": null
}
{code}
 
Mapping the column to a Hive numeric type among TINYINT, SMALLINT, INT, FLOAT, 
DOUBLE, DECIMAL, and trying to run a SELECT will give back an error.

The following snippet can be used to reproduce the problem.
{code:sql}
CREATE TABLE ts_table (eventtime INT) STORED AS PARQUET;
LOAD DATA LOCAL INPATH 'ts_file.parquet' into table ts_table;
SELECT * FROM ts_table;
{code}
This is a regression caused by HIVE-21215. Although, HIVE-21215 allows to read 
INT64 types as Hive TIMESTAMP, which was not possible before, at the same time 
it broke the mapping to every other Hive numeric type. The problem was 
addressed selectively for BIGINT type very recently (HIVE-26612).

The primary goal of this ticket is to restore backward compatibility since 
these use-cases were working before HIVE-21215.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26653) Wrong results when (map) joining multiple tables on partition column

2022-10-19 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26653:
--

 Summary: Wrong results when (map) joining multiple tables on 
partition column
 Key: HIVE-26653
 URL: https://issues.apache.org/jira/browse/HIVE-26653
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The result of the query must have exactly one row matching the date specified 
in the WHERE clause but the query returns nothing.
{code:sql}
CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string)
row format delimited fields terminated by ',' stored as textfile;

LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a;

CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string)
row format delimited fields terminated by ',' stored as textfile;

LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b;

set hive.auto.convert.join=true;
set hive.optimize.semijoin.conversion=false;

SELECT a.p_dt
FROM ((SELECT p_dt
   FROM table_b
   GROUP BY p_dt) a
 JOIN
 (SELECT p_dt
  FROM table_a
  GROUP BY p_dt) b ON a.p_dt = b.p_dt
 JOIN
 (SELECT p_dt
  FROM table_a
  GROUP BY p_dt) c ON a.p_dt = c.p_dt)
WHERE a.p_dt =  translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), 
'-', '');
{code}
+Expected result+
20220731

+Actual result+
Empty

To reproduce the problem the tables need to have some data. Values in aid and 
bid columns are not important. For p_dt column use one of the following values 
20220731, 20220630.

I will attach some sample data with which the problem can be reproduced. The 
tables look like below.
||aid|pdt||
|611|20220731|
|239|20220630|
|...|...|

The problem can be reproduced via qtest in current master 
(commit 
[6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a])
 by running the TestMiniLlapLocalCliDriver.

There is specific query plan (will attach shortly) for which the problem shows 
up so if the plan changes slightly the problem may not appear anymore; this is 
why we need to set explicitly hive.optimize.semijoin.conversion and 
hive.auto.convert.join to trigger the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26642) Replace HiveFilterMergeRule with Calcite's built-in implementation

2022-10-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26642:
--

 Summary: Replace HiveFilterMergeRule with Calcite's built-in 
implementation
 Key: HIVE-26642
 URL: https://issues.apache.org/jira/browse/HIVE-26642
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The rule was copied from Calcite to address HIVE-23389 as a temporary 
workaround till the next Calcite upgrade. 

Now that Hive is on calcite 1.25.0 (HIVE-23456) the in-house copy can be 
removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26638) Replace in-house CBO reduce expressions rules with Calcite's built-in classes

2022-10-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26638:
--

 Summary: Replace in-house CBO reduce expressions rules with 
Calcite's built-in classes
 Key: HIVE-26638
 URL: https://issues.apache.org/jira/browse/HIVE-26638
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The goal of this ticket is to remove Hive specific code in 
[HiveReduceExpressionsRule|https://github.com/apache/hive/blob/b48c1bf11c4f75ba2c894e4732a96813ddde1414/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java]
 and use exclusively the respective Calcite classes (i.e., 
[ReduceExpressionsRule|https://github.com/apache/calcite/blob/2c30a56158cdd351d35725006bc1f76bb6aac75b/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java])
 to reduce maintenance overhead and facilitate code evolution.

Currently the only difference between in-house (HiveReduceExpressionsRule) and 
built-in (ReduceExpressionsRule) reduce expressions rules lies in the way we 
treat the {{Filter}} operator (i.e., FilterReduceExpressionsRule).

There are four differences when comparing the in-house code with the respective 
part in Calcite 1.25.0 that are Hive specific.

+Match nullability when reducing expressions+
When we reduce filters we always set {{matchNullability}} (last parameter) to 
false.
{code:java}
if (reduceExpressions(filter, expList, predicates, true, false)) {
{code}
This means that the original and reduced expression can have a slightly 
different type in terms of nullability; the original is nullable and the 
reduced is not nullable. When the value is true the type can be preserved by 
adding a "nullability" CAST, which is a cast to the same type which differs 
only to if it is nullable or not.

Hardcoding {{matchNullability}} to false was done as part of the upgrade in 
Calcite 1.15.0 (HIVE-18068) where the behavior of the rule became configurable 
(CALCITE-2041).

+Remove nullability cast explicitly+
When the expression is reduced we try to remove the nullability cast; if there 
is one.
{code:java}
if (RexUtil.isNullabilityCast(filter.getCluster().getTypeFactory(), 
newConditionExp)) {
newConditionExp = ((RexCall) newConditionExp).getOperands().get(0);
}
{code}
The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316). 
However, the code is redundant as of HIVE-18068; setting {{matchNullability}} 
to {{false}} no longer generates nullability casts during the reduction.

+Avoid creating filters with condition of type NULL+
{code:java}
if(newConditionExp.getType().getSqlTypeName() == SqlTypeName.NULL) {
newConditionExp = call.builder().cast(newConditionExp, 
SqlTypeName.BOOLEAN);
}
{code}
Hive tries to cast such expressions to BOOLEAN to avoid the weird (and possibly 
problematic) situation of having a condition with NULL type.

In Calcite, there is specific code for detecting if the new condition is the 
NULL literal (with NULL type) and if that's the case it turns the relation to 
empty.
{code:java}
} else if (newConditionExp instanceof RexLiteral
  || RexUtil.isNullLiteral(newConditionExp, true)) {
call.transformTo(createEmptyRelOrEquivalent(call, filter));
{code}
Due to that the Hive specific code is redundant if the Calcite rule is used.

+Bail out when input to reduceNotNullableFilter is not a RexCall+
{code:java}
if (!(rexCall.getOperands().get(0) instanceof RexCall)) {
  // If child is not a RexCall instance, we can bail out
  return;
}
{code}
The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316) but it 
does not add any functional value.
The instanceof check is redundant since the code in reduceNotNullableFilter [is 
a 
noop|https://github.com/apache/hive/blob/6e8fc53fb68898d1a404435859cea5bbc79200a4/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java#L228]
 when the expression/call is not one of the following: IS_NULL, IS_UNKNOWN, 
IS_NOT_NULL, which are all rex calls.

+Summary+

All of the Hive specific changes mentioned previously can be safely replaced by 
appropriate uses of the Calcite APIs without affecting the behavior of CBO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26627) Remove HiveRelBuilder.aggregateCall override and refactor callers to use existing public methods

2022-10-12 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26627:
--

 Summary: Remove HiveRelBuilder.aggregateCall override and refactor 
callers to use existing public methods
 Key: HIVE-26627
 URL: https://issues.apache.org/jira/browse/HIVE-26627
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The HiveRelBuilder overrides 
[aggregateCall|https://github.com/apache/hive/blob/8c3567ea8e423b202cde370f4d3fb401bcc23e46/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java#L246]
 from its superclass simply to expose and use it in 
HiveRewriteToDataSketchesRules. 

However, there is no real need to override this method since we can achieve the 
same outcome by using existing methods in RelBuilder which are easier to use 
and understand. Furthermore it is safer to depend on public APIs since are more 
stable in general.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26626) Cut dependencies between HiveXxPullUpConstantsRule and HiveReduceExpressionsRule

2022-10-12 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26626:
--

 Summary: Cut dependencies between HiveXxPullUpConstantsRule and 
HiveReduceExpressionsRule
 Key: HIVE-26626
 URL: https://issues.apache.org/jira/browse/HIVE-26626
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HiveSortPullUpConstantsRule and HiveUnionPullUpConstantsRule are calling 
[predicateConstants|https://github.com/apache/hive/blob/8c3567ea8e423b202cde370f4d3fb401bcc23e46/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortPullUpConstantsRule.java#L128
] method from HiveReduceExpressionsRule. 

The method in HiveReduceExpressionsRule is deprecated and creates unnecessary 
dependencies among the rules. It can be replaced by a direct call to 
RexUtil.predicateConstants; the two methods are functionally equivalent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26609) Cleanup external table directories created in qtests after test run

2022-10-07 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26609:
--

 Summary: Cleanup external table directories created in qtests 
after test run
 Key: HIVE-26609
 URL: https://issues.apache.org/jira/browse/HIVE-26609
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


In many qtests we are creating external tables by setting explicitly the 
location of the table.

[https://github.com/apache/hive/blob/566f48d3d3fc740ef958bdf963e511e0853da402/ql/src/test/queries/clientnegative/authorization_uri_create_table_ext.q#L7]

If the test does not remove explicitly the directory (as it happens above) then 
it remains there and may cause flakiness and unrelated test failures if other 
tests happen to use the same directory somehow.

A recent case where this problem appeared (directory conflict between tests) is 
logged under HIVE-26584. There the solution was to add explicit rm commands in 
the qfiles.

A more general solution would be to handle the cleanup of such directories 
inside 
[QTestUtil.clearTablesCreatedDuringTests|https://github.com/apache/hive/blob/566f48d3d3fc740ef958bdf963e511e0853da402/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L342].

The idea is to get the location for an external table from the metastore and 
remove the respective directory if it is under a known "safe" directory such as 
{{{}$\{system:test.tmp.dir{.

As discussed under HIVE-26584 it might be risky to forcefully delete any kind 
of directory coming from an external table at the risk of corrupting the 
development environment. If we restrict the cleanup to known directories it 
should be fine though.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26557) AbstractMethodError when running TestWebHCatE2e

2022-09-23 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26557:
--

 Summary: AbstractMethodError when running TestWebHCatE2e
 Key: HIVE-26557
 URL: https://issues.apache.org/jira/browse/HIVE-26557
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog, Tests
Reporter: Stamatis Zampetakis


{code:bash}
mvn test -Dtest=TestWebHCatE2e
{code}

{noformat}
java.lang.AbstractMethodError: 
javax.ws.rs.core.UriBuilder.uri(Ljava/lang/String;)Ljavax/ws/rs/core/UriBuilder;
at javax.ws.rs.core.UriBuilder.fromUri(UriBuilder.java:119) 
~[javax.ws.rs-api-2.0.1.jar:2.0.1]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:669)
 ~[jersey-servlet-1.19.jar:1.19]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
~[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.apache.hive.hcatalog.templeton.Main$XFrameOptionsFilter.doFilter(Main.java:355)
 ~[classes/:?]
at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:650)
 ~[hadoop-auth-3.3.1.jar:?]
at 
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.doFilter(ProxyUserAuthenticationFilter.java:104)
 ~[hadoop-common-3.3.1.jar:?]
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
 ~[hadoop-auth-3.3.1.jar:?]
at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:51) 
~[hadoop-hdfs-3.3.1.jar:?]
at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.server.Server.handle(Server.java:516) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
 ~[jetty-io-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) 
~[jetty-io-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) 
~[jetty-io-9.4.40.v2021

[jira] [Created] (HIVE-26549) WebHCat servers fails to start due to authentication filter configuration

2022-09-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26549:
--

 Summary: WebHCat servers fails to start due to authentication 
filter configuration
 Key: HIVE-26549
 URL: https://issues.apache.org/jira/browse/HIVE-26549
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog, Test
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The TestWebHCatE2e test fails cause the server cannot start. The exception is 
shown below:

{noformat}
2022-09-20T02:10:15,186 ERROR [main] templeton.Main: Server failed to start: 
javax.servlet.ServletException: Authentication type must be specified: 
simple|kerberos|
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:164)
 ~[hadoop-auth-3.3.1.jar:?]
at 
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.init(ProxyUserAuthenticationFilter.java:57)
 ~[hadoop-common-3.3.1.jar:?]
at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:140) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:731)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) 
~[?:1.8.0_261]
at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) 
~[?:1.8.0_261]
at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
~[?:1.8.0_261]
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) 
~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:911)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288)
 ~[jetty-servlet-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.server.Server.start(Server.java:423) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
 ~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.eclipse.jetty.server.Server.doStart(Server.java:387) 
~[jetty-server-9.4.40.v20210413.jar:9.4.40.v20210413]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
 ~[jetty-runner-9.4.40.v20210413.jar:9.4.40.v20210413]
at org.apache.hive.hcatalog.templeton.Main.runServer(Main.java:255) 
~[classes/:?]
at org.apache.hive.hcatalog.templeton.Main.run(Main.java:147) 
~[classes/:?]
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.startHebHcatInMem(TestWebHCatE2e.java:94)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 ~[junit-4.13.jar:4.13]
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 ~[junit-4.13.jar:4.13]
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 ~[junit-4.13.jar:4.13]
at 
org.junit.internal.runners.statements.RunBefores.invokeMeth

[jira] [Created] (HIVE-26461) Add CI build check for macOS

2022-08-10 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26461:
--

 Summary: Add CI build check for macOS
 Key: HIVE-26461
 URL: https://issues.apache.org/jira/browse/HIVE-26461
 Project: Hive
  Issue Type: Test
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Add CI builds for Hive over macOS distribution to test that the project can be 
compiled successfully in this platform and ensure that future changes will not 
break it accidentally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26458) Add explicit dependency to commons-dbcp2 in hive-exec module

2022-08-05 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26458:
--

 Summary: Add explicit dependency to commons-dbcp2 in hive-exec 
module
 Key: HIVE-26458
 URL: https://issues.apache.org/jira/browse/HIVE-26458
 Project: Hive
  Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Hive CBO relies on Calcite so there is a direct dependency towards Calcite in 
hive-exec module. On its turn, Calcite needs commons-dbcp2 dependency in order 
to compile and run properly:

https://github.com/apache/calcite/blob/b9c2099ea92a575084b55a206efc5dd341c0df62/core/build.gradle.kts#L69

In particular the dependency is necessary in order to use the JDBC adapter and 
some of its usages are shown below:
* 
https://github.com/apache/calcite/blob/257c81b5cac35e29598a246463356fea7e0b0336/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcUtils.java#L29
* 
https://github.com/apache/calcite/blob/257c81b5cac35e29598a246463356fea7e0b0336/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcUtils.java#L262


However, due to the [shading of 
Calcite|https://github.com/apache/hive/blob/778c838317c952dcd273fd6c7a51491746a1d807/ql/pom.xml#L1075]
 inside hive-exec module all the transitive dependencies coming from Calcite 
must be defined explicitly otherwise they will not make it to the classpath.

At the moment this does not pose a problem in master since {{commons-dbcp2}} 
dependency comes transitively from other modules. But in certain Hive branches 
with slightly different dependencies between modules we have seen failures like 
the one shown below:

{noformat}
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
org/apache/commons/dbcp2/BasicDataSource
at 
org.apache.calcite.adapter.jdbc.JdbcUtils$DataSourcePool.(JdbcUtils.java:213)
at 
org.apache.calcite.adapter.jdbc.JdbcUtils$DataSourcePool.(JdbcUtils.java:210)
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.dataSource(JdbcSchema.java:207)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3331)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5324)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1815)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.plan(CalcitePlanner.java:1411)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:588)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13071)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:472)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:312)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:201)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:650)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:596)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:590)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:867)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:178)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:173)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccesso

[jira] [Created] (HIVE-26441) Add DatabaseAccessor unit tests for all methods and supported DBMS

2022-08-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26441:
--

 Summary: Add DatabaseAccessor unit tests for all methods and 
supported DBMS
 Key: HIVE-26441
 URL: https://issues.apache.org/jira/browse/HIVE-26441
 Project: Hive
  Issue Type: Test
  Components: JDBC storage handler
Reporter: Stamatis Zampetakis


The 
[DatabaseAccessor|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/DatabaseAccessor.java]
 interface provides various APIs and has multiple concrete implementations one 
for each supported DBMS.

There are a few end-to-end tests for JDBC storage handler (see 
[relevant|https://github.com/search?q=repo%3Aapache%2Fhive+filename%3A*jdbc*.q+extension%3Aq+filename%3A*jdbc*&type=Code]
 qfiles) and also a few unit tests 
([TestGenericJdbcDatabaseAccessor|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/jdbc-handler/src/test/java/org/apache/hive/storage/jdbc/dao/TestGenericJdbcDatabaseAccessor.java])
 but we do not have enough coverage.

Ideally we should have unit tests for each method present in the top level 
interface and for each supported DBMS. The goal of this JIRA is to add more 
unit tests, similar to what {{TestGenericJdbcDatabaseAccessor}} is doing, 
covering more methods, use-cases, and DBMS.

The scope of this JIRA can get quite big so it makes sense to create additional 
sub-tasks for addressing specific cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26440) Duplicate hive-standalone-metastore-server dependency in QFile module

2022-08-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26440:
--

 Summary: Duplicate hive-standalone-metastore-server dependency in 
QFile module
 Key: HIVE-26440
 URL: https://issues.apache.org/jira/browse/HIVE-26440
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The hive-standalone-metastore-server dependency is defined two times in the 
QFile module 
([pom.xml|https://github.com/apache/hive/blob/9909edee8dad841e15fc36df81a2316bcb381bc3/itests/qtest/pom.xml#L67])
 leading to the following warning.
{noformat}
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hive:hive-it-qfile:jar:4.0.0-alpha-2-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hive:hive-standalone-metastore-server:jar:tests -> 
duplicate declaration of version (?) @ line 67, column 17
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten 
the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support 
building such malformed projects.
[WARNING] 
[INFO] 
[INFO] ---< org.apache.hive:hive-it-qfile >
[INFO] Building Hive Integration - QFile Tests 4.0.0-alpha-2-SNAPSHOT
[INFO] [ jar ]-
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26427) Unify JoinDeriveIsNotNullFilterRule with HiveJoinAddNotNullRule

2022-07-25 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26427:
--

 Summary: Unify JoinDeriveIsNotNullFilterRule with 
HiveJoinAddNotNullRule
 Key: HIVE-26427
 URL: https://issues.apache.org/jira/browse/HIVE-26427
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Stamatis Zampetakis


[JoinDeriveIsNotNullFilterRule|https://github.com/apache/calcite/blob/9bdd26159110663c4a207e3e8c378d1c3d16e034/core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveIsNotNullFilterRule.java]
 has been introduced recently in Calcite as part of CALCITE-3890.

The rule has similar goals with HiveJoinAddNotNullRule (that exists in Hive 
since HIVE-9581) so ideally (and in order to avoid maintaining the code twice) 
we should use the one provided by Calcite if possible.

At this stage the rules are not identical so we cannot replace one with the 
other immediately but hopefully we can work together with the Calcite community 
to reuse common parts so both project can benefit from each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26404) HMS memory leak when compaction cleaner fails to remove obsolete files

2022-07-18 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26404:
--

 Summary: HMS memory leak when compaction cleaner fails to remove 
obsolete files
 Key: HIVE-26404
 URL: https://issues.apache.org/jira/browse/HIVE-26404
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


While investigating an issue where HMS becomes unresponsive we noticed a lot of 
failed attempts from the compaction Cleaner thread to remove obsolete 
directories with exceptions similar to the one below.
{noformat}
2022-06-16 05:48:24,819 ERROR org.apache.hadoop.hive.ql.txn.compactor.Cleaner: 
[Cleaner-executor-thread-0]: Caught exception when cleaning, unable to complete 
cleaning of 
id:4410976,dbname:my_database,tableName:my_table,partName:day=20220502,state:,type:MAJOR,enqueueTime:0,start:0,properties:null,runAs:some_user,tooManyAborts:false,hasOldAbort:false,highestWriteId:187502,errorMessage:null
 java.io.IOException: Not enough history available for (187502,x).  Oldest 
available base: 
hdfs://nameservice1/warehouse/tablespace/managed/hive/my_database.db/my_table/day=20220502/base_0188687_v4297872
at 
org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1432)
at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.removeFiles(Cleaner.java:261)
at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.access$000(Cleaner.java:71)
at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner$1.run(Cleaner.java:203)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.clean(Cleaner.java:200)
at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.lambda$run$0(Cleaner.java:105)
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil$ThrowingRunnable.lambda$unchecked$0(CompactorUtil.java:54)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
In addition the logs contained a large number of long JVM pauses as shown below 
and the HMS (RSZ) memory kept increasing at rate of 90MB per hour.
{noformat}
2022-06-16 16:17:17,805 WARN  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 34346ms
2022-06-16 16:17:21,497 INFO  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 1690ms
2022-06-16 16:17:57,696 WARN  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 34697ms
2022-06-16 16:18:01,326 INFO  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 1628ms
2022-06-16 16:18:37,280 WARN  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 34453ms
2022-06-16 16:18:40,927 INFO  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 1646ms
2022-06-16 16:19:16,929 WARN  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 33997ms
2022-06-16 16:19:20,572 INFO  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 1637ms
2022-06-16 16:20:01,643 WARN  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor$Monitor@5b022296]: 
Detected pause in JVM or host machine (eg GC): pause of approximately 39329ms
2022-06-16 16:20:05,572 INFO  
org.apache.hadoop.hive.metastore.metrics.JvmPauseMonitor: 
[org.apache.hadoop.h

[jira] [Created] (HIVE-26389) ALTER TABLE CASCADE is slow for tables with many partitions

2022-07-13 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26389:
--

 Summary: ALTER TABLE CASCADE is slow for tables with many 
partitions
 Key: HIVE-26389
 URL: https://issues.apache.org/jira/browse/HIVE-26389
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Planning
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: native_sql_queries.txt, per_partition_sql_queries.txt

Consider the following simplified scenario with a table having two partitions.
{code:sql}
CREATE TABLE student (fname string, lname string) PARTITIONED BY (department 
string);
INSERT INTO student VALUES ('Alex','Dumas', 'Computer Science');
INSERT INTO student VALUES ('Victor','Hugo', 'Physics');
{code}

Altering a column of this table and propagating the changes to the partitions 
(using the CASCADE) syntax is slow.
{code:sql}
ALTER TABLE student CHANGE lname lastname STRING CASCADE;
{code}

The seemingly simple ALTER statement outlined above triggers roughly 136 SQL 
queries in the underlying DBMS of the metastore (see native_sql_queries.txt).

We can observe that some of these queries are recurring and appear as many 
times as there are partitions in the table (see per_partition_sql_queries.txt).

As the number of partitions grows so does the number of queries so if we manage 
to reduce the number of queries send per partition or make them more efficient 
this will have a positive impact on performance.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26350) IndexOutOfBoundsException when generating splits for external JDBC table with partition columns

2022-06-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26350:
--

 Summary: IndexOutOfBoundsException when generating splits for 
external JDBC table with partition columns
 Key: HIVE-26350
 URL: https://issues.apache.org/jira/browse/HIVE-26350
 Project: Hive
  Issue Type: Bug
  Components: CBO, JDBC storage handler
Reporter: Stamatis Zampetakis


Create the following table in some JDBC database (e.g., Postgres).

{code:sql}
CREATE TABLE country
(
id   int,
name varchar(20)
);
{code}

Create the following tables in Hive ensuring that the external JDBC table has 
the {{hive.sql.partitionColumn}} table property set.

{code:sql}
CREATE TABLE city (id int);

CREATE EXTERNAL TABLE country
(
id int,
name varchar(20)
)
STORED BY  
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver" = "org.postgresql.Driver",
"hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
"hive.sql.dbcp.username" = "qtestuser",
"hive.sql.dbcp.password" = "qtestpassword",
"hive.sql.table" = "country",
"hive.sql.partitionColumn" = "name",
"hive.sql.numPartitions" = "2"
);
{code}

The query below fails with IndexOutOfBoundsException when the mapper scanning 
the JDBC table tries to generate the splits by exploiting the partitioning 
column.

{code:sql}
select country.id from country cross join city;
{code}

The full stack trace is given below.
{noformat}
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261]
at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261]
at 
org.apache.hive.storage.jdbc.JdbcInputFormat.getSplits(JdbcInputFormat.java:102)
 [hive-jdbc-handler-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:564)
 [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:858)
 [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:263)
 [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281)
 [tez-dag-0.10.1.jar:0.10.1]
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272)
 [tez-dag-0.10.1.jar:0.10.1]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_261]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
 [hadoop-common-3.1.0.jar:?]
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272)
 [tez-dag-0.10.1.jar:0.10.1]
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256)
 [tez-dag-0.10.1.jar:0.10.1]
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
 [guava-19.0.jar:?]
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
 [guava-19.0.jar:?]
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
 [guava-19.0.jar:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_261]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_261]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26349) TestOperatorCmp/TestReOptimization fail silently due to incompatible configuration

2022-06-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26349:
--

 Summary: TestOperatorCmp/TestReOptimization fail silently due to 
incompatible configuration
 Key: HIVE-26349
 URL: https://issues.apache.org/jira/browse/HIVE-26349
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Running TestOperatorCmp, TestReOptimization currently in master 
(https://github.com/apache/hive/commit/10e5381cb6a4215c0b25fe0cda0a26a084ba6a89)
 shows BUILD SUCCESS although the tests are actually failing when executing the 
{{@BeforeClass}} logic. 

Since the error appears inside {{@BeforeClass}} the failure remains unnoticed 
and the only  indication that something is wrong is given by the INFO line 
below:

{noformat}
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
{noformat}

+Steps to reproduce:+
{code:bash}
mvn test -Dtest=TestOperatorCmp
mvn test -Dtest=TestReOptimization
{code}
 
{noformat}
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-exec ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.732 s 
- in org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  18.962 s
[INFO] Finished at: 2022-06-22T12:49:54+02:00
[INFO] 

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26343) TestWebHCatE2e causes surefire fork to exit and fails

2022-06-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26343:
--

 Summary: TestWebHCatE2e causes surefire fork to exit and fails
 Key: HIVE-26343
 URL: https://issues.apache.org/jira/browse/HIVE-26343
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Testing Infrastructure
Reporter: Stamatis Zampetakis


Any attempt to run TestWebHCatE2e in current master 
([https://github.com/apache/hive/commit/948f9fb56a00e981cd653146de44ae82307b4f2f])
 causes the surefire fork to exit and the test fails.
{noformat}
cd hcatalog/webhcat/svr && mvn test -Dtest=TestWebHCatE2e

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on 
project hive-webhcat: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire-reports
 for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, 
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr && 
/opt/jdks/jdk1.8.0_261/jre/bin/java -Xmx2048m -jar 
/home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire/surefirebooter4564605288390864592.jar
 /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire 
2022-06-20T16-29-05_858-jvmRun1 surefire4795088574293215609tmp 
surefire_01535173811171404671tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr && 
/opt/jdks/jdk1.8.0_261/jre/bin/java -Xmx2048m -jar 
/home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire/surefirebooter4564605288390864592.jar
 /home/stamatis/Projects/Apache/hive/hcatalog/webhcat/svr/target/surefire 
2022-06-20T16-29-05_858-jvmRun1 surefire4795088574293215609tmp 
surefire_01535173811171404671tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:513)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:460)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:301)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:249)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR]

[jira] [Created] (HIVE-26332) Upgrade maven-surefire-plugin to 3.0.0-M7

2022-06-15 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26332:
--

 Summary: Upgrade maven-surefire-plugin to 3.0.0-M7
 Key: HIVE-26332
 URL: https://issues.apache.org/jira/browse/HIVE-26332
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
multiple bug fixes and improvements:

[https://issues.apache.org/jira/issues/?jql=project%20%3D%20SUREFIRE%20AND%20(fixVersion%20%3D%203.0.0-M5%20OR%20fixVersion%20%3D%203.0.0-M6%20OR%20fixVersion%20%3D%203.0.0-M7)%20ORDER%20BY%20resolutiondate%20%20DESC%2C%20key]

Worth mentioning that interaction with JUnit5 is much more mature as well and 
this is one of the main reasons driving this upgrade.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26331) Use maven-surefire-plugin version consistently in standalone-metastore modules

2022-06-15 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26331:
--

 Summary: Use maven-surefire-plugin version consistently in 
standalone-metastore modules
 Key: HIVE-26331
 URL: https://issues.apache.org/jira/browse/HIVE-26331
 Project: Hive
  Issue Type: Task
  Components: Standalone Metastore, Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Due to some problems in the pom.xml files inside the standalone-metastore 
modules we end up using different maven-surefire-plugin versions.

Most of the modules use 3.0.0-M4, which is the expected one, while 
the {{hive-standalone-metastore-common}} uses the older 2.22.0 version.

+Actual+ 
{noformat}
[INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @ 
hive-standalone-metastore-common ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore 
---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
hive-standalone-metastore-server ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
metastore-tools-common ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
hive-metastore-benchmarks ---
{noformat}

The goal of this JIRA is to ensure we use the same version consistently in all 
modules.

+Expected+
{noformat}
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
hive-standalone-metastore-common ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-metastore 
---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
hive-standalone-metastore-server ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
metastore-tools-common ---
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
hive-metastore-benchmarks ---
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26312) Use default digest normalization strategy in CBO

2022-06-10 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26312:
--

 Summary: Use default digest normalization strategy in CBO
 Key: HIVE-26312
 URL: https://issues.apache.org/jira/browse/HIVE-26312
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


CALCITE-2450 introduced a way to improve planning time by normalizing some 
query expressions (RexNodes). The behavior can be enabled/disabled via the 
following system property: calcite.enable.rexnode.digest.normalize

There was an attempt to disable the normalization explicitly in HIVE-23456 to 
avoid rendering HiveFilterSortPredicates rule useless. However, the [way the 
normalization is disabled 
now|https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L549],
 dependents on the way classes are loaded. If for some reason 
CalciteSystemProperty is loaded before hitting the respective line in Hive.java 
setting the property will not have any effect.

After HIVE-26238 the behavior of the rule is not dependent in the value of the 
property so there is nothing holding us back from enabling the normalization.

At the moment there is not strong reason to enable or disable the normalization 
explicitly so it is better to rely on the default value provided by Calcite to 
avoid running with different normalization strategy when the class loading 
order changes.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26310) Remove unused junit runners from test-utils module

2022-06-10 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26310:
--

 Summary: Remove unused junit runners from test-utils module
 Key: HIVE-26310
 URL: https://issues.apache.org/jira/browse/HIVE-26310
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The two classes under 
https://github.com/apache/hive/tree/master/testutils/src/java/org/apache/hive/testutils/junit/runners
 namely:
* 
[ConcurrentTestRunner|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/ConcurrentTestRunner.java]
* 
[ConcurrentScheduler|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/model/ConcurrentScheduler.java]

have been introduced a long time ago by HIVE-2935 to somewhat parallelize 
execution for {{TestBeeLineDriver}}.

However, since HIVE-1 (resolved 6 years ago) they are not used by anyone 
and unlikely to be used again in the future since there are much more modern 
alternatives.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26309) Remove Log4jConfig junit extension in favor LoggerContextSource

2022-06-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26309:
--

 Summary: Remove Log4jConfig junit extension in favor 
LoggerContextSource
 Key: HIVE-26309
 URL: https://issues.apache.org/jira/browse/HIVE-26309
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The Log4JConfig JUnit extension was introduced by HIVE-24588 in order to 
facilitate running tests with a specific log4j2 configuration.

However, there is a very similar and seemingly more powerful JUnit extension in 
the official LOG4J2 release/repo, i.e., 
[LoggerContextSource|https://github.com/apache/logging-log4j2/blob/eedc3cdb6be6744071f8ae6dcfb37b26b1fc0940/log4j-core/src/test/java/org/apache/logging/log4j/junit/LoggerContextSource.java].
 

The goal of this JIRA is to remove code related to Log4jConfig from Hive repo 
and replace its usages with LoggerContextSource. By doing this we reduce the 
maintenance overhead for the Hive community and reduce the dependencies to 
log4j-core.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26296) RuntimeException when executing EXPLAIN CBO JOINCOST on query with JDBC tables

2022-06-07 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26296:
--

 Summary: RuntimeException when executing EXPLAIN CBO JOINCOST on 
query with JDBC tables
 Key: HIVE-26296
 URL: https://issues.apache.org/jira/browse/HIVE-26296
 Project: Hive
  Issue Type: Bug
  Components: CBO, HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Consider a JDBC database with two tables _author_ and _book_.
{code:sql}
CREATE EXTERNAL TABLE author
(
id int,
fname varchar(20),
lname varchar(20)
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
...
"hive.sql.table" = "author"
);

CREATE EXTERNAL TABLE book
(
id int,
title varchar(100),
author int
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
...
"hive.sql.table" = "book"
);
{code}

Executing an {{EXPLAIN CBO JOINCOST}} with a query joining two JDBC tables 
fails with {{RuntimeException}} while trying to compute the selectivity of the 
join.
{code:sql}
EXPLAIN CBO JOINCOST 
SELECT a.lname, b.title FROM author a JOIN book b ON a.id=b.author;
{code}

+Stacktrace+
{noformat}
java.lang.RuntimeException: Unexpected Join type: 
org.apache.calcite.adapter.jdbc.JdbcRules$JdbcJoin
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdSelectivity.computeInnerJoinSelectivity(HiveRelMdSelectivity.java:156)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdSelectivity.getSelectivity(HiveRelMdSelectivity.java:68)
at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source)
at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426)
at 
org.apache.calcite.rel.metadata.RelMdUtil.getJoinRowCount(RelMdUtil.java:736)
at 
org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:195)
at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212)
at 
org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:140)
at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212)
at 
org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:191)
at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212)
at 
org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:100)
at 
org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
at 
org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:648)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12699)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:495)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:447)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:412)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:406)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200)
at org.apache.hadoop

[jira] [Created] (HIVE-26290) Remove useless calls to DateTimeFormatter#withZone without assignment

2022-06-03 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26290:
--

 Summary: Remove useless calls to DateTimeFormatter#withZone 
without assignment
 Key: HIVE-26290
 URL: https://issues.apache.org/jira/browse/HIVE-26290
 Project: Hive
  Issue Type: Task
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


There are some places in the code calling \{{DateTimeFormatter#withZone}} 
without assigning the result anywhere. This basically makes the call useless 
since the method does not modify the formatter instance but always creates a 
new one.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26289) Remove useless try catch in DataWritableReadSupport#getWriterDateProleptic

2022-06-03 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26289:
--

 Summary: Remove useless try catch in 
DataWritableReadSupport#getWriterDateProleptic
 Key: HIVE-26289
 URL: https://issues.apache.org/jira/browse/HIVE-26289
 Project: Hive
  Issue Type: Task
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


{code:java}
try {
  if (value != null) {
return Boolean.valueOf(value);
  }
} catch (DateTimeException e) {
  throw new RuntimeException("Can't parse writer proleptic property stored 
in file metadata", e);
}
{code}
The Boolean.valueOf never throws so try catch block is completely useless.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26281) Missing statistics when requesting partition by names via HS2

2022-06-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26281:
--

 Summary: Missing statistics when requesting partition by names via 
HS2
 Key: HIVE-26281
 URL: https://issues.apache.org/jira/browse/HIVE-26281
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


[Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
 method can be used to obtain partition objects from the metastore by 
specifying their names and other options.

{code:java}
public List getPartitionsByNames(Table tbl, List partNames, 
boolean getColStats){code}

However, the partition statistics are missing from the returned objects no 
matter the value of the {{getColStats}} parameter.

The problem is 
[here|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4174]
 and was caused by HIVE-24743.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker

2022-06-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26279:
--

 Summary: Drop unused requests from 
TestHiveMetaStoreClientApiArgumentsChecker
 Key: HIVE-26279
 URL: https://issues.apache.org/jira/browse/HIVE-26279
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a request 
but not really using them so it is basically dead code that can be removed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26278:
--

 Summary: Add unit tests for Hive#getPartitionsByNames using 
batching
 Key: HIVE-26278
 URL: https://issues.apache.org/jira/browse/HIVE-26278
 Project: Hive
  Issue Type: Task
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


[Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
 supports decomposing requests in batches but there are no unit tests checking 
for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader

2022-05-27 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26270:
--

 Summary: Wrong timestamps when reading Hive 3.1.x Parquet files 
with vectorized reader
 Key: HIVE-26270
 URL: https://issues.apache.org/jira/browse/HIVE-26270
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Parquet
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific.
{code:sql}
CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET;

INSERT INTO employee VALUES 
(1, '1880-01-01 00:00:00'),
(2, '1884-01-01 00:00:00'),
(3, '1990-01-01 00:00:00');
{code}
Parquet files read with Hive 4.0.0-apha-1 onwards.

+Without vectorization+ results are correct.
{code:sql}
SELECT * FROM employee;
{code}
{noformat}
1   1880-01-01 00:00:00
2   1884-01-01 00:00:00
3   1990-01-01 00:00:00
{noformat}
+With vectorization+ some timestamps are shifted.
{code:sql}
-- Disable fetch task conversion to force vectorization kick in
set hive.fetch.task.conversion=none;
SELECT * FROM employee;
{code}
{noformat}
1   1879-12-31 23:52:58
2   1884-01-01 00:00:00
3   1990-01-01 00:00:00
{noformat}
The problem is the same reported under HIVE-24074. The data were written using 
the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they were 
read using the old APIs (java.sql).

The difference with HIVE-24074 is that here the problem appears only for 
vectorized execution while the non-vectorized reader is working fine so there 
is some *inconsistency in the behavior* of vectorized and non vectorized 
readers.

Non-vectorized reader works fine cause it derives automatically that it should 
use the new JDK APIs to read back the timestamp value. This is possible in this 
case cause there are metadata information in the file (i.e., the presence of 
{{{}writer.time.zone{}}}) from where it can infer that the timestamps were 
written using the new Date/Time APIs.

The inconsistent behavior between vectorized and non-vectorized reader is a 
regression caused by HIVE-25104. This JIRA is an attempt to re-align the 
behavior between vectorized and non-vectorized readers.

Note that if the file metadata are empty both vectorized and non-vectorized 
reader cannot determine which APIs to use for the conversion and in this case 
it is necessary the user to set the
{{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back the 
correct results.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26238) Decouple sort filter predicates optimization from digest normalization in CBO

2022-05-18 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26238:
--

 Summary: Decouple sort filter predicates optimization from digest 
normalization in CBO
 Key: HIVE-26238
 URL: https://issues.apache.org/jira/browse/HIVE-26238
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HIVE-21857 introduced an optimization for ordering predicates inside a filter 
based on a cost function. After HIVE-23456, this optimization can run only if 
the the digest normalization (introduced in CALCITE-2450) in CBO is disabled 
(via {{calcite.enable.rexnode.digest.normalize}}).

The goal of this issue is to decouple the sort predicate optimization from 
digest normalization. After the changes here the optimization shouldn't be 
affected by the value of {{calcite.enable.rexnode.digest.normalize}} property.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26168) EXPLAIN DDL command output is not deterministic

2022-04-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26168:
--

 Summary: EXPLAIN DDL command output is not deterministic 
 Key: HIVE-26168
 URL: https://issues.apache.org/jira/browse/HIVE-26168
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis


The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a 
given query in order to debug planner issues. This is achieved by fetching 
information from the metastore and outputting series of DDL commands. 

The output commands though may appear in different order among runs since there 
is no mechanism to enforce an explicit order.

Consider for instance the following scenario.

{code:sql}
CREATE TABLE customer
(
`c_custkey` bigint,
`c_name`string,
`c_address` string
);

INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24 
avenue Mansart');

EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob'; 
{code}

+Result 1+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS 
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address 
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name 
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT 
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

+Result 2+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED
ALTER TABLE default.customer UPDATE STATISTICS 
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address 
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF  
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name 
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT 
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

The two results are equivalent but the statements appear in a different order. 
This is not a big issue cause the results remain correct but it may lead to 
test flakiness so it might be worth addressing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26166) Make website GDPR compliant

2022-04-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26166:
--

 Summary: Make website GDPR compliant
 Key: HIVE-26166
 URL: https://issues.apache.org/jira/browse/HIVE-26166
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Stamatis Zampetakis


Per the email that was sent out from privacy we need to make the Hive website 
GDPR compliant. 
 # The link to privacy policy needs to be updated from 
[https://hive.apache.org/privacy_policy.html] to 
[https://privacy.apache.org/policies/privacy-policy-public.html]
 # The google analytics service must be removed



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26126) Allow capturing/validating SQL generated from HMS calls in qtests

2022-04-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26126:
--

 Summary: Allow capturing/validating SQL generated from HMS calls 
in qtests
 Key: HIVE-26126
 URL: https://issues.apache.org/jira/browse/HIVE-26126
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


During the compilation/execution of a Hive command there are usually calls in 
the HiveMetastore (HMS). Most of the time these calls need to connect to the 
underlying database backend in order to return the requested information so 
they trigger the generation and execution of SQL queries. 

We have a lot of code in Hive which affects the generation and execution of 
these SQL queries and some vivid examples are the {{MetaStoreDirectSql}} and 
{{CachedStore}} classes.

[MetaStoreDirectSql|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java]
 is responsible for building explicitly SQL queries for performance reasons. 

[CachedStore|https://github.com/apache/hive/blob/e8f3a6cdc22c6a4681af2ea5763c80a5b76e310b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java]
 is responsible for caching certain requests to avoid going to the database on 
every call. 

Ensuring that the generated SQL is the expected one and/or that certain queries 
are hitting (or not) the DB is valuable for catching regressions or evaluating 
the effectiveness of caches.

The idea is that for each Hive command/query in some qtest there is an option 
to include in the output (.q.out) the list of SQL queries that were generated 
by HMS calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-03-30 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26095:
--

 Summary: Add queryid in QueryLifeTimeHookContext
 Key: HIVE-26095
 URL: https://issues.apache.org/jira/browse/HIVE-26095
 Project: Hive
  Issue Type: New Feature
  Components: Hooks
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0-alpha-2


A 
[QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
 is executed various times in the life-cycle of a query but it is not always 
possible to obtain the id of the query. The query id is inside the 
{{HookContext}} but the latter is not always available notably during 
compilation.

The query id is useful for many purposes as it is the only way to uniquely 
identify the query/command that is currently running. It is also the only way 
to match together events appearing in before and after methods.

The goal of this jira is to add the query id in 
[QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
 and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26022) Error: ORA-00904 when initializing metastore schema in Oracle

2022-03-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26022:
--

 Summary: Error: ORA-00904 when initializing metastore schema in 
Oracle
 Key: HIVE-26022
 URL: https://issues.apache.org/jira/browse/HIVE-26022
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Stamatis Zampetakis
 Fix For: 4.0.0-alpha-1


The Metastore schema tool fails to create the database schema when the 
underlying backend is Oracle. 

The initialization scripts fails while creating the "REPLICATION_METRICS" table:

{noformat}
338/362  --Create table replication metrics
339/362  CREATE TABLE "REPLICATION_METRICS" ( 
  "RM_SCHEDULED_EXECUTION_ID" number PRIMARY KEY, 
  "RM_POLICY" varchar2(256) NOT NULL, 
  "RM_DUMP_EXECUTION_ID" number NOT NULL, 
  "RM_METADATA" varchar2(4000), 
  "RM_PROGRESS" varchar2(4000), 
  "RM_START_TIME" integer NOT NULL, 
  "MESSAGE_FORMAT" VARCHAR(16) DEFAULT 'json-0.2', 
);
Error: ORA-00904: : invalid identifier (state=42000,code=904)
{noformat}

The problem can be reproduced by running the {{ITestOracle}}.

{noformat}
mvn -pl standalone-metastore/metastore-server verify -DskipITests=false 
-Dit.test=ITestOracle -Dtest=nosuch
{noformat}






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26021) Change integration tests under DBInstallBase to regular unit tests

2022-03-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26021:
--

 Summary: Change integration tests under DBInstallBase to regular 
unit tests
 Key: HIVE-26021
 URL: https://issues.apache.org/jira/browse/HIVE-26021
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Stamatis Zampetakis


After HIVE-18588, some tests including those under 
[DBInstallBase|https://github.com/apache/hive/blob/1139c4b14db82a9e2316196819b35cfb713f34b5/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/DbInstallBase.java]
 class have been marked as integration tests mainly to keep the test duration 
low.

Nowadays, Hive developers rarely run all tests locally so separating between 
integration tests and unit tests does not provide a clear benefit. The 
separation adds maintenance cost and makes their execution more difficult 
scaring people away.

The goal of this issue is to change the tests under {{DBInstallBase}} from 
"integration" tests back to regular unit tests and run them as part of the 
standard maven test phase without any fancy arguments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26020) Set dependency scope for json-path, commons-compiler and janino to runtime

2022-03-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26020:
--

 Summary: Set dependency scope for json-path, commons-compiler and 
janino to runtime
 Key: HIVE-26020
 URL: https://issues.apache.org/jira/browse/HIVE-26020
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The dependencies are necessary only when running Hive. They are not required 
during compilation since Hive does not depend on them directly but transitively 
through Calcite.

Changing the scope to runtime makes the intention clear and guards against 
accidental usages in Hive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26019) Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0

2022-03-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26019:
--

 Summary: Upgrade com.jayway.jsonpath from 2.4.0 to 2.7.0
 Key: HIVE-26019
 URL: https://issues.apache.org/jira/browse/HIVE-26019
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26014) Remove redundant HushableRandomAccessFileAppender

2022-03-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26014:
--

 Summary: Remove redundant HushableRandomAccessFileAppender
 Key: HIVE-26014
 URL: https://issues.apache.org/jira/browse/HIVE-26014
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


[HushableRandomAccessFileAppender|https://github.com/apache/hive/blob/d3cd596aa15ebedd58f99628d43a03eb2f5f3909/ql/src/java/org/apache/hadoop/hive/ql/log/HushableRandomAccessFileAppender.java]
 was introduced by HIVE-17826 to avoid exceptions originating from attempts to 
write to a closed appender.

After the changes in HIVE-24590, the life-cycle (opening/closing/deleting) of 
appenders is managed by the Log4j framework and not explicitly by Hive as it 
used to be before. With HIVE-24590 in place, it is no longer possible to have 
the exception in HIVE-17826 cause appenders are opened and closed when 
necessary. 

Due to the above, the {{HushableRandomAccessFileAppender}} is completely 
redundant and can be removed in favor of the {{RandomAccessFileAppender}} 
already provided by the Log4j framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26005) Run selected qtest on different metastore backends

2022-03-04 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-26005:
--

 Summary: Run selected qtest on different metastore backends
 Key: HIVE-26005
 URL: https://issues.apache.org/jira/browse/HIVE-26005
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis


In various cases there are bugs which affect only certain types of metastore 
databases (e.g., HIVE-26000) and it would be nice to be able to specify for 
each test or a bunch of tests which metastore backend to use and have these 
tests consistently running in CI.

After HIVE-21954, it is possible to run qtests on different metastores by 
setting the system property {{test.metastore.db}} or introducing new 
[AbstractCliConfig|https://github.com/apache/hive/blob/fcd0a47c2e27defb04247ffca6da11734e3e25c3/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCliConfig.java]
 configuration with a new driver etc.

The naive way of implementing this task would be to copy an existing 
configuration, change the metastore type, select the input files, and create a 
new driver (probably again a copy from {{CoreCliDriver}}. 

Other ideas would be to allow a driver to run with multiple configurations, or 
handle the selection of the metastore type via QT options (similar to what was 
done in HIVE-25594).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25995) Build from source distribution archive fails

2022-03-01 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25995:
--

 Summary: Build from source distribution archive fails
 Key: HIVE-25995
 URL: https://issues.apache.org/jira/browse/HIVE-25995
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis


The source distribution archive, apache-hive-4.0.0-SNAPSHOT-src.tar.gz, can be 
produced by running:
{code:bash}
mvn clean package -DskipTests -Pdist
{code}
The file is generated under:
{noformat}
packaging/target/apache-hive-4.0.0-SNAPSHOT-src.tar.gz
{noformat}
The source distribution archive/package 
[should|https://www.apache.org/legal/release-policy.html#source-packages] allow 
anyone who downloads it to build and test Hive.

At the moment, on commit 
[b63dab11d229abac59a4ef5e141d8d9b28037c8b|https://github.com/apache/hive/commit/b63dab11d229abac59a4ef5e141d8d9b28037c8b],
 if someone produces the source package and extracts the contents of the 
archive, it is not possible to build Hive.

Both {{mvn install}} and {{mvn package}} commands fail when they are executed 
inside the directory extracted from the archive.
{noformat}
mvn clean install -DskipTests
mvn clean package -DskipTests
{noformat}
The error is shown below:
{noformat}
[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist 
@ 
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist 
@ 
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
 of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
exist @ 
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.hive:hive:4.0.0-SNAPSHOT 
(/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml) has 3 errors
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
[ERROR] Child module 
/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
 of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
exist
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25970) Missing messages in HS2 operation logs

2022-02-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25970:
--

 Summary: Missing messages in HS2 operation logs
 Key: HIVE-25970
 URL: https://issues.apache.org/jira/browse/HIVE-25970
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation 
log messages can get lost and never appear in the appropriate files.

The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} 
from being created if the latter refers to a file that has been closed in the 
last second. Preventing the creation of the appender also means that the 
message which triggered the creation will be lost forever. In fact any message 
(for the same query) that comes in the interval of 1 second will be lost 
forever.

Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) 
and thus the problem may be very hard to notice in practice. However, with the 
arrival of HIVE-24590 appenders may close much more frequently (and not via 
HS2) making the issue reproducible rather easily. It suffices to set 
_hive.server2.operation.log.purgePolicy.timeToLive_ property very low and check 
the operation logs.

The problem was discovered by investigating some intermittent failures in 
operation logging tests (e.g.,  TestOperationLoggingAPIWithTez).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25965) SQLDataException when obtaining partitions from HMS via direct SQL over Derby

2022-02-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25965:
--

 Summary: SQLDataException when obtaining partitions from HMS via 
direct SQL over Derby
 Key: HIVE-25965
 URL: https://issues.apache.org/jira/browse/HIVE-25965
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Stamatis Zampetakis


In certain cases fetching the partition information from the metastore using 
direct SQL fails with the stack trace below.

{noformat}
javax.jdo.JDODataStoreException: Error executing SQL query "select 
"PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on 
"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner 
join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ? inner 
join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
"PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" 
= ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? and "TBLS"."TBL_NAME" = ? 
and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and "FILTER0"."PART_ID" = 
"PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 then 
cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:542)
 ~[datanucleus-api-jdo-5.2.4.jar:?]
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:456) 
~[datanucleus-api-jdo-5.2.4.jar:?]
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:318) 
~[datanucleus-api-jdo-5.2.4.jar:?]
at 
org.apache.hadoop.hive.metastore.QueryWrapper.executeWithArray(QueryWrapper.java:137)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.executeWithArray(MetastoreDirectSqlUtils.java:69)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2156)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionIdsViaSqlFilter(MetaStoreDirectSql.java:894)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:663)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3962)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3953)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4269)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3989)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at com.sun.proxy.$Proxy60.getPartitionsByExpr(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7346)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at com.sun.proxy.$

[jira] [Created] (HIVE-25947) Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

2022-02-10 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25947:
--

 Summary: Compactor job queue cannot be set per table via 
compactor.mapred.job.queue.name
 Key: HIVE-25947
 URL: https://issues.apache.org/jira/browse/HIVE-25947
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Before HIVE-20723 it was possible to schedule the compaction for each table on 
specific job queues by putting {{compactor.mapred.job.queue.name}} in the table 
properties. 

{code:sql}
CREATE TABLE person (name STRING, age INT) STORED AS ORC TBLPROPERTIES(
'transactional'='true',
'compactor.mapred.job.queue.name'='root.user2);

ALTER TABLE person COMPACT 'major' WITH OVERWRITE 
TBLPROPERTIES('compactor.mapred.job.queue.name'='root.user2')
{code}

This is no longer possible (after HIVE-20723) and in order to achieve the same 
effect someone needs to use the {{compactor.hive.compactor.job.queue}}.

{code:sql}
CREATE TABLE person (name STRING, age INT) STORED AS ORC TBLPROPERTIES(
'transactional'='true',
'compactor.hive.compactor.job.queue'='root.user2);

ALTER TABLE person COMPACT 'major' WITH OVERWRITE 
TBLPROPERTIES('compactor.hive.compactor.job.queue'='root.user2')
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25945) Upgrade H2 database version to 2.1.210

2022-02-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25945:
--

 Summary: Upgrade H2 database version to 2.1.210
 Key: HIVE-25945
 URL: https://issues.apache.org/jira/browse/HIVE-25945
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The 1.3.166 version, which is in use in Hive, suffers from the following 
security vulnerabilities:
https://nvd.nist.gov/vuln/detail/CVE-2021-42392
https://nvd.nist.gov/vuln/detail/CVE-2022-23221

In the project, we use H2 only for testing purposes (inside the jdbc-handler 
module) thus the H2 binaries are not present in the runtime classpath thus 
these CVEs do not pose a problem for Hive or its users. Nevertheless, it would 
be good to upgrade to a more recent version to avoid Hive coming up in 
vulnerability scans due to this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25939) Support filter pushdown in HBaseStorageHandler for simple expressions with boolean columns

2022-02-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25939:
--

 Summary: Support filter pushdown in HBaseStorageHandler for simple 
expressions with boolean columns
 Key: HIVE-25939
 URL: https://issues.apache.org/jira/browse/HIVE-25939
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis


In current master (commit 
[4b7a948e45fd88372fef573be321cda40d189cc7|https://github.com/apache/hive/commit/4b7a948e45fd88372fef573be321cda40d189cc7]),
 the HBaseStorageHandler is able to push many simple comparison predicates into 
the underlying engine but fails do so for some simple predicates with boolean 
columns.

The goal of this issue is to support filter pushdown in HBaseStorageHandler for 
the following queries.
{code:sql}
CREATE TABLE hbase_table(row_key string, c1 boolean, c2 boolean)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:c1,cf:c2"
);
explain select * from hbase_table where c1;
explain select * from hbase_table where not c1;
explain select * from hbase_table where c1 = true;
explain select * from hbase_table where c1 = false;
explain select * from hbase_table where c1 IS TRUE;
explain select * from hbase_table where c1 IS FALSE;
{code}





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-02-07 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25936:
--

 Summary: ValidWriteIdList & table id are sometimes missing when 
requesting partitions by name via HS2
 Key: HIVE-25936
 URL: https://issues.apache.org/jira/browse/HIVE-25936
 Project: Hive
  Issue Type: Sub-task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


According to HIVE-24743 the table id and {{ValidWriteIdList}} are important for 
keeping HMS remote metadata cache consistent. Although HIVE-24743 attempted to 
pass the write id list and table id in every call to HMS it failed to do so 
completely. For those partitions not handled in the batch logic, the [metastore 
call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
 in {{Hive#getPartitionsByName}} method does not pass the table id and write id 
list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs

2022-02-07 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25935:
--

 Summary: Cleanup IMetaStoreClient#getPartitionsByNames APIs
 Key: HIVE-25935
 URL: https://issues.apache.org/jira/browse/HIVE-25935
 Project: Hive
  Issue Type: Task
  Components: Metastore
Reporter: Stamatis Zampetakis


Currently the 
[IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java]
 interface has 8 variants of the {{getPartitionsByNames}} method. Going quickly 
over the concrete implementation it appears that not all of them are 
useful/necessary so a bit of cleanup is needed.

Below a few potential problems I observed:
* Some of the APIs are not used anywhere in the project (neither by production 
nor by test code).
* Some of the APIs are deprecated in some concrete implementations but not 
globally at the interface level without an explanation why.
* Some of the implementations simply throw without doing anything.
* Many of the APIs are partially tested or not tested at all.

HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the 
aforementioned APIs.

It would be good to review the aforementioned APIs and decide what needs to 
stay and what needs to go as well as complete necessary when relevant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

2022-01-10 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25856:
--

 Summary: Intermittent null ordering in plans of queries with GROUP 
BY and LIMIT
 Key: HIVE-25856
 URL: https://issues.apache.org/jira/browse/HIVE-25856
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


{code:sql}
CREATE TABLE person (id INTEGER, country STRING);
EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
{code}

The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) 
from one execution to another.

{noformat}
CBO PLAN:
HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
  HiveProject(country=[$0], $f1=[$1])
HiveAggregate(group=[{1}], agg#0=[count()])
  HiveTableScan(table=[[default, person]], table:alias=[person])
{noformat}

{noformat}
CBO PLAN:
HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
  HiveProject(country=[$0], $f1=[$1])
HiveAggregate(group=[{1}], agg#0=[count()])
  HiveTableScan(table=[[default, person]], table:alias=[person])
{noformat}

This is unlikely to cause wrong results cause most aggregate functions (not 
all) do not return nulls thus null ordering doesn't matter much but it can lead 
to other problems such as:
* intermittent CI failures
* query/plan caching

I bumped into this problem after investigating test failures in CI. The 
following query in 
[offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
 returns different plan when it runs individually and when it runs along with 
some other qtest files.

{code:sql}
explain
select * from
(select key, count(1) from src group by key order by key limit 10,20) subq
join
(select key, count(1) from src group by key limit 20,20) subq2
on subq.key=subq2.key limit 3,5;
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25832) Exclude Category-X JDBC drivers from binary distribution

2021-12-22 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25832:
--

 Summary: Exclude Category-X JDBC drivers from binary distribution
 Key: HIVE-25832
 URL: https://issues.apache.org/jira/browse/HIVE-25832
 Project: Hive
  Issue Type: Task
  Components: distribution
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The binary distribution contains all the required elements to be able to run 
Hive in a cluster. It can be obtained by building from source using the 
following command:
{code:java}
mvn clean package -DskipTests -Pdist{code}
The binary distribution is also published during a release along with the 
source code.
 
In current master, commit 8572c1201e1d483eb03c7e413f4ff7f9b6f4a3d2, the binary 
distribution includes the following JDBC drivers:
 * derby-10.14.1.0.jar
 * postgresql-42.2.14.jar
 * ojdbc8-21.3.0.0.jar
 * mssql-jdbc-6.2.1.jre8.jar
 * mysql-connector-java-8.0.27.jar
 
JDBC drivers are needed: 
* by schemaTool to initialize the database backend for the Metastore
* by metastore to communicate with underlying database
so if we want Hive to work out of the box we have to provide at least one.

The Oracle (ojdbc8) and MySQL (mysql-connector-java) drivers must be removed 
cause their license is not compatible with Apache License 2 (see [category 
x|https://www.apache.org/legal/resolved.html#category-x]).

Previous Hive releases (e.g., 3.1.2) are not affected since they only contain: 
* derby-10.14.1.0.jar
* postgresql-9.4.1208.jre7.jar

The additional drivers that appear in the binary distribution are a side effect 
of  HIVE-25701.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25816) Log CBO plan after rule application for debugging purposes

2021-12-16 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25816:
--

 Summary: Log CBO plan after rule application for debugging purposes
 Key: HIVE-25816
 URL: https://issues.apache.org/jira/browse/HIVE-25816
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


In many cases, we want to identify which rule lead to a certain transformation 
in the plan or need to observe how the query plan evolves by applying some 
rules in order to fix some bug or find the right place to introduce another 
optimization step.

Currently there are some logs during the application of a rule triggered by the 
[HepPlanner|https://github.com/apache/calcite/blob/e04f3b08dcfb6910ff4df3810772c346b25ed424/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L367]
 and 
[VolcanoPlanner|https://github.com/apache/calcite/blob/e04f3b08dcfb6910ff4df3810772c346b25ed424/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoRuleCall.java#L126]
 but they more or less display only the top operator of the transformation and 
not the whole subtree. 

It would help if instead of displaying only the top operator we logged the 
equivalent of {{EXPLAIN CBO}} on the transformed sub-tree. 

The change is going to be introduced soon by default in Calcite (CALCITE-4704) 
but till we update to that version it would help to have this functionality 
already in Hive.

For more examples about the proposed change have a look in CALCITE-4704.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25718) ORDER BY query on external MSSQL table fails

2021-11-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25718:
--

 Summary: ORDER BY query on external MSSQL table fails
 Key: HIVE-25718
 URL: https://issues.apache.org/jira/browse/HIVE-25718
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Stamatis Zampetakis


+Microsoft SQLServer+
{code:sql}
CREATE TABLE country (id   int, name varchar(20));
insert into country values (1, 'India');
insert into country values (2, 'Russia');
insert into country values (3, 'USA');
{code}

+Hive+
{code:sql}
CREATE EXTERNAL TABLE country (id int, name varchar(20))
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MSSQL",
"hive.sql.jdbc.driver" = "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"hive.sql.jdbc.url" = "jdbc:sqlserver://localhost:1433;",
"hive.sql.dbcp.username" = "sa",
"hive.sql.dbcp.password" = "Its-a-s3cret",
"hive.sql.table" = "country");

SELECT * FROM country ORDER BY id;
{code}

The query fails with the following stacktrace:

{noformat}
com.microsoft.sqlserver.jdbc.SQLServerException: The ORDER BY clause is invalid 
in views, inline functions, derived tables, subqueries, and common table 
expressions, unless TOP, OFFSET or FOR XML is also specified.
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1535)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:467)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:409)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) 
~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:219)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:199)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:331)
 ~[mssql-jdbc-6.2.1.jre8.jar:?]
at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
 ~[commons-dbcp2-2.7.0.jar:2.7.0]
at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
 ~[commons-dbcp2-2.7.0.jar:2.7.0]
at 
org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getRecordIterator(GenericJdbcDatabaseAccessor.java:180)
 [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.storage.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:58) 
[hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.storage.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:35) 
[hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:589) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:716) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:668) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:241) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:277) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726) 
[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:6

[jira] [Created] (HIVE-25717) INSERT INTO on external MariaDB/MySQL table fails silently

2021-11-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25717:
--

 Summary: INSERT INTO on external MariaDB/MySQL table fails silently
 Key: HIVE-25717
 URL: https://issues.apache.org/jira/browse/HIVE-25717
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


+MariaDB/MySQL+
{code:sql}
CREATE TABLE country (id   int, name varchar(20));

insert into country values (1, 'India');
insert into country values (2, 'Russia');
insert into country values (3, 'USA');
{code}

+Hive+
{code:sql}
CREATE EXTERNAL TABLE country (id int, name varchar(20))
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/qtestDB",
"hive.sql.dbcp.username" = "root",
"hive.sql.dbcp.password" = "qtestpassword",
"hive.sql.table" = "country"
);

INSERT INTO country VALUES (8, 'Hungary');
SELECT * FROM country;
{code}

+Expected results+
||ID||NAME||
|1| India|
|2| Russia|
|3| USA|
|8|   Hungary|

+Actual results+
||ID||NAME||
|1| India|
|2| Russia|
|3| USA|

The {{INSERT INTO}} statement finishes without showing any kind of problem in 
the logs but the row is not inserted in the table.

Running the test it comes back green although the following exception is 
printed in the System.err (not in the logs).

{noformat}
java.sql.SQLException: Parameter metadata not available for the given statement
at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
at 
com.mysql.cj.jdbc.MysqlParameterMetadata.checkAvailable(MysqlParameterMetadata.java:86)
at 
com.mysql.cj.jdbc.MysqlParameterMetadata.getParameterType(MysqlParameterMetadata.java:138)
at 
org.apache.hive.storage.jdbc.DBRecordWritable.write(DBRecordWritable.java:67)
at 
org.apache.hadoop.mapreduce.lib.db.DBOutputFormat$DBRecordWriter.write(DBOutputFormat.java:122)
at 
org.apache.hive.storage.jdbc.JdbcRecordWriter.write(JdbcRecordWriter.java:47)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1160)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133)
at 
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:154)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.ja

[jira] [Created] (HIVE-25705) Use dynamic host/post binding for dockerized databases in tests

2021-11-16 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25705:
--

 Summary: Use dynamic host/post binding for dockerized databases in 
tests
 Key: HIVE-25705
 URL: https://issues.apache.org/jira/browse/HIVE-25705
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Currently all dockerized databases (subclasses of 
[DatabaseRule|https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java],
 subclasses of 
[AbstractExternalDB.java|https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java])
 are mapped statically to a specific hostname (usually localhost) and port when 
the container is launched; the host/port values are hardcoded in the code. 

This may create problems when a certain port is already taken by another 
process leading to errors like the one below:

{noformat}
Bind for 0.0.0.0:5432 failed: port is already allocated.
{noformat}

Similar problems can occur by assuming that every database will be accessible 
on localhost.

This can lead to flakiness in CI and/or poor developer experience when running 
tests backed by Docker.

The goal of this case is to allow the containers/databases bind dynamically to 
a random port at startup and expose the appropriate IP address & port to the 
tests relying on these databases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25701) Declare JDBC drivers as runtime & optional dependencies

2021-11-15 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25701:
--

Summary: Declare JDBC drivers as runtime & optional dependencies
Key: HIVE-25701
URL: https://issues.apache.org/jira/browse/HIVE-25701
Project: Hive
Issue Type: Task
Components: Standalone Metastore, Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis

Currently, we are using the following JDBC drivers in various Hive modules:
* MariaDB
* MySQL
* Oracle
* Postgres
* MSSQL
* Derby

MariaDB, MySQL, and Oracle licenses are not compatible with Apache License 2
([Category-X |https://www.apache.org/legal/resolved.html#category-x]) and in
the past we used various ways to circumvent licensing problems (see
HIVE-23284). Now, some of them appear as test scope dependency which is OKish
but in the near future may lead again to licensing problems.

JDBC drivers are only needed at runtime so they could all be declared at
runtime scope. Moreover, Hive does not require a specific JDBC driver in order
to operate so they are all optional.

The goal of this issue is to declare every JDBC driver at runtime scope and
mark it as optional
([ASF-optional|https://www.apache.org/legal/resolved.html#optional],
[maven-optional|https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html]).

This has the following advantages:
* Eliminates the risk to write code which needs JDBC driver classes in order to
compile and potentially violate AL2.
* Unifies the declaration of JDBC drivers making easier to add/remove some if
necessary.
* Removes the need to use download-maven-plugin and other similar workarounds
to avoid licensing problems.
* Simplifies the execution of tests using these drivers since now they are
added in the runtime classpath automatically by maven.
* Projects with dependencies depending on Hive will not inherit any JDBC driver
by default.

--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25684) Many (~16K) skipped tests in TestGenericUDFInitializeOnCompareUDF

2021-11-09 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25684:
--

 Summary: Many (~16K) skipped tests in 
TestGenericUDFInitializeOnCompareUDF
 Key: HIVE-25684
 URL: https://issues.apache.org/jira/browse/HIVE-25684
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: skipped_tests.png

TestGenericUDFInitializeOnCompareUDF is a parameterized test leading to 24K 
possible test combinations. From those only 7K are actually run and the rest 
(~16K) are skipped. 

{noformat}
mvn test -Dtest=TestGenericUDFInitializeOnCompareUDF
...
[WARNING] Tests run: 24300, Failures: 0, Errors: 0, Skipped: 16452, Time 
elapsed: 7.098 s - in 
org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFInitializeOnCompareUDF
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 7848, Failures: 0, Errors: 0, Skipped: 0
{noformat}

This generates a lot of noise in Jenkins CI, where many tests appear as 
skipped, and it may make people believe it is a problem (side effect of their 
changes).  Moreover, we know in advance which tests are skipped and why so 
instead of generating invalid parameter combinations we could simply remove 
those combinations altogether.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25681) Drop support for multi-threaded qtest execution via QTestRunnerUtils

2021-11-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25681:
--

 Summary: Drop support for multi-threaded qtest execution via 
QTestRunnerUtils
 Key: HIVE-25681
 URL: https://issues.apache.org/jira/browse/HIVE-25681
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


There is an option for running qtest concurrently via 
[QTestRunnerUtils#queryListRunnerMultiThreaded|https://github.com/apache/hive/blob/a72db99676ca6a79b414906ab78963a3e955ae69/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestRunnerUtils.java#L128]
 but it is not in use for more than a year now. 

Moreover, with the move the kubernetes containerized test execution 
(HIVE-22942) it is unlikely that we will run concurrent tests using these APIs 
anytime soon. 

The only consumer of this API at the moment is 
[TestMTQueries|https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMTQueries.java]
 which is disabled and it basically corresponds to the unit tests for these 
APIs. 

I propose to drop these APIs and related test to facilitate code evolution and 
maintenance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25676) Uncaught exception in QTestDatabaseHandler#afterTest causes unrelated test failures

2021-11-05 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25676:
--

 Summary: Uncaught exception in QTestDatabaseHandler#afterTest 
causes unrelated test failures
 Key: HIVE-25676
 URL: https://issues.apache.org/jira/browse/HIVE-25676
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


When for some reason we fail to cleanup a database after running a test using 
the {{qt:database}} option an exception is raised and propagates up the stack. 
Not catching it in 
[QTestDatabaseHandler#afterTest|https://github.com/apache/hive/blob/0616bcaa2436ccbf388b635bfea160b47849553c/itests/util/src/main/java/org/apache/hadoop/hive/ql/qoption/QTestDatabaseHandler.java#L124]
 disrupts subsequent cleanup actions, which are not executed, and leads to 
failures in subsequent tests which are not related.
 
Moreover, the exception leaves {{QTestDatabaseHandler}} in an invalid state 
since the internal map holding the running databases is not updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25675) Intermittent PSQLException when trying to connect to Postgres in tests

2021-11-05 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25675:
--

 Summary: Intermittent PSQLException when trying to connect to 
Postgres in tests
 Key: HIVE-25675
 URL: https://issues.apache.org/jira/browse/HIVE-25675
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The following exception appears intermittently when running tests using 
dockerized Postgres.

{noformat}
Unexpected exception org.postgresql.util.PSQLException: FATAL: the database 
system is starting up
21:26:55at 
org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:525)
21:26:55at 
org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:146)
21:26:55at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197)
21:26:55at 
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
21:26:55at 
org.postgresql.jdbc.PgConnection.(PgConnection.java:217)
21:26:55at org.postgresql.Driver.makeConnection(Driver.java:458)
21:26:55at org.postgresql.Driver.connect(Driver.java:260)
21:26:55at java.sql.DriverManager.getConnection(DriverManager.java:664)
21:26:55at java.sql.DriverManager.getConnection(DriverManager.java:247)
21:26:55at 
org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.execute(AbstractExternalDB.java:191)
21:26:55at 
org.apache.hadoop.hive.ql.qoption.QTestDatabaseHandler.beforeTest(QTestDatabaseHandler.java:116)
21:26:55at 
org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
21:26:55at 
org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:717)
21:26:55at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:189)
21:26:55at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
21:26:55at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
{noformat}

As the exception indicates when we try to connect to Postgres the database is 
not yet completely ready despite the fact that the respective port is open thus 
leading to the previous exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25668) Support database reuse when using qt:database option

2021-11-03 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25668:
--

 Summary: Support database reuse when using qt:database option
 Key: HIVE-25668
 URL: https://issues.apache.org/jira/browse/HIVE-25668
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis


With HIVE-25594 it is possible to initialize and use various types of databases 
in tests. At the moment all the supported databases rely on docker containers 
which are initialized/destroyed in per test basis. This is good in terms of 
test isolation but it brings a certain performance overhead slowing down tests. 
At the moment it is fine since the feature it is not widely used  but it would 
be good to have a way to reuse a database in multiple qfiles. 

The developper could specify in the qfile if they want to reuse a container (if 
it is possible) by passing certain additional options. The declaration could 
look like below:
{noformat}
--!qt:database:type=mysql;script=q_test_country_table.sql;reuse=true{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25667) Unify code managing JDBC databases in tests

2021-11-03 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25667:
--

 Summary: Unify code managing JDBC databases in tests
 Key: HIVE-25667
 URL: https://issues.apache.org/jira/browse/HIVE-25667
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis


Currently there are two class hierarchies managing JDBC databases in tests, 
[DatabaseRule|
https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
 and 
[AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java].
 There are many similarities between these hierarchies and certain parts are 
duplicated. 

The goal of this JIRA is to refactor the aforementioned hierarchies to reduce 
code duplication and improve extensibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2021-11-02 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25665:
--

 Summary: Checkstyle LGPL files must not be in the release 
sources/binaries
 Key: HIVE-25665
 URL: https://issues.apache.org/jira/browse/HIVE-25665
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Stamatis Zampetakis


As discussed in the [dev 
list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
 LGPL files must not be present in the Apache released sources/binaries.

The following files must not be present in the release:
https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl

There may be other checkstyle LGPL files in the repo. All these should either 
be removed entirely from the repository or selectively excluded from the 
release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25655) Remove ElapsedTimeLoggingWrapper from tests

2021-10-27 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25655:
--

 Summary: Remove ElapsedTimeLoggingWrapper from tests
 Key: HIVE-25655
 URL: https://issues.apache.org/jira/browse/HIVE-25655
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The  
[ElapsedTimeLoggingWrapper|https://github.com/apache/hive/blob/f749ef2af27638914984c183bcfa213920f5cdd9/itests/util/src/main/java/org/apache/hadoop/hive/util/ElapsedTimeLoggingWrapper.java]
 introduced in HIVE-14625 is used by the [CoreCliDriver|#L68] to execute, 
measure, and display the time spend on some operations during the execution of 
{{@Before/@After}} methods. 

The benefit of logging the elapsed time for these methods is unclear. The time 
is usually rather short, especially compared to the actual time a query takes 
to run,  so it is not an information which can be of much use.

The enforced coding pattern for measuring and logging the time leads to 
boilerplate and makes the code harder to read and understand. 

{code:java}
qt = new ElapsedTimeLoggingWrapper() {
  @Override
  public QTestUtil invokeInternal() throws Exception {
return new QTestUtil(
QTestArguments.QTestArgumentsBuilder.instance()
  .withOutDir(cliConfig.getResultsDir())
  .withLogDir(cliConfig.getLogDir())
  .withClusterType(miniMR)
  .withConfDir(hiveConfDir)
  .withInitScript(initScript)
  .withCleanupScript(cleanupScript)
  .withLlapIo(true)
  .withFsType(cliConfig.getFsType())
  .build());
  }
}.invoke("QtestUtil instance created", LOG, true);
{code}

Moreover, the wrapper is not used consistently across drivers making results 
less uniform.

The goal of this issue is to remove {{ElapsedTimeLoggingWrapper}} and its 
usages to improve code readability and maintenance.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25632) Remove unused code from ptest/ptest2

2021-10-21 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25632:
--

 Summary: Remove unused code from ptest/ptest2
 Key: HIVE-25632
 URL: https://issues.apache.org/jira/browse/HIVE-25632
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Ptest framework was deprecated when PTest2 was introduced, and the latter is no 
longer used since it was superseded by HIVE-22942. 

The code is more or less dead and keeping it in the repo leads to maintenance 
overhead. People update files from time to time assuming that it is maintained 
and occasionally it also leads to broken build since ptest2 is an actual maven 
module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25629) Drop support of multiple qfiles in QTestUtil, output and result processors

2021-10-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25629:
--

 Summary: Drop support of multiple qfiles in QTestUtil, output and 
result processors
 Key: HIVE-25629
 URL: https://issues.apache.org/jira/browse/HIVE-25629
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The current implementation of 
[QTestUtil|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java],
 
[QOutProcessor|https://github.com/apache/hive/blob/master/itests/util/src/main/java/org/apache/hadoop/hive/ql/QOutProcessor.java],
 and 
[QTestResultProcessor|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestResultProcessor.java],
 has some methods and fields (maps) for managing multiple input files. However, 
*all* clients of this API, such as 
[CoreCliDriver|https://github.com/apache/hive/blob/afeb0f8413b1fd777611e890e53925119a5e39f1/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java],
 use these classes by processing one file per run.

+Example+
{code:java}
public void runTest(String testName, String fname, String fpath) {
...
qt.addFile(fpath);
qt.cliInit(new File(fpath));
...
try {
  qt.executeClient(fname);
} catch (CommandProcessorException e) {
  qt.failedQuery(e.getCause(), e.getResponseCode(), fname, 
QTestUtil.DEBUG_HINT);
}
...
}
{code}
Notice that {{qt.addFile}} will keep accumulating input files to memory 
(filename + content) while {{qt.executeClient}} (and other similar APIs) always 
operate on the last file added. Apart from wasting memory, the APIs for 
multiple files are harder to understand, and extend.

The goal of this JIRA is to simplify the aforementioned APIs by removing 
unused/redundant parts associated to multiple files to improve code 
readability, and reduce memory consumption.

+Historical note+
 Before HIVE-25625 the functionality of multiple input files was used by the 
{{TestCompareCliDriver}} but it was still useless for all the other clients. 
With the removal of {{TestCompareCliDriver}} in HIVE-25625 keeping multiple 
files is completely redundant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25625) Drop TestCompareCliDriver and related code from tests

2021-10-19 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25625:
--

 Summary: Drop TestCompareCliDriver and related code from tests
 Key: HIVE-25625
 URL: https://issues.apache.org/jira/browse/HIVE-25625
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The driver has been introduced back in 2015 (HIVE-6010) aiming to run queries 
with vectorization on/off and comparing the results. However it didn't receive 
much attention since then and currently only two queries are run with this 
driver.

The majority of tests aiming to ensure vectorization works correctly use the 
{{TestMiniLlapLocalCliDriver}} and run a query twice switching on/off the 
necessary properties.

Summing up having the 
[TestCompareCliDriver|https://github.com/apache/hive/blob/d521f149fade25f74e7ca28fa399103684a80580/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestCompareCliDriver.java]
 in the repo leads to extra code maintenance cost without significant benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25624) Drop DummyCliDriver and related code from tests

2021-10-19 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25624:
--

 Summary: Drop DummyCliDriver and related code from tests
 Key: HIVE-25624
 URL: https://issues.apache.org/jira/browse/HIVE-25624
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The only thing this test code does is fail no matter the input file, 
potentially with different message (see 
[CoreDummy.runTest|https://github.com/apache/hive/blob/d521f149fade25f74e7ca28fa399103684a80580/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreDummy.java#L56]).
 

It is very close to "dead-code" so keeping it in the repository only adds 
maintenance overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25618) Stack trace is difficult to find when qtest fails during setup/teardown

2021-10-18 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25618:
--

 Summary: Stack trace is difficult to find when qtest fails during 
setup/teardown
 Key: HIVE-25618
 URL: https://issues.apache.org/jira/browse/HIVE-25618
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


When a qtest fails while executing one of the setup/teardown methods of a CLI 
driver 
([CliAdapter|https://github.com/apache/hive/blob/3e37ba473545a691f5f32c08fc4b62b49257cab4/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliAdapter.java#L36]
 and its subclasses):

{code:java}
  public abstract void beforeClass() throws Exception;
  public abstract void setUp();
  public abstract void tearDown();
  public abstract void shutdown() throws Exception;
{code}

the original stack trace leading to the failure cannot be found easily. 

Maven console shows a stack trace which doesn't correspond to the actual 
exception causing the problem but another one which in most cases does not 
contain the original cause. 

The original stack trace is not displayed in the maven console and it is not in 
the {{target/tmp/logs/hive.log}} either. At the moment it goes to 
{{target/surefire-reports/...-output.txt}}. 

The developer needs to search in 2-3 places and navigate back and forth to the 
code in order to find what went wrong.

Ideally the stack trace from the original exception should be printed directly 
in maven console. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25611) OOM when running MERGE query on wide transactional table with many buckets

2021-10-12 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25611:
--

 Summary: OOM when running MERGE query on wide  transactional table 
with many buckets
 Key: HIVE-25611
 URL: https://issues.apache.org/jira/browse/HIVE-25611
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: merge_query_plan.txt, merge_wide_acid_bucketed_table.q, 
wide_table_100_char_cols.csv

Running a {{MERGE}} statement over a wide transactional/ACID table with many 
buckets leads to {{OutOfMemoryError}} during the execution of the query.

A step-by-step reproducer is attached to the case ( 
[^merge_wide_acid_bucketed_table.q]  [^wide_table_100_char_cols.csv] ) but the 
main idea is outlined below.
{code:sql}
CREATE TABLE wide_table_txt (
w_id_colint,
w_char_col0 char(20),
...
w_char_col99 char(20)) STORED AS ORC TBLPROPERTIES ('transactional'='true')
-- Load data into the table in a way that it gets bucketed

CREATE TABLE simple_table_txt (id int, name char(20)) STORED AS TEXTFILE;
-- Load data into simple_table_txt overlapping with the data in wide_table_txt

MERGE INTO wide_table_orc target USING simple_table_txt source ON 
(target.w_id_col = source.id)
WHEN MATCHED THEN UPDATE SET w_char_col0 = source.name
WHEN NOT MATCHED THEN INSERT (w_id_col, w_char_col1) VALUES (source.id, 'Actual 
value does not matter');
{code}

A sample stacktrace showing the memory pressure is given below:

{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.orc.OrcProto$RowIndexEntry$Builder.create(OrcProto.java:8962) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.OrcProto$RowIndexEntry$Builder.access$12100(OrcProto.java:8931) 
~[orc-core-1.6.9.jar:1.6.9]
at org.apache.orc.OrcProto$RowIndexEntry.newBuilder(OrcProto.java:8915) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriterBase.(TreeWriterBase.java:98) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.StringBaseTreeWriter.(StringBaseTreeWriter.java:66)
 ~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.CharTreeWriter.(CharTreeWriter.java:40) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:163)
 ~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.StructTreeWriter.(StructTreeWriter.java:41) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181)
 ~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.StructTreeWriter.(StructTreeWriter.java:41) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181)
 ~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) 
~[orc-core-1.6.9.jar:1.6.9]
at org.apache.orc.impl.WriterImpl.(WriterImpl.java:216) 
~[orc-core-1.6.9.jar:1.6.9]
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:95) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:396) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.initWriter(OrcRecordUpdater.java:615)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSimpleEvent(OrcRecordUpdater.java:442)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:495)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:519)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1200)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apa

[jira] [Created] (HIVE-25594) Setup JDBC databases in tests via QT options

2021-10-06 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25594:
--

 Summary: Setup JDBC databases in tests via QT options
 Key: HIVE-25594
 URL: https://issues.apache.org/jira/browse/HIVE-25594
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The goal of this jira is to add a new QT option for setting up JDBC DBMS and 
using it in qtests which need a JDBC endpoint up and running. It can be used in 
tests with external JDBC tables, connectors, etc.

A sample file using the proposed option ({{qt:database}}) is shown below.
{code:sql}
--!qt:database:postgres:init_sript_1234.sql:cleanup_script_1234.sql

CREATE EXTERNAL TABLE country (name varchar(80))
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver" = "org.postgresql.Driver",
"hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
"hive.sql.dbcp.username" = "qtestuser",
"hive.sql.dbcp.password" = "qtestpassword",
"hive.sql.table" = "country");
EXPLAIN CBO SELECT COUNT(*) from country;
SELECT COUNT(*) from country;
{code}
This builds upon HIVE-25423 but proposes to use JDBC datasources without the 
need for a using a specific CLI driver. Furthermore, the proposed QT option 
syntax allows using customised init/cleanup scripts for the JDBC datasource per 
test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25591) CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema

2021-10-05 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25591:
--

 Summary: CREATE EXTERNAL TABLE fails for JDBC tables stored in 
non-default schema
 Key: HIVE-25591
 URL: https://issues.apache.org/jira/browse/HIVE-25591
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Consider the following use case where tables reside in some user-defined schema 
in some JDBC compliant database:

+Postgres+
{code:sql}
create schema world;

create table if not exists world.country (name varchar(80) not null);

insert into world.country (name) values ('India');
insert into world.country (name) values ('Russia');
insert into world.country (name) values ('USA');
{code}

The following DDL statement in Hive fails:

+Hive+
{code:sql}
CREATE EXTERNAL TABLE country (name varchar(80))
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver" = "org.postgresql.Driver",
"hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/test",
"hive.sql.dbcp.username" = "user",
"hive.sql.dbcp.password" = "pwd",
"hive.sql.schema" = "world",
"hive.sql.table" = "country");
{code}

The exception is the following:

{noformat}
org.postgresql.util.PSQLException: ERROR: relation "country" does not exist
  Position: 15
at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
 ~[postgresql-42.2.14.jar:42.2.14]
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
 ~[postgresql-42.2.14.jar:42.2.14]
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) 
~[postgresql-42.2.14.jar:42.2.14]
at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) 
~[postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) 
~[postgresql-42.2.14.jar:42.2.14]
at 
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153)
 ~[postgresql-42.2.14.jar:42.2.14]
at 
org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103)
 ~[postgresql-42.2.14.jar:42.2.14]
at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
 ~[commons-dbcp2-2.7.0.jar:2.7.0]
at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
 ~[commons-dbcp2-2.7.0.jar:2.7.0]
at 
org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getColumnNames(GenericJdbcDatabaseAccessor.java:83)
 [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hive.storage.jdbc.JdbcSerDe.initialize(JdbcSerDe.java:98) 
[hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:95)
 [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:78)
 [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:342)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:324) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:734) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:717) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableDesc.toTable(CreateTableDesc.java:933)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:59)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
a

[jira] [Created] (HIVE-25530) AssertionError when query involves multiple JDBC tables and views

2021-09-16 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25530:
--

 Summary: AssertionError when query involves multiple JDBC tables 
and views
 Key: HIVE-25530
 URL: https://issues.apache.org/jira/browse/HIVE-25530
 Project: Hive
  Issue Type: Bug
  Components: CBO, HiveServer2
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis
Assignee: Soumyakanti Das
 Fix For: 4.0.0
 Attachments: engesc_6056.q

An {{AssertionError}} is thrown during compilation when a query contains 
multiple external JDBC tables and there are available materialized views which 
can be used to answer the query. 

The problem can be reproduced by running the scenario in [^engesc_6056.q].

{code:bash}
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=engesc_6056.q 
-Dtest.output.overwrite
{code}

The stacktrace is shown below:

{noformat}
java.lang.AssertionError: Rule's description should be unique; existing 
rule=JdbcToEnumerableConverterRule(in:JDBC.DERBY,out:ENUMERABLE); new 
rule=JdbcToEnumerableConverterRule(in:JDBC.DERBY,out:ENUMERABLE)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.addRule(AbstractRelOptPlanner.java:158)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.addRule(VolcanoPlanner.java:406)
at 
org.apache.calcite.adapter.jdbc.JdbcConvention.register(JdbcConvention.java:66)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.registerClass(AbstractRelOptPlanner.java:233)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.cost.HiveVolcanoPlanner.registerClass(HiveVolcanoPlanner.java:90)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1224)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewBoxing$HiveMaterializedViewUnboxingRule.onMatch(HiveMaterializedViewBoxing.java:210)
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
at 
org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2027)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1717)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1589)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1341)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:559)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12549)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:452)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
a

[jira] [Created] (HIVE-25316) Query with window function over external JDBC table and filter fails at runtime

2021-07-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25316:
--

 Summary: Query with window function over external JDBC table and 
filter fails at runtime
 Key: HIVE-25316
 URL: https://issues.apache.org/jira/browse/HIVE-25316
 Project: Hive
  Issue Type: Bug
  Components: JDBC storage handler, Query Processor
Affects Versions: 4.0.0
Reporter: Stamatis Zampetakis


The following TPC-DS query fails at runtime when the table {{store_sales}} is 
an external JDBC table.

{code:sql}
SELECT ranking
FROM
(SELECT rank() OVER (PARTITION BY ss_store_sk
ORDER BY sum(ss_net_profit)) AS ranking
 FROM store_sales
 GROUP BY ss_store_sk) tmp1
WHERE ranking <= 5
{code}

The stacktrace below shows that problem occurs while trying to initialize the 
{{TopNKeyOperator}}.

{noformat}
2021-07-08T09:04:37,444 ERROR [TezTR-270335_1_3_0_0_0] tez.TezProcessor: Failed 
initializeAndRunProcessor
java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
 [tez-runtime-internals-0.10.0.jar:0.10.0]
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
 [tez-runtime-internals-0.10.0.jar:0.10.0]
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
 [tez-runtime-internals-0.10.0.jar:0.10.0]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_261]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
 [hadoop-common-3.1.0.jar:?]
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
 [tez-runtime-internals-0.10.0.jar:0.10.0]
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
 [tez-runtime-internals-0.10.0.jar:0.10.0]
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
[tez-common-0.10.0.jar:0.10.0]
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
 [hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_261]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_261]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_261]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
Caused by: java.lang.RuntimeException: cannot find field _col0 from 
[0:ss_store_sk, 1:$f1]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
 ~[hive-serde-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153)
 ~[hive-serde-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator.initObjectInspectors(TopNKeyOperator.java:101)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator.initializeOp(TopNKeyOperator.java:82)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:506)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
... 16 more
{noformat}






--
This message was sent by Atlassi

[jira] [Created] (HIVE-25296) Replace parquet-hadoop-bundle dependency with the actual parquet modules

2021-06-29 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25296:
--

 Summary: Replace parquet-hadoop-bundle dependency with the actual 
parquet modules
 Key: HIVE-25296
 URL: https://issues.apache.org/jira/browse/HIVE-25296
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0


The parquet-hadoop-bundle is not a real dependency but a mere packaging
of three parquet modules to create an uber jar. The Parquet community
created this artificial module on demand by HIVE-5783 but the
benefits if any are unclear.

On the contrary using the uber dependency has some drawbacks:
* Parquet souce code cannot be attached easily in IDEs which makes debugging 
sessions cumbersome.
* Finding concrete dependencies with Parquet is not possible just by inspecting 
the pom files.
* Extra maintenance cost for the Parquet community adding additional 
verification steps during a release.

The goal of this JIRA is to replace the uber dependency with concrete 
dependencies to the respective modules:
* parquet-common
* parquet-column
* parquet-hadoop



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-08 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25219:
--

 Summary: Backward incompatible timestamp serialization in Avro for 
certain timezones
 Key: HIVE-25219
 URL: https://issues.apache.org/jira/browse/HIVE-25219
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0


HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-05-17 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25129:
--

 Summary: Wrong results when timestamps stored in Avro/Parquet fall 
into the DST shift
 Key: HIVE-25129
 URL: https://issues.apache.org/jira/browse/HIVE-25129
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values falling into the daylight savings time of the system timezone 
cannot be retrieved as is when those are stored in Parquet/Avro tables. The 
respective SELECT query shifts those timestamps by +1 reflecting the DST shift.

+Example+
{code:sql}
--! qt:timezone:US/Pacific

create table employee (eid int, birthdate timestamp) stored as parquet;

insert into employee values (0, '2019-03-10 02:00:00');
insert into employee values (1, '2020-03-08 02:00:00');
insert into employee values (2, '2021-03-14 02:00:00');

select eid, birthdate from employee order by eid;{code}

+Actual results+
|0|2019-03-10 03:00:00|
|1|2020-03-08 03:00:00|
|2|2021-03-14 03:00:00|

+Expected results+
|0|2019-03-10 02:00:00|
|1|2020-03-08 02:00:00|
|2|2021-03-14 02:00:00|

Storing and retrieving values in columns using the [timestamp data 
type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
 (equivalent with LocalDateTime java API) should not alter at any way the value 
that the user is seeing. The results are correct for {{TEXTFILE}} and {{ORC}} 
tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-11 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-25104:
--

 Summary: Backward incompatible timestamp serialization in Parquet 
for certain timezones
 Key: HIVE-25104
 URL: https://issues.apache.org/jira/browse/HIVE-25104
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro, Orc).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 164 matches

Mail list logo