[jira] [Created] (HIVE-26529) Fix VectorizedSupport support for DECIMAL_64 in HiveIcebergInputFormat
Rajesh Balamohan created HIVE-26529: --- Summary: Fix VectorizedSupport support for DECIMAL_64 in HiveIcebergInputFormat Key: HIVE-26529 URL: https://issues.apache.org/jira/browse/HIVE-26529 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Rajesh Balamohan For supporting vectored reads in parquet, DECIMAL_64 support in ORC has been disabled in HiveIcebergInputFormat. This causes regressions in queries. [https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java#L182] It will be good to restore DECIMAL_64 support in iceberg input format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26528) TIMESTAMP stored via spark-shell DataFrame to Avro returns incorrect value when read using HiveCLI
xsys created HIVE-26528: --- Summary: TIMESTAMP stored via spark-shell DataFrame to Avro returns incorrect value when read using HiveCLI Key: HIVE-26528 URL: https://issues.apache.org/jira/browse/HIVE-26528 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.2 Reporter: xsys h2. Describe the bug We are trying to store a TIMESTAMP {{"2022" }}to a table created via Spark DataFrame. The table is created with the Avro file format. We encounter no errors while creating the table and inserting the aforementioned timestamp value. However, performing a SELECT query on the table through HiveCLI returns an incorrect value: "+53971-10-02 19:00:" The root cause for this issue is the fact that Spark's [AvroSerializer|https://github.com/apache/spark/blob/v3.2.1/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala#L171-L180] serializes timestamps using Avro's [TIMESTAMP_MICRO|https://github.com/apache/avro/blob/ee4725c64807549ec74e20e83d35cfc1fe8e90a8/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java#L190] while Hive's [AvroDeserializer|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java#L320-L347] assumes timestamps to be Avro's [TIMESTAMP_MILLIS|#L189] during deserialization. h2. Step to reproduce On Spark 3.2.1 (commit `4f25b3f712`), using `spark-shell` with the Avro package: {code:java} ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:3.2.1{code} Execute the following: {code:java} import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types._ val rdd = sc.parallelize(Seq(Row(Seq("2022").toDF("time").select(to_timestamp(col("time")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0 val schema = new StructType().add(StructField("c1", TimestampType, true)) val df = spark.createDataFrame(rdd, schema) df.show(false) df.write.mode("overwrite").format("avro").saveAsTable("ws") {code} On [Hive 3.1.2|https://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz], execute the following: {noformat} hive> select * from ws; OK +53971-10-02 19:00:{noformat} h2. Expected behavior We expect the output of the {{SELECT}} query to be "{{{}2022-01-01 00:00:00".{}}}We tried other formats like Parquet and the outcome is consistent with this expectation. Moreover, the timestamp is interpreted correctly when the table is written to via DataFrame and read via spark-shell/spark-sql: h3. Can be read correctly from spark-shell: {code:java} scala> spark.sql("select * from ws;").show(false) +---+ |c1 | +---+ |2022-01-01 00:00:00| +---+{code} h3. Can be read correctly from spark-sql: {noformat} spark-sql> select * from ws; 2022-01-01 00:00:00 Time taken: 0.063 seconds, Fetched 1 row(s){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26527) Hive分区字段存在特殊字符,进行小文件合并时会出现分区字段值异常。
sunyan created HIVE-26527: - Summary: Hive分区字段存在特殊字符,进行小文件合并时会出现分区字段值异常。 Key: HIVE-26527 URL: https://issues.apache.org/jira/browse/HIVE-26527 Project: Hive Issue Type: Bug Reporter: sunyan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26526) MSCK sync is not removing partitions with special characters
Naresh P R created HIVE-26526: - Summary: MSCK sync is not removing partitions with special characters Key: HIVE-26526 URL: https://issues.apache.org/jira/browse/HIVE-26526 Project: Hive Issue Type: New Feature Reporter: Naresh P R PARTITIONS table were having encoding string & PARTITION_KEY_VALS were having original string. {code:java} hive=> select * from "PARTITION_KEY_VALS" where "PART_ID" IN (46753, 46754, 46755, 46756); PART_ID | PART_KEY_VAL | INTEGER_IDX -+-+- 46753 | 2022-02-* | 0 46754 | 2011-03-01 | 0 46755 | 2022-01-* | 0 46756 | 2010-01-01 | 0 hive=> select * from "PARTITIONS" where "TBL_ID" = 23567 ; PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID -+-+--+---+---++-- 46753 | 0 | 0 | part_date=2022-02-%2A | 70195 | 23567 | 0 46754 | 0 | 0 | part_date=2011-03-01 | 70196 | 23567 | 0 46755 | 0 | 0 | part_date=2022-01-%2A | 70197 | 23567 | 0 46756 | 0 | 0 | part_date=2010-01-01 | 70198 | 23567 | 0 (4 rows){code} 1) DirectSQL has a join condition on PARTITION_KEY_VALS.PART_KEY_VAL = "2022-02-%2A" at here https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L883 2) Jdo is having filter condition on PARTITIONS.PART_NAME = "part_date=2022-02-%252A" (ie., 2 times url encoded) Once from HS2 https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L353 2nd from HMS [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java#L365] Above conditions returns 0 partitions, so those are not removed from HMS metadata. Attaching repro q file -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26525) Update llap-server python scripts to be compatible with python 3
Simhadri Govindappa created HIVE-26525: -- Summary: Update llap-server python scripts to be compatible with python 3 Key: HIVE-26525 URL: https://issues.apache.org/jira/browse/HIVE-26525 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa llap-server/src/main/resources/package.py and /llap-server/src/main/resources/argparse.py are not compatible with python 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows
Krisztian Kasa created HIVE-26524: - Summary: Use Calcite to remove sections of a query plan known never produces rows Key: HIVE-26524 URL: https://issues.apache.org/jira/browse/HIVE-26524 Project: Hive Issue Type: Improvement Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa Calcite has a set of rules to remove sections of a query plan known never produces any rows. In some cases the whole plan can be removed. Such plans are represented with a single {{Values}} operators with no tuples. ex.: {code} select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0 {code} {code} HiveValues(tuples=[[]]) {code} Other cases when plan has outer join or set operators some branches can be replaced with empty values moving forward the join/set operator can be removed {code} select a2, b2 from t2 where 1=0 union select a1, b1 from t1 {code} {code} HiveAggregate(group=[{0, 1}]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26523) Hive job stuck for a long time
Mayank Kunwar created HIVE-26523: Summary: Hive job stuck for a long time Key: HIVE-26523 URL: https://issues.apache.org/jira/browse/HIVE-26523 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-alpha-1 Reporter: Mayank Kunwar Assignee: Mayank Kunwar The default value of "hive.server2.tez.initialize.default.sessions" is true, due to which query was stuck on waiting to choose a session from default queue pool as the default queue pool size is set as 1 . 2022-07-10 16:34:23,831 INFO org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: [HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the defaultQueuePool 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: [HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: [HiveServer2-Background-Pool: Thread-184167]: Session is already open A possible work around is to increase the value of "hive.server2.tez.sessions.per.default.queue" -- This message was sent by Atlassian Jira (v8.20.10#820010)