[jira] [Created] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild
Jesus Camacho Rodriguez created HIVE-20382: -- Summary: Materialized views: Introduce heuristic to favour incremental rebuild Key: HIVE-20382 URL: https://issues.apache.org/jira/browse/HIVE-20382 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [jira] [Created] (HIVE-20381) Vectorization: Reduce dedup of GroupBy + PTF turns off vectorization
Hi team, I would request you to unsubscribe my account from these mailing lists. Thanks, Matheswaran.S On Tue, Aug 14, 2018 at 6:00 AM Gopal V (JIRA) wrote: > Gopal V created HIVE-20381: > -- > > Summary: Vectorization: Reduce dedup of GroupBy + PTF turns > off vectorization > Key: HIVE-20381 > URL: https://issues.apache.org/jira/browse/HIVE-20381 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 3.1.0, 4.0.0 > Reporter: Gopal V > > > One of the PTF Reducers in Query51 is not vectorized because there's a > reduce deduplication which combines a group-by and a windowing shuffle. > > {code} > | Reducer 8 | > | Execution mode: llap | > | Reduce Vectorization: | > | enabled: true | > | enableConditionsMet: > hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez > IN [tez, spark] IS true | > | notVectorizedReason: PTF operator: Only PTF directly > under reduce-shuffle is supported | > | vectorized: false > {code} > > It vectorizes all PTF vertices (after HIVE-20367), with {{ set > hive.optimize.reducededuplication=false;}} > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) >
Review Request 68337: HIVE-20379
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68337/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-20379 https://issues.apache.org/jira/browse/HIVE-20379 Repository: hive-git Description --- HIVE-20379 Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 361f150193a155d45eb64266f88eb88f0a881ad3 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2ee562add907c2b57992df27ecbb4fd5e114cdba ql/src/test/queries/clientpositive/materialized_view_rewrite_part_2.q 505f7507bc9adb25e544dc422164858fb20fed0e ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out b12df11a98e55c00c8b77e8292666373f3509364 ql/src/test/results/clientpositive/llap/materialized_view_partitioned_3.q.out 726c660cf21d26cc5cc120d1397243958b49f834 ql/src/test/results/clientpositive/llap/materialized_view_rewrite_part_1.q.out 492bb226fd03d51686e2040aed4766aef7150592 ql/src/test/results/clientpositive/llap/materialized_view_rewrite_part_2.q.out e748ccb010fc755f9d6af82c26c8ccffc26b1a55 Diff: https://reviews.apache.org/r/68337/diff/1/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-20381) Vectorization: Reduce dedup of GroupBy + PTF turns off vectorization
Gopal V created HIVE-20381: -- Summary: Vectorization: Reduce dedup of GroupBy + PTF turns off vectorization Key: HIVE-20381 URL: https://issues.apache.org/jira/browse/HIVE-20381 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 3.1.0, 4.0.0 Reporter: Gopal V One of the PTF Reducers in Query51 is not vectorized because there's a reduce deduplication which combines a group-by and a windowing shuffle. {code} | Reducer 8 | | Execution mode: llap | | Reduce Vectorization: | | enabled: true | | enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true | | notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported | | vectorized: false {code} It vectorizes all PTF vertices (after HIVE-20367), with {{ set hive.optimize.reducededuplication=false;}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20380) explore storing multiple CBs in a single cache buffer in LLAP cache
Sergey Shelukhin created HIVE-20380: --- Summary: explore storing multiple CBs in a single cache buffer in LLAP cache Key: HIVE-20380 URL: https://issues.apache.org/jira/browse/HIVE-20380 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum (instead of 256Kb), then after we moved metadata cache off-heap, the index streams that are all tiny take up a lot of CBs and waste space. Wasted space can require larger cache and lead to cache OOMs on some workloads. Reducing min.alloc solves this problem, but then there's a lot of heap (and probably compute) overhead to track all these buffers. Arguably even the 4Kb min.alloc is too small. We should store contiguous CBs in the same buffer; to start, we can do it for ROW_INDEX streams. That probably means reading all ROW_INDEX streams instead of doing projection when we see that they are too small. We need to investigate what the pattern is for ORC data blocks. One option is to increase min.alloc and then consolidate multiple 4-8Kb CBs, but only for the same stream. However larger min.alloc will result in wastage for really small streams, so we can also consolidate multiple streams (potentially across columns) if needed. This will result in some priority anomalies but they probably ok. Another consideration is making tracking less object oriented, in particular passing around integer indexes instead of objects and storing state in giant arrays somewhere (potentially with some optimizations for less common things), instead of every buffers getting its own object. cc [~gopalv] [~prasanth_j] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20379) Rewriting with partitioned materialized views may reference wrong column
Jesus Camacho Rodriguez created HIVE-20379: -- Summary: Rewriting with partitioned materialized views may reference wrong column Key: HIVE-20379 URL: https://issues.apache.org/jira/browse/HIVE-20379 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20378) don't update stats during alter for txn table conversion
Sergey Shelukhin created HIVE-20378: --- Summary: don't update stats during alter for txn table conversion Key: HIVE-20378 URL: https://issues.apache.org/jira/browse/HIVE-20378 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20377) Hive Kafka Storage Handler
slim bouguerra created HIVE-20377: - Summary: Hive Kafka Storage Handler Key: HIVE-20377 URL: https://issues.apache.org/jira/browse/HIVE-20377 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra Assignee: slim bouguerra h1. Goal * Read streaming data form Kafka queue as an external table. * Allow streaming navigation by pushing down filters on Kafka record partition id, offset and timestamp. * Insert streaming data form Kafka to an actual Hive internal table, using CTAS statement. h1. Example h2. Create the external table {code} CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamps, page string, `user` string, language string, added int, deleted int, flags string,comment string, namespace string) STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' TBLPROPERTIES ("kafka.topic" = "wikipedia", "kafka.bootstrap.servers"="brokeraddress:9092", "kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe"); {code} h2. Kafka Metadata In order to keep track of Kafka records the storage handler will add automatically the Kafka row metadata eg partition id, record offset and record timestamp. {code} DESCRIBE EXTENDED kafka_table timestamp timestamp from deserializer pagestring from deserializer userstring from deserializer languagestring from deserializer country string from deserializer continent string from deserializer namespace string from deserializer newpage boolean from deserializer unpatrolled boolean from deserializer anonymous boolean from deserializer robot boolean from deserializer added int from deserializer deleted int from deserializer delta bigint from deserializer __partition int from deserializer __offsetbigint from deserializer __timestamp bigint from deserializer {code} h2. Filter push down. Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on a given offset. The proposed storage handler will be able to leverage such API by pushing down filters over metadata columns, namely __partition (int), __offset(long) and __timestamp(long) For instance Query like {code} select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 and `__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and `__offset` > 99) or (`__offset` = 109); {code} Will result on a scan of partition 0 only then read only records between offset 4 and 109. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20376) Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z"
slim bouguerra created HIVE-20376: - Summary: Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z" Key: HIVE-20376 URL: https://issues.apache.org/jira/browse/HIVE-20376 Project: Hive Issue Type: Bug Reporter: slim bouguerra It will be nice to add ISO formats to timezone utils parser to handler the following "2013-08-31T01:02:33Z" org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String) CC [~jcamachorodriguez]/ [~ashutoshc] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20375) Json SerDe ignoring the timestamp.formats property
slim bouguerra created HIVE-20375: - Summary: Json SerDe ignoring the timestamp.formats property Key: HIVE-20375 URL: https://issues.apache.org/jira/browse/HIVE-20375 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra JsonSerd is supposed to accept "timestamp.formats" SerDe property to allow different timestamp formats, after recent refactor I see that this is not working anymore. Looking at the code I can see that The serde is not using the constructed parser with added format https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L82 But instead it is using Converter https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L324 Then converter is using org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter.TimestampConverter This converter does not have any knowledge about user formats or what so ever... It is using this static converter org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils#getTimestampFromString -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Programmatically determine version of Hive running on server
Can you achieve that with hive --version ? using a simple script. regards Dev On Mon, Aug 13, 2018 at 2:49 PM Bohdan Kazydub wrote: > Hi all, > > is it possible to determine a version of Hive running on server using Hive > (HiveMetaStoreClient etc.) 2.3.3? > > Best regards, > Bohdan > -- Devopam Mittra Life and Relations are not binary
[jira] [Created] (HIVE-20374) Write Hive version information to Parquet footer
Zoltan Ivanfi created HIVE-20374: Summary: Write Hive version information to Parquet footer Key: HIVE-20374 URL: https://issues.apache.org/jira/browse/HIVE-20374 Project: Hive Issue Type: Improvement Reporter: Zoltan Ivanfi PARQUET-352 added support for the "writer.model.name" property in the Parquet metadata to identify the object model (application) that wrote the file. The easiest way to write this property is by overriding getName() of org.apache.parquet.hadoop.api.WriteSupport. In Hive, this would mean adding getName() to the org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20373) Output of 'show compactions' displays double header
Laszlo Bodor created HIVE-20373: --- Summary: Output of 'show compactions' displays double header Key: HIVE-20373 URL: https://issues.apache.org/jira/browse/HIVE-20373 Project: Hive Issue Type: Bug Reporter: Laszlo Bodor {code} +---+---++++--++---+-+--+ | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid | +---+---++++--++---+-+--+ | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId | | default | student | --- | MAJOR | working | ctr-e138-1518143905142-435940-01-03.hwx.site-61 | 1534156696000 | --- | job_1534152461533_0030 | | default | acid_partitioned | bkt=1 | MAJOR | initiated | --- | --- | --- | --- | | default | acid_partitioned | bkt=2 | MAJOR | initiated | --- | --- | --- | --- | +---+---++++--++---+-+–+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20372) WRTIE_SET spelling in TxnHandler
Laszlo Bodor created HIVE-20372: --- Summary: WRTIE_SET spelling in TxnHandler Key: HIVE-20372 URL: https://issues.apache.org/jira/browse/HIVE-20372 Project: Hive Issue Type: Bug Reporter: Laszlo Bodor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #416: HIVE-20371: Queries failing with Internal error proc...
GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/416 HIVE-20371: Queries failing with Internal error processing add_write_notification_log You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-20371 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 commit b97064246b2a503f40b7c5657c10d3278e1cad19 Author: Sankar Hariappan Date: 2018-08-13T12:11:56Z HIVE-20371: Queries failing with Internal error processing add_write_notification_log ---
Programmatically determine version of Hive running on server
Hi all, is it possible to determine a version of Hive running on server using Hive (HiveMetaStoreClient etc.) 2.3.3? Best regards, Bohdan
[jira] [Created] (HIVE-20371) Queries failing with Internal error processing add_write_notification_log
Sankar Hariappan created HIVE-20371: --- Summary: Queries failing with Internal error processing add_write_notification_log Key: HIVE-20371 URL: https://issues.apache.org/jira/browse/HIVE-20371 Project: Hive Issue Type: Bug Components: HiveServer2, repl, Standalone Metastore Affects Versions: 4.0.0, 3.2.0 Reporter: Sankar Hariappan Assignee: Sankar Hariappan Queries failing with following error: {noformat} ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing add_write_notification_log INFO : Completed executing command(queryId=hive_20180806072916_a9ae37a9-869f-4218-8357-a96ba713db69); Time taken: 878.604 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing add_write_notification_log (state=08S01,code=1) {noformat} >From hiveserver log: {noformat} 2018-08-06T07:59:33,656 ERROR [HiveServer2-Background-Pool: Thread-1551]: operation.Operation (:()) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing add_write_notification_log at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.0.3.0.1.0-59.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Internal error processing add_write_notification_log at org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:2879) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2035) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:477) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:397) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2679) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2350) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2026) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1724) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1718) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) ~[hive-service-3.1.0.3.0.1.0-59.jar:3.1.0.3.0.1.0-59] ... 13 more Caused by:
[GitHub] hive pull request #415: desc table Command optimize
GitHub user Xoln opened a pull request: https://github.com/apache/hive/pull/415 desc table Command optimize when execute desc table {table} , and the {table} is a partitioned table which has many partitions, it will read the partitions from meta and load very slow. desc table command did not show partitions' state, so not need to load these information. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xoln/hive origin/desc-optimizer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #415 commit f1e1e870be361a30ed32dda0172f803be0217d02 Author: zhongliang Date: 2018-08-13T08:26:10Z desc table Command optimize ---