[jira] [Created] (HIVE-23041) LLAP purge command can lead to resource leak
Slim Bouguerra created HIVE-23041: - Summary: LLAP purge command can lead to resource leak Key: HIVE-23041 URL: https://issues.apache.org/jira/browse/HIVE-23041 Project: Hive Issue Type: Bug Reporter: Slim Bouguerra As per the Java Spec https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html An unused ExecutorService should be shut down to allow reclamation of its resources. Code like this is a serious resource leak in case user fires multiple commands. https://github.com/apache/hive/blob/7ae6756d40468d18b65423a0b5174b827dc42b60/ql/src/java/org/apache/hadoop/hive/ql/processors/LlapCacheResourceProcessor.java#L132 The other question that this raise is how those tasks responds to interrupt or cancel on the thread level [~prasanth_j] any idea if one task hangs on IO what happens ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22934) Hive server interactive log counters to error stream
Slim Bouguerra created HIVE-22934: - Summary: Hive server interactive log counters to error stream Key: HIVE-22934 URL: https://issues.apache.org/jira/browse/HIVE-22934 Project: Hive Issue Type: Bug Reporter: Slim Bouguerra Hive server is logging the console output to system error stream. This need to be fixed because First we do not roll the file. Second writing to such file is done sequential and can lead to throttle/poor perf. {code} -rw-r--r-- 1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22760) Add Clock caching eviction based strategy
Slim Bouguerra created HIVE-22760: - Summary: Add Clock caching eviction based strategy Key: HIVE-22760 URL: https://issues.apache.org/jira/browse/HIVE-22760 Project: Hive Issue Type: New Feature Components: llap Reporter: Slim Bouguerra Assignee: Slim Bouguerra LRFU is the current default right now. The main issue with such Strategy is that it has a very high memory overhead, in addition to that, most of the accounting has to happen under locks thus can be source of contentions. Add Simpler policy like clock, can help with both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22755) Cleaner/Compaction can skip the read locks and use the min open txn id
Slim Bouguerra created HIVE-22755: - Summary: Cleaner/Compaction can skip the read locks and use the min open txn id Key: HIVE-22755 URL: https://issues.apache.org/jira/browse/HIVE-22755 Project: Hive Issue Type: Sub-task Reporter: Slim Bouguerra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TX watermark
Slim Bouguerra created HIVE-22754: - Summary: Trim some extra HDFS find file name calls that can be deduced using current TX watermark Key: HIVE-22754 URL: https://issues.apache.org/jira/browse/HIVE-22754 Project: Hive Issue Type: Improvement Components: Transactions Reporter: Slim Bouguerra Assignee: Slim Bouguerra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22743) Enable Fast LLAP-IO path for tables with schema evolution case appending columns.
Slim Bouguerra created HIVE-22743: - Summary: Enable Fast LLAP-IO path for tables with schema evolution case appending columns. Key: HIVE-22743 URL: https://issues.apache.org/jira/browse/HIVE-22743 Project: Hive Issue Type: Improvement Reporter: Slim Bouguerra Assignee: Slim Bouguerra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22742) Skip Class loader forking, when needed.
Slim Bouguerra created HIVE-22742: - Summary: Skip Class loader forking, when needed. Key: HIVE-22742 URL: https://issues.apache.org/jira/browse/HIVE-22742 Project: Hive Issue Type: Improvement Reporter: Slim Bouguerra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing
Slim Bouguerra created HIVE-22629: - Summary: AST Node Children can be quite expensive to build due to List resizing Key: HIVE-22629 URL: https://issues.apache.org/jira/browse/HIVE-22629 Project: Hive Issue Type: Improvement Reporter: Slim Bouguerra Assignee: Slim Bouguerra As per the attached profile, The AST Node can be a major source of CPU and memory churn, due to the ArrayList resizing and copy. In my Opinion this can be amortized by providing the actual size. [~jcamachorodriguez] / [~vgarg] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22492) Amortize lock contention due to LRFU accounting
Slim Bouguerra created HIVE-22492: - Summary: Amortize lock contention due to LRFU accounting Key: HIVE-22492 URL: https://issues.apache.org/jira/browse/HIVE-22492 Project: Hive Issue Type: Improvement Reporter: Slim Bouguerra Assignee: Slim Bouguerra LRFU eviction policy can be a major source of contention under high load. This can be see on the following profiles. To fix this the idea is to use a batching wrapper to amortize the locking contention. The trick i a common way to amortize locking as explained here http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none
Slim Bouguerra created HIVE-22476: - Summary: Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none Key: HIVE-22476 URL: https://issues.apache.org/jira/browse/HIVE-22476 Project: Hive Issue Type: Bug Reporter: Slim Bouguerra Assignee: Slim Bouguerra The actual issue stems to the different date parser used by various part of the engine. Fetch task uses udfdatediff via {code} org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}. This fix is meant to be not very intrusive and will add more support to the GenericUDFToDate by enhancing the parser. For the longer term will be better to use one parser for all the operators. Thanks [~Rajkumar Singh] for the repro example {code} create external table testdatediff(datetimecol string) stored as orc; insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24'); select datetimecol from testdatediff where datediff(cast(current_timestamp as string), datetimecol)<183; set hive.ferch.task.conversion=none; select datetimecol from testdatediff where datediff(cast(current_timestamp as string), datetimecol)<183; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22446) Make IO decoding quantiles counters less contended resource.
Slim Bouguerra created HIVE-22446: - Summary: Make IO decoding quantiles counters less contended resource. Key: HIVE-22446 URL: https://issues.apache.org/jira/browse/HIVE-22446 Project: Hive Issue Type: Improvement Components: llap Reporter: Slim Bouguerra Assignee: Slim Bouguerra Fix For: 4.0.0 Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and updates the IO decoding sample on a per batch based using. {code} org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime {code} via {code} org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData {code} This can be a source of thread contention. Goal of this ticket is to reduce the frequency of updates to avoid major bottleneck. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.
Slim Bouguerra created HIVE-22437: - Summary: LLAP Metadata cache NPE on locking metadata. Key: HIVE-22437 URL: https://issues.apache.org/jira/browse/HIVE-22437 Project: Hive Issue Type: Bug Reporter: Slim Bouguerra Assignee: Slim Bouguerra {code} java.lang.NullPointerException at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22436) Add more logging to the test.
Slim Bouguerra created HIVE-22436: - Summary: Add more logging to the test. Key: HIVE-22436 URL: https://issues.apache.org/jira/browse/HIVE-22436 Project: Hive Issue Type: Sub-task Reporter: Slim Bouguerra Assignee: Slim Bouguerra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22398) Remove Yarn queue management via ShimLoader.
Slim Bouguerra created HIVE-22398: - Summary: Remove Yarn queue management via ShimLoader. Key: HIVE-22398 URL: https://issues.apache.org/jira/browse/HIVE-22398 Project: Hive Issue Type: Task Reporter: Slim Bouguerra Assignee: Slim Bouguerra Legacy MR Hive used this shim loader to do fair scheduling using Yarn Queues non public APIs. This patch will remove this code since it is not used anymore and new [Yarn-YARN-8967|https://issues.apache.org/jira/browse/YARN-8967] changes will break future version upgrade -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22168) remove excessive logging by llap cache.
slim bouguerra created HIVE-22168: - Summary: remove excessive logging by llap cache. Key: HIVE-22168 URL: https://issues.apache.org/jira/browse/HIVE-22168 Project: Hive Issue Type: Improvement Components: llap, Logging Reporter: slim bouguerra Assignee: slim bouguerra Lllap cache logging is very expensive when it comes to log every request buffers range. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22127) Query Routing logging appender is leaking resources of RandomAccessFileManager.
slim bouguerra created HIVE-22127: - Summary: Query Routing logging appender is leaking resources of RandomAccessFileManager. Key: HIVE-22127 URL: https://issues.apache.org/jira/browse/HIVE-22127 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Query routing appender registered by {code:java} org.apache.hadoop.hive.ql.log.LogDivertAppender#registerRoutingAppender {code} Is leaking reference to the {code} org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender {code} On closing operation hooks {code} org.apache.hive.service.cli.operation.Operation#cleanupOperationLog {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22125) Move to Kafka 2.3 Clients
slim bouguerra created HIVE-22125: - Summary: Move to Kafka 2.3 Clients Key: HIVE-22125 URL: https://issues.apache.org/jira/browse/HIVE-22125 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property
slim bouguerra created HIVE-22115: - Summary: Prevent the creation of query-router logger in HS2 as per property Key: HIVE-22115 URL: https://issues.apache.org/jira/browse/HIVE-22115 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra Avoid the creation and registration of query-router logger if the Hive server Property is set to false by the user {code} HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21989) Query must fail in case we have no explicit cast for cases like reader schema is int32 and file schema is int64.
slim bouguerra created HIVE-21989: - Summary: Query must fail in case we have no explicit cast for cases like reader schema is int32 and file schema is int64. Key: HIVE-21989 URL: https://issues.apache.org/jira/browse/HIVE-21989 Project: Hive Issue Type: Bug Reporter: slim bouguerra In Some cases the table definition can be different from the orc file schema. Take an example, Orc files has a column of int64 (bigint) while the table schema definition is int32 (int) then the query must fail in absence of explicit cast from the user. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21934) Materialized view on top of Druid not pushing every thing
slim bouguerra created HIVE-21934: - Summary: Materialized view on top of Druid not pushing every thing Key: HIVE-21934 URL: https://issues.apache.org/jira/browse/HIVE-21934 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: Jesus Camacho Rodriguez The title is not very informative, but examples hopefully are. this is the plan with the view {code} explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, SUM(1) AS `sum_number_of_records_ok`, YEAR(`dates_n1`.`__time`) AS `yr___time_ok` FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) GROUP BY MONTH(`dates_n1`.`__time`), CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), YEAR(`dates_n1`.`__time`) INFO : Starting task [Stage-3:EXPLAIN] in serial mode INFO : Completed executing command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); Time taken: 0.005 seconds INFO : OK ++ | Explain | ++ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 vectorized, llap | | File Output Operator [FS_13] | | Select Operator [SEL_12] (rows=300018951 width=38) | | Output:["_col0","_col1","_col2","_col3"] | | Group By Operator [GBY_11] (rows=300018951 width=38) | | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 | | <-Map 1 [SIMPLE_EDGE] vectorized, llap | | SHUFFLE [RS_10] | | PartitionCols:_col0, _col1, _col2 | | Group By Operator [GBY_9] (rows=600037902 width=38) | | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, _col1, _col2 | | Select Operator [SEL_8] (rows=600037902 width=38) | | Output:["_col0","_col1","_col2"] | | TableScan [TS_0] (rows=600037902 width=38) | | mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} | | | ++ {code} if i use a simple druid table without MV {code} explain SELECT MONTH(`__time`) AS `mn___time_ok`, CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, SUM(1) AS `sum_number_of_records_ok`, YEAR(`__time`) AS `yr___time_ok` FROM `druid_ssb.ssb_druid_100` GROUP BY MONTH(`__time`), CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), YEAR(`__time`); {code} {code} ++ | Explain | ++ | Plan optimized by CBO. | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Select Operator [SEL_1] | | Output:["_col0","_col1","_col2","_col3"] | | TableScan [TS_0] | | Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')),
[jira] [Created] (HIVE-21689) Buddy Allocator memory accounting does not account for failed allocation attempts
slim bouguerra created HIVE-21689: - Summary: Buddy Allocator memory accounting does not account for failed allocation attempts Key: HIVE-21689 URL: https://issues.apache.org/jira/browse/HIVE-21689 Project: Hive Issue Type: Bug Components: llap Reporter: slim bouguerra Assignee: slim bouguerra Allocation method on Buddy Allocator, does not release the reserved memory in case we failed to allocate the full sequence. Simple example: Assume We have an allocation request of 1kb. Will call reserver and reserve 1KB. Try to allocate will fail due to race condition. Try to discard will fail due to no space. At this point will exit without releasing the reserved memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.
slim bouguerra created HIVE-21686: - Summary: Brute Force eviction can lead to a random uncontrolled eviction pattern. Key: HIVE-21686 URL: https://issues.apache.org/jira/browse/HIVE-21686 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Current logic used by brute force eviction can lead to a perpetual random eviction pattern. For instance if the cache build a small pocket of free memory where the total size is greater than incoming allocation request, the allocator will randomly evict block that fits a particular size. This can happen over and over therefore all the eviction will be random. In Addition this random eviction will lead a leak in the linked list maintained by the policy since it does not know anymore about what is evicted and what not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21665) Unable to reconstruct valid SQL query from AST when back ticks are used
slim bouguerra created HIVE-21665: - Summary: Unable to reconstruct valid SQL query from AST when back ticks are used Key: HIVE-21665 URL: https://issues.apache.org/jira/browse/HIVE-21665 Project: Hive Issue Type: Bug Reporter: slim bouguerra Hive-6013 have introduced a parser rule that removes all the {code:java} `{code} from identifiers or query alias, this can result in some issue when we need to reconstruct the actual SQL query from the AST. To reproduce the bug you can use explain analyze statement as the following query {code:java} explain analyze select 'literal' as `alias with space`; {code} This bugs will affect how Ranger plugin and probably result cache, where in both places we need to reconstruct the query from AST. The current work around is to avoid white spaces within Aliases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21621) Update Kafka Clients to recent release 2.2.0
slim bouguerra created HIVE-21621: - Summary: Update Kafka Clients to recent release 2.2.0 Key: HIVE-21621 URL: https://issues.apache.org/jira/browse/HIVE-21621 Project: Hive Issue Type: Task Components: kafka integration Reporter: slim bouguerra Assignee: slim bouguerra all in the title update Kafka Storage Handler to the most recent clients library. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21334) Eviction of blocks is major source of blockage for allocation request. Allocation path need to be lock-free.
slim bouguerra created HIVE-21334: - Summary: Eviction of blocks is major source of blockage for allocation request. Allocation path need to be lock-free. Key: HIVE-21334 URL: https://issues.apache.org/jira/browse/HIVE-21334 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra Attachments: lock_profile.png Eviction is getting in the way of memory allocation when the query fragment has no cache entry. This is cause major bottleneck and waist lot of cpu cycles. To fix this is first we can batch the evictions to avoid taking the lock multiple times. The memory manager need to be able to anticipate such issue and keep some spare space for queries that do not have any hit. {code} IO-Elevator-Thread-12 Blocked CPU usage on sample: 692ms org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(long) LowLevelLrfuCachePolicy.java:264 org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(long) CacheContentsTracker.java:194 org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long, boolean, AtomicBoolean) LowLevelCacheMemoryManager.java:87 org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long, AtomicBoolean) LowLevelCacheMemoryManager.java:63 org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(MemoryBuffer[], int, Allocator$BufferObjectFactory, AtomicBoolean) BuddyAllocator.java:263 org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(MemoryBuffer[], int) EncodedReaderImpl.java:1295 org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(long, DiskRangeList, long, long, EncodedColumnBatch$ColumnStreamData, long, long, IdentityHashMap) EncodedReaderImpl.java:923 org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int, StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], Consumer) EncodedReaderImpl.java:501 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() OrcEncodedDataReader.java:407 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() OrcEncodedDataReader.java:266 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() OrcEncodedDataReader.java:263 java.security.AccessController.doPrivileged(PrivilegedExceptionAction, AccessControlContext) AccessController.java (native) javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) Subject.java:422 org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) UserGroupInformation.java:1688 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() OrcEncodedDataReader.java:263 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() OrcEncodedDataReader.java:110 org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36 org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call() StatsRecordingThreadPool.java:110 java.util.concurrent.FutureTask.run() FutureTask.java:266 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1142 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617 java.lang.Thread.run() Thread.java:745 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21332) Cache Purge command does purge the in-use buffer.
slim bouguerra created HIVE-21332: - Summary: Cache Purge command does purge the in-use buffer. Key: HIVE-21332 URL: https://issues.apache.org/jira/browse/HIVE-21332 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Cache Purge command, is purging what is not suppose to evict. This can lead to unrecoverable state. {code} TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1545278897356_0093_27_00_01_3:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, or an internal issue) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, or an internal issue) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) ... 15 more Caused by: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, or an internal issue) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) ... 17 more Caused by: java.io.IOException: org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, or an internal issue) at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:513) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:407) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:266) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:263) at
[jira] [Created] (HIVE-21026) Druid Vectorize Reader is not using the correct input size
slim bouguerra created HIVE-21026: - Summary: Druid Vectorize Reader is not using the correct input size Key: HIVE-21026 URL: https://issues.apache.org/jira/browse/HIVE-21026 Project: Hive Issue Type: Bug Components: Druid integration, Vectorization Reporter: slim bouguerra Assignee: slim bouguerra In case the number of projected columns is different from number of Input row column will get an array out of bound exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21004) Less object creation for Hive Kafka reader
slim bouguerra created HIVE-21004: - Summary: Less object creation for Hive Kafka reader Key: HIVE-21004 URL: https://issues.apache.org/jira/browse/HIVE-21004 Project: Hive Issue Type: Improvement Components: kafka integration Reporter: slim bouguerra Assignee: slim bouguerra Reduce the amount of un-needed object allocation by using a row boat as way to carry data around. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20997) Make Druid Cluster start on random ports.
slim bouguerra created HIVE-20997: - Summary: Make Druid Cluster start on random ports. Key: HIVE-20997 URL: https://issues.apache.org/jira/browse/HIVE-20997 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra As of now Druid Tests will run in a Single batch. To avoid timeouts we need to support batching of tests. As suggested by [~vihangk1] it will be better to start the Druid tests setups on a totally random ports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20995) Add mini Druid to the list of tests
slim bouguerra created HIVE-20995: - Summary: Add mini Druid to the list of tests Key: HIVE-20995 URL: https://issues.apache.org/jira/browse/HIVE-20995 Project: Hive Issue Type: Test Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20987) Split Druid Tests to avoid Timeouts
slim bouguerra created HIVE-20987: - Summary: Split Druid Tests to avoid Timeouts Key: HIVE-20987 URL: https://issues.apache.org/jira/browse/HIVE-20987 Project: Hive Issue Type: Test Reporter: slim bouguerra Assignee: slim bouguerra Currently Druid Tests fail with Timeout issue. I am plaining on splitting the test into 2 batches at least to avoid timeouts. I will tweak the test code to pick random Druid nodes ports like that minimize the collision issue that we saw before. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20982) Avoid the un-needed object creation within hotloop
slim bouguerra created HIVE-20982: - Summary: Avoid the un-needed object creation within hotloop Key: HIVE-20982 URL: https://issues.apache.org/jira/browse/HIVE-20982 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20958) Cleaning of imports at Hive-common
slim bouguerra created HIVE-20958: - Summary: Cleaning of imports at Hive-common Key: HIVE-20958 URL: https://issues.apache.org/jira/browse/HIVE-20958 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20955) Calcite Rule HiveExpandDistinctAggregatesRule seems throwing IndexOutOfBoundsException
slim bouguerra created HIVE-20955: - Summary: Calcite Rule HiveExpandDistinctAggregatesRule seems throwing IndexOutOfBoundsException Key: HIVE-20955 URL: https://issues.apache.org/jira/browse/HIVE-20955 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra Adde the following query to Druid test ql/src/test/queries/clientpositive/druidmini_expressions.q {code} select count(distinct `__time`, cint) from (select * from druid_table_alltypesorc) as src; {code} leads to error \{code} 2018-11-21T07:36:39,449 ERROR [main] QTestUtil: Client execution failed with error code = 4 running "\{code} with exception stack {code} 2018-11-21T07:36:39,443 ERROR [ecd48683-0286-4cb4-b0ad-e150fab51038 main] parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.IndexOutOfBoundsException: index (1) must be less than size (1) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) ~[guava-19.0.jar:?] at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) ~[guava-19.0.jar:?] at com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41) ~[guava-19.0.jar:?] at org.apache.calcite.rel.metadata.RelMdColumnOrigins.getColumnOrigins(RelMdColumnOrigins.java:77) ~[calcite-core-1.17.0.jar:1.17.0] at GeneratedMetadataHandler_ColumnOrigin.getColumnOrigins_$(Unknown Source) ~[?:?] at GeneratedMetadataHandler_ColumnOrigin.getColumnOrigins(Unknown Source) ~[?:?] at org.apache.calcite.rel.metadata.RelMetadataQuery.getColumnOrigins(RelMetadataQuery.java:345) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule.onMatch(HiveExpandDistinctAggregatesRule.java:168) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:315) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2363) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2314) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2031) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1780) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1680) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) ~[calcite-core-1.17.0.jar:1.17.0] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1439) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12296) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:670) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1893) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1840)
[jira] [Created] (HIVE-20952) Cleaning VectorizationContext.java
slim bouguerra created HIVE-20952: - Summary: Cleaning VectorizationContext.java Key: HIVE-20952 URL: https://issues.apache.org/jira/browse/HIVE-20952 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20931) Minor code cleaning for Druid Storage Handler
slim bouguerra created HIVE-20931: - Summary: Minor code cleaning for Druid Storage Handler Key: HIVE-20931 URL: https://issues.apache.org/jira/browse/HIVE-20931 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Attachments: HIVE-20931.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20932) Vectorize Druid Storage Handler Reader
slim bouguerra created HIVE-20932: - Summary: Vectorize Druid Storage Handler Reader Key: HIVE-20932 URL: https://issues.apache.org/jira/browse/HIVE-20932 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra This patch aims at adding support for vectorize read of data from Druid to Hive. [~t3rmin4t0r] suggested that this will improve the performance of the top level operators that supports vectorization. As a first cut am just adding a wrapper around the existing Record Reader to read up to 1024 row at a time. Future work will be to avoid going via old reader and convert straight the Json (smile format) to Vector primitive types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20903) Cleanup code inspection issue on the druid adapter.
slim bouguerra created HIVE-20903: - Summary: Cleanup code inspection issue on the druid adapter. Key: HIVE-20903 URL: https://issues.apache.org/jira/browse/HIVE-20903 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra This is a simple cleanup of the code and minor refactor. I did not change any of the behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20902) Math.abs(rand.nextInt())) or Math.abs(rand.nexLong())) can return a negative number
slim bouguerra created HIVE-20902: - Summary: Math.abs(rand.nextInt())) or Math.abs(rand.nexLong())) can return a negative number Key: HIVE-20902 URL: https://issues.apache.org/jira/browse/HIVE-20902 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra i see a lot of Math.abs(rand.nextInt())) in the code base and this can return a negative number. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash
slim bouguerra created HIVE-20892: - Summary: Benchmark XXhash for 64 bit hashing function instead of Murmum hash Key: HIVE-20892 URL: https://issues.apache.org/jira/browse/HIVE-20892 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra https://cyan4973.github.io/xxHash/ FYI this is used by lot of other MPP systems ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20893) BloomK Filter probing method is not thread safe
slim bouguerra created HIVE-20893: - Summary: BloomK Filter probing method is not thread safe Key: HIVE-20893 URL: https://issues.apache.org/jira/browse/HIVE-20893 Project: Hive Issue Type: Bug Components: storage-api Reporter: slim bouguerra As far i can tell this is not an issue for Hive yet (most of the usage of probing seems to be done by one thread at a time) but it is an issue of other users like Druid as per the following issue.[https://github.com/apache/incubator-druid/issues/6546] The fix is proposed by the author of [https://github.com/apache/incubator-druid/pull/6584] is to make couple of local fields as ThreadLocals. Idea looks good to me and doesn't have any perf drawbacks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20875) Druid storage handler Kafka ingestion timestamp column name
slim bouguerra created HIVE-20875: - Summary: Druid storage handler Kafka ingestion timestamp column name Key: HIVE-20875 URL: https://issues.apache.org/jira/browse/HIVE-20875 Project: Hive Issue Type: Task Reporter: slim bouguerra Assignee: Nishant Bangarwa This question brought to my attention that currently the Druid-Hive-Kafka ingestion is assuming that the Kafka Stream has to include a column called __time as timestamp column. https://community.hortonworks.com/questions/226191/druid-kafka-ingestion-from-hive-hdp-30.html?childToView=227242#answer-227242 Looking at the code here seems confirming that https://github.com/apache/hive/blob/a51e6aeaf816bdeea5e91ba3a0fab8a31b3a496d/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L301. IMO this is a serious limitation, because user can not always guarantee that the kafka record always will contain a column called `__time`, we need to introduce a have a way to configure this, maybe via table property. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20869) Fix test results file
slim bouguerra created HIVE-20869: - Summary: Fix test results file Key: HIVE-20869 URL: https://issues.apache.org/jira/browse/HIVE-20869 Project: Hive Issue Type: Sub-task Components: kafka integration Reporter: slim bouguerra Assignee: slim bouguerra seems like between the time test run and HIVE-20486 was merged new hive property was added {code} discover.partitions true {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well
slim bouguerra created HIVE-20813: - Summary: udf to_epoch_milli need to support timestamp without time zone as well Key: HIVE-20813 URL: https://issues.apache.org/jira/browse/HIVE-20813 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Currently the following query will fail with a cast exception (tries to cast timestamp to timestamp with local timezone). {code} select to_epoch_milli(current_timestamp) {code} As a simple fix we need to add support for timestamp object inspector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20782) Cleaning some unused code
slim bouguerra created HIVE-20782: - Summary: Cleaning some unused code Key: HIVE-20782 URL: https://issues.apache.org/jira/browse/HIVE-20782 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: Teddy Choi Am making my way into the vectorize code and trying understand the APIs. Ran into this unused one, i guess it is not used anymore. [~ashutoshc] maybe can explain as you are the main contributor to this file ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20768) Adding Tumbling Window UDF
slim bouguerra created HIVE-20768: - Summary: Adding Tumbling Window UDF Key: HIVE-20768 URL: https://issues.apache.org/jira/browse/HIVE-20768 Project: Hive Issue Type: New Feature Reporter: slim bouguerra Assignee: slim bouguerra Goal is to provide a UDF that truncates a timestamp to a beginning of a tumbling window interval. {code} /** * Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. * Tumbling windows are inclusive start exclusive end. * By default the beginning instant of fist window is Epoch 0 Thu Jan 01 00:00:00 1970 UTC. * Optionally users may provide a different origin as a timestamp arg3. * * This an example of series of window with an interval of 5 seconds and origin Epoch 0 Thu Jan 01 00:00:00 1970 UTC: * * * interval 1 interval 2interval 3 * Jan 01 00:00:00 Jan 01 00:00:05 Jan 01 00:00:10 * 0 -- 4 : 5 --- 9: 10 --- 14 * * This UDF rounds timestamp agr1 to the beginning of window interval where it belongs to. * */ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20736) Eagerly allocate Containers from Yarn
slim bouguerra created HIVE-20736: - Summary: Eagerly allocate Containers from Yarn Key: HIVE-20736 URL: https://issues.apache.org/jira/browse/HIVE-20736 Project: Hive Issue Type: Bug Components: llap, Tez Affects Versions: 4.0.0 Reporter: slim bouguerra Assignee: Sergey Shelukhin According to [~sershe] HS2 interactive at startup-time tries to eagerly allocated what ever needed to execute queries. But currently this process is kind of broken. AS of now HS2I starts and tries to allocate the resources but in cases there is not enough Yarn resources in the desired queue, HS2I will keep trying in the background forever and will not bubble this up as an issue. trying forever to allocate without signaling error defeats the idea of eagerly allocate in my opinion. I think HS2I has to fail the start if after XX minutes can not eagerly allocate the minimum space needed to run the max concurrent query. CC [~hagleitn]/[~t3rmin4t0r] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20735) Address some of the review comments.
slim bouguerra created HIVE-20735: - Summary: Address some of the review comments. Key: HIVE-20735 URL: https://issues.apache.org/jira/browse/HIVE-20735 Project: Hive Issue Type: Sub-task Components: kafka integration Reporter: slim bouguerra Assignee: slim bouguerra As part of the review comments we agreed to: # remove start and end offsets columns # remove the best effort mode # make the 2pc as default protocol for EOS Also this patch will include an additional enhancement to add kerberos support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20639) Add ability to Write Data from Hive Table/Query to Kafka Topic
slim bouguerra created HIVE-20639: - Summary: Add ability to Write Data from Hive Table/Query to Kafka Topic Key: HIVE-20639 URL: https://issues.apache.org/jira/browse/HIVE-20639 Project: Hive Issue Type: New Feature Components: kafka integration Reporter: slim bouguerra Assignee: slim bouguerra This patch adds multiple record writers to allow Hive user writing data directly to a Kafka Topic. The writer provides multiple write semantics modes. * A None where all the records will be delivered with no guarantee or reties. * B At_least_once, each record will be delivered with retries from the Kafka Producer and Hive Write Task. * C Exactly_once , Writer will be using Kafka Transaction API to ensure that each record is delivered once. In addition to the new feature i have refactored the existing code to make it more readable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20561) Use the position of the Kafka Consumer to track progress instead of Consumer Records offsets
slim bouguerra created HIVE-20561: - Summary: Use the position of the Kafka Consumer to track progress instead of Consumer Records offsets Key: HIVE-20561 URL: https://issues.apache.org/jira/browse/HIVE-20561 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 4.0.0 Kafka Partitions with transactional messages (post 0.11) will include commit or abort markers which indicate the result of a transaction. The markers are not returned to applications, yet have an offset in the log. Therefore the end of Stream position can be the offset of a control message. This Patch change the way how we keep track of the consumer position by using {code} consumer.position(topicP) {code} as oppose to using the offset of the consumed messages. Also I have done some refactoring to help code readability hopefully. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20485) Test Storage Handler with Secured Kafka Cluster
slim bouguerra created HIVE-20485: - Summary: Test Storage Handler with Secured Kafka Cluster Key: HIVE-20485 URL: https://issues.apache.org/jira/browse/HIVE-20485 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra Need to test this with Secured Kafka Cluster. * Kerberos * SSL support -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20481) Add the Kafka Key record as part of the row.
slim bouguerra created HIVE-20481: - Summary: Add the Kafka Key record as part of the row. Key: HIVE-20481 URL: https://issues.apache.org/jira/browse/HIVE-20481 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra Kafka records are keyed, most of the case this key is null or used to route records to the same partition. This patch adds this column as a binary column {code} __record_key{code}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20427) Remove Druid Mock tests from CliDrive
slim bouguerra created HIVE-20427: - Summary: Remove Druid Mock tests from CliDrive Key: HIVE-20427 URL: https://issues.apache.org/jira/browse/HIVE-20427 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra as per comment https://issues.apache.org/jira/browse/HIVE-20425?focusedCommentId=16586272=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16586272 We do not need to run those Mock Druid tests anymore, since org.apache.hadoop.hive.cli.TestMiniDruidCliDriver cover most of this cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20426) Upload Druid Test Runner logs from Build Slaves
slim bouguerra created HIVE-20426: - Summary: Upload Druid Test Runner logs from Build Slaves Key: HIVE-20426 URL: https://issues.apache.org/jira/browse/HIVE-20426 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: Vineet Garg Currently only hive log is uploaded from "hive/itests/qtest/tmp/log/" It will be very valuable if we can add the following Druid logs * coordinator.log * broker.log * historical.log -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20425) Use a custom range of port for embedded Derby used by Druid.
slim bouguerra created HIVE-20425: - Summary: Use a custom range of port for embedded Derby used by Druid. Key: HIVE-20425 URL: https://issues.apache.org/jira/browse/HIVE-20425 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Seems like good amount of the flakiness of Druid Tests is due to port collision between Derby used by Hive and the one used by Druid. The goal of this Patch is to use a custom range 60_000 to 65535 and find the first available to be used by Druid Derby process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20377) Hive Kafka Storage Handler
slim bouguerra created HIVE-20377: - Summary: Hive Kafka Storage Handler Key: HIVE-20377 URL: https://issues.apache.org/jira/browse/HIVE-20377 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra Assignee: slim bouguerra h1. Goal * Read streaming data form Kafka queue as an external table. * Allow streaming navigation by pushing down filters on Kafka record partition id, offset and timestamp. * Insert streaming data form Kafka to an actual Hive internal table, using CTAS statement. h1. Example h2. Create the external table {code} CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamps, page string, `user` string, language string, added int, deleted int, flags string,comment string, namespace string) STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' TBLPROPERTIES ("kafka.topic" = "wikipedia", "kafka.bootstrap.servers"="brokeraddress:9092", "kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe"); {code} h2. Kafka Metadata In order to keep track of Kafka records the storage handler will add automatically the Kafka row metadata eg partition id, record offset and record timestamp. {code} DESCRIBE EXTENDED kafka_table timestamp timestamp from deserializer pagestring from deserializer userstring from deserializer languagestring from deserializer country string from deserializer continent string from deserializer namespace string from deserializer newpage boolean from deserializer unpatrolled boolean from deserializer anonymous boolean from deserializer robot boolean from deserializer added int from deserializer deleted int from deserializer delta bigint from deserializer __partition int from deserializer __offsetbigint from deserializer __timestamp bigint from deserializer {code} h2. Filter push down. Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on a given offset. The proposed storage handler will be able to leverage such API by pushing down filters over metadata columns, namely __partition (int), __offset(long) and __timestamp(long) For instance Query like {code} select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 and `__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and `__offset` > 99) or (`__offset` = 109); {code} Will result on a scan of partition 0 only then read only records between offset 4 and 109. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20376) Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z"
slim bouguerra created HIVE-20376: - Summary: Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z" Key: HIVE-20376 URL: https://issues.apache.org/jira/browse/HIVE-20376 Project: Hive Issue Type: Bug Reporter: slim bouguerra It will be nice to add ISO formats to timezone utils parser to handler the following "2013-08-31T01:02:33Z" org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String) CC [~jcamachorodriguez]/ [~ashutoshc] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20375) Json SerDe ignoring the timestamp.formats property
slim bouguerra created HIVE-20375: - Summary: Json SerDe ignoring the timestamp.formats property Key: HIVE-20375 URL: https://issues.apache.org/jira/browse/HIVE-20375 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: slim bouguerra JsonSerd is supposed to accept "timestamp.formats" SerDe property to allow different timestamp formats, after recent refactor I see that this is not working anymore. Looking at the code I can see that The serde is not using the constructed parser with added format https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L82 But instead it is using Converter https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L324 Then converter is using org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter.TimestampConverter This converter does not have any knowledge about user formats or what so ever... It is using this static converter org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils#getTimestampFromString -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20094) Update Druid to 0.12.1 version
slim bouguerra created HIVE-20094: - Summary: Update Druid to 0.12.1 version Key: HIVE-20094 URL: https://issues.apache.org/jira/browse/HIVE-20094 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra As per Jira title. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19923) Follow up of HIVE-19615, use UnaryFunction instead of prefix
slim bouguerra created HIVE-19923: - Summary: Follow up of HIVE-19615, use UnaryFunction instead of prefix Key: HIVE-19923 URL: https://issues.apache.org/jira/browse/HIVE-19923 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Correct usage of Druid isnull function is {code} isnull(exp){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19879) Remove unused calcite sql operator.
slim bouguerra created HIVE-19879: - Summary: Remove unused calcite sql operator. Key: HIVE-19879 URL: https://issues.apache.org/jira/browse/HIVE-19879 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra HIVE-19796 introduced by mistake an unused sql operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19869) Remove double formatting bug followup of HIVE-19382
slim bouguerra created HIVE-19869: - Summary: Remove double formatting bug followup of HIVE-19382 Key: HIVE-19869 URL: https://issues.apache.org/jira/browse/HIVE-19869 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra HIVE-19382 has a minor bug that happens when users provide custom format as part of FROM_UNIXTIMESTAMP function. Here is an example query {code} SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`, CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP)), '-MM-dd HH:00:00') AS TIMESTAMP) AS `thr___time_ok` FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100` GROUP BY CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP)), '-MM-dd HH:00:00') AS TIMESTAMP); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19868) Extract support for float aggregator
slim bouguerra created HIVE-19868: - Summary: Extract support for float aggregator Key: HIVE-19868 URL: https://issues.apache.org/jira/browse/HIVE-19868 Project: Hive Issue Type: Sub-task Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19796) Push Down TRUNC Fn to Druid Storage Handler
slim bouguerra created HIVE-19796: - Summary: Push Down TRUNC Fn to Druid Storage Handler Key: HIVE-19796 URL: https://issues.apache.org/jira/browse/HIVE-19796 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Push down Queries with TRUNC date function such as {code} SELECT SUM((`ssb_druid_100`.`discounted_price` * `ssb_druid_100`.`net_revenue`)) AS `sum_calculation_4998925219892510720_ok`, CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE) AS `tmn___time_ok` FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100` GROUP BY CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19721) Druid Storage handler throws exception when query has a Cast to Date
slim bouguerra created HIVE-19721: - Summary: Druid Storage handler throws exception when query has a Cast to Date Key: HIVE-19721 URL: https://issues.apache.org/jira/browse/HIVE-19721 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.1 {code} SELECT CAST(`ssb_druid_100`.`__time` AS DATE) AS `x_time`, SUM(`ssb_druid_100`.`metric_c`) AS `sum_lo_revenue_ok` FROM `default`.`druid_test_table` `ssb_druid_100` GROUP BY CAST(`ssb_druid_100`.`__time` AS DATE); {code} {code} 2018-05-26T06:54:56,570 DEBUG [HttpClient-Netty-Worker-5] client.NettyHttpClient: [POST http://localhost:8082/druid/v2/] Got chunk: 0B, last=true 2018-05-26T06:54:56,572 ERROR [1917f624-7b94-4990-9e3a-bbfff3656365 main] CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Unknown type: DATE java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: DATE at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:602) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2509) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1514) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1488) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.TestMiniDruidLocalCliDriver.testCliDriver(TestMiniDruidLocalCliDriver.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at
[jira] [Created] (HIVE-19695) Year Month Day extraction functions need to add an implicit cast for column that are String types
slim bouguerra created HIVE-19695: - Summary: Year Month Day extraction functions need to add an implicit cast for column that are String types Key: HIVE-19695 URL: https://issues.apache.org/jira/browse/HIVE-19695 Project: Hive Issue Type: Bug Components: Druid integration, Query Planning Affects Versions: 3.0.0 Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.1.0 To avoid surprising/wrong results, Hive Query plan shall add an explicit cast over non date/timestamp column type when user try to extract Year/Month/Hour etc.. This is an example of misleading results. {code} create table test_base_table(`timecolumn` timestamp, `date_c` string, `timestamp_c` string, `metric_c` double); insert into test_base_table values ('2015-03-08 00:00:00', '2015-03-10', '2015-03-08 00:00:00', 5.0); CREATE TABLE druid_test_table STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `date_c`, `timestamp_c`, `metric_c` FROM test_base_table; select year(date_c), month(date_c),day(date_c), hour(date_c), year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c) from druid_test_table; {code} will return the following wrong results: {code} PREHOOK: query: select year(date_c), month(date_c),day(date_c), hour(date_c), year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c) from druid_test_table PREHOOK: type: QUERY PREHOOK: Input: default@druid_test_table A masked pattern was here POSTHOOK: query: select year(date_c), month(date_c),day(date_c), hour(date_c), year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c) from druid_test_table POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_test_table A masked pattern was here 196912 31 16 196912 31 16 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19684) Hive stats optimizer wrongly uses stats against non native tables
slim bouguerra created HIVE-19684: - Summary: Hive stats optimizer wrongly uses stats against non native tables Key: HIVE-19684 URL: https://issues.apache.org/jira/browse/HIVE-19684 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Stats of non native tables are inaccurate, thus queries over non native tables can not optimized by stats optimizer. Take example of query {code} Explain select count(*) from (select `__time` from druid_test_table limit 1) as src ; {code} the plan will be reduced to {code} POSTHOOK: query: explain extended select count(*) from (select `__time` from druid_test_table limit 1) as src POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: 1 Processor Tree: ListSink {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19680) Push down limit is not applied for Druid storage handler.
slim bouguerra created HIVE-19680: - Summary: Push down limit is not applied for Druid storage handler. Key: HIVE-19680 URL: https://issues.apache.org/jira/browse/HIVE-19680 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 Query like {code} select `__time` from druid_test_table limit 1; {code} returns more than one row. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19675) Cast to timestamps on Druid time column leads to an exception
slim bouguerra created HIVE-19675: - Summary: Cast to timestamps on Druid time column leads to an exception Key: HIVE-19675 URL: https://issues.apache.org/jira/browse/HIVE-19675 Project: Hive Issue Type: Bug Components: Druid integration Affects Versions: 3.0.0 Reporter: slim bouguerra Assignee: Jesus Camacho Rodriguez The following query fail due to a formatting issue. {code} SELECT CAST(`ssb_druid_100`.`__time` AS TIMESTAMP) AS `x_time`, . . . . . . . . . . . . . . . .> SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok` . . . . . . . . . . . . . . . .> FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100` . . . . . . . . . . . . . . . .> GROUP BY CAST(`ssb_druid_100`.`__time` AS TIMESTAMP); {code} Exception {code} Error: java.io.IOException: java.lang.NumberFormatException: For input string: "1991-12-31 19:00:00" (state=,code=0) {code} [~jcamachorodriguez] maybe this is fixed by your upcoming patches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19674) Group by Decimal Constants push down to Druid tables.
slim bouguerra created HIVE-19674: - Summary: Group by Decimal Constants push down to Druid tables. Key: HIVE-19674 URL: https://issues.apache.org/jira/browse/HIVE-19674 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Queries like following gets generated by Tableau. {code} SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok` FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100` GROUP BY 1.1001; {code} The Group key is pushed down to Druid as a Constant Column, this leads to an Exception while parsing back the results since Druid Input format does not allow Decimals. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19672) Column Names mismatch between native Druid Tables and Hive External map
slim bouguerra created HIVE-19672: - Summary: Column Names mismatch between native Druid Tables and Hive External map Key: HIVE-19672 URL: https://issues.apache.org/jira/browse/HIVE-19672 Project: Hive Issue Type: Bug Components: Druid integration Affects Versions: 3.0.0 Reporter: slim bouguerra Fix For: 4.0.0 Druid Columns names are case sensitive while Hive is case insensitive. This implies that any Druid Datasource that has columns with some upper cases as part of column name it will not return the expected results. One possible fix is to try to remap the column names before issuing Json Query to Druid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19615) Proper handling of is null and not is null predicate when pushed to Druid
slim bouguerra created HIVE-19615: - Summary: Proper handling of is null and not is null predicate when pushed to Druid Key: HIVE-19615 URL: https://issues.apache.org/jira/browse/HIVE-19615 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 Recent development in Druid introduced new semantic of null handling [here|https://github.com/b-slim/druid/commit/219e77aeac9b07dc20dd9ab2dd537f3f17498346] Based on those changes when need to honer push down of expressions with is null/ is not null predicates. The prosed fix overrides the mapping of Calcite Function to Druid Expression to much the correct semantic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of Aggregates
slim bouguerra created HIVE-19607: - Summary: Pushing Aggregates on Top of Aggregates Key: HIVE-19607 URL: https://issues.apache.org/jira/browse/HIVE-19607 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Fix For: 3.1.0 This plan shows an instance where the count aggregates can be pushed to Druid which will eliminate the last stage reducer. {code} +PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table +PREHOOK: type: QUERY +POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Tez + A masked pattern was here + Edges: +Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) + A masked pattern was here + Vertices: +Map 1 +Map Operator Tree: +TableScan + alias: druid_table + properties: +druid.fieldNames cstring2,$f1 +druid.fieldTypes string,double +druid.query.json {"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]} +druid.query.type groupBy + Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE + Select Operator +expressions: cstring2 (type: string), $f1 (type: double) +outputColumnNames: cstring2, $f1 +Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE +Group By Operator + aggregations: count(cstring2), sum($f1) + mode: hash + outputColumnNames: _col0, _col1 + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE +value expressions: _col0 (type: bigint), _col1 (type: double) +Reducer 2 +Reduce Operator Tree: + Group By Operator +aggregations: count(VALUE._col0), sum(VALUE._col1) +mode: mergepartial +outputColumnNames: _col0, _col1 +Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE +File Output Operator + compressed: false + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.SequenceFileInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19601) Unsupported Post join function
slim bouguerra created HIVE-19601: - Summary: Unsupported Post join function Key: HIVE-19601 URL: https://issues.apache.org/jira/browse/HIVE-19601 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra h1. As part of trying to use the Calcite rule {code} org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#JOIN {code} i got the following Calcite plan {code} 2018-05-17T09:26:02,781 DEBUG [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] translator.PlanModifierForASTConv: Final plan after modifier HiveProject(_c0=[$1], _c1=[$2]) HiveProject(zone=[$0], $f1=[$1], $f2=[$3]) HiveJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(zone=[$0], $f1=[$1]) HiveAggregate(group=[{0}], agg#0=[count($1)]) HiveProject(zone=[$0], interval_marker=[$1]) HiveAggregate(group=[{0, 1}]) HiveProject(zone=[$3], interval_marker=[$1]) HiveTableScan(table=[[druid_test_dst.test_base_table]], table:alias=[test_base_table]) HiveProject(zone=[$0], $f1=[$1]) HiveAggregate(group=[{0}], agg#0=[count($1)]) HiveProject(zone=[$0], dim=[$1]) HiveAggregate(group=[{0, 1}]) HiveProject(zone=[$3], dim=[$4]) HiveTableScan(table=[[druid_test_dst.test_base_table]], table:alias=[test_base_table]) {code} I run into this issue {code} 2018-05-17T09:26:02,876 ERROR [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] parse.CalcitePlanner: CBO failed, skipping CBO. org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid function 'IS NOT DISTINCT FROM' at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1069) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1464) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19600) Hive and Calcite have different semantics for Grouping sets
slim bouguerra created HIVE-19600: - Summary: Hive and Calcite have different semantics for Grouping sets Key: HIVE-19600 URL: https://issues.apache.org/jira/browse/HIVE-19600 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Fix For: 3.1.0 h1. Issue: Tried to use the calcite rule {code} org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#AggregateExpandDistinctAggregatesRule(java.lang.Class, boolean, org.apache.calcite.tools.RelBuilderFactory) {code} to replace current rule used by Hive {code} org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule#HiveExpandDistinctAggregatesRule {code} But i got an exception when generating the Operator tree out of calcite plan. This is the Calcite plan {code} HiveProject.HIVE.[](input=rel#50:HiveAggregate.HIVE.[](input=rel#48:HiveProject.HIVE.[](input=rel#44:HiveAggregate.HIVE.[](input=rel#38:HiveProject.HIVE.[](input=rel#0:HiveTableScan.HIVE.[] (table=[druid_test_dst.test_base_table],table:alias=test_base_table)[false],$f0=$3,$f1=$1,$f2=$4),group={0, 1, 2},groups=[{0, 1}, {0, 2}],$g=GROUPING($0, $1, $2)),$f0=$0,$f1=$1,$f2=$2,$g_1==($3, 1),$g_2==($3, 2)),group={0},agg#0=count($1) FILTER $3,agg#1=count($2) FILTER $4),_o__c0=$1,_o__c1=$2) {code} This is the exception stack {code} 2018-05-17T08:46:48,604 ERROR [649a61b0-d8c7-45d8-962d-b1d38397feb4 main] ql.Driver: FAILED: SemanticException Line 0:-1 Argument type mismatch 'zone': The first argument to grouping() must be an int/long. Got: STRING org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Argument type mismatch 'zone': The first argument to grouping() must be an int/long. Got: STRING at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1467) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:239) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:185) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12566) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12521) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4525) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4298) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10487) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11339) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11196) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11223) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11209) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:517) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12074) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:164) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:643) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1686) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1633) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1628) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at
[jira] [Created] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities
slim bouguerra created HIVE-19586: - Summary: Optimize Count(distinct X) pushdown based on the storage capabilities Key: HIVE-19586 URL: https://issues.apache.org/jira/browse/HIVE-19586 Project: Hive Issue Type: Improvement Components: Druid integration, Logical Optimizer Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 h1. Goal Provide a way to rewrite queries with combination of COUNT(Distinct) and Aggregates like SUM as a series of Group By. This can be useful to push down to Druid queries like {code} select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) FROM druid_test_table GROUP BY `__time`, `zone` ; {code} In general this can be useful to be used in cases where storage handlers can not perform count (distinct column) h1. How to do it. Use the Calcite rule {code} org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that breaks down Count distinct to a single Group by with Grouping sets or multiple series of Group by that might be linked with Joins if multiple counts are present. FYI today Hive does have a similar rule {code} org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code}, but it only provides a rewrite to Grouping sets based plan. I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or caveats to be aware of? h2. Concerns/questions Need to have a way to switch between Grouping sets or Simple chained group by based on the plan cost. For instance for Druid based scan it makes always sense (at least today) to push down a series of Group by and stitch result sets in Hive later (as oppose to scan everything). But this might be not true for other storage handler that can handle Grouping sets it is better to push down the Grouping sets as one table scan. Am still unsure how i can lean on the cost optimizer to select the best plan, [~ashutoshc]/[~jcamachorodriguez] any inputs? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19490) Locking on Insert into for non native and managed tables.
slim bouguerra created HIVE-19490: - Summary: Locking on Insert into for non native and managed tables. Key: HIVE-19490 URL: https://issues.apache.org/jira/browse/HIVE-19490 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Current state of the art: Managed non native table like Druid Tables, will need to get a Lock on Insert into or insert Over write. The nature of this lock is set to Exclusive by default for any non native table. This implies that Inserts into Druid table will Lock any read query as well during the execution of the insert into. IMO this lock (on insert into) is not needed since the insert statement is appending data and the state of loading it is managed partially by Hive Storage handler hook and part of it by Druid. What i am proposing is to relax the lock level to shared for all non native tables on insert into operations and keep it as Exclusive Write for insert Overwrite for now. Any feedback is welcome. cc [~ekoifman] / [~ashutoshc] / [~jdere] / [~hagleitn] Also am not sure what is the best way to unit test this currently am using debugger to check of locks are what i except, please let me know if there is a better way to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
slim bouguerra created HIVE-19474: - Summary: Decimal type should be casted as part of the CTAS or INSERT Clause. Key: HIVE-19474 URL: https://issues.apache.org/jira/browse/HIVE-19474 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra HIVE-18569 introduced a runtime config variable to allow the indexing of Decimal as Double, this leads to kind of messy state, Hive metadata think the column is still decimal while it is stored as double. Since the Hive metadata of the column is Decimal the logical optimizer will not push down aggregates. i tried to fix this by adding some logic to the application but it makes the code very clumsy with lot of branches. Instead i propose to revert this patch and let the user introduce an explicit cast this will be better since the metada reflects actual storage type and push down aggregates will kick in and there is no config needed. cc [~ashutoshc] and [~nishantbangarwa] You can see the difference with the following DDL {code} create table test_base_table(`timecolumn` timestamp, `interval_marker` string, `num_l` DECIMAL(10,2)); insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5); set hive.druid.approx.result=true; CREATE TABLE druid_test_table STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, cast(`num_l` as double) FROM test_base_table; describe druid_test_table; explain select sum(num_l), min(num_l) FROM druid_test_table; CREATE TABLE druid_test_table_2 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, `num_l` FROM test_base_table; describe druid_test_table_2; explain select sum(num_l), min(num_l) FROM druid_test_table_2; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19462) Fix mapping for char_length function to enable pushdown to Druid.
slim bouguerra created HIVE-19462: - Summary: Fix mapping for char_length function to enable pushdown to Druid. Key: HIVE-19462 URL: https://issues.apache.org/jira/browse/HIVE-19462 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra currently char_length is not push down to Druid because of missing mapping form/to calcite This patch will add this mapping. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19443) Issue with Druid timestamp with timezone handling
slim bouguerra created HIVE-19443: - Summary: Issue with Druid timestamp with timezone handling Key: HIVE-19443 URL: https://issues.apache.org/jira/browse/HIVE-19443 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Attachments: test_resutls.out, test_timestamp.q As you can see at the attached file [^test_resutls.out] when switching current timezone to UTC the insert of values from Hive table into Druid table does miss some rows. You can use this to reproduce it. [^test_timestamp.q] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19441) Add support for float aggregator and use LLAP test Driver
slim bouguerra created HIVE-19441: - Summary: Add support for float aggregator and use LLAP test Driver Key: HIVE-19441 URL: https://issues.apache.org/jira/browse/HIVE-19441 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Adding support to the float kind aggregator. Use LLAP as test Driver to reduce execution time of tests from about 2 hours to 15 min: Although this patches unveiling an issue with timezone, maybe it is fixed by [~jcamachorodriguez] upcoming set of patches. Before {code} [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ hive-it-qfile --- [INFO] Compiling 21 source files to /Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6,654.117 s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver [INFO] [INFO] Results: [INFO] [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:51 h [INFO] Finished at: 2018-05-04T12:43:19-07:00 [INFO] {code} After {code} INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ hive-it-qfile --- [INFO] Compiling 22 source files to /Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 907.167 s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver [INFO] [INFO] Results: [INFO] [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 15:31 min [INFO] Finished at: 2018-05-04T13:15:11-07:00 [INFO] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19298) Fix operator tree of CTAS for Druid Storage Handler
slim bouguerra created HIVE-19298: - Summary: Fix operator tree of CTAS for Druid Storage Handler Key: HIVE-19298 URL: https://issues.apache.org/jira/browse/HIVE-19298 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.1.0 Current operator plan of CTAS for Druid storage handler is broken when used enables the property \{code} hive.exec.parallel\{code} as \{code} true\{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19239) Check for possible null timestamp fields during SerDe from Druid events
slim bouguerra created HIVE-19239: - Summary: Check for possible null timestamp fields during SerDe from Druid events Key: HIVE-19239 URL: https://issues.apache.org/jira/browse/HIVE-19239 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Currently we do not check for possible null timestamp events. This might lead to NPE. This Patch add addition check for such case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19187) Update Druid Storage Handler to Druid 0.12.0
slim bouguerra created HIVE-19187: - Summary: Update Druid Storage Handler to Druid 0.12.0 Key: HIVE-19187 URL: https://issues.apache.org/jira/browse/HIVE-19187 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.1.0 Current used Druid Version is 0.11.0 This Patch updates the Druid version to the most recent version 0.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19157) Assert that Insert into Druid Table it fails.
slim bouguerra created HIVE-19157: - Summary: Assert that Insert into Druid Table it fails. Key: HIVE-19157 URL: https://issues.apache.org/jira/browse/HIVE-19157 Project: Hive Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra The usual work flow of loading Data into Druid relies on the fact that HS2 is able to load Segments metadata from HDFS that are produced by LLAP/TEZ works. In some cases where HS2 is not able to perform `ls` on the HDFS path the insert into query will return success and will not insert any data. This bug was introduced at function {code} org.apache.hadoop.hive.druid.DruidStorageHandlerUtils#getCreatedSegments{code} when we added feature to allow create empty tables. {code} try { fss = fs.listStatus(taskDir); } catch (FileNotFoundException e) { // This is a CREATE TABLE statement or query executed for CTAS/INSERT // did not produce any result. We do not need to do anything, this is // expected behavior. return publishedSegmentsBuilder.build(); } {code} Am still looking for the way to fix this, [~jcamachorodriguez]/[~ashutoshc] any idea what is the best way to detect that it is an empty create table statement? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19155) Day time saving cause Druid inserts to fail with org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments
slim bouguerra created HIVE-19155: - Summary: Day time saving cause Druid inserts to fail with org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments Key: HIVE-19155 URL: https://issues.apache.org/jira/browse/HIVE-19155 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra If you try to insert data around the daylight saving time hour the query fails with following exception {code} 2018-04-10T11:24:58,836 ERROR [065fdaa2-85f9-4e49-adaf-3dc14d51be90 main] exec.DDLTask: Failed org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments [2015-03-08T05:00:00.000Z/2015-03-09T05:00:00.000Z and 2015-03-09T04:00:00.000Z/2015-03-10T04:00:00.000Z] with the same version [2018-04-10T11:24:48.388-07:00] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:914) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:919) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4831) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:394) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2443) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2114) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1797) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1538) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1532) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) [hive-cli-3.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) [hive-cli-3.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) [hive-cli-3.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) [hive-cli-3.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1455) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1429) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) [test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_92] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92] {code} You can reproduce this using the following DDL {code} create database druid_test; use druid_test; create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float); insert into test_table values ('2015-03-08 00:00:00', 'i1-start', 4); insert into test_table values ('2015-03-08 23:59:59', 'i1-end', 1); insert into test_table values ('2015-03-09 00:00:00', 'i2-start', 4); insert into test_table values ('2015-03-09 23:59:59', 'i2-end', 1); insert into test_table values ('2015-03-10 00:00:00', 'i3-start', 2); insert into test_table values ('2015-03-10 23:59:59', 'i3-end', 2); CREATE TABLE druid_table STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `userid`, `num_l` FROM test_table; {code} The fix is to always adjust the Druid segments identifiers to UTC.
[jira] [Created] (HIVE-19070) Add More Test To Druid Mini Cluster 200 Tableau kind queries.
slim bouguerra created HIVE-19070: - Summary: Add More Test To Druid Mini Cluster 200 Tableau kind queries. Key: HIVE-19070 URL: https://issues.apache.org/jira/browse/HIVE-19070 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 In This patch am adding 200 new tableau query that runs over a new Data set called calcs. The data set is very small. I also have consolidated 3 different tests to run as one test this will help with keeping execution time low. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19044) Duplicate field names within Druid Query Generated by Calcite plan
slim bouguerra created HIVE-19044: - Summary: Duplicate field names within Druid Query Generated by Calcite plan Key: HIVE-19044 URL: https://issues.apache.org/jira/browse/HIVE-19044 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra This is the Query plan as you can see "$f4" is duplicated. {code} PREHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk, SUM(Calcs.num0) AS temp_z_stdevp_num0___1723718801__0_, COUNT(Calcs.num0) AS temp_z_stdevp_num0___2730138885__0_, SUM((Calcs.num0 * Calcs.num0)) AS temp_z_stdevp_num0___4071133194__0_, STDDEV_POP(Calcs.num0) AS stp_num0_ok FROM druid_tableau.calcs Calcs GROUP BY Calcs.key PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk, SUM(Calcs.num0) AS temp_z_stdevp_num0___1723718801__0_, COUNT(Calcs.num0) AS temp_z_stdevp_num0___2730138885__0_, SUM((Calcs.num0 * Calcs.num0)) AS temp_z_stdevp_num0___4071133194__0_, STDDEV_POP(Calcs.num0) AS stp_num0_ok FROM druid_tableau.calcs Calcs GROUP BY Calcs.key POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: calcs properties: druid.fieldNames key,$f1,$f2,$f3,$f4 druid.fieldTypes string,double,bigint,double,double druid.query.json {"queryType":"groupBy","dataSource":"druid_tableau.calcs","granularity":"all","dimensions":[{"type":"default","dimension":"key","outputName":"key","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"num0"},{"type":"filtered","filter":{"type":"not","field":{"type":"selector","dimension":"num0","value":null}},"aggregator":{"type":"count","name":"$f2","fieldName":"num0"}},{"type":"doubleSum","name":"$f3","expression":"(\"num0\" * \"num0\")"},{"type":"doubleSum","name":"$f4","expression":"(\"num0\" * \"num0\")"}],"postAggregations":[{"type":"expression","name":"$f4","expression":"pow(((\"$f4\" - ((\"$f1\" * \"$f1\") / \"$f2\")) / \"$f2\"),0.5)"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]} druid.query.type groupBy Select Operator expressions: key (type: string), $f1 (type: double), $f2 (type: bigint), $f3 (type: double), $f4 (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4 ListSink {code} Table DDL {code} create database druid_tableau; use druid_tableau; drop table if exists calcs; create table calcs STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "MONTH", "druid.query.granularity" = "DAY") AS SELECT cast(datetime0 as timestamp with local time zone) `__time`, key, str0, str1, str2, str3, date0, date1, date2, date3, time0, time1, datetime1, zzz, cast(bool0 as string) bool0, cast(bool1 as string) bool1, cast(bool2 as string) bool2, cast(bool3 as string) bool3, int0, int1, int2, int3, num0, num1, num2, num3, num4 from default.calcs_orc; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19023) Druid storage Handler still using old select query when the CBO fails
slim bouguerra created HIVE-19023: - Summary: Druid storage Handler still using old select query when the CBO fails Key: HIVE-19023 URL: https://issues.apache.org/jira/browse/HIVE-19023 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra See usage of function {code} org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#createSelectStarQuery{code} this can be replaced by scan query that is more efficent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19011) Druid Storage Handler returns conflicting results for Qtest druidmini_dynamic_partition.q
slim bouguerra created HIVE-19011: - Summary: Druid Storage Handler returns conflicting results for Qtest druidmini_dynamic_partition.q Key: HIVE-19011 URL: https://issues.apache.org/jira/browse/HIVE-19011 Project: Hive Issue Type: Bug Reporter: slim bouguerra This git diff shows the conflicting results {code} diff --git a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out index 714778ebfc..cea9b7535c 100644 --- a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out +++ b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out @@ -243,7 +243,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_partitioned_table POSTHOOK: Output: hdfs://### HDFS PATH ### -1408069801800 4139540644 10992545287 165393120 +1408069801800 3272553822 10992545287 -648527473 PREHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM druid_partitioned_table_0 PREHOOK: type: QUERY PREHOOK: Input: default@druid_partitioned_table_0 @@ -429,7 +429,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM d POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_partitioned_table POSTHOOK: Output: hdfs://### HDFS PATH ### -2857395071862 4139540644 -1661313883124 885815256 +2857395071862 3728054572 -1661313883124 71894663 PREHOOK: query: EXPLAIN INSERT OVERWRITE TABLE druid_partitioned_table SELECT cast (`ctimestamp1` as timestamp with local time zone) as `__time`, cstring1, @@ -566,7 +566,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM d POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_partitioned_table POSTHOOK: Output: hdfs://### HDFS PATH ### -1408069801800 7115092987 10992545287 1232243564 +1408069801800 4584782821 10992545287 -1808876374 PREHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM druid_partitioned_table_0 PREHOOK: type: QUERY PREHOOK: Input: default@druid_partitioned_table_0 @@ -659,7 +659,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM d POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_partitioned_table POSTHOOK: Output: hdfs://### HDFS PATH ### -1408069801800 7115092987 10992545287 1232243564 +1408069801800 4584782821 10992545287 -1808876374 PREHOOK: query: EXPLAIN SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM druid_max_size_partition PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM druid_max_size_partition @@ -758,7 +758,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint), sum(cbigint), max(cint) FROM d POSTHOOK: type: QUERY POSTHOOK: Input: default@druid_partitioned_table POSTHOOK: Output: hdfs://### HDFS PATH ### -1408069801800 7115092987 10992545287 1232243564 +1408069801800 4584782821 10992545287 -1808876374 PREHOOK: query: DROP TABLE druid_partitioned_table_0 PREHOOK: type: DROPTABLE PREHOOK: Input: default@druid_partitioned_table_0 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18996) SubString Druid convertor assuming that index is always constant literal value
slim bouguerra created HIVE-18996: - Summary: SubString Druid convertor assuming that index is always constant literal value Key: HIVE-18996 URL: https://issues.apache.org/jira/browse/HIVE-18996 Project: Hive Issue Type: Bug Reporter: slim bouguerra Query like the following {code} SELECT substring(namespace, CAST(deleted AS INT), 4) FROM druid_table_1; {code} will fail with {code} java.lang.AssertionError: not a literal: $13 at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:963) at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:955) at org.apache.calcite.rex.RexLiteral.intValue(RexLiteral.java:938) at org.apache.calcite.adapter.druid.SubstringOperatorConversion.toDruidExpression(SubstringOperatorConversion.java:46) at org.apache.calcite.adapter.druid.DruidExpressions.toDruidExpression(DruidExpressions.java:120) at org.apache.calcite.adapter.druid.DruidQuery.computeProjectAsScan(DruidQuery.java:746) at org.apache.calcite.adapter.druid.DruidRules$DruidProjectRule.onMatch(DruidRules.java:308) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317) {code} because is assuming that index is always a constant literal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18993) Use Druid Expressions
slim bouguerra created HIVE-18993: - Summary: Use Druid Expressions Key: HIVE-18993 URL: https://issues.apache.org/jira/browse/HIVE-18993 Project: Hive Issue Type: Task Reporter: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18959) Avoid creating extra pool of threads within LLAP
slim bouguerra created HIVE-18959: - Summary: Avoid creating extra pool of threads within LLAP Key: HIVE-18959 URL: https://issues.apache.org/jira/browse/HIVE-18959 Project: Hive Issue Type: Task Components: Druid integration Environment: Kerberos Cluster Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 The current Druid-Kerberos-Http client is using an external single threaded pool to handle retry auth calls (eg when a cookie expire or other transient auth issues). First, this is not buying us anything since all the Druid Task is executed as one synchronous task. Second, this can cause a major issue if an exception occurs that leads to shutting down the LLAP main thread. Thus to fix this we should avoid using an external thread pool and handle retrying in a synchronous way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18780) Improve schema discovery For Druid Storage Handler
slim bouguerra created HIVE-18780: - Summary: Improve schema discovery For Druid Storage Handler Key: HIVE-18780 URL: https://issues.apache.org/jira/browse/HIVE-18780 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Assignee: slim bouguerra Currently, Druid Storage adapter issues a Segment metadata Query every time the query is of type Select or Scan. Not only that but then every input split (map) will do the same as well since it is using the same Serde, this is very expensive and put a lot of pressure on the Druid Cluster. The way to fix this is to add the schema out of the calcite plan instead of serializing the query itself as part of the Hive query context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18732) Push order/limit to Druid historical when approximate results are allowed
slim bouguerra created HIVE-18732: - Summary: Push order/limit to Druid historical when approximate results are allowed Key: HIVE-18732 URL: https://issues.apache.org/jira/browse/HIVE-18732 Project: Hive Issue Type: Improvement Reporter: slim bouguerra Druid 0.11 allow force push down of Order by Limit to historicals using a context Query Flag \{code} forcePushDownLimit\{code}. [http://druid.io/docs/latest/querying/groupbyquery.html] As per the docs [http://druid.io/docs/latest/querying/groupbyquery.html|http://druid.io/docs/latest/querying/groupbyquery.html.], this is a great optimization that can be used if the approximate results are allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18731) Add Documentations about this feature.
slim bouguerra created HIVE-18731: - Summary: Add Documentations about this feature. Key: HIVE-18731 URL: https://issues.apache.org/jira/browse/HIVE-18731 Project: Hive Issue Type: Sub-task Components: Druid integration Reporter: slim bouguerra need to add basic docs about new table properties and what it means in practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18730) Use LLAP as execution engine for Druid mini Cluster Tests
slim bouguerra created HIVE-18730: - Summary: Use LLAP as execution engine for Druid mini Cluster Tests Key: HIVE-18730 URL: https://issues.apache.org/jira/browse/HIVE-18730 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: slim bouguerra Assignee: slim bouguerra Fix For: 3.0.0 Currently, we are using local MR to run Mini Cluster tests. It will be better to use LLAP cluster or TEZ. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18729) Druid Time column type
slim bouguerra created HIVE-18729: - Summary: Druid Time column type Key: HIVE-18729 URL: https://issues.apache.org/jira/browse/HIVE-18729 Project: Hive Issue Type: Task Components: Druid integration Reporter: slim bouguerra Assignee: Jesus Camacho Rodriguez I have talked Offline with [~jcamachorodriguez] about this and agreed that the best way to go is to support both cases where Druid time column can be Timestamp or Timestamp with local time zone. In fact, for the Hive-Druid internal table, this makes perfect sense since we have Hive metadata about the time column during the CTAS statement then we can handle both cases as we do for another type of storage eg ORC. For the Druid external tables, we can have a default type and allow the user to override that via table properties. CC [~ashutoshc] and [~nishantbangarwa]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone
slim bouguerra created HIVE-18595: - Summary: UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone Key: HIVE-18595 URL: https://issues.apache.org/jira/browse/HIVE-18595 Project: Hive Issue Type: Bug Reporter: slim bouguerra {code} 2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only string/date/timestamp types org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only string/date/timestamp types at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at
[jira] [Created] (HIVE-18594) DATEDIFF UDF fails when type is timestamp with Local timezone.
slim bouguerra created HIVE-18594: - Summary: DATEDIFF UDF fails when type is timestamp with Local timezone. Key: HIVE-18594 URL: https://issues.apache.org/jira/browse/HIVE-18594 Project: Hive Issue Type: Bug Components: Hive Reporter: slim bouguerra {code} 2018-01-31T12:45:08,488 ERROR [9b5c5020-b1f5-4703-8c2e-bac4aa01a578 main] ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:88 Wrong arguments ''2004-07-04'': DATEDIFF() o nly takes STRING/TIMESTAMP/DATEWRITABLE types as 1-th argument, got TIMESTAMPLOCALTZ org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:88 Wrong arguments ''2004-07-04'': DATEDIFF() only takes STRING/TIMESTAMP/DATEWRITABLE types as 1-th argument, got TIMESTA MPLOCALTZ at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11802) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4005) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4336) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at