[jira] [Created] (HIVE-25646) Thrift metastore URI reverse resolution could fail in some environments
Prasanth Jayachandran created HIVE-25646: Summary: Thrift metastore URI reverse resolution could fail in some environments Key: HIVE-25646 URL: https://issues.apache.org/jira/browse/HIVE-25646 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.1.2, 4.0.0 Reporter: Prasanth Jayachandran When custom URI resolver is not specified, the default thrift metastore URI goes through DNS reverse resolution (getCanonicalHostname) which is unlikely to resolve correctly when the HMS is sitting behind LBs and proxies. This is a change in behaviour from hive 2.x branch which isn't required. If reverse resolution is required, custom URI resolver can be implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24866) FileNotFoundException during alter table concat
Prasanth Jayachandran created HIVE-24866: Summary: FileNotFoundException during alter table concat Key: HIVE-24866 URL: https://issues.apache.org/jira/browse/HIVE-24866 Project: Hive Issue Type: Bug Affects Versions: 2.4.0, 3.2.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Because of the way combinefile IF groups files based on node and rack locality, there are cases where single big orc file gets spread across 2 or more combine hive split. When first task completes, as part of jobCloseOp the source orc file of concatenation is moved/renamed which can lead to FileNotFoundException in subsequent mappers that has partial split of that file. A simple fix would be for the mapper with start of the split to own the entire orc file for concatenation. If a mapper gets partial split which is not the start then it can skip the entire file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
Prasanth Jayachandran created HIVE-24786: Summary: JDBC HttpClient should retry for idempotent and unsent http methods Key: HIVE-24786 URL: https://issues.apache.org/jira/browse/HIVE-24786 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When hiveserver2 is behind multiple proxies there is possibility of "broken pipe", "connect timeout" and "read timeout" exceptions if one of the intermediate proxies or load balancers decided to reset the underlying tcp socket after idle timeout. When the connection is broken and when the a query is submitted after idle timeout from beeline (or client) perspective the connection is open but http methods (POST/GET) fails with socket related exceptions. Since these methods are not sent to the server these are safe for client side retries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI
Prasanth Jayachandran created HIVE-24514: Summary: UpdateMDatabaseURI does not update managed location URI Key: HIVE-24514 URL: https://issues.apache.org/jira/browse/HIVE-24514 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When FS Root is updated using metatool, if the DB has managed location defined, updateMDatabaseURI API should update the managed location as well. Currently it only updates location uri. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
Prasanth Jayachandran created HIVE-24501: Summary: UpdateInputAccessTimeHook should not update stats Key: HIVE-24501 URL: https://issues.apache.org/jira/browse/HIVE-24501 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran UpdateInputAccessTimeHook can fail for transactional tables with following exception. The hook should skip updating the stats and only update the access time. {code:java} ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. Cannot change stats state for a transactional table default.test without providing the transactional write state for verification (new write ID 0, valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state null)ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. Cannot change stats state for a transactional table default.test without providing the transactional write state for verification (new write ID 0, valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. Cannot change stats state for a transactional table default.test without providing the transactional write state for verification (new write ID 0, valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: MetaException(message:Cannot change stats state for a transactional table default.test without providing the transactional write state for verification (new write ID 0, valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state null) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_req(ThriftHiveMetastore.java:2584) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_req(ThriftHiveMetastore.java:2571) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:487) at org.apache.hadoop.hive.ql.metadata.SessionHiv
[jira] [Created] (HIVE-24142) Provide config to skip umask validation during scratch dir creation
Prasanth Jayachandran created HIVE-24142: Summary: Provide config to skip umask validation during scratch dir creation Key: HIVE-24142 URL: https://issues.apache.org/jira/browse/HIVE-24142 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran When HS2 sessions create scratch dirs, it performs umask validation and it is mainly specific to HDFS. There are environments where scratch dirs can be in a different fs with paths writable but with different umask. It will be good to have a config to skip umask validation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well
Prasanth Jayachandran created HIVE-24068: Summary: ReExecutionOverlayPlugin can handle DAG submission failures as well Key: HIVE-24068 URL: https://issues.apache.org/jira/browse/HIVE-24068 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
Prasanth Jayachandran created HIVE-23582: Summary: LLAP: Make SplitLocationProvider impl pluggable Key: HIVE-23582 URL: https://issues.apache.org/jira/browse/HIVE-23582 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran LLAP uses HostAffinitySplitLocationProvider implementation by default. For non zookeeper based environments, a different split location provider may be used. To facilitate that make the SplitLocationProvider implementation class a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
Prasanth Jayachandran created HIVE-23477: Summary: [LLAP] mmap allocation interruptions fails to notify other threads Key: HIVE-23477 URL: https://issues.apache.org/jira/browse/HIVE-23477 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran BuddyAllocator always uses lazy allocation is mmap is enabled. If query fragment is interrupted at the time of arena allocation ClosedByInterruptionException is thrown. This exception artificially triggers allocator OutOfMemoryError and fails to notify other threads waiting to allocate arenas. {code:java} 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed trying to allocate memory mapped arena java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38
[jira] [Created] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well
Prasanth Jayachandran created HIVE-23476: Summary: [LLAP] Preallocate arenas for mmap case as well Key: HIVE-23476 URL: https://issues.apache.org/jira/browse/HIVE-23476 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. Since we are not filling up the mmap'ed buffers the upfront allocations in constructor is cheap. This can avoid lock free allocation of arenas later in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23472) LLAP Guaranteed state update should trigger queue re-ordering
Prasanth Jayachandran created HIVE-23472: Summary: LLAP Guaranteed state update should trigger queue re-ordering Key: HIVE-23472 URL: https://issues.apache.org/jira/browse/HIVE-23472 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran This is follow up to HIVE-23443 to handle the guaranteed state update case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23466) ZK registry base should remove only specific instance instead of host
Prasanth Jayachandran created HIVE-23466: Summary: ZK registry base should remove only specific instance instead of host Key: HIVE-23466 URL: https://issues.apache.org/jira/browse/HIVE-23466 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When ZKRegistryBase detects new ZK nodes it maintains path based cache and host based cache. The host based cached already handles multiple instances running in same host. But even if single instance is removed all instances belonging to the host are removed. Another issue is that, if single host has multiple instances it returns a Set with no ordering. Ideally, we want the newest instance to be top of the set (use TreeSet maybe?). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
Prasanth Jayachandran created HIVE-23443: Summary: LLAP speculative task pre-emption seems to be not working Key: HIVE-23443 URL: https://issues.apache.org/jira/browse/HIVE-23443 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran I think after HIVE-23210 we are getting a stable sort order in pre-emption queue and it is causing pre-emption to not work in certain cases. {code:java} "attempt_1589167813851__119_01_08_0 (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 2020-05-11 05:59:22, in preemption queue, can finish)", "attempt_1589167813851_0008_84_01_08_1 (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} Scheduler only peek's at the pre-emption queue and looks at whether it is non-finishable. [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] In the above case, all tasks are speculative but state change is not triggering pre-emption queue re-ordering so peek() always returns canFinish task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23441) Support foreground option for running llap scripts
Prasanth Jayachandran created HIVE-23441: Summary: Support foreground option for running llap scripts Key: HIVE-23441 URL: https://issues.apache.org/jira/browse/HIVE-23441 Project: Hive Issue Type: Bug Components: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Llap scripts are always running in background. To make it container friendly, support foreground execution of the script as an option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23118) Option for exposing compile time counters as tez counters
Prasanth Jayachandran created HIVE-23118: Summary: Option for exposing compile time counters as tez counters Key: HIVE-23118 URL: https://issues.apache.org/jira/browse/HIVE-23118 Project: Hive Issue Type: Improvement Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran TezCounters currently are runtime only. Some compile time information from optimizer can be exposed as counters which can then be used by workload management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22994) Add total file size to explain
Prasanth Jayachandran created HIVE-22994: Summary: Add total file size to explain Key: HIVE-22994 URL: https://issues.apache.org/jira/browse/HIVE-22994 Project: Hive Issue Type: Improvement Reporter: Prasanth Jayachandran HIVE-22979 added total file size to Statistics object for table scan operator. It will be very useful for debugging just from the explain output to know the actual on-disk file size (instead of getting describe formatted output). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22988) LLAP: If consistent splits is disabled ordering instances is not required
Prasanth Jayachandran created HIVE-22988: Summary: LLAP: If consistent splits is disabled ordering instances is not required Key: HIVE-22988 URL: https://issues.apache.org/jira/browse/HIVE-22988 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran LlapTaskSchedulerService always gets consistent ordered list of all LLAP instances even if consistent splits is disabled. When consistent split is disabled ordering isn't really useful as there is no cache locality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22979) Support total file size in statistics annotation
Prasanth Jayachandran created HIVE-22979: Summary: Support total file size in statistics annotation Key: HIVE-22979 URL: https://issues.apache.org/jira/browse/HIVE-22979 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Hive statistics annotation provide estimated Statistics for each operator. The data size provided in TableScanOperator is raw data size (after decompression and decoding), but there are some optimizations that can be performed based on total file size on disk (scan cost estimation). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22922) LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s
Prasanth Jayachandran created HIVE-22922: Summary: LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s Key: HIVE-22922 URL: https://issues.apache.org/jira/browse/HIVE-22922 Project: Hive Issue Type: Bug Reporter: Nita Dembla Assignee: Prasanth Jayachandran Executor logs shows "Invalid map id: TTP/1.1 500 Internal Server Error". This happens when executor pod restarts with same hostname and port, but missing shuffle data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22908) AM caching connections to LLAP based on hostname and port does not work in kubernetes
Prasanth Jayachandran created HIVE-22908: Summary: AM caching connections to LLAP based on hostname and port does not work in kubernetes Key: HIVE-22908 URL: https://issues.apache.org/jira/browse/HIVE-22908 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran AM is caching all connections to LLAP services using combination of hostname and port which does not work in kubernetes environment where hostname of pod and port can be same with statefulset. This causes AM to talk to old LLAP which could have died or OOM/Pod kill etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22859) Tez external sessions are leaking
Prasanth Jayachandran created HIVE-22859: Summary: Tez external sessions are leaking Key: HIVE-22859 URL: https://issues.apache.org/jira/browse/HIVE-22859 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When Tez external/unmanaged sessions are used, TezSessionPoolManager "opens" the session using ApplicationId which is essentially connecting to existing external/unmanaged session. But when the session is returned it is not closed (close for external session is essentially releasing it from sessions list and not really killing the session). If a session is not closed, it is never removed from openSessions linked list that HS2 maintains hence leaking the session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21970) Avoid using RegistryUtils.currentUser()
Prasanth Jayachandran created HIVE-21970: Summary: Avoid using RegistryUtils.currentUser() Key: HIVE-21970 URL: https://issues.apache.org/jira/browse/HIVE-21970 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran RegistryUtils.currentUser() does replacement of '_' with '-' for DNS reasons. This is used inconsistently in some places causing issues wrt. ZK (deletion token secret manager, llap cluster membership for external clients). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21925) HiveConnection retries should support backoff
Prasanth Jayachandran created HIVE-21925: Summary: HiveConnection retries should support backoff Key: HIVE-21925 URL: https://issues.apache.org/jira/browse/HIVE-21925 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Hive JDBC connection supports retries. In http mode, retries always seem to happen immediately without any backoff. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21924) Split text files if only header/footer is present
Prasanth Jayachandran created HIVE-21924: Summary: Split text files if only header/footer is present Key: HIVE-21924 URL: https://issues.apache.org/jira/browse/HIVE-21924 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 2.4.0, 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503 this piece of code makes the CSV (or any text files with header/footer) files not splittable if header or footer is present. If only header is present, we can find the offset after first line break and use that to split. Similarly for footer, may be read few KB's of data at the end and find the last line break offset. Use that to determine the data range which can be used for splitting. Few reads during split generation are cheaper than not splitting the file at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21913) GenericUDTFGetSplits should handle usernames in the same way as LLAP
Prasanth Jayachandran created HIVE-21913: Summary: GenericUDTFGetSplits should handle usernames in the same way as LLAP Key: HIVE-21913 URL: https://issues.apache.org/jira/browse/HIVE-21913 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran LLAP ZK registry namespacing includes current user name which is typically hive. But in some deployments, usernames are created with '_' like hive_dev. RegistryUtils.currentUser() (that LLAP uses) replaces '_' with '-' for DNS reasons. But GenericUDTFSplits uses UGI login user which does not do the underscore replacement. As a result, LlapBaseInputFormat is finding any LLAP daemons even though they are running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21892) Trusted domain authentication should look at X-Forwarded-For header as well
Prasanth Jayachandran created HIVE-21892: Summary: Trusted domain authentication should look at X-Forwarded-For header as well Key: HIVE-21892 URL: https://issues.apache.org/jira/browse/HIVE-21892 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HIVE-21783 added trusted domain authentication. However, it looks only at request.getRemoteAddr() which works in most cases where there are no intermediate forward/reverse proxies. In trusted domain scenarios, if there intermediate proxies, the proxies typically append its own ip address "X-Forwarded-For" header. The X-Forwarded-For will look like clientIp -> proxyIp1 -> proxyIp2. The left most ip address in the X-Forwarded-For represents the real client ip address. For such scenarios, add a config to optionally look at X-Forwarded-For header when available to determine the real client ip. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21825) Improve client error msg when Active/Passive HA is enabled
Prasanth Jayachandran created HIVE-21825: Summary: Improve client error msg when Active/Passive HA is enabled Key: HIVE-21825 URL: https://issues.apache.org/jira/browse/HIVE-21825 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran When Active/Passive HA is enabled and when client tries to connect to Passive HA or when HS2 is still starting up, clients will receive the following the error msg {code:java} 'Cannot open sessions on an inactive HS2 instance; use service discovery to connect'{code} This error msg can be improved to say that HS2 is still starting up (or more user-friendly error msg). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21624) LLAP: Cpu metrics and gc time metrics at thread level is broken
Prasanth Jayachandran created HIVE-21624: Summary: LLAP: Cpu metrics and gc time metrics at thread level is broken Key: HIVE-21624 URL: https://issues.apache.org/jira/browse/HIVE-21624 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0, 3.2.0 Reporter: Nita Dembla Assignee: Prasanth Jayachandran ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu metrics when available. At some point, the thread name which the metrics publisher looks for has changed causing no metrics to be published for these counters. The above counters looks for thread with name starting with "ContainerExecutor" but the llap task executor thread got changed to "Task-Executor" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21597) WM trigger validation should happen at the time of create or alter
Prasanth Jayachandran created HIVE-21597: Summary: WM trigger validation should happen at the time of create or alter Key: HIVE-21597 URL: https://issues.apache.org/jira/browse/HIVE-21597 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When a query guardrail trigger is created the trigger expression is not validated immediately upon creation or altering the trigger. Instead, it gets validated at the start of HS2 which could fail resource plans from being applied correctly. The trigger expression validation should happen in DDLTask. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21591) Using triggers in non-LLAP mode should not require wm queue
Prasanth Jayachandran created HIVE-21591: Summary: Using triggers in non-LLAP mode should not require wm queue Key: HIVE-21591 URL: https://issues.apache.org/jira/browse/HIVE-21591 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Resource plan triggers are supported in non-LLAP (tez container) mode. But fetching of resource plan happens only when hive.server2.tez.interactive.queue is set. For tez container mode, only triggers are applicable, so this queue dependency can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21582) Prefix msck configs with metastore
Prasanth Jayachandran created HIVE-21582: Summary: Prefix msck configs with metastore Key: HIVE-21582 URL: https://issues.apache.org/jira/browse/HIVE-21582 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HIVE-20707 moved msck configs to metastore but the configs are not prefixed with "metastore". Will be good to prefix it with "metastore" for consistency with other configs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21527) LLAP: Table property to skip cache
Prasanth Jayachandran created HIVE-21527: Summary: LLAP: Table property to skip cache Key: HIVE-21527 URL: https://issues.apache.org/jira/browse/HIVE-21527 Project: Hive Issue Type: Improvement Components: llap Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Similar to HIVE-21305, there can be text tables with big string columns that are not cache friendly (often pollutes the cache). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21497) Direct SQL exception thrown by PartitionManagementTask
Prasanth Jayachandran created HIVE-21497: Summary: Direct SQL exception thrown by PartitionManagementTask Key: HIVE-21497 URL: https://issues.apache.org/jira/browse/HIVE-21497 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Metastore runs background thread out of which one is partition discovery. While removing expired partitions following exception is thrown {code:java} 2019-03-24 04:24:59.583 WARN [PartitionDiscoveryTask-0] metastore.MetaStoreDirectSql: Failed to execute [select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 inner join "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where "DBS"."CTLG_NAME" = ? and ( ( (((case when "FILTER0"."PART_KEY_VAL" <> ? and "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 then cast("FILTER0"."PART_KEY_VAL" as date) else null end) = ?) and ("FILTER1"."PART_KEY_VAL" = ?)) and ("FILTER2"."PART_KEY_VAL" = ?)) )] with parameters [logs, sys, hive, __HIVE_DEFAULT_PARTITION__, logs, sys, hive, 2019-03-23, warehouse-1553300821-692w, metastore-db-create-job] javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 inner join "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where "DBS"."CTLG_NAME" = ? and ( ( (((case when "FILTER0"."PART_KEY_VAL" <> ? and "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 then cast("FILTER0"."PART_KEY_VAL" as date) else null end) = ?) and ("FILTER1"."PART_KEY_VAL" = ?)) and ("FILTER2"."PART_KEY_VAL" = ?)) )". at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2042) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionIdsViaSqlFilter(MetaStoreDirectSql.java:621) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:487) at org.apache.hadoop.hive.metastore.ObjectStore$9.getSqlResult(ObjectStore.java:3426) at org.apache.hadoop.hive.metastore.ObjectStore$9.getSqlResult(ObjectStore.java:3418) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3702) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3453) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:3406) at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy33.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:4521) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy34.drop_partitions_req(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropPartitions(HiveMetaStoreClient.java:1288) at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:474) at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:435) at
[jira] [Created] (HIVE-21496) Automatic sizing of unordered buffer can overflow
Prasanth Jayachandran created HIVE-21496: Summary: Automatic sizing of unordered buffer can overflow Key: HIVE-21496 URL: https://issues.apache.org/jira/browse/HIVE-21496 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Attachments: hive.log HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer based on group by statistics. However, some corner cases for group by statistics sets Long.MAX for data size. This ends up setting Integer.MAX for unordered KV buffer size. This buffer size is expected to be in MB. Converting Integer.MAX value for MB to bytes will overflow and following exception is thrown. {code:java} 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}] HistoryEventHandler.criticalEvents: [HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_00_0, creationTime=1553330117468, allocationTime=1553330117524, startTime=1553330117562, finishTime=1553330117755, timeTaken=193, status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, diagnostics=Error: Error while running task ( failure ) : attempt_1553330105749_0001_1_00_00_0:java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) at org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177) at org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110) at org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214) at org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745){code} Stats for GBY operator is getting Long.MAX_VALUE as seen below {code:java} 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: [0] STATS-TS[0] (logs): numRows: 1795 dataSize: 4443078 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true} 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: Estimating row count for GenericUDFOPEqual(Column[severity], Const string ERROR) Original num rows: 1795 New num rows: 5 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: [1] STATS-FIL[8]: numRows: 5 dataSize: 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true} 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.FilterOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats: PARTIAL Column stats: NONE) on: FIL[8] 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.SelectOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats: PARTIAL Column stats: NONE) on: SEL[2] 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: [1] STATS-SEL[2]: numRows: 5 dataSize: 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true} 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: STATS-GBY[3]: inputSize: 4443078 maxSplitSize: 25600 parallelism: 1 containsGroupingSet: false sizeOfGroupingSet: 1 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory: [Case
[jira] [Created] (HIVE-21495) Calcite assertion error when UDF returns null
Prasanth Jayachandran created HIVE-21495: Summary: Calcite assertion error when UDF returns null Key: HIVE-21495 URL: https://issues.apache.org/jira/browse/HIVE-21495 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Jesus Camacho Rodriguez Calcite throws the following error when UDFs return null. "current_authorizer()" for example can return null if authorizer is disabled. {code:java} org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.AssertionError: Cannot add expression of different type to set: set type is RecordType(CHAR(7) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f0, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f1, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f3, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f4, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f5, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f6, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f7, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f8, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f9, CHAR(2) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f10, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f11) NOT NULL expression type is RecordType(CHAR(7) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f0, CHAR(7) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f1, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f3, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f4, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f5, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f6, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f7, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f8, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f9, CHAR(2) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f10, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f11) NOT NULL set is rel#784:HiveAggregate.HIVE.[](input=HepRelVertex#783,group={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}) expression is HiveProject#829 at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:210) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.AssertionError: Cannot add expression of different type to set: set type is RecordType(CHAR(7) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" NOT NULL $f0, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f1, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f3, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f4, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f5, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f6, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f7, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f8, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f9, CHAR(2) CHARACTER SET "UTF-16LE" COL
[jira] [Created] (HIVE-21482) Partition discovery table property is added to non-partitioned external tables
Prasanth Jayachandran created HIVE-21482: Summary: Partition discovery table property is added to non-partitioned external tables Key: HIVE-21482 URL: https://issues.apache.org/jira/browse/HIVE-21482 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Automatic partition discovery is added to external tables by default. But it doesn't check if the external table is partitioned or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21457) Perf optimizations in split-generation
Prasanth Jayachandran created HIVE-21457: Summary: Perf optimizations in split-generation Key: HIVE-21457 URL: https://issues.apache.org/jira/browse/HIVE-21457 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Minor split generation optimizations * Reuse vectorization checks * Reuse isAcid checks * Reuse filesystem objects * Improved logging (log at top-level instead of inside the thread pool) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21415) Parallel build is failing, trying to download incorrect hadoop-hdfs-client version
Prasanth Jayachandran created HIVE-21415: Summary: Parallel build is failing, trying to download incorrect hadoop-hdfs-client version Key: HIVE-21415 URL: https://issues.apache.org/jira/browse/HIVE-21415 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Running the following build command {code:java} mvn clean install -Pdist -DskipTests -Dpackaging.minimizeJar=false -T 1C -DskipShade -Dremoteresources.skip=true -Dmaven.javadoc.skip=true{code} fails with the following exception for 3 modules (hplql, kryo-registrator, packaging) {code:java} [ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:4.0.0-SNAPSHOT: Failure to find org.apache.hadoop:hadoop-hdfs-client:jar:2.7.3 in http://www.datanucleus.org/downloads/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of datanucleus has elapsed or updates are forced -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hive-packaging{code} It is trying to download 2.7.3 version but hadoop.version refers to 3.1.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21391) LLAP: Pool of column vector buffers can cause memory pressure
Prasanth Jayachandran created HIVE-21391: Summary: LLAP: Pool of column vector buffers can cause memory pressure Key: HIVE-21391 URL: https://issues.apache.org/jira/browse/HIVE-21391 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Where there are too many columns (in the order of 100s), with decimal, string types the column vector pool of buffers created here [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/EncodedDataConsumer.java#L59] can cause memory pressure. Example: 128 (poolSize) * 300 (numCols) * 1024 (batchSize) * 80 (decimalSize) ~= 3GB The pool size keeps increasing when there is slow consumer but fast llap io (SSDs) leading to GC pressure when all LLAP io threads read splits from same table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21390) BI split strategy does work for blob stores
Prasanth Jayachandran created HIVE-21390: Summary: BI split strategy does work for blob stores Key: HIVE-21390 URL: https://issues.apache.org/jira/browse/HIVE-21390 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran BI split strategy cuts the split at block boundaries however there are no block boundaries in blob storage so we end up with 1 split for BI split strategy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21373) Expose query result cache info as a table
Prasanth Jayachandran created HIVE-21373: Summary: Expose query result cache info as a table Key: HIVE-21373 URL: https://issues.apache.org/jira/browse/HIVE-21373 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran To be able to look into the metadata of query result cache, like size, cache hit/miss, location etc. using query like {code:java} select * from query_cache();{code} will be good to expose these as a queryable table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21369) LLAP: Logging is expensive in encoded reader path
Prasanth Jayachandran created HIVE-21369: Summary: LLAP: Logging is expensive in encoded reader path Key: HIVE-21369 URL: https://issues.apache.org/jira/browse/HIVE-21369 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Nita Dembla There should be no INFO logging in EncodedReaderImpl. Stringifying of disk ranges is expensive in core read path. {code:java} 2019-03-01T17:55:56.322852142Z 2019-03-01T17:55:56,306 INFO [IO-Elevator-Thread-3 (hive_20190301175546_a279f33c-4f2b-4cd5-8695-57bc8b042a61)] encoded.EncodedReaderImpl: Disk ranges after cache (found everything true; file [-3693547618692831801, 1551190876000, 1047660824], base offset 792920167): [{start: 887940 end: 1003508 cache buffer: 0x5165f83d(1)}, {start: 1003508 end: 1119078 cache buffer: 0xb63cac3(1)}, {start: 1119078 end: 1234745 cache buffer: 0x41a724fa(1)}, {start: 1234745 end: 1350261 cache buffer: 0x2f71bc38(1)}, {start: 1350261 end: 1465752 cache buffer: 0x2c38e1bb(1)}, {start: 1465752 end: 1581231 cache buffer: 0x5827982(1)}, {start: 1581231 end: 1696885 cache buffer: 0x75a6773c(1)}, {start: 1696885 end: 1812492 cache buffer: 0x2ed060f9(1)},{start: 1812492 end: 1928086 cache buffer: 0x20b2c8aa(1)}, {start: 1928086 end: 2043588 cache buffer: 0x6559aacb(1)}, {start: 2043588 end: 2159089 cache buffer: 0x569c85e1(1)}, {start: 2159089 end: 2274725 cache buffer: 0x25a88dd0(1)}, {start: 2274725 end: 2390228 cache buffer: 0x738b7e87(1)}, {start: 2390228 end: 2505715 cache buffer: 0x26edafa0(1)}, {start: 2505715 end: 2621322 cache buffer: 0x69db7752(1)}, {start: 2621322 end: 2736844 cache b{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21305) LLAP: Option to skip cache for ETL queries
Prasanth Jayachandran created HIVE-21305: Summary: LLAP: Option to skip cache for ETL queries Key: HIVE-21305 URL: https://issues.apache.org/jira/browse/HIVE-21305 Project: Hive Issue Type: Improvement Components: llap Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran To avoid ETL queries from polluting the cache, would be good to detect such queries at compile time and optional skip llap io for such queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21254) Pre-upgrade tool should handle exceptions and skip db/tables
Prasanth Jayachandran created HIVE-21254: Summary: Pre-upgrade tool should handle exceptions and skip db/tables Key: HIVE-21254 URL: https://issues.apache.org/jira/browse/HIVE-21254 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When exceptions like AccessControlException is thrown, pre-upgrade tool fails. If hive user does not have read access to database or tables (some external tables denies read access to hive), pre-upgrade tool should just assume they are external tables and move on without failing pre-upgrade process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21244) NPE in Hive Proto Logger
Prasanth Jayachandran created HIVE-21244: Summary: NPE in Hive Proto Logger Key: HIVE-21244 URL: https://issues.apache.org/jira/browse/HIVE-21244 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran [https://github.com/apache/hive/blob/4ddc9de90b6de032d77709c9631ab787cef225d5/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L308] can cause NPE. There is no uncaught exception handler for this thread. This NPE can silently fail and drop the event. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21235) LLAP: make the name of log4j2 properties file configurable
Prasanth Jayachandran created HIVE-21235: Summary: LLAP: make the name of log4j2 properties file configurable Key: HIVE-21235 URL: https://issues.apache.org/jira/browse/HIVE-21235 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran For llap daemon, the name of llap-daemon-log4j2.properties is fixed. If a conf dir and jar contain the same filename, it will mess up log4j2 initialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21223) CachedStore returns null partition when partition does not exist
Prasanth Jayachandran created HIVE-21223: Summary: CachedStore returns null partition when partition does not exist Key: HIVE-21223 URL: https://issues.apache.org/jira/browse/HIVE-21223 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran CachedStore can return null partition for getPartitionWithAuth() when partition does not exist. null value serialization in thrift will break the connection. Instead if partition does not exist it should throw NoSuchObjectException. Clients will see this exception {code:java} org.apache.thrift.TApplicationException: get_partition_with_auth failed: unknown result at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition_with_auth(ThriftHiveMetastore.java:3017) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition_with_auth(ThriftHiveMetastore.java:2990) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1679) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1671) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at com.sun.proxy.$Proxy36.getPartitionWithAuthInfo(Unknown Source) ~[?:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2976) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at com.sun.proxy.$Proxy36.getPartitionWithAuthInfo(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.SynchronizedMetaStoreClient.getPartitionWithAuthInfo(SynchronizedMetaStoreClient.java:101) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2870) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2835) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1950) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.ql.metadata.Hive$4.call(Hive.java:2490) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at org.apache.hadoop.hive.ql.metadata.Hive$4.call(Hive.java:2481) ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys
Prasanth Jayachandran created HIVE-21222: Summary: ACID: When there are no delete deltas skip finding min max keys Key: HIVE-21222 URL: https://issues.apache.org/jira/browse/HIVE-21222 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys (which will read 16K footer) even for cases where delete deltas does not exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21212) LLAP: shuffle port config uses internal configuration
Prasanth Jayachandran created HIVE-21212: Summary: LLAP: shuffle port config uses internal configuration Key: HIVE-21212 URL: https://issues.apache.org/jira/browse/HIVE-21212 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran LlapDaemon main() reads daemon configuration but for shuffle port it reads internal config instead of hive.llap.daemon.yarn.shuffle.port [https://github.com/apache/hive/blob/c8eb03affa2533f4827cf6497e7c9873bc9520a7/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java#L535] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21103) PartitionManagementTask should not modify DN configs to avoid closing persistence manager
Prasanth Jayachandran created HIVE-21103: Summary: PartitionManagementTask should not modify DN configs to avoid closing persistence manager Key: HIVE-21103 URL: https://issues.apache.org/jira/browse/HIVE-21103 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HIVE-20707 added automatic partition management which uses thread pools to run parallel msck repair. It also modifies datanucleus connection pool size to avoid explosion of connections to backend database. But object store closes the persistence manager when it detects a change in datanuclues or jdo configs. So when PartitionManagementTask is running and when HS2 tries to connect to metastore HS2 will get persistence manager close exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20876) Use tez provided AM registry client for external sessions
Prasanth Jayachandran created HIVE-20876: Summary: Use tez provided AM registry client for external sessions Key: HIVE-20876 URL: https://issues.apache.org/jira/browse/HIVE-20876 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Continuation to HIVE-20547, replace hive side AM external sessions registry with the one provided by tez. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20841) LLAP: Make dynamic ports configurable
Prasanth Jayachandran created HIVE-20841: Summary: LLAP: Make dynamic ports configurable Key: HIVE-20841 URL: https://issues.apache.org/jira/browse/HIVE-20841 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 4.0.0, 3.2.0 Some ports in llap -> tez interaction code uses dynamic ports, provide an option to make them configurable to facilitate adding them to iptable rules in some environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20713) Use percentage for join conversion size thresholds
Prasanth Jayachandran created HIVE-20713: Summary: Use percentage for join conversion size thresholds Key: HIVE-20713 URL: https://issues.apache.org/jira/browse/HIVE-20713 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran There are many places in join conversion that rely on absolute byte sizes for join conversions (mapjoin, dynamic hashjoin etc.). When container sizes change, these join conversion thresholds have to be tuned accordingly according to the new container size. Instead, make the join conversions byte sizes a percentage/fraction of container size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20707) Automatic MSCK REPAIR for external tables
Prasanth Jayachandran created HIVE-20707: Summary: Automatic MSCK REPAIR for external tables Key: HIVE-20707 URL: https://issues.apache.org/jira/browse/HIVE-20707 Project: Hive Issue Type: New Feature Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran In current scenario, to add partitions for external tables to metastore, MSCK REPAIR command has to be executed manually. To avoid this manual step, external tables can be specified a table property based on which a background metastore thread can add/drop/sync partitions periodically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20656) Map aggregation memory configs are too aggressive
Prasanth Jayachandran created HIVE-20656: Summary: Map aggregation memory configs are too aggressive Key: HIVE-20656 URL: https://issues.apache.org/jira/browse/HIVE-20656 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran The defaults for the following configs seems to be too aggressive. In java this can easily lead to several full GC pauses whose memory cannot be reclaimed. {code:java} HIVEMAPAGGRHASHMEMORY("hive.map.aggr.hash.percentmemory", (float) 0.99, "Portion of total memory to be used by map-side group aggregation hash table"), HIVEMAPAGGRMEMORYTHRESHOLD("hive.map.aggr.hash.force.flush.memory.threshold", (float) 0.9, "The max memory to be used by map-side group aggregation hash table.\n" + "If the memory usage is higher than this number, force to flush data"),{code} We can be little bit conservative for these configs to avoid getting into GC pause. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20649) LLAP aware memory manager for Orc writers
Prasanth Jayachandran created HIVE-20649: Summary: LLAP aware memory manager for Orc writers Key: HIVE-20649 URL: https://issues.apache.org/jira/browse/HIVE-20649 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran ORC writer has its own memory manager that assumes memory usage or memory available based on JVM heap (MemoryMX bean). This works on tez container mode execution model but not in LLAP where container sizes (and Xmx) are typically high and there are multiple executors per LLAP daemon. This custom memory manager should be aware of memory bounds per executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20648) LLAP: Vector group by operator should use memory per executor
Prasanth Jayachandran created HIVE-20648: Summary: LLAP: Vector group by operator should use memory per executor Key: HIVE-20648 URL: https://issues.apache.org/jira/browse/HIVE-20648 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HIVE-15503 treatment has to be applied for vector group by operator as well. Vector group by currently uses MemoryMX bean to get heap usage and heap max memory which will not work for LLAP. Instead it should use memory per executor as upper bound to make flush decision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20621) GetOperationStatus called in resultset.next causing incremental slowness
Prasanth Jayachandran created HIVE-20621: Summary: GetOperationStatus called in resultset.next causing incremental slowness Key: HIVE-20621 URL: https://issues.apache.org/jira/browse/HIVE-20621 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 4.0.0, 3.2.0 Environment: Fetching result set for a result cache hit query gets slower as more rows are fetched. For fetching 10 row result set it took about 900ms but fetching 200 row result set took 8 seconds. Reason for this slowness is GetOperationStatus is invoked inside resultset.next() and it happens for every row even after operation has completed. This is one RPC call per row fetched. Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20583) Use canonical hostname only for kerberos auth in HiveConnection
Prasanth Jayachandran created HIVE-20583: Summary: Use canonical hostname only for kerberos auth in HiveConnection Key: HIVE-20583 URL: https://issues.apache.org/jira/browse/HIVE-20583 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20582) Make hflush in hive proto logging configurable
Prasanth Jayachandran created HIVE-20582: Summary: Make hflush in hive proto logging configurable Key: HIVE-20582 URL: https://issues.apache.org/jira/browse/HIVE-20582 Project: Hive Issue Type: New Feature Affects Versions: 4.0.0, 3.2.0 Environment: Hive proto logging does hflush to avoid small files issue in hdfs. This may not be ideal for blobstorage where hflush gets applied only on closing of the file. Make hflush configurable so that blobstorage can do close instead of hflush. Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20428) HiveStreamingConnection should use addPartition if not exists API
Prasanth Jayachandran created HIVE-20428: Summary: HiveStreamingConnection should use addPartition if not exists API Key: HIVE-20428 URL: https://issues.apache.org/jira/browse/HIVE-20428 Project: Hive Issue Type: New Feature Components: Streaming, Transactions Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran [https://github.com/apache/hive/blob/f280361374c6219d8734d5972c740d6d6c3fb7ef/streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java#L379-L381] catches AlreadyExistsException when adding partition. Instead use add_partitions API with ifNotExists set to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20249) LLAP IO: NPE during refCount decrement
Prasanth Jayachandran created HIVE-20249: Summary: LLAP IO: NPE during refCount decrement Key: HIVE-20249 URL: https://issues.apache.org/jira/browse/HIVE-20249 Project: Hive Issue Type: New Feature Components: llap Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran This was observed on one of the old build which was digesting the exception root cause. {code:java} Ignoring exception when closing input calls(cleanup). Exception class=java.lang.NullPointerException java.lang.NullPointerException: null at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.deallocate(BuddyAllocator.java:1355) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.llap.cache.BuddyAllocator.deallocate(BuddyAllocator.java:685) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.releaseInitialRefcounts(EncodedReaderImpl.java:676) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:543) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:404) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:260) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) ~[hadoop-common-3.0.0.3.0.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:260) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:109) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) ~[tez-common-0.9.2-SNAPSHOT.jar:0.9.2-SNAPSHOT] at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) ~[hive-llap-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20202) Add profiler endpoint to httpserver
Prasanth Jayachandran created HIVE-20202: Summary: Add profiler endpoint to httpserver Key: HIVE-20202 URL: https://issues.apache.org/jira/browse/HIVE-20202 Project: Hive Issue Type: New Feature Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Add a web endpoint for profiling based on async-profiler. This servlet should be added to httpserver so that HS2 and LLAP daemons can output flamegraphs when their /prof endpoint is hit. Since this will be based on [https://github.com/jvm-profiling-tools/async-profiler] heap allocation, lock contentions, HW counters etc. will also be supported in addition to cpu profiling. In most cases the profiling overhead is pretty low and is safe to run on production. More analysis on CPU and memory overhead here [https://github.com/jvm-profiling-tools/async-profiler/issues/14] and [https://github.com/jvm-profiling-tools/async-profiler/issues/131] For the impatient, here is the usage doc and the sample output [https://github.com/prasanthj/nightswatch/blob/master/README.md] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20165) Enable ZLIB for streaming ingest
Prasanth Jayachandran created HIVE-20165: Summary: Enable ZLIB for streaming ingest Key: HIVE-20165 URL: https://issues.apache.org/jira/browse/HIVE-20165 Project: Hive Issue Type: Bug Components: Streaming, Transactions Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Per [~gopalv]'s recommendation tried running streaming ingest with and without zlib. Following are the numbers *Compression: NONE* Total rows committed: 9380 Throughput: *156* rows/second [prasanth@cn105-10 culvert]$ hdfs dfs -du -s -h /apps/hive/warehouse/prasanth.db/culvert *14.1 G* /apps/hive/warehouse/prasanth.db/culvert *Compression: ZLIB* Total rows committed: 9210 Throughput: *1535000* rows/second [prasanth@cn105-10 culvert]$ hdfs dfs -du -s -h /apps/hive/warehouse/prasanth.db/culvert *7.4 G* /apps/hive/warehouse/prasanth.db/culvert ZLIB is getting us 2x compression and only 2% lesser throughput. We should enable ZLIB by default for streaming ingest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20147) Hive streaming ingest is contented on synchronized logging
Prasanth Jayachandran created HIVE-20147: Summary: Hive streaming ingest is contented on synchronized logging Key: HIVE-20147 URL: https://issues.apache.org/jira/browse/HIVE-20147 Project: Hive Issue Type: Bug Components: Streaming Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: Screen Shot 2018-07-11 at 4.17.27 PM.png In one of the observed profile, >30% time spent on synchronized logging. See attachment. We should use async logging for hive streaming ingest by default. !Screen Shot 2018-07-11 at 4.17.27 PM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20129) Revert to position based schema evolution for orc tables
Prasanth Jayachandran created HIVE-20129: Summary: Revert to position based schema evolution for orc tables Key: HIVE-20129 URL: https://issues.apache.org/jira/browse/HIVE-20129 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-20129.1.patch Hive has been doing positional based schema evolution. ORC-54 changed it to column name based schema evolution causing unexpected results. Queries returned results earlier are now returning no results. Change the default in hive to positional schema evolution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20126) VectorizedOrcInputFormat does not pass conf to orc reader options
Prasanth Jayachandran created HIVE-20126: Summary: VectorizedOrcInputFormat does not pass conf to orc reader options Key: HIVE-20126 URL: https://issues.apache.org/jira/browse/HIVE-20126 Project: Hive Issue Type: Bug Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran VectorizedOrcInputFormat creates Orc reader options without passing in the configuration object. Without it setting orc configurations will not have any impact. Example: set orc.force.positional.evolution=true; does not work for positional schema evolution (will attach test case). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20116) TezTask is using parent logger
Prasanth Jayachandran created HIVE-20116: Summary: TezTask is using parent logger Key: HIVE-20116 URL: https://issues.apache.org/jira/browse/HIVE-20116 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-20116.1.patch TezTask is using parent's logger (Task). It should instead use its own class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20075) Re-enable TestTriggersWorkloadManager disabled in HIVE-20074
Prasanth Jayachandran created HIVE-20075: Summary: Re-enable TestTriggersWorkloadManager disabled in HIVE-20074 Key: HIVE-20075 URL: https://issues.apache.org/jira/browse/HIVE-20075 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20074) Disable TestTriggersWorkloadManager as it is unstable again
Prasanth Jayachandran created HIVE-20074: Summary: Disable TestTriggersWorkloadManager as it is unstable again Key: HIVE-20074 URL: https://issues.apache.org/jira/browse/HIVE-20074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20059) Hive streaming should try shade prefix unconditionally on exception
Prasanth Jayachandran created HIVE-20059: Summary: Hive streaming should try shade prefix unconditionally on exception Key: HIVE-20059 URL: https://issues.apache.org/jira/browse/HIVE-20059 Project: Hive Issue Type: Bug Components: Streaming Affects Versions: 3.1.0, 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-20059.1.patch AbstractRecordWriter tries hive.classloader.shade.prefix on ClassNotFoundException but there are instances where OrcOutputFormat from old hive version gets loaded resulting in ClassCastException. I think we should try shadeprefix when defined and when any exception is thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20038) Update queries on non-bucketed + partitioned tables throws NPE
Prasanth Jayachandran created HIVE-20038: Summary: Update queries on non-bucketed + partitioned tables throws NPE Key: HIVE-20038 URL: https://issues.apache.org/jira/browse/HIVE-20038 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0, 3.2.0 Reporter: Kavan Suresh Assignee: Prasanth Jayachandran With HIVE-19890 delete deltas of non-bucketed tables are computed from ROW__ID. This can create holes in output paths in FSOp.commit() resulting in NPE. Following is the exception {code:java} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commitOneOutPath(FileSinkOperator.java:246) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:235) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$400(FileSinkOperator.java:168) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1325) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20028) Metastore client cache config is used incorrectly
Prasanth Jayachandran created HIVE-20028: Summary: Metastore client cache config is used incorrectly Key: HIVE-20028 URL: https://issues.apache.org/jira/browse/HIVE-20028 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Metastore client cache config is not used correctly. Enabling the cache actually disables it and vice versa. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20019) Remove commons-logging and move to slf4j
Prasanth Jayachandran created HIVE-20019: Summary: Remove commons-logging and move to slf4j Key: HIVE-20019 URL: https://issues.apache.org/jira/browse/HIVE-20019 Project: Hive Issue Type: Improvement Components: Logging Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Still seeing several references to commons-logging. We should move all classes to slf4j instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20004) Wrong scale used by ConvertDecimal64ToDecimal results in incorrect results
Prasanth Jayachandran created HIVE-20004: Summary: Wrong scale used by ConvertDecimal64ToDecimal results in incorrect results Key: HIVE-20004 URL: https://issues.apache.org/jira/browse/HIVE-20004 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 3.0.1, 4.0.0 Reporter: Prasanth Jayachandran ConvertDecimal64ToDecimal uses scale from output column vector which results in incorrect results. Input: decimal(8,1) Output: decimal(9,2) Input value: 963.8 gets converted to 96.38 which is wrong. The scale should not change this case (value should be 963.8 even after the conversion). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20000) woooohoo20000ooooooo
Prasanth Jayachandran created HIVE-2: Summary: whoo2ooo Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: New Feature Components: Hive Affects Versions: All Versions Reporter: Prasanth Jayachandran Fix For: All Versions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19980) GenericUDTFGetSplits fails when order by query returns 0 rows
Prasanth Jayachandran created HIVE-19980: Summary: GenericUDTFGetSplits fails when order by query returns 0 rows Key: HIVE-19980 URL: https://issues.apache.org/jira/browse/HIVE-19980 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When order by query returns 0 rows, there will not be any files in temporary table location for GenericUDTFGetSplits which results in the following exception {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:217) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:420) ... 52 more{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19964) Apply resource plan fails if trigger expression has quotes
Prasanth Jayachandran created HIVE-19964: Summary: Apply resource plan fails if trigger expression has quotes Key: HIVE-19964 URL: https://issues.apache.org/jira/browse/HIVE-19964 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Aswathy Chellammal Sreekumar {code:java} 0: jdbc:hive2://localhost:1> CREATE TRIGGER global.big_hdfs_read WHEN HDFS_BYTES_READ > '300kb' DO KILL; INFO : Compiling command(queryId=pjayachandran_20180621131017_72b1441b-d790-4db7-83ca-479735843890): CREATE TRIGGER global.big_hdfs_read WHEN HDFS_BYTES_READ > '300kb' DO KILL INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=pjayachandran_20180621131017_72b1441b-d790-4db7-83ca-479735843890); Time taken: 0.015 seconds INFO : Executing command(queryId=pjayachandran_20180621131017_72b1441b-d790-4db7-83ca-479735843890): CREATE TRIGGER global.big_hdfs_read WHEN HDFS_BYTES_READ > '300kb' DO KILL INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=pjayachandran_20180621131017_72b1441b-d790-4db7-83ca-479735843890); Time taken: 0.025 seconds INFO : OK No rows affected (0.054 seconds) 0: jdbc:hive2://localhost:1> ALTER TRIGGER global.big_hdfs_read ADD TO UNMANAGED; INFO : Compiling command(queryId=pjayachandran_20180621131031_dd489324-db23-412f-9409-32ba697a10e5): ALTER TRIGGER global.big_hdfs_read ADD TO UNMANAGED INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=pjayachandran_20180621131031_dd489324-db23-412f-9409-32ba697a10e5); Time taken: 0.014 seconds INFO : Executing command(queryId=pjayachandran_20180621131031_dd489324-db23-412f-9409-32ba697a10e5): ALTER TRIGGER global.big_hdfs_read ADD TO UNMANAGED INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=pjayachandran_20180621131031_dd489324-db23-412f-9409-32ba697a10e5); Time taken: 0.029 seconds INFO : OK No rows affected (0.054 seconds) 0: jdbc:hive2://localhost:1> ALTER RESOURCE PLAN global ENABLE; INFO : Compiling command(queryId=pjayachandran_20180621131036_26a5f4f3-91e3-4bec-ab42-800adb90104e): ALTER RESOURCE PLAN global ENABLE INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=pjayachandran_20180621131036_26a5f4f3-91e3-4bec-ab42-800adb90104e); Time taken: 0.012 seconds INFO : Executing command(queryId=pjayachandran_20180621131036_26a5f4f3-91e3-4bec-ab42-800adb90104e): ALTER RESOURCE PLAN global ENABLE INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=pjayachandran_20180621131036_26a5f4f3-91e3-4bec-ab42-800adb90104e); Time taken: 0.021 seconds INFO : OK No rows affected (0.045 seconds) 0: jdbc:hive2://localhost:1> ALTER RESOURCE PLAN global ACTIVATE; INFO : Compiling command(queryId=pjayachandran_20180621131037_551b2af0-321b-4638-8ac0-76771a159f4b): ALTER RESOURCE PLAN global ACTIVATE INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=pjayachandran_20180621131037_551b2af0-321b-4638-8ac0-76771a159f4b); Time taken: 0.017 seconds INFO : Executing command(queryId=pjayachandran_20180621131037_551b2af0-321b-4638-8ac0-76771a159f4b): ALTER RESOURCE PLAN global ACTIVATE INFO : Starting task [Stage-0:DDL] in serial mode ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Invalid expression: HDFS_BYTES_READ > 300kb INFO : Completed executing command(queryId=pjayachandran_20180621131037_551b2af0-321b-4638-8ac0-76771a159f4b); Time taken: 0.037 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Invalid expression: HDFS_BYTES_READ > 300kb (state=08S01,code=1){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19956) Include yarn registry classes to jdbc standalone jar
Prasanth Jayachandran created HIVE-19956: Summary: Include yarn registry classes to jdbc standalone jar Key: HIVE-19956 URL: https://issues.apache.org/jira/browse/HIVE-19956 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HS2 Active/Passive HA requires some yarn registry classes. Include it in JDBC standalone jar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19926) Remove deprecated hcatalog streaming
Prasanth Jayachandran created HIVE-19926: Summary: Remove deprecated hcatalog streaming Key: HIVE-19926 URL: https://issues.apache.org/jira/browse/HIVE-19926 Project: Hive Issue Type: Improvement Components: Streaming Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran hcatalog streaming is deprecated in 3.0.0. We should remove it in 4.0.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19886) Logs may be directed to 2 files if --hiveconf hive.log.file is used
Prasanth Jayachandran created HIVE-19886: Summary: Logs may be directed to 2 files if --hiveconf hive.log.file is used Key: HIVE-19886 URL: https://issues.apache.org/jira/browse/HIVE-19886 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran hive launch script explicitly specific log4j2 configuration file to use. The main() methods in HiveServer2 and HiveMetastore reconfigures the logger based on user input via --hiveconf hive.log.file. This may cause logs to end up in 2 different files. Initial logs goes to the file specified in hive-log4j2.properties and after logger reconfiguration the rest of the logs goes to the file specified via --hiveconf hive.log.file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19877) Remove setting hive.execution.engine as mr in HiveStreamingConnection
Prasanth Jayachandran created HIVE-19877: Summary: Remove setting hive.execution.engine as mr in HiveStreamingConnection Key: HIVE-19877 URL: https://issues.apache.org/jira/browse/HIVE-19877 Project: Hive Issue Type: Bug Components: Streaming Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran HiveStreamingConnection explicitly sets execution engine to mr which was from old code. It is no longer required. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19875) increase LLAP IO queue size for perf
Prasanth Jayachandran created HIVE-19875: Summary: increase LLAP IO queue size for perf Key: HIVE-19875 URL: https://issues.apache.org/jira/browse/HIVE-19875 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran According to [~gopalv] queue limit has perf impact, esp. during hashtable load for mapjoin where in the past IO used to queue up more data for processing to process. 1) Overall the default limit could be adjusted higher. 2) Depending on Decimal64 availability, the weight for decimal columns could be reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19873) Cleanup operation log on query cancellation after some delay
Prasanth Jayachandran created HIVE-19873: Summary: Cleanup operation log on query cancellation after some delay Key: HIVE-19873 URL: https://issues.apache.org/jira/browse/HIVE-19873 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When a query is executed using beeline and the query is cancelled due to query timeout or kill query or triggers and when there is cursor on operation log row set, the cursor can thrown an exception as cancel will cleanup the operation log in the background. This can return a non-zero exit code in beeline. So add a delay to the cleanup of operation logging in operation cancel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19864) Address TestTriggersWorkloadManager flakiness
Prasanth Jayachandran created HIVE-19864: Summary: Address TestTriggersWorkloadManager flakiness Key: HIVE-19864 URL: https://issues.apache.org/jira/browse/HIVE-19864 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran TestTriggersWorkloadManager seems flaky and all test cases gets timed out at times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19852) update jackson to latest
Prasanth Jayachandran created HIVE-19852: Summary: update jackson to latest Key: HIVE-19852 URL: https://issues.apache.org/jira/browse/HIVE-19852 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Update jackson version to latest 2.9.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19851) upgrade jQuery version
Prasanth Jayachandran created HIVE-19851: Summary: upgrade jQuery version Key: HIVE-19851 URL: https://issues.apache.org/jira/browse/HIVE-19851 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran jQuery version seems to be very old. Update to latest stable version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19817) Hive streaming API + dynamic partitioning + json/regex writer does not work
Prasanth Jayachandran created HIVE-19817: Summary: Hive streaming API + dynamic partitioning + json/regex writer does not work Key: HIVE-19817 URL: https://issues.apache.org/jira/browse/HIVE-19817 Project: Hive Issue Type: Bug Components: Streaming Affects Versions: 3.1.0, 3.0.1, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran New streaming API for dynamic partitioning only works with delimited record writer. Json and Regex writers does not work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19799) remove jasper dependency
Prasanth Jayachandran created HIVE-19799: Summary: remove jasper dependency Key: HIVE-19799 URL: https://issues.apache.org/jira/browse/HIVE-19799 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran jasper dependency version looks old and unwanted. There is a comment which says it is required by thrift but I don't see jasper as thrift dependency. Try removing it to see if its safe (after precommit test run). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19794) Disable removing order by from subquery in GenericUDTFGetSplits
Prasanth Jayachandran created HIVE-19794: Summary: Disable removing order by from subquery in GenericUDTFGetSplits Key: HIVE-19794 URL: https://issues.apache.org/jira/browse/HIVE-19794 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran spark-llap always wraps query under a subquery, until that is removed from spark-llap hive compiler is going to remove inner order by in GenericUDTFGetSplits. disable that optimization until then. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19792) Enable schema evolution tests for decimal 64
Prasanth Jayachandran created HIVE-19792: Summary: Enable schema evolution tests for decimal 64 Key: HIVE-19792 URL: https://issues.apache.org/jira/browse/HIVE-19792 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Following tests are disabled in HIVE-19629 as orc ConvertTreeReaderFactory does not handle Decimal64ColumnVectors. This jira is to re-enable those tests after orc supports it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19772) Streaming ingest V2 API can generate invalid orc file if interrupted
Prasanth Jayachandran created HIVE-19772: Summary: Streaming ingest V2 API can generate invalid orc file if interrupted Key: HIVE-19772 URL: https://issues.apache.org/jira/browse/HIVE-19772 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.0, 3.0.1, 4.0.0 Reporter: Gopal V Assignee: Prasanth Jayachandran Hive streaming ingest generated 0 length and 3 byte files which are invalid orc files. This will throw the following exception during compaction {code} Error: org.apache.orc.FileFormatException: Not a valid ORC file hdfs://cn105-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/culvert/year=2018/month=7/delta_025_025/bucket_5 (maxFileLength= 3) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:546) at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:60) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:90) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:1124) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2373) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:1000) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:977) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:460) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19742) Fix test added in HIVE-19726 by running with -Duser.timezone="Europe/Paris"
Prasanth Jayachandran created HIVE-19742: Summary: Fix test added in HIVE-19726 by running with -Duser.timezone="Europe/Paris" Key: HIVE-19742 URL: https://issues.apache.org/jira/browse/HIVE-19742 Project: Hive Issue Type: Bug Affects Versions: 3.1.0, 4.0.0 Reporter: Prasanth Jayachandran Make sure test added in HIVE-19726 works with Paris timezone after fixing ORC-370. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19726) ORC date PPD is broken
Prasanth Jayachandran created HIVE-19726: Summary: ORC date PPD is broken Key: HIVE-19726 URL: https://issues.apache.org/jira/browse/HIVE-19726 Project: Hive Issue Type: Bug Affects Versions: 2.4.0, 3.1.0, 3.0.1, 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran When kryo was in version 2.22 we added a fix in HIVE-7222 and later in HIVE-10819. Now that we have updated kryo to 3.0.3 that old workaround fix was never removed. The issue was that kryo serialized Timestamp to Date type. So to recover the timestamp, during deserialization we deserialized *any* date instance to Timestamp object which is wrong (we don't know if date was serialized as date or timestamp serialized as date in first place). This breaks PPD on date time as kryo deserialization always converts Date to Timestamp breaking PPD because of type mismatch. Now that we have newer kryo version we can remove the code added in HIVE-10819. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19664) LLAP reader changes for decimal 64
Prasanth Jayachandran created HIVE-19664: Summary: LLAP reader changes for decimal 64 Key: HIVE-19664 URL: https://issues.apache.org/jira/browse/HIVE-19664 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran With ORC 1.5.0, LLAP readers has to be updated to support decimal 64 readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19641) sync up hadoop version used by storage-api with hive
Prasanth Jayachandran created HIVE-19641: Summary: sync up hadoop version used by storage-api with hive Key: HIVE-19641 URL: https://issues.apache.org/jira/browse/HIVE-19641 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran There is hadoop version mismatch between hive and storage-api and hence different transitive dependency versions gets pulled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19640) dependency version upgrades/fixes/convergence
Prasanth Jayachandran created HIVE-19640: Summary: dependency version upgrades/fixes/convergence Key: HIVE-19640 URL: https://issues.apache.org/jira/browse/HIVE-19640 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran There are several versioning jars, some are old, some have multiple versions, transitive version conflicts etc. This is umbrella jar to fix up all that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19637) Add slow test report script to testutils
Prasanth Jayachandran created HIVE-19637: Summary: Add slow test report script to testutils Key: HIVE-19637 URL: https://issues.apache.org/jira/browse/HIVE-19637 Project: Hive Issue Type: Sub-task Components: Test Affects Versions: 3.1.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Wrote the attached utility script to find top K slow tests from precommit test url. Would like to get that committed to testutils so that its useful for everyone. {code:title=ascii mode} $ python gen-report.py -b 11102 -a Processing 1073 test xml reports from http://104.198.109.242/logs/PreCommit-HIVE-Build-11102/test-results/.. Top 25 testsuite in terms of execution time (in seconds).. [Total time: 73882.661 seconds] ## ██ 20806 TestCliDriver ███ 9601 TestMiniLlapLocalCliDriver ███ 8210 TestSparkCliDriver ██ 2744 TestMinimrCliDriver █2262 TestEncryptedHDFSCliDriver 2021 TestMiniSparkOnYarnCliDriver 1808 TestHiveCli ███ 1566 TestMiniLlapCliDriver ███ 1345 TestReplicationScenarios ██ 1238 TestMiniDruidCliDriver ██940 TestNegativeCliDriver ██865 TestHBaseCliDriver █ 681 TestMiniTezCliDriver █ 555 TestTxnCommands2WithSplitUpdateAndVectorization █ 543 TestCompactor █ 528 TestTxnCommands2 378 TestStreaming 374 TestBlobstoreCliDriver 328 TestNegativeMinimrCliDriver 302 TestTxnCommandsWithSplitUpdateAndVectorization 301 TestHCatClient 299 TestTxnCommands 261 TestTxnLoadData 258 TestAcidOnTez 240 TestHBaseNegativeCliDriver Top 25 testcases in terms of execution time (in seconds).. [Total time: 63102.607 seconds] ### ██ 680 TestMinimrCliDriver_testCliDriver[infer_bucket_sort_reducers_power_two] █ 623 TestMinimrCliDriver_testCliDriver[infer_bucket_sort_map_operators] ███ 429 TestMinimrCliDriver_testCliDriver[infer_bucket_sort_dyn_part] ███ 374 TestSparkCliDriver_testCliDriver[vectorization_short_regress] ███ 374 TestMiniLlapLocalCliDriver_testCliDriver[vectorization_short_regress] 330 TestMiniDruidCliDriver_testCliDriver[druidmini_dynamic_partition] █ 238 TestMiniLlapLocalCliDriver_testCliDriver[vector_outer_join5] 227 TestMiniDruidCliDriver_testCliDriver[druidmini_test_insert] ███ 214 TestEncryptedHDFSCliDriver_testCliDriver[encryption_auto_purge_tables] ███ 211 TestMiniLlapCliDriver_testCliDriver[unionDistinct_1] ███ 210 TestMiniSparkOnYarnCliDriver_testCliDriver[vector_outer_join5] ███ 206 TestMinimrCliDriver_testCliDriver[bucket_num_reducers_acid] ██ 202 TestMinimrCliDriver_testCliDriver[infer_bucket_sort_merge] ██ 198 TestCliDriver_testCliDriver[typechangetest]
[jira] [Created] (HIVE-19636) Fix druidmini_dynamic_partition.q slowness
Prasanth Jayachandran created HIVE-19636: Summary: Fix druidmini_dynamic_partition.q slowness Key: HIVE-19636 URL: https://issues.apache.org/jira/browse/HIVE-19636 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0 Reporter: Prasanth Jayachandran druidmini_dynamic_partition.q runs for >5 mins -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19635) Fix vectorization_short_regress slowness
Prasanth Jayachandran created HIVE-19635: Summary: Fix vectorization_short_regress slowness Key: HIVE-19635 URL: https://issues.apache.org/jira/browse/HIVE-19635 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.0 Reporter: Prasanth Jayachandran vectorization_short_regress.q file runs for >5 mins on each cli drivers -- This message was sent by Atlassian JIRA (v7.6.3#76005)