[jira] [Created] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable
Attila Magyar created HIVE-25519: Summary: Knox homepage service UI links missing when CM intermittently unavailable Key: HIVE-25519 URL: https://issues.apache.org/jira/browse/HIVE-25519 Project: Hive Issue Type: Task Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
Attila Magyar created HIVE-25242: Summary: Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen Key: HIVE-25242 URL: https://issues.apache.org/jira/browse/HIVE-25242 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Environment: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code} Reporter: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25223) Select with limit returns no rows on non native table
Attila Magyar created HIVE-25223: Summary: Select with limit returns no rows on non native table Key: HIVE-25223 URL: https://issues.apache.org/jira/browse/HIVE-25223 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Str: {code:java} CREATE EXTERNAL TABLE hht (key string, value int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" = "hht"); insert into hht select uuid(), cast((rand() * 100) as int); insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; set hive.fetch.task.conversion=none; select * from hht limit 10; +--++ | hht.key | hht.value | +--++ +--++ No rows selected (5.22 seconds) {code} This is caused by GlobalLimitOptimizer. The table directory is always empty with a non native table since the data is not managed by hive (but hbase in this case). The optimizer scans the directory and sets the file list to an empty list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25033) HPL/SQL thrift call fails when returning null
Attila Magyar created HIVE-25033: Summary: HPL/SQL thrift call fails when returning null Key: HIVE-25033 URL: https://issues.apache.org/jira/browse/HIVE-25033 Project: Hive Issue Type: Sub-task Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
Attila Magyar created HIVE-25004: Summary: HPL/SQL subsequent statements are failing after typing a malformed input in beeline Key: HIVE-25004 URL: https://issues.apache.org/jira/browse/HIVE-25004 Project: Hive Issue Type: Bug Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
Attila Magyar created HIVE-24997: Summary: HPL/SQL udf doesn't work in tez container mode Key: HIVE-24997 URL: https://issues.apache.org/jira/browse/HIVE-24997 Project: Hive Issue Type: Sub-task Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS
Attila Magyar created HIVE-24813: Summary: thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS Key: HIVE-24813 URL: https://issues.apache.org/jira/browse/HIVE-24813 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 {code:java} [ERROR] /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34] cannot find symbol [ERROR] symbol: variable TABLE_IS_CTAS [ERROR] location: class org.apache.hadoop.hive.metastore.HMSHandler [ERROR] /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58] cannot find symbol [ERROR] symbol: variable TABLE_IS_CTAS [ERROR] location: class org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer [ERROR] -> [Help 1] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24715) Increase bucketId range
Attila Magyar created HIVE-24715: Summary: Increase bucketId range Key: HIVE-24715 URL: https://issues.apache.org/jira/browse/HIVE-24715 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24696) Drop procedure and drop package syntax for HPLSQL
Attila Magyar created HIVE-24696: Summary: Drop procedure and drop package syntax for HPLSQL Key: HIVE-24696 URL: https://issues.apache.org/jira/browse/HIVE-24696 Project: Hive Issue Type: Sub-task Components: hpl/sql Reporter: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory
Attila Magyar created HIVE-24625: Summary: CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory Key: HIVE-24625 URL: https://issues.apache.org/jira/browse/HIVE-24625 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
Attila Magyar created HIVE-24584: Summary: IndexOutOfBoundsException from Kryo when running msck repair Key: HIVE-24584 URL: https://issues.apache.org/jira/browse/HIVE-24584 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar The following exception is coming when running "msck repair table t1 sync partitions". {code:java} java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] at org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24427) HPL/SQL improvements
Attila Magyar created HIVE-24427: Summary: HPL/SQL improvements Key: HIVE-24427 URL: https://issues.apache.org/jira/browse/HIVE-24427 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24383) Add Table type to HPL/SQL
Attila Magyar created HIVE-24383: Summary: Add Table type to HPL/SQL Key: HIVE-24383 URL: https://issues.apache.org/jira/browse/HIVE-24383 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24346) Store HPL/SQL packages into HMS
Attila Magyar created HIVE-24346: Summary: Store HPL/SQL packages into HMS Key: HIVE-24346 URL: https://issues.apache.org/jira/browse/HIVE-24346 Project: Hive Issue Type: New Feature Components: hpl/sql, Metastore Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24338) HPL/SQL missing features
Attila Magyar created HIVE-24338: Summary: HPL/SQL missing features Key: HIVE-24338 URL: https://issues.apache.org/jira/browse/HIVE-24338 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar There are some features which are supported by Oracle's PL/SQL but not by HPL/SQL. This Jira is about to prioritize them and investigate the feasibility of the implementation. * ForAll syntax like: ForAll j in i..j save exceptions * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n; * Type declartion: Type T_cab is TABLE of * TABLE datatype * GOTO and LABEL * Global variables like $$PLSQL_UNIT and others * Named parameters func(name1 => value1, name2 => value2); * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL
Attila Magyar created HIVE-24315: Summary: Improve validation and semantic analysis in HPL/SQL Key: HIVE-24315 URL: https://issues.apache.org/jira/browse/HIVE-24315 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar There are some known issues that need to be fixed. For example it seems that arity of a function is not checked when calling it, and same is true for parameter types. Calling an undefined function is evaluated to null and sometimes it seems that incorrect syntax is silently ignored. In cases like this a helpful error message would be expected, thought we should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24230) Integrate HPL/SQL into HiveServer2
Attila Magyar created HIVE-24230: Summary: Integrate HPL/SQL into HiveServer2 Key: HIVE-24230 URL: https://issues.apache.org/jira/browse/HIVE-24230 Project: Hive Issue Type: Bug Components: HiveServer2, hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
Attila Magyar created HIVE-24217: Summary: HMS storage backend for HPL/SQL stored procedures Key: HIVE-24217 URL: https://issues.apache.org/jira/browse/HIVE-24217 Project: Hive Issue Type: Bug Components: Hive, hpl/sql, Metastore Reporter: Attila Magyar Assignee: Attila Magyar HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection
Attila Magyar created HIVE-24149: Summary: HiveStreamingConnection doesn't close HMS connection Key: HIVE-24149 URL: https://issues.apache.org/jira/browse/HIVE-24149 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 There 3 HMS connections used by HiveStreamingConnection. One for TX one for hearbeat and for notifications. The close method only closes the first 2 leaving the last one open which eventually overloads HMS and it becomes unresponsive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24137) Race condition when copying llap.tar.gz by multiple HSI
Attila Magyar created HIVE-24137: Summary: Race condition when copying llap.tar.gz by multiple HSI Key: HIVE-24137 URL: https://issues.apache.org/jira/browse/HIVE-24137 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar When both HSI started simultaneously , one of it fails to start. This issue seems to be because multiple HSI are started simultaneous and there is a race condition by DFSClient trying to copy llap tar package to HDFS Restarting one after another would resolve the issue or trying second restart might help. But for long term fix , we would need to fix llap-server/src/main/resources/templates.py and retry copyFromLocal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23957) Limit followed by TopNKey improvement
Attila Magyar created HIVE-23957: Summary: Limit followed by TopNKey improvement Key: HIVE-23957 URL: https://issues.apache.org/jira/browse/HIVE-23957 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar The Limit + topnkey pushdown might result a limit operator followed by a TNK in the physical plan. This likely makes the TNK unnecessary in cases like this. Need to investigate if/when we can remove the TNK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins
Attila Magyar created HIVE-23937: Summary: Take null ordering into consideration when pushing TNK through inner joins Key: HIVE-23937 URL: https://issues.apache.org/jira/browse/HIVE-23937 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23817) Pushing TopN Key operator PKFK inner joins
Attila Magyar created HIVE-23817: Summary: Pushing TopN Key operator PKFK inner joins Key: HIVE-23817 URL: https://issues.apache.org/jira/browse/HIVE-23817 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar If there is primary key foreign key relationship between the tables we can push the topnkey operator through the join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
Attila Magyar created HIVE-23757: Summary: Pushing TopN Key operator through MAPJOIN Key: HIVE-23757 URL: https://issues.apache.org/jira/browse/HIVE-23757 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: HIVE-23757.1.patch So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23723) Limit operator pushdown through LOJ
Attila Magyar created HIVE-23723: Summary: Limit operator pushdown through LOJ Key: HIVE-23723 URL: https://issues.apache.org/jira/browse/HIVE-23723 Project: Hive Issue Type: Improvement Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Limit operator (without an order by) can be pushed through SELECTS and LEFT OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure
Attila Magyar created HIVE-23580: Summary: deleteOnExit set is not cleaned up, causing memory pressure Key: HIVE-23580 URL: https://issues.apache.org/jira/browse/HIVE-23580 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23518) Tez may skip file permission update on intermediate output
Attila Magyar created HIVE-23518: Summary: Tez may skip file permission update on intermediate output Key: HIVE-23518 URL: https://issues.apache.org/jira/browse/HIVE-23518 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Before updating file permissions TEZ check if the permission change is needed with the following conditional: {code:java} if (!SPILL_FILE_PERMS.equals(SPILL_FILE_PERMS.applyUMask(FsPermission.getUMask(conf { rfs.setPermission(filename, SPILL_FILE_PERMS); } {code} If the config object is changing in the background then setPermission() call will be skipped. The rfs file system is always a local file system so there is no need to do this check beforehand (it doesn't generate an additional NameNode call). {code:java} rfs = ((LocalFileSystem)FileSystem.getLocal(this.conf)).getRaw(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration
Attila Magyar created HIVE-23500: Summary: [Kubernetes] Use Extend NodeId for LLAP registration Key: HIVE-23500 URL: https://issues.apache.org/jira/browse/HIVE-23500 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In kubernetes environment where pods can have same host name and port, there can be situations where node trackers could be retaining old instance of the pod in its cache. In case of Hive LLAP, where the llap tez task scheduler maintains the membership of nodes based on zookeeper registry events there can be cases where NODE_ADDED followed by NODE_REMOVED event could end up removing the node/host from node trackers because of stable hostname and service port. The NODE_REMOVED event in this case is old stale event of the already dead pod but ZK will send only after session timeout (in case of non-graceful shutdown). If this sequence of events happen, a node/host is completely lost form the schedulers perspective. To support this scenario, tez can extend yarn's NodeId to include uniqueIdentifier. Llap task scheduler can construct the container object with this new NodeId that includes uniqueIdentifier as well so that stale events like above will only remove the host/node that matches the old uniqueIdentifier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23469) Use hostname + pod UID for shuffle manager caching
Attila Magyar created HIVE-23469: Summary: Use hostname + pod UID for shuffle manager caching Key: HIVE-23469 URL: https://issues.apache.org/jira/browse/HIVE-23469 Project: Hive Issue Type: Bug Components: Tez Reporter: Attila Magyar Assignee: Attila Magyar When a pod restarts, it uses the same hostname and shuffle port. Now when fetcher threads connects to download the shuffle data it will use the cached connection info and since the pod has died it's shuffle data will also get cleaned up. When the pod restarts, it receives connection from clients to download specific shuffle data but the daemon will not have it because of the restart. In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo which is a combination of host+port and the host's unique ID. The host host Id changes when a node is killed or restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23305) NullPointerException in LlapTaskSchedulerService addNode due to race condition
Attila Magyar created HIVE-23305: Summary: NullPointerException in LlapTaskSchedulerService addNode due to race condition Key: HIVE-23305 URL: https://issues.apache.org/jira/browse/HIVE-23305 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 {code:java} java.lang.NullPointerException at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.addNode(LlapTaskSchedulerService.java:1575) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.registerAndAddNode(LlapTaskSchedulerService.java:1566) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.access$1800(LlapTaskSchedulerService.java:128) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:831) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:823) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase$InstanceStateChangeListener.childEvent(ZkRegistryBase.java:612) at {code} The above exception happens when a node registers too fast, before the active activeInstances field was initialized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23295) Possible NPE when on getting predicate literal list when dynamic values are not available
Attila Magyar created HIVE-23295: Summary: Possible NPE when on getting predicate literal list when dynamic values are not available Key: HIVE-23295 URL: https://issues.apache.org/jira/browse/HIVE-23295 Project: Hive Issue Type: Bug Components: storage-api Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 getLiteralList() in SearchArgumentImpl$PredicateLeafImpl returns null if dynamic values are not available. There are multiple call sites where the return value is used without a null check. E.g: leaf.getLiteralList().stream(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23253) Synchronization between external SerDe schemas and Metastore
Attila Magyar created HIVE-23253: Summary: Synchronization between external SerDe schemas and Metastore Key: HIVE-23253 URL: https://issues.apache.org/jira/browse/HIVE-23253 Project: Hive Issue Type: Bug Components: Hive, Metastore Affects Versions: 3.1.2 Reporter: Attila Magyar Fix For: 3.0.0 In HIVE-15995 an ALTER UPDATE COLUMNS statement was introduce to sync external SerDe schema changes with the metastore. This command can only be manually invoked. See it in the documentation. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns Maybe it would make sense to run an update columns automatically in certain cases to prevent problems coming from cases where the user forgets running the update columns manually. One way to reproduce the issue is to change the schema url via an alter table statement. {code:java} [root@c7401 vagrant]# cat test_schema1.avsc { "type":"record", "name":"test_schema", "namespace":"gdc_datascience_qa", "fields":[ { "name":"name", "type":[ "null", "string" ], "default":null } ] }[root@c7401 vagrant]# cat test_schema2.avsc { "type":"record", "name":"test_schema", "namespace":"gdc_datascience_qa", "fields":[ { "name":"name", "type":[ "null", "string" ], "default":null }, { "name":"last_name", "type":[ "null", "string" ], "default":null } ] } {code} {code:java} $ hadoop fs -copyFromLocal *.avsc /tmp/ [beeline] create external table t1 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema1.avsc'); [beeline] alter table t1 set tblproperties('avro.schema.url'='/tmp/test_schema2.avsc'); [beeline] insert into t1 values ('n1', 'l1'); [beeline] create external table t2 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema2.avsc'); [beeline] insert into t2 values ('n2', 'l2'); [beeline] insert overwrite table t1 select * from t2; {code} Error: {code:java} MetaException(message:Column last_name doesn't exist in table t1 in database default) at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652) at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416) at org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446 {code} Running an ALTER UPDATE COLUMNS fixes the problem. cc: [~szita] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23056) LLAP registry getAll doesn't filter compute groups
Attila Magyar created HIVE-23056: Summary: LLAP registry getAll doesn't filter compute groups Key: HIVE-23056 URL: https://issues.apache.org/jira/browse/HIVE-23056 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 ZkRegistryBase's InstanceStateChangeListener gets notified every time a new node is added/removed even when the node doesn't belong to the same compute group as the registry. These znodes are stored internally and returned by getAll(). This causes query coordinators to assign task to executors that are in different compute groups. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22982) TopN Key efficiency check might disable filter too soon
Attila Magyar created HIVE-22982: Summary: TopN Key efficiency check might disable filter too soon Key: HIVE-22982 URL: https://issues.apache.org/jira/browse/HIVE-22982 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 The check is triggered after every n batches but there can be multiple filters, one for each partition. Some filters might have less data then the others. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22974) Metastore's table location check should be optional
Attila Magyar created HIVE-22974: Summary: Metastore's table location check should be optional Key: HIVE-22974 URL: https://issues.apache.org/jira/browse/HIVE-22974 Project: Hive Issue Type: Bug Components: Metastore Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In HIVE-22189 a check was introduced to make sure managed and external tables are located at the proper space. This condition cannot be satisfied during an upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22960) Approximate TopN Key Operator
Attila Magyar created HIVE-22960: Summary: Approximate TopN Key Operator Key: HIVE-22960 URL: https://issues.apache.org/jira/browse/HIVE-22960 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png ??Different from other operators, top n operator demonstrates the notable “long tail” characteristics which makes it distinct from other operators like join, group by and etc. will saturate very quickly. Update is pretty frequent at the beginning and then diverges to a very slow update frequently. The approximation can be implemented in two ways: one way is to stop the array/heap update after certain percentage of the data is been read, for example, 10% or 20%, if we know the table size. The other way is to set a frequency threshold of the array/heap update. After the threshold is met, then stop the top n processing.?? [~rzhappy] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22925) Implement TopNKeyFilter efficiency check
Attila Magyar created HIVE-22925: Summary: Implement TopNKeyFilter efficiency check Key: HIVE-22925 URL: https://issues.apache.org/jira/browse/HIVE-22925 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22870) DML execution on TEZ always outputs the message 'No rows affected'
Attila Magyar created HIVE-22870: Summary: DML execution on TEZ always outputs the message 'No rows affected' Key: HIVE-22870 URL: https://issues.apache.org/jira/browse/HIVE-22870 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Executing an update or insert statement in beeline doesn't show the actual rows inserted/updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22726) TopN Key optimizer should use array instead of priority queue
Attila Magyar created HIVE-22726: Summary: TopN Key optimizer should use array instead of priority queue Key: HIVE-22726 URL: https://issues.apache.org/jira/browse/HIVE-22726 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22647) enable session pool by default
Attila Magyar created HIVE-22647: Summary: enable session pool by default Key: HIVE-22647 URL: https://issues.apache.org/jira/browse/HIVE-22647 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Non pooled session my leak when the client doesn't close the connection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22577) StringIndexOutOfBoundsException when getting sessionId from worker node name
Attila Magyar created HIVE-22577: Summary: StringIndexOutOfBoundsException when getting sessionId from worker node name Key: HIVE-22577 URL: https://issues.apache.org/jira/browse/HIVE-22577 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 When the node name is "worker-" the following exception is thrown {code:java} Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl$DynamicServiceInstanceSet.(LlapZookeeperRegistryImpl.java:313) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:462) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getApplicationId(LlapZookeeperRegistryImpl.java:469) at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getApplicationId(LlapRegistryService.java:212) at org.apache.hadoop.hive.ql.exec.tez.Utils.getCustomSplitLocationProvider(Utils.java:77) at org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:53) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:140) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full
Attila Magyar created HIVE-22523: Summary: The error handler in LlapRecordReader might block if its queue is full Key: HIVE-22523 URL: https://issues.apache.org/jira/browse/HIVE-22523 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In setError() we set the value of an atomic reference (pendingError) and we also put the error in a queue. The latter seems not just unnecessary but it might block the caller of the handler if the queue is full. Also closing of the reader is might not properly handled as some of the flags are not volatile. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22514) HiveProtoLoggingHook might leak memory
Attila Magyar created HIVE-22514: Summary: HiveProtoLoggingHook might leak memory Key: HIVE-22514 URL: https://issues.apache.org/jira/browse/HIVE-22514 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2019-11-18 at 2.19.24 PM.png HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced from the outside. If log events are generated at a very fast rate this queue can grow large. !Screen Shot 2019-11-18 at 2.19.24 PM.png|width=650,height=101! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22502) ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing
Attila Magyar created HIVE-22502: Summary: ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing Key: HIVE-22502 URL: https://issues.apache.org/jira/browse/HIVE-22502 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22411) Performance degradation on single row inserts
Attila Magyar created HIVE-22411: Summary: Performance degradation on single row inserts Key: HIVE-22411 URL: https://issues.apache.org/jira/browse/HIVE-22411 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. For this it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly: !Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436! The fix is to use fs.listFiles(path, /*recursive*/ true) instead the handcrafter recursive method/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22094) Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector
Attila Magyar created HIVE-22094: Summary: Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector Key: HIVE-22094 URL: https://issues.apache.org/jira/browse/HIVE-22094 Project: Hive Issue Type: Task Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 When running a query like this select sum(salary.salary_paid) from salary, employee_closure where salary.employee_id = employee_closure.employee_id; with hive.auto.convert.join=true and hive.vectorized.execution.enabled=true the following exception occurs {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal64ToDecimal.aggregateInput(VectorUDAFSumDecimal64ToDecimal.java:320) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:217) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.doProcessBatch(VectorGroupByOperator.java:414) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:182) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1124) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardOverflow(VectorMapJoinGenerateResultOperator.java:706) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultMultiValue(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:268) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnly(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:180) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.processBatch(VectorMapJoinInnerBigOnlyLongOperator.java:379) ... 28 more{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22055) select count gives incorrect result after loading data from text file
Attila Magyar created HIVE-22055: Summary: select count gives incorrect result after loading data from text file Key: HIVE-22055 URL: https://issues.apache.org/jira/browse/HIVE-22055 Project: Hive Issue Type: Task Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Load data 3 times (both kv1.txt and kv2.txt contains 500 records) {code:java} create table load0_mm (key string, value string) stored as textfile tblproperties("transactional"="true", "transactional_properties"="insert_only"); load data local inpath '../../data/files/kv1.txt' into table load0_mm; select count(1) from load0_mm; load data local inpath '../../data/files/kv2.txt' into table load0_mm; select count(1) from load0_mm; load data local inpath '../../data/files/kv2.txt' into table load0_mm; select count(1) from load0_mm;{code} Expected output {code:java} PREHOOK: query: load data local inpath '../../data/files/kv2.txt' into table load0_mm PREHOOK: type: LOAD A masked pattern was here PREHOOK: Output: default@load0_mm POSTHOOK: query: load data local inpath '../../data/files/kv2.txt' into table load0_mm POSTHOOK: type: LOAD A masked pattern was here POSTHOOK: Output: default@load0_mm PREHOOK: query: select count(1) from load0_mm PREHOOK: type: QUERY PREHOOK: Input: default@load0_mm A masked pattern was here POSTHOOK: query: select count(1) from load0_mm POSTHOOK: type: QUERY POSTHOOK: Input: default@load0_mm A masked pattern was here 1500{code} Got: [ERROR] TestMiniLlapLocalCliDriver.testCliDriver:59 Client Execution succeeded but contained differences (error code = 1) after executing mm_loaddata.q 63c63 < 1480 --- > 1500 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-15585) LLAP failed to start on a host with only 1 cp
Attila Magyar created HIVE-15585: Summary: LLAP failed to start on a host with only 1 cp Key: HIVE-15585 URL: https://issues.apache.org/jira/browse/HIVE-15585 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.1.1 Reporter: Attila Magyar Assignee: Attila Magyar LLAP failed to start on a host with only 1 cpu. The number of thread was calculated by dividing the number of cpus with 2. This resulted zero if the cpu count was 1 and caused an IllegalArgumentException upon startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)