[jira] [Created] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable
Attila Magyar created HIVE-25519: Summary: Knox homepage service UI links missing when CM intermittently unavailable Key: HIVE-25519 URL: https://issues.apache.org/jira/browse/HIVE-25519 Project: Hive Issue Type: Task Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
Attila Magyar created HIVE-25242: Summary: Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen Key: HIVE-25242 URL: https://issues.apache.org/jira/browse/HIVE-25242 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Environment: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code} Reporter: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25223) Select with limit returns no rows on non native table
Attila Magyar created HIVE-25223: Summary: Select with limit returns no rows on non native table Key: HIVE-25223 URL: https://issues.apache.org/jira/browse/HIVE-25223 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Str: {code:java} CREATE EXTERNAL TABLE hht (key string, value int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" = "hht"); insert into hht select uuid(), cast((rand() * 100) as int); insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; set hive.fetch.task.conversion=none; select * from hht limit 10; +--++ | hht.key | hht.value | +--++ +--++ No rows selected (5.22 seconds) {code} This is caused by GlobalLimitOptimizer. The table directory is always empty with a non native table since the data is not managed by hive (but hbase in this case). The optimizer scans the directory and sets the file list to an empty list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25033) HPL/SQL thrift call fails when returning null
Attila Magyar created HIVE-25033: Summary: HPL/SQL thrift call fails when returning null Key: HIVE-25033 URL: https://issues.apache.org/jira/browse/HIVE-25033 Project: Hive Issue Type: Sub-task Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
Attila Magyar created HIVE-25004: Summary: HPL/SQL subsequent statements are failing after typing a malformed input in beeline Key: HIVE-25004 URL: https://issues.apache.org/jira/browse/HIVE-25004 Project: Hive Issue Type: Bug Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
Attila Magyar created HIVE-24997: Summary: HPL/SQL udf doesn't work in tez container mode Key: HIVE-24997 URL: https://issues.apache.org/jira/browse/HIVE-24997 Project: Hive Issue Type: Sub-task Components: hpl/sql Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS
Attila Magyar created HIVE-24813: Summary: thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS Key: HIVE-24813 URL: https://issues.apache.org/jira/browse/HIVE-24813 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 {code:java} [ERROR] /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34] cannot find symbol [ERROR] symbol: variable TABLE_IS_CTAS [ERROR] location: class org.apache.hadoop.hive.metastore.HMSHandler [ERROR] /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58] cannot find symbol [ERROR] symbol: variable TABLE_IS_CTAS [ERROR] location: class org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer [ERROR] -> [Help 1] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24715) Increase bucketId range
Attila Magyar created HIVE-24715: Summary: Increase bucketId range Key: HIVE-24715 URL: https://issues.apache.org/jira/browse/HIVE-24715 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24696) Drop procedure and drop package syntax for HPLSQL
Attila Magyar created HIVE-24696: Summary: Drop procedure and drop package syntax for HPLSQL Key: HIVE-24696 URL: https://issues.apache.org/jira/browse/HIVE-24696 Project: Hive Issue Type: Sub-task Components: hpl/sql Reporter: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory
Attila Magyar created HIVE-24625: Summary: CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory Key: HIVE-24625 URL: https://issues.apache.org/jira/browse/HIVE-24625 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
Attila Magyar created HIVE-24584: Summary: IndexOutOfBoundsException from Kryo when running msck repair Key: HIVE-24584 URL: https://issues.apache.org/jira/browse/HIVE-24584 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar The following exception is coming when running "msck repair table t1 sync partitions". {code:java} java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] at org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24427) HPL/SQL improvements
Attila Magyar created HIVE-24427: Summary: HPL/SQL improvements Key: HIVE-24427 URL: https://issues.apache.org/jira/browse/HIVE-24427 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24383) Add Table type to HPL/SQL
Attila Magyar created HIVE-24383: Summary: Add Table type to HPL/SQL Key: HIVE-24383 URL: https://issues.apache.org/jira/browse/HIVE-24383 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24346) Store HPL/SQL packages into HMS
Attila Magyar created HIVE-24346: Summary: Store HPL/SQL packages into HMS Key: HIVE-24346 URL: https://issues.apache.org/jira/browse/HIVE-24346 Project: Hive Issue Type: New Feature Components: hpl/sql, Metastore Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24338) HPL/SQL missing features
Attila Magyar created HIVE-24338: Summary: HPL/SQL missing features Key: HIVE-24338 URL: https://issues.apache.org/jira/browse/HIVE-24338 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar There are some features which are supported by Oracle's PL/SQL but not by HPL/SQL. This Jira is about to prioritize them and investigate the feasibility of the implementation. * ForAll syntax like: ForAll j in i..j save exceptions * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n; * Type declartion: Type T_cab is TABLE of * TABLE datatype * GOTO and LABEL * Global variables like $$PLSQL_UNIT and others * Named parameters func(name1 => value1, name2 => value2); * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL
Attila Magyar created HIVE-24315: Summary: Improve validation and semantic analysis in HPL/SQL Key: HIVE-24315 URL: https://issues.apache.org/jira/browse/HIVE-24315 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar There are some known issues that need to be fixed. For example it seems that arity of a function is not checked when calling it, and same is true for parameter types. Calling an undefined function is evaluated to null and sometimes it seems that incorrect syntax is silently ignored. In cases like this a helpful error message would be expected, thought we should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24230) Integrate HPL/SQL into HiveServer2
Attila Magyar created HIVE-24230: Summary: Integrate HPL/SQL into HiveServer2 Key: HIVE-24230 URL: https://issues.apache.org/jira/browse/HIVE-24230 Project: Hive Issue Type: Bug Components: HiveServer2, hpl/sql Reporter: Attila Magyar Assignee: Attila Magyar HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
Attila Magyar created HIVE-24217: Summary: HMS storage backend for HPL/SQL stored procedures Key: HIVE-24217 URL: https://issues.apache.org/jira/browse/HIVE-24217 Project: Hive Issue Type: Bug Components: Hive, hpl/sql, Metastore Reporter: Attila Magyar Assignee: Attila Magyar HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection
Attila Magyar created HIVE-24149: Summary: HiveStreamingConnection doesn't close HMS connection Key: HIVE-24149 URL: https://issues.apache.org/jira/browse/HIVE-24149 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 There 3 HMS connections used by HiveStreamingConnection. One for TX one for hearbeat and for notifications. The close method only closes the first 2 leaving the last one open which eventually overloads HMS and it becomes unresponsive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24137) Race condition when copying llap.tar.gz by multiple HSI
Attila Magyar created HIVE-24137: Summary: Race condition when copying llap.tar.gz by multiple HSI Key: HIVE-24137 URL: https://issues.apache.org/jira/browse/HIVE-24137 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar When both HSI started simultaneously , one of it fails to start. This issue seems to be because multiple HSI are started simultaneous and there is a race condition by DFSClient trying to copy llap tar package to HDFS Restarting one after another would resolve the issue or trying second restart might help. But for long term fix , we would need to fix llap-server/src/main/resources/templates.py and retry copyFromLocal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23957) Limit followed by TopNKey improvement
Attila Magyar created HIVE-23957: Summary: Limit followed by TopNKey improvement Key: HIVE-23957 URL: https://issues.apache.org/jira/browse/HIVE-23957 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar The Limit + topnkey pushdown might result a limit operator followed by a TNK in the physical plan. This likely makes the TNK unnecessary in cases like this. Need to investigate if/when we can remove the TNK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins
Attila Magyar created HIVE-23937: Summary: Take null ordering into consideration when pushing TNK through inner joins Key: HIVE-23937 URL: https://issues.apache.org/jira/browse/HIVE-23937 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23817) Pushing TopN Key operator PKFK inner joins
Attila Magyar created HIVE-23817: Summary: Pushing TopN Key operator PKFK inner joins Key: HIVE-23817 URL: https://issues.apache.org/jira/browse/HIVE-23817 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar If there is primary key foreign key relationship between the tables we can push the topnkey operator through the join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
Attila Magyar created HIVE-23757: Summary: Pushing TopN Key operator through MAPJOIN Key: HIVE-23757 URL: https://issues.apache.org/jira/browse/HIVE-23757 Project: Hive Issue Type: Improvement Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: HIVE-23757.1.patch So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23723) Limit operator pushdown through LOJ
Attila Magyar created HIVE-23723: Summary: Limit operator pushdown through LOJ Key: HIVE-23723 URL: https://issues.apache.org/jira/browse/HIVE-23723 Project: Hive Issue Type: Improvement Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Limit operator (without an order by) can be pushed through SELECTS and LEFT OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 72570: HiveProtoLogger should carry out JSON conversion in its own thread
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72570/ --- Review request for hive, Ashutosh Chauhan and Rajesh Balamohan. Bugs: HIVE-23277 https://issues.apache.org/jira/browse/HIVE-23277 Repository: hive-git Description --- This is to avoid JSON serialization being in the hotpath of compiler thread. In short queries, where subsecond latency matters, this becomes an issue along with the query complexity. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 750abcb6a61 ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveHookEventProtoPartialBuilder.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 86a68008515 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/StageIDsRearranger.java 6c874754a1f ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveHookEventProtoPartialBuilder.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java add4b6863d8 Diff: https://reviews.apache.org/r/72570/diff/1/ Testing --- pending Thanks, Attila Magyar
[jira] [Created] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure
Attila Magyar created HIVE-23580: Summary: deleteOnExit set is not cleaned up, causing memory pressure Key: HIVE-23580 URL: https://issues.apache.org/jira/browse/HIVE-23580 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23518) Tez may skip file permission update on intermediate output
Attila Magyar created HIVE-23518: Summary: Tez may skip file permission update on intermediate output Key: HIVE-23518 URL: https://issues.apache.org/jira/browse/HIVE-23518 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Before updating file permissions TEZ check if the permission change is needed with the following conditional: {code:java} if (!SPILL_FILE_PERMS.equals(SPILL_FILE_PERMS.applyUMask(FsPermission.getUMask(conf { rfs.setPermission(filename, SPILL_FILE_PERMS); } {code} If the config object is changing in the background then setPermission() call will be skipped. The rfs file system is always a local file system so there is no need to do this check beforehand (it doesn't generate an additional NameNode call). {code:java} rfs = ((LocalFileSystem)FileSystem.getLocal(this.conf)).getRaw(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration
Attila Magyar created HIVE-23500: Summary: [Kubernetes] Use Extend NodeId for LLAP registration Key: HIVE-23500 URL: https://issues.apache.org/jira/browse/HIVE-23500 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In kubernetes environment where pods can have same host name and port, there can be situations where node trackers could be retaining old instance of the pod in its cache. In case of Hive LLAP, where the llap tez task scheduler maintains the membership of nodes based on zookeeper registry events there can be cases where NODE_ADDED followed by NODE_REMOVED event could end up removing the node/host from node trackers because of stable hostname and service port. The NODE_REMOVED event in this case is old stale event of the already dead pod but ZK will send only after session timeout (in case of non-graceful shutdown). If this sequence of events happen, a node/host is completely lost form the schedulers perspective. To support this scenario, tez can extend yarn's NodeId to include uniqueIdentifier. Llap task scheduler can construct the container object with this new NodeId that includes uniqueIdentifier as well so that stale events like above will only remove the host/node that matches the old uniqueIdentifier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23469) Use hostname + pod UID for shuffle manager caching
Attila Magyar created HIVE-23469: Summary: Use hostname + pod UID for shuffle manager caching Key: HIVE-23469 URL: https://issues.apache.org/jira/browse/HIVE-23469 Project: Hive Issue Type: Bug Components: Tez Reporter: Attila Magyar Assignee: Attila Magyar When a pod restarts, it uses the same hostname and shuffle port. Now when fetcher threads connects to download the shuffle data it will use the cached connection info and since the pod has died it's shuffle data will also get cleaned up. When the pod restarts, it receives connection from clients to download specific shuffle data but the daemon will not have it because of the restart. In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo which is a combination of host+port and the host's unique ID. The host host Id changes when a node is killed or restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72437: Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> On May 5, 2020, 5:04 a.m., Ashutosh Chauhan wrote: > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > > Line 832 (original), 830 (patched) > > <https://reviews.apache.org/r/72437/diff/3/?file=2230109#file2230109line839> > > > > isView only used for this check here, which can be eliminated. Not sure what you mean by eliminating it? Removing it altogeather? - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72437/#review220616 --- On April 27, 2020, 9:15 a.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72437/ > --- > > (Updated April 27, 2020, 9:15 a.m.) > > > Review request for hive, Ashutosh Chauhan, Rajesh Balamohan, and Vineet Garg. > > > Bugs: HIVE-23282 > https://issues.apache.org/jira/browse/HIVE-23282 > > > Repository: hive-git > > > Description > --- > > ObjectStore::getPartitionsByExprInternal internally uses Table information > for getting partitionKeys, table, catalog name. > > > > For this, it ends up populating entire table data from DB (including skew > column, parameters, sort, bucket cols etc). This makes it a lot more > expensive call. It would be good to check if MTable itself can be used > instead of Table. > > > Diffs > - > > > ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java > 4f58cd91efc > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > d1558876f14 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 53b7a67a429 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java > 9834883f00f > > > Diff: https://reviews.apache.org/r/72437/diff/4/ > > > Testing > --- > > > Thanks, > > Attila Magyar > >
[jira] [Created] (HIVE-23305) NullPointerException in LlapTaskSchedulerService addNode due to race condition
Attila Magyar created HIVE-23305: Summary: NullPointerException in LlapTaskSchedulerService addNode due to race condition Key: HIVE-23305 URL: https://issues.apache.org/jira/browse/HIVE-23305 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 {code:java} java.lang.NullPointerException at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.addNode(LlapTaskSchedulerService.java:1575) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.registerAndAddNode(LlapTaskSchedulerService.java:1566) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.access$1800(LlapTaskSchedulerService.java:128) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:831) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:823) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase$InstanceStateChangeListener.childEvent(ZkRegistryBase.java:612) at {code} The above exception happens when a node registers too fast, before the active activeInstances field was initialized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23295) Possible NPE when on getting predicate literal list when dynamic values are not available
Attila Magyar created HIVE-23295: Summary: Possible NPE when on getting predicate literal list when dynamic values are not available Key: HIVE-23295 URL: https://issues.apache.org/jira/browse/HIVE-23295 Project: Hive Issue Type: Bug Components: storage-api Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 getLiteralList() in SearchArgumentImpl$PredicateLeafImpl returns null if dynamic values are not available. There are multiple call sites where the return value is used without a null check. E.g: leaf.getLiteralList().stream(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23253) Synchronization between external SerDe schemas and Metastore
Attila Magyar created HIVE-23253: Summary: Synchronization between external SerDe schemas and Metastore Key: HIVE-23253 URL: https://issues.apache.org/jira/browse/HIVE-23253 Project: Hive Issue Type: Bug Components: Hive, Metastore Affects Versions: 3.1.2 Reporter: Attila Magyar Fix For: 3.0.0 In HIVE-15995 an ALTER UPDATE COLUMNS statement was introduce to sync external SerDe schema changes with the metastore. This command can only be manually invoked. See it in the documentation. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns Maybe it would make sense to run an update columns automatically in certain cases to prevent problems coming from cases where the user forgets running the update columns manually. One way to reproduce the issue is to change the schema url via an alter table statement. {code:java} [root@c7401 vagrant]# cat test_schema1.avsc { "type":"record", "name":"test_schema", "namespace":"gdc_datascience_qa", "fields":[ { "name":"name", "type":[ "null", "string" ], "default":null } ] }[root@c7401 vagrant]# cat test_schema2.avsc { "type":"record", "name":"test_schema", "namespace":"gdc_datascience_qa", "fields":[ { "name":"name", "type":[ "null", "string" ], "default":null }, { "name":"last_name", "type":[ "null", "string" ], "default":null } ] } {code} {code:java} $ hadoop fs -copyFromLocal *.avsc /tmp/ [beeline] create external table t1 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema1.avsc'); [beeline] alter table t1 set tblproperties('avro.schema.url'='/tmp/test_schema2.avsc'); [beeline] insert into t1 values ('n1', 'l1'); [beeline] create external table t2 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema2.avsc'); [beeline] insert into t2 values ('n2', 'l2'); [beeline] insert overwrite table t1 select * from t2; {code} Error: {code:java} MetaException(message:Column last_name doesn't exist in table t1 in database default) at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652) at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416) at org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446 {code} Running an ALTER UPDATE COLUMNS fixes the problem. cc: [~szita] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23056) LLAP registry getAll doesn't filter compute groups
Attila Magyar created HIVE-23056: Summary: LLAP registry getAll doesn't filter compute groups Key: HIVE-23056 URL: https://issues.apache.org/jira/browse/HIVE-23056 Project: Hive Issue Type: Bug Components: llap Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 ZkRegistryBase's InstanceStateChangeListener gets notified every time a new node is added/removed even when the node doesn't belong to the same compute group as the registry. These znodes are stored internally and returned by getAll(). This causes query coordinators to assign task to executors that are in different compute groups. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72200: TopN Key efficiency check might disable filter too soon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72200/ --- (Updated March 6, 2020, 12:44 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Rajesh Balamohan. Bugs: HIVE-22982 https://issues.apache.org/jira/browse/HIVE-22982 Repository: hive-git Description --- The check is triggered after every n batches but there can be multiple filters, one for each partition. Some filters might have less data then the others. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12f4822e381 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 0f8eb173c66 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java b487480b938 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java 06ac661028f ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java ddd657e5552 ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 Diff: https://reviews.apache.org/r/72200/diff/2/ Changes: https://reviews.apache.org/r/72200/diff/1-2/ Testing --- manually Thanks, Attila Magyar
Review Request 72200: TopN Key efficiency check might disable filter too soon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72200/ --- Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Rajesh Balamohan. Bugs: HIVE-22982 https://issues.apache.org/jira/browse/HIVE-22982 Repository: hive-git Description --- The check is triggered after every n batches but there can be multiple filters, one for each partition. Some filters might have less data then the others. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7ea2de9019c ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 0f8eb173c66 ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 Diff: https://reviews.apache.org/r/72200/diff/1/ Testing --- manually Thanks, Attila Magyar
[jira] [Created] (HIVE-22982) TopN Key efficiency check might disable filter too soon
Attila Magyar created HIVE-22982: Summary: TopN Key efficiency check might disable filter too soon Key: HIVE-22982 URL: https://issues.apache.org/jira/browse/HIVE-22982 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 The check is triggered after every n batches but there can be multiple filters, one for each partition. Some filters might have less data then the others. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22974) Metastore's table location check should be optional
Attila Magyar created HIVE-22974: Summary: Metastore's table location check should be optional Key: HIVE-22974 URL: https://issues.apache.org/jira/browse/HIVE-22974 Project: Hive Issue Type: Bug Components: Metastore Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In HIVE-22189 a check was introduced to make sure managed and external tables are located at the proper space. This condition cannot be satisfied during an upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22960) Approximate TopN Key Operator
Attila Magyar created HIVE-22960: Summary: Approximate TopN Key Operator Key: HIVE-22960 URL: https://issues.apache.org/jira/browse/HIVE-22960 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png ??Different from other operators, top n operator demonstrates the notable “long tail” characteristics which makes it distinct from other operators like join, group by and etc. will saturate very quickly. Update is pretty frequent at the beginning and then diverges to a very slow update frequently. The approximation can be implemented in two ways: one way is to stop the array/heap update after certain percentage of the data is been read, for example, 10% or 20%, if we know the table size. The other way is to set a frequency threshold of the array/heap update. After the threshold is met, then stop the top n processing.?? [~rzhappy] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71783: Implement TopNKeyFilter efficiency check
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71783/ --- (Updated Feb. 27, 2020, 12:57 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis. Bugs: HIVE-22925 https://issues.apache.org/jira/browse/HIVE-22925 Repository: hive-git Description --- In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 Diff: https://reviews.apache.org/r/71783/diff/3/ Changes: https://reviews.apache.org/r/71783/diff/2-3/ Testing --- on dwx Thanks, Attila Magyar
Review Request 71783: Implement TopNKeyFilter efficiency check
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71783/ --- Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis. Bugs: HIVE-22925 https://issues.apache.org/jira/browse/HIVE-22925 Repository: hive-git Description --- In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 Diff: https://reviews.apache.org/r/71783/diff/1/ Testing --- on dwx Thanks, Attila Magyar
[jira] [Created] (HIVE-22925) Implement TopNKeyFilter efficiency check
Attila Magyar created HIVE-22925: Summary: Implement TopNKeyFilter efficiency check Key: HIVE-22925 URL: https://issues.apache.org/jira/browse/HIVE-22925 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72113/ --- (Updated Feb. 13, 2020, 3:40 p.m.) Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and Ramesh Kumar Thangarajan. Bugs: HIVE-22870 https://issues.apache.org/jira/browse/HIVE-22870 Repository: hive-git Description --- Executing an update or insert statement in beeline doesn't show the actual rows inserted/updated. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 9c5695ae603 ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out f9b5f8f0d4d ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 9ad0a9b7faf ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 3e99e0ee627 ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out baeac434d79 ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 885cb0a9cba Diff: https://reviews.apache.org/r/72113/diff/2/ Changes: https://reviews.apache.org/r/72113/diff/1-2/ Testing --- with insert and updates Thanks, Attila Magyar
Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72113/ --- Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and Ramesh Kumar Thangarajan. Bugs: HIVE-22870 https://issues.apache.org/jira/browse/HIVE-22870 Repository: hive-git Description --- Executing an update or insert statement in beeline doesn't show the actual rows inserted/updated. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 Diff: https://reviews.apache.org/r/72113/diff/1/ Testing --- with insert and updates Thanks, Attila Magyar
[jira] [Created] (HIVE-22870) DML execution on TEZ always outputs the message 'No rows affected'
Attila Magyar created HIVE-22870: Summary: DML execution on TEZ always outputs the message 'No rows affected' Key: HIVE-22870 URL: https://issues.apache.org/jira/browse/HIVE-22870 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Executing an update or insert statement in beeline doesn't show the actual rows inserted/updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72108: HIVE-22867
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72108/#review219544 --- Ship it! Ship It! - Attila Magyar On Feb. 11, 2020, 9:58 a.m., Krisztian Kasa wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72108/ > --- > > (Updated Feb. 11, 2020, 9:58 a.m.) > > > Review request for hive, Attila Magyar and Jesús Camacho Rodríguez. > > > Bugs: HIVE-22867 > https://issues.apache.org/jira/browse/HIVE-22867 > > > Repository: hive-git > > > Description > --- > > Add partitioning support to VectorTopNKeyOperator > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java bd8ff6285e > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java > f03d65030d > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java > 27ff0c2484 > ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java > 9a266a0c57 > ql/src/test/queries/clientpositive/subquery_in.q 96ed1bae41 > ql/src/test/queries/clientpositive/subquery_notin.q f25168ab77 > ql/src/test/queries/clientpositive/topnkey_windowing.q a5352d2d6c > ql/src/test/queries/clientpositive/vector_windowing_streaming.q 2f7b628db3 > ql/src/test/queries/clientpositive/windowing_filter.q 14d0c5a7c8 > ql/src/test/results/clientpositive/llap/subquery_in.q.out ea8fe5ea96 > ql/src/test/results/clientpositive/llap/subquery_notin.q.out c24b79db86 > ql/src/test/results/clientpositive/llap/topnkey_windowing.q.out 52ba490c01 > ql/src/test/results/clientpositive/llap/vector_windowing_streaming.q.out > b63bcf47f3 > ql/src/test/results/clientpositive/llap/windowing_filter.q.out 8ef2261755 > ql/src/test/results/clientpositive/topnkey_windowing.q.out c186790bea > > > Diff: https://reviews.apache.org/r/72108/diff/1/ > > > Testing > --- > > mvn test -Dtest.output.overwrite -DskipSparkTests > -Dtest=TestMiniLlapLocalCliDriver > -Dqfile=vector_windowing_streaming.q,subquery_notin.q,subquery_in.q,windowing_filter.q,topnkey_windowing.q > -pl itests/qtest -Pitests > > > Thanks, > > Krisztian Kasa > >
Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71995/ --- (Updated Jan. 29, 2020, 2:23 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa. Bugs: HIVE-22726 https://issues.apache.org/jira/browse/HIVE-22726 Repository: hive-git Description --- The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e3ee06ab5fa ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 0ccaeea1da5 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 5faa038c18d ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java 0786c82b7be ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java 8cb48473785 ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java a9ff6b4a830 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java fce850f4fc2 Diff: https://reviews.apache.org/r/71995/diff/4/ Changes: https://reviews.apache.org/r/71995/diff/3-4/ Testing --- with the following query: use tpcds_bin_partitioned_orc_100; set hive.optimize.topnkey=true; set hive.optimize.topnkey.max=5; select i_item_id, s_state, grouping(s_state) g_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, store, item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and ss_cdemo_sk = cd_demo_sk group by rollup (i_item_id, s_state) order by i_item_id ,s_state limit 5; Results: enabled: 5 rows selected (715.26 seconds) enabled: 5 rows selected (605.888 seconds) disabled: 5 rows selected (1208.168 seconds) disabled: 5 rows selected (1219.482 seconds) Thanks, Attila Magyar
Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71995/ --- (Updated Jan. 22, 2020, 9:44 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa. Bugs: HIVE-22726 https://issues.apache.org/jira/browse/HIVE-22726 Repository: hive-git Description --- The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b79515fcf07 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 5faa038c18d ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java 0786c82b7be ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java 8cb48473785 ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java ce6efa49192 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c Diff: https://reviews.apache.org/r/71995/diff/3/ Changes: https://reviews.apache.org/r/71995/diff/2-3/ Testing --- with the following query: use tpcds_bin_partitioned_orc_100; set hive.optimize.topnkey=true; set hive.optimize.topnkey.max=5; select i_item_id, s_state, grouping(s_state) g_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, store, item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and ss_cdemo_sk = cd_demo_sk group by rollup (i_item_id, s_state) order by i_item_id ,s_state limit 5; Results: enabled: 5 rows selected (715.26 seconds) enabled: 5 rows selected (605.888 seconds) disabled: 5 rows selected (1208.168 seconds) disabled: 5 rows selected (1219.482 seconds) Thanks, Attila Magyar
Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71995/ --- (Updated Jan. 22, 2020, 12:09 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa. Changes --- added counter + small comparator optimization Bugs: HIVE-22726 https://issues.apache.org/jira/browse/HIVE-22726 Repository: hive-git Description --- The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b79515fcf07 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 5faa038c18d ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java 0786c82b7be ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java 8cb48473785 ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java ce6efa49192 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c Diff: https://reviews.apache.org/r/71995/diff/2/ Changes: https://reviews.apache.org/r/71995/diff/1-2/ Testing --- with the following query: use tpcds_bin_partitioned_orc_100; set hive.optimize.topnkey=true; set hive.optimize.topnkey.max=5; select i_item_id, s_state, grouping(s_state) g_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, store, item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and ss_cdemo_sk = cd_demo_sk group by rollup (i_item_id, s_state) order by i_item_id ,s_state limit 5; Results: enabled: 5 rows selected (715.26 seconds) enabled: 5 rows selected (605.888 seconds) disabled: 5 rows selected (1208.168 seconds) disabled: 5 rows selected (1219.482 seconds) Thanks, Attila Magyar
Review Request 71995: TopN Key optimizer should use array instead of priority queue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71995/ --- Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa. Bugs: HIVE-22726 https://issues.apache.org/jira/browse/HIVE-22726 Repository: hive-git Description --- The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e7724f9084f ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 5faa038c18d ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java ce6efa49192 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c Diff: https://reviews.apache.org/r/71995/diff/1/ Testing --- with the following query: use tpcds_bin_partitioned_orc_100; set hive.optimize.topnkey=true; set hive.optimize.topnkey.max=5; select i_item_id, s_state, grouping(s_state) g_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, store, item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and ss_cdemo_sk = cd_demo_sk group by rollup (i_item_id, s_state) order by i_item_id ,s_state limit 5; Results: enabled: 5 rows selected (715.26 seconds) enabled: 5 rows selected (605.888 seconds) disabled: 5 rows selected (1208.168 seconds) disabled: 5 rows selected (1219.482 seconds) Thanks, Attila Magyar
[jira] [Created] (HIVE-22726) TopN Key optimizer should use array instead of priority queue
Attila Magyar created HIVE-22726: Summary: TopN Key optimizer should use array instead of priority queue Key: HIVE-22726 URL: https://issues.apache.org/jira/browse/HIVE-22726 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 The TopN key optimizer currently uses a priority queue for keeping track of the largest/smallest rows. Its max size is the same as the user specified limit. This should be replaced a more cache line friendly array with a small (128) maximum size and see how much performance is gained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22647) enable session pool by default
Attila Magyar created HIVE-22647: Summary: enable session pool by default Key: HIVE-22647 URL: https://issues.apache.org/jira/browse/HIVE-22647 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Non pooled session my leak when the client doesn't close the connection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71871/ --- (Updated Dec. 12, 2019, 12:22 p.m.) Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra. Changes --- warning log if there is no suffix after worker- Bugs: HIVE-22577 https://issues.apache.org/jira/browse/HIVE-22577 Repository: hive-git Description --- The sequence number from the worker node name might be missing under some circumstances (the root cause is not fully clear it might be a zookeeper bug). In this case the following exception occurs: Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) at Diffs (updated) - llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 5751b8ed939 Diff: https://reviews.apache.org/r/71871/diff/3/ Changes: https://reviews.apache.org/r/71871/diff/2-3/ Testing --- qtest Thanks, Attila Magyar
Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71871/ --- (Updated Dec. 5, 2019, 6:03 p.m.) Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra. Changes --- there is a 2nd bug in the original code an off by one error when using the substring. Bugs: HIVE-22577 https://issues.apache.org/jira/browse/HIVE-22577 Repository: hive-git Description --- The sequence number from the worker node name might be missing under some circumstances (the root cause is not fully clear it might be a zookeeper bug). In this case the following exception occurs: Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) at Diffs (updated) - llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 5751b8ed939 Diff: https://reviews.apache.org/r/71871/diff/2/ Changes: https://reviews.apache.org/r/71871/diff/1-2/ Testing --- qtest Thanks, Attila Magyar
Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name
> On Dec. 5, 2019, 4:43 p.m., Panos Garefalakis wrote: > > llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java > > Lines 478 (patched) > > <https://reviews.apache.org/r/71871/diff/1/?file=2181935#file2181935line478> > > > > Hey Attila, > > > > With Java's short circuiting the the left expression in the && > > operarator will always be evaluated which could also throw the error you > > are trying to avoid -- to safeguard this operation you would place the > > **nodeName.length() > workerNodePrefix.length()** check on the left part of > > the expression. Hey Panos, That's true, but the error is not originated from the startsWith() but from a subString() expression later on. The startsWith() method doesn't throw any expressions it won't fail regardless the length of nodeName or workerNodePrefix. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71871/#review218941 --- On Dec. 4, 2019, 11:05 a.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71871/ > --- > > (Updated Dec. 4, 2019, 11:05 a.m.) > > > Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra. > > > Bugs: HIVE-22577 > https://issues.apache.org/jira/browse/HIVE-22577 > > > Repository: hive-git > > > Description > --- > > The sequence number from the worker node name might be missing under some > circumstances (the root cause is not fully clear it might be a zookeeper bug). > > In this case the following exception occurs: > > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 at java.lang.String.substring(String.java:1931) at > org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) > at > org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) > at > org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) > at > > > Diffs > - > > > llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java > 5751b8ed939 > > > Diff: https://reviews.apache.org/r/71871/diff/1/ > > > Testing > --- > > qtest > > > Thanks, > > Attila Magyar > >
Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71871/ --- Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra. Bugs: HIVE-22577 https://issues.apache.org/jira/browse/HIVE-22577 Repository: hive-git Description --- The sequence number from the worker node name might be missing under some circumstances (the root cause is not fully clear it might be a zookeeper bug). In this case the following exception occurs: Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) at Diffs - llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 5751b8ed939 Diff: https://reviews.apache.org/r/71871/diff/1/ Testing --- qtest Thanks, Attila Magyar
[jira] [Created] (HIVE-22577) StringIndexOutOfBoundsException when getting sessionId from worker node name
Attila Magyar created HIVE-22577: Summary: StringIndexOutOfBoundsException when getting sessionId from worker node name Key: HIVE-22577 URL: https://issues.apache.org/jira/browse/HIVE-22577 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 When the node name is "worker-" the following exception is thrown {code:java} Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781) at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl$DynamicServiceInstanceSet.(LlapZookeeperRegistryImpl.java:313) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:462) at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getApplicationId(LlapZookeeperRegistryImpl.java:469) at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getApplicationId(LlapRegistryService.java:212) at org.apache.hadoop.hive.ql.exec.tez.Utils.getCustomSplitLocationProvider(Utils.java:77) at org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:53) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:140) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 71845: ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71845/ --- Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra. Bugs: HIVE-22502 https://issues.apache.org/jira/browse/HIVE-22502 Repository: hive-git Description --- A ConcurrentModificationException was thrown from the main loop of TriggerValidatorRunnable. The ConcurrentModificationException happened because an other thread (from TezSessionPoolManager) updated the sessions list while the TriggerValidatorRunnable was iterating over it. The sessions list is updated by TezSessionPoolManager when opening or closing a session. These operations are synchronized but the iteration in TriggerValidatorRunnable is not. The TriggerValidatorRunnable is executed frequently (it is scheduled at a 500ms rate by default) therefore I was reluctant put the whole iteration into a synchronized block. Opening and closing a session happens not so often so I decided to make a copy of the sessions list before passing it to the TriggerValidatorRunnable. Let me know if you think otherwise. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 7c0a1fe120b Diff: https://reviews.apache.org/r/71845/diff/1/ Testing --- qtest Thanks, Attila Magyar
Review Request 71801: The error handler in LlapRecordReader might block if its queue is full
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71801/ --- Review request for hive, Laszlo Bodor, Panos Garefalakis, and Slim Bouguerra. Bugs: HIVE-22523 https://issues.apache.org/jira/browse/HIVE-22523 Repository: hive-git Description --- In setError() we set the value of an atomic reference (pendingError) and we also put the error in a queue. The latter seems not just unnecessary but it might block the caller of the handler if the queue is full. Also closing the reader might not properly handled as some of the flags are not volatile. Diffs - llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java 77966aa9650 Diff: https://reviews.apache.org/r/71801/diff/1/ Testing --- q tests Thanks, Attila Magyar
[jira] [Created] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full
Attila Magyar created HIVE-22523: Summary: The error handler in LlapRecordReader might block if its queue is full Key: HIVE-22523 URL: https://issues.apache.org/jira/browse/HIVE-22523 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 In setError() we set the value of an atomic reference (pendingError) and we also put the error in a queue. The latter seems not just unnecessary but it might block the caller of the handler if the queue is full. Also closing of the reader is might not properly handled as some of the flags are not volatile. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71784: HiveProtoLoggingHook might consume lots of memory
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/ --- (Updated Nov. 21, 2019, 9:40 a.m.) Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and Panos Garefalakis. Changes --- test might be flaky, ignore it until we find a better solution Bugs: HIVE-22514 https://issues.apache.org/jira/browse/HIVE-22514 Repository: hive-git Description --- HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced from the outside. If log events are generated at a very fast rate this queue can grow large. Since ScheduledThreadPoolExecutor does not support changing the default unbounded queue to a bounded one, the queue capacity is checked manually by the patch. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 8eab54859bf ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 450a0b544d6 Diff: https://reviews.apache.org/r/71784/diff/2/ Changes: https://reviews.apache.org/r/71784/diff/1-2/ Testing --- unittest Thanks, Attila Magyar
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 20, 2019, 1:58 a.m., Harish Jaiprakash wrote: > > Thanks for the change. This does solve the memory problem and it looks good > > for me. > > > > We need a follow up JIRA to address why the queue size was 17,000 events. > > Was this hdfs or s3fs? In either case we should have some more > > optimizations like: > > * if there are lot of events, batch the flush to hdfs. > > * if its one event per file mode, increase parallelism since writes are not > > happening in different files. > > > > FYI, the events are lost since it is not written to the hdfs file and DAS > > will not get these events. But that is better than crashing hiveserver2. Thanks for the review. The hive.hook.proto.base-directory points to an s3a path. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218709 ------- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 20, 2019, 2:02 a.m., Harish Jaiprakash wrote: > > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > > Lines 176 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173908#file2173908line176> > > > > We expect the dequeue to have not happened by this time. There is no > > guarantee, since its another thread. Can we atleast add a comment that this > > test can fail intermittently? I guess this effects the existing tests as well, right? However I don't remember seeing any of those faling. Maybe because we're calling the shutdown() on the evtLogger. According to its javadoc it waits for already submitted task to complete. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218710 ------- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 19, 2019, 10:38 p.m., Slim Bouguerra wrote: > > I recommend using a bounded queue instead of checking the size and soing > > the if else everytime > > something like this might work > > ```java > > BlockingQueue linkedBlockingDeque = new > > LinkedBlockingDeque( > > 1); > > ExecutorService service = new ThreadPoolExecutor(1, 1, 30, > > TimeUnit.SECONDS, linkedBlockingDeque, > > new > > ThreadPoolExecutor.DiscardPolicy()); > > > > ``` We also have a periodic event which is scheduled with a fix delay by the ScheduledThreadPoolExecutor. The ThreadPoolExecutor can't do this scheduling, this would require either using 2 executors (a ThreadPoolExecutor and a ScheduledThreadPoolExecutor) or 1 executor plus a Timer, plus some synchronization. The queue size check looked like the most simple solution I could find. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218697 ------- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 19, 2019, 10:24 p.m., Panos Garefalakis wrote: > > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > > Lines 192 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line193> > > > > The solution makes sense to me, however maybe we need to investigate > > further why the queueCapacity change (which is similar to what you are > > proposing) was reverted in HIVE-20746? > > > > Also this logwriter is used to track all protobuf messages right? Is it > > acceptable to drop messages here? > > Panos Garefalakis wrote: > Update: Seems like HIVE-20746 only makes sure that logFile is closed at > the end of the day (even when no events are triggered) -- so the remaining > question is if its acceptible to start dropping messages here (because even > if we drop the messages the events are still going to happen) @harishjp is already added to the review but he is on vacation so I don't know if'll respond or not. But my impression is that removing the capacity limit was not the goal but a side effect when the ScheduledThreadPoolExecutor was added. Before HIVE-20746 we were dropping events. HIVE-20746 was about making sure that the file is closed not about making sure that there are no drops. As far as I understand these events are for external systems like DAS. The event is still visible in the log so they're not lost. But @harishjp will hopefully confirm or refute this. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218694 --- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote: > > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > > Lines 274 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line275> > > > > looking at the java code i see that it is using a bounded queue so not > > sure what you mean by unbounded ? > > can you please clarify ? > > Attila Magyar wrote: > No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and > it cannot be changed. > > Slim Bouguerra wrote: > ```java > /** > * Specialized delay queue. To mesh with TPE declarations, this > * class must be declared as a BlockingQueue even though > * it can only hold RunnableScheduledFutures. > */ > static class DelayedWorkQueue extends AbstractQueue > implements BlockingQueue { > > ``` > but it is blocking means that it should block and therefore we get the > RejectedException. Yes but a BlockingQueue can be created without a size limit. It will still block if I want to take out from an empty queue but it will never be full. So it'll never block when adding an element. / The implementation might use Integer.MAX_VAL as a size limit which is practically almost the same (it'll likely never block or it'll block too late). / - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218679 --- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 19, 2019, 6:58 p.m., Slim Bouguerra wrote: > > the description is kind of confusing, is this a leak or we have spike of > > overload ? > > Leak means we are not cleaning the resources thus that is why we have an > > OOM. > > What you are describing seems to be a system overload that cause a memory > > spike. Yeat, that's right, callig it overload might be better than a leak, i'll modify the description. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218680 ------- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Re: Review Request 71784: HiveProtoLoggingHook might leak memory
> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote: > > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > > Line 217 (original), 219 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line220> > > > > how this can solve the issue ? > > Seems like it is doing the same thing > > > > `Executors.newSingleThreadScheduledExecutor` > > is the same as what you are doing > > ``` > > public static ScheduledExecutorService > > newSingleThreadScheduledExecutor(ThreadFactory threadFactory) { > > return new DelegatedScheduledExecutorService > > (new ScheduledThreadPoolExecutor(1, threadFactory)); > > } > > ``` This not the fix. This change is only needed to get back the proper type so that I can invoke the getQueue() method. I can't do that on ScheduledExecutorService which is interface that is returned by Executors.newSingleThreadScheduledExecutor. I need the concreate class (ScheduledThreadPoolExecutor). > On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote: > > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > > Lines 274 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line275> > > > > looking at the java code i see that it is using a bounded queue so not > > sure what you mean by unbounded ? > > can you please clarify ? No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and it cannot be changed. > On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote: > > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > > Lines 279 (patched) > > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line280> > > > > i am still not sure how this is going to work? > > the original code was dropping events when the queue is full that is > > the case where you see the `RejectedExecutionException` RejectedExecutionException was never thrown with the original code because of the unbounded queue. The queue continued to be larger. In the heap dump there 17000 elements in the queue totally and it takes about 2.5g space. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/#review218679 --- On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71784/ > --- > > (Updated Nov. 19, 2019, 3:43 p.m.) > > > Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and > Panos Garefalakis. > > > Bugs: HIVE-22514 > https://issues.apache.org/jira/browse/HIVE-22514 > > > Repository: hive-git > > > Description > --- > > HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer > tasks and to periodically handle rollover. The builtin > ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced > from the outside. If log events are generated at a very fast rate this queue > can grow large. > > Since ScheduledThreadPoolExecutor does not support changing the default > unbounded queue to a bounded one, the queue capacity is checked manually by > the patch. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java > 8eab54859bf > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java > 450a0b544d6 > > > Diff: https://reviews.apache.org/r/71784/diff/1/ > > > Testing > --- > > unittest > > > Thanks, > > Attila Magyar > >
Review Request 71784: HiveProtoLoggingHook might leak memory
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71784/ --- Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and Panos Garefalakis. Bugs: HIVE-22514 https://issues.apache.org/jira/browse/HIVE-22514 Repository: hive-git Description --- HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced from the outside. If log events are generated at a very fast rate this queue can grow large. Since ScheduledThreadPoolExecutor does not support changing the default unbounded queue to a bounded one, the queue capacity is checked manually by the patch. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 8eab54859bf ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 450a0b544d6 Diff: https://reviews.apache.org/r/71784/diff/1/ Testing --- unittest Thanks, Attila Magyar
[jira] [Created] (HIVE-22514) HiveProtoLoggingHook might leak memory
Attila Magyar created HIVE-22514: Summary: HiveProtoLoggingHook might leak memory Key: HIVE-22514 URL: https://issues.apache.org/jira/browse/HIVE-22514 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2019-11-18 at 2.19.24 PM.png HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced from the outside. If log events are generated at a very fast rate this queue can grow large. !Screen Shot 2019-11-18 at 2.19.24 PM.png|width=650,height=101! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22502) ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing
Attila Magyar created HIVE-22502: Summary: ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing Key: HIVE-22502 URL: https://issues.apache.org/jira/browse/HIVE-22502 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Attila Magyar Assignee: Attila Magyar -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 71707: Performance degradation on single row inserts
> On Nov. 5, 2019, 11:59 p.m., Ashutosh Chauhan wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > > Line 331 (original), 324 (patched) > > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line331> > > > > you may use BlobStorageUtils::isBlobStorageFileSystem() here. isBlobStorageFileSystem matches to s3,s3a,s3n, but only S3AFileSystem (https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861) has an optimized listFiles() implementation. NativeS3FileSystem (https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java) uses the same tree travesing algorithm from the base class. - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218518 ------- On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > --- > > (Updated Nov. 7, 2019, 9:23 a.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > --- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 > common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java > 09343e56166 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/3/ > > > Testing > --- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >
Re: Review Request 71707: Performance degradation on single row inserts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218524 --- standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java Line 331 (original), 324 (patched) <https://reviews.apache.org/r/71707/#comment306265> BlobStorageUtils::isBlobStorageFileSystem() checks if the scheme is either "s3","s3n" or "s3a". But only S3AFileSystem has the optimized listFiles(). In NativeS3FileSystem does not override the tree walking algorithm from the base class. See: https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861 and: https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java - Attila Magyar On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > --- > > (Updated Nov. 7, 2019, 9:23 a.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > --- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 > common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java > 09343e56166 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/3/ > > > Testing > --- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >
Re: Review Request 71707: Performance degradation on single row inserts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/ --- (Updated Nov. 7, 2019, 9:23 a.m.) Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. Changes --- adressing review comments Bugs: HIVE-22411 https://issues.apache.org/jira/browse/HIVE-22411 Repository: hive-git Description --- Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. In order to calculate these, it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly. Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 09343e56166 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 38e843aeacf standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java bf206fffc26 Diff: https://reviews.apache.org/r/71707/diff/3/ Changes: https://reviews.apache.org/r/71707/diff/2-3/ Testing --- measured and plotted insertation time Thanks, Attila Magyar
Re: Review Request 71707: Performance degradation on single row inserts
> On Nov. 5, 2019, 4:33 p.m., Panos Garefalakis wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > > Lines 328 (patched) > > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line335> > > > > Hey Attila, the solution looks good however, as other fileSystems might > > face similar issues in the future using this recursive method (i.e. Azure > > Blob storage) wouldn't it make sense to have hdfs a the base case and > > others separately? and maybe throw a warn message here when the filesystem > > is not supported? Hey Panos, I checked the hadoop project and I found only one FS implementation with optimized recursive listFiles(), other implementations use the tree walking impl. from the base class. I think that's the more common case. Do you know where is the source of this Azure Blob storage? Is that one open source at all? - Attila --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218505 ------- On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > --- > > (Updated Nov. 5, 2019, 3:32 p.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > --- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > - > > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/2/ > > > Testing > --- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >
Re: Review Request 71707: Performance degradation on single row inserts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/ --- (Updated Nov. 5, 2019, 3:32 p.m.) Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. Changes --- Adressing Ashutosh's comments Bugs: HIVE-22411 https://issues.apache.org/jira/browse/HIVE-22411 Repository: hive-git Description --- Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. In order to calculate these, it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly. Diffs (updated) - standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 38e843aeacf standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java bf206fffc26 Diff: https://reviews.apache.org/r/71707/diff/2/ Changes: https://reviews.apache.org/r/71707/diff/1-2/ Testing --- measured and plotted insertation time Thanks, Attila Magyar
Review Request 71707: Performance degradation on single row inserts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/ --- Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. Bugs: HIVE-22411 https://issues.apache.org/jira/browse/HIVE-22411 Repository: hive-git Description --- Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. In order to calculate these, it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly. Diffs - standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 38e843aeacf standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java 155ecb18bf5 Diff: https://reviews.apache.org/r/71707/diff/1/ Testing --- measured and plotted insertation time Thanks, Attila Magyar
[jira] [Created] (HIVE-22411) Performance degradation on single row inserts
Attila Magyar created HIVE-22411: Summary: Performance degradation on single row inserts Key: HIVE-22411 URL: https://issues.apache.org/jira/browse/HIVE-22411 Project: Hive Issue Type: Bug Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. For this it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly: !Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436! The fix is to use fs.listFiles(path, /*recursive*/ true) instead the handcrafter recursive method/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
Review Request 71555: Incompatible java.util.ArrayList for java 11
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71555/ --- Review request for hive, Laszlo Bodor, Ashutosh Chauhan, and Prasanth_J. Bugs: HIVE-22097 https://issues.apache.org/jira/browse/HIVE-22097 Repository: hive-git Description --- The following exceptions come when running a query on Java 11: java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390) at org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235) at org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.lang.NoSuchFieldException: parentOffset at java.base/java.lang.Class.getDeclaredField(Class.java:2412) at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384) ... 29 more The internal structure of ArrayList$SubList changed and our serializer fails. This serialzier comes from kryo-serializers package where they already updated the code. This patch does the some. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java e4d33e82168 Diff: https://reviews.apache.org/r/71555/diff/1/ Testing --- Tested on a real cluster with Java 11. Thanks, Attila Magyar
Review Request 71456: select count gives incorrect result after loading data from text file
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71456/ --- Review request for hive, Ashutosh Chauhan, Jesús Camacho Rodríguez, and Slim Bouguerra. Bugs: HIVE-22055 https://issues.apache.org/jira/browse/HIVE-22055 Repository: hive-git Description --- This happens when tez.grouping.min-size is set to a small value (for example 1) so that the split size that is calculated from the file size is going to be used. This changes as the table grows and different split sizes will be used while doing each selects. load 90 records from f1 select count(1) gives back 90 load 90 records from f2 select count(1) gives back 172 // 8 records missing When running the second select the split size is larger, and SerDeLowLevelCacheImpl is already populated with stripes from the first select (and by that tiem split size was smaller). There is problem with how LineRecordReader works togeather with the cache. So if a larger split is requested and an overlapping smaller one is already in the cache, then SerDeEncodedDataReader'll try to extend the existing split by reading the difference between the large and the small split. But it'll start reading right after the last stripe pyhsically ends, and LineRecordReader always skips the first row, unless we are at the beginning of the file. So this line skipping behaviour is not considered at one point and that's why some rows are missing. Diffs - itests/src/test/resources/testconfiguration.properties 98280c52fe9 llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java 462b25fa234 ql/src/test/queries/clientpositive/mm_loaddata_split_change.q PRE-CREATION ql/src/test/results/clientpositive/llap/mm_loaddata_split_change.q.out PRE-CREATION Diff: https://reviews.apache.org/r/71456/diff/1/ Testing --- with q test Thanks, Attila Magyar
Review Request 71262: Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71262/ --- Review request for hive, Ashutosh Chauhan, Gopal V, and Jesús Camacho Rodríguez. Bugs: HIVE-22094 https://issues.apache.org/jira/browse/HIVE-22094 Repository: hive-git Description --- ClassNotFoundException when running join on decimal column: Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal64ToDecimal.aggregateInput(VectorUDAFSumDecimal64ToDecimal.java:320) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:217) Diffs - data/files/employee_closure/employee_closure.tsv PRE-CREATION data/files/salary/salary.tsv PRE-CREATION itests/src/test/resources/testconfiguration.properties 84c20426763 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 573368829e5 ql/src/test/queries/clientpositive/vector_decimal_mapjoin2.q PRE-CREATION ql/src/test/results/clientpositive/llap/vector_decimal_mapjoin2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/71262/diff/1/ Testing --- qtest Thanks, Attila Magyar
[jira] [Created] (HIVE-22094) Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector
Attila Magyar created HIVE-22094: Summary: Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector Key: HIVE-22094 URL: https://issues.apache.org/jira/browse/HIVE-22094 Project: Hive Issue Type: Task Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 When running a query like this select sum(salary.salary_paid) from salary, employee_closure where salary.employee_id = employee_closure.employee_id; with hive.auto.convert.join=true and hive.vectorized.execution.enabled=true the following exception occurs {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal64ToDecimal.aggregateInput(VectorUDAFSumDecimal64ToDecimal.java:320) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:217) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.doProcessBatch(VectorGroupByOperator.java:414) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:182) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1124) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardOverflow(VectorMapJoinGenerateResultOperator.java:706) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultMultiValue(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:268) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnly(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:180) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.processBatch(VectorMapJoinInnerBigOnlyLongOperator.java:379) ... 28 more{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22055) select count gives incorrect result after loading data from text file
Attila Magyar created HIVE-22055: Summary: select count gives incorrect result after loading data from text file Key: HIVE-22055 URL: https://issues.apache.org/jira/browse/HIVE-22055 Project: Hive Issue Type: Task Components: Hive Reporter: Attila Magyar Assignee: Attila Magyar Load data 3 times (both kv1.txt and kv2.txt contains 500 records) {code:java} create table load0_mm (key string, value string) stored as textfile tblproperties("transactional"="true", "transactional_properties"="insert_only"); load data local inpath '../../data/files/kv1.txt' into table load0_mm; select count(1) from load0_mm; load data local inpath '../../data/files/kv2.txt' into table load0_mm; select count(1) from load0_mm; load data local inpath '../../data/files/kv2.txt' into table load0_mm; select count(1) from load0_mm;{code} Expected output {code:java} PREHOOK: query: load data local inpath '../../data/files/kv2.txt' into table load0_mm PREHOOK: type: LOAD A masked pattern was here PREHOOK: Output: default@load0_mm POSTHOOK: query: load data local inpath '../../data/files/kv2.txt' into table load0_mm POSTHOOK: type: LOAD A masked pattern was here POSTHOOK: Output: default@load0_mm PREHOOK: query: select count(1) from load0_mm PREHOOK: type: QUERY PREHOOK: Input: default@load0_mm A masked pattern was here POSTHOOK: query: select count(1) from load0_mm POSTHOOK: type: QUERY POSTHOOK: Input: default@load0_mm A masked pattern was here 1500{code} Got: [ERROR] TestMiniLlapLocalCliDriver.testCliDriver:59 Client Execution succeeded but contained differences (error code = 1) after executing mm_loaddata.q 63c63 < 1480 --- > 1500 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: Review Request 71156: Tez: Use a pre-parsed TezConfiguration from DagUtils
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71156/ --- (Updated July 25, 2019, 8:05 a.m.) Review request for hive, Laszlo Bodor, Gopal V, and Jesús Camacho Rodríguez. Bugs: HIVE-21828 https://issues.apache.org/jira/browse/HIVE-21828 Repository: hive-git Description --- The HS2 tez-site.xml does not change dynamically - the XML parsed components of the config can be obtained statically and kept across sessions. This allows for the replacing of "new TezConfiguration()" with a HS2 local version instead. The configuration object however has to reference the right resource file (i.e location of tez-site.xml) without reparsing it for each query. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b7468a1ab7 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 3278dfea061 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java dd7ccd4764d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRegExp.java 3bf3cfd3d9e ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java befeb4f2dd4 ql/src/test/org/apache/hive/testutils/HiveTestEnvSetup.java f872da02a3c ql/src/test/queries/clientpositive/mm_loaddata.q 7e5787f2a65 ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out eaed60c1ba7 Diff: https://reviews.apache.org/r/71156/diff/2/ Changes: https://reviews.apache.org/r/71156/diff/1-2/ Testing --- unittests Thanks, Attila Magyar
Review Request 71156: Tez: Use a pre-parsed TezConfiguration from DagUtils
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71156/ --- Review request for hive, Laszlo Bodor, Gopal V, and Jesús Camacho Rodríguez. Bugs: HIVE-21828 https://issues.apache.org/jira/browse/HIVE-21828 Repository: hive-git Description --- The HS2 tez-site.xml does not change dynamically - the XML parsed components of the config can be obtained statically and kept across sessions. This allows for the replacing of "new TezConfiguration()" with a HS2 local version instead. The configuration object however has to reference the right resource file (i.e location of tez-site.xml) without reparsing it for each query. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 440d761f03d ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 3278dfea061 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java dd7ccd4764d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRegExp.java 3bf3cfd3d9e ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java befeb4f2dd4 ql/src/test/org/apache/hive/testutils/HiveTestEnvSetup.java f872da02a3c ql/src/test/queries/clientpositive/mm_loaddata.q 7e5787f2a65 Diff: https://reviews.apache.org/r/71156/diff/1/ Testing --- unittests Thanks, Attila Magyar
Review Request 70990: Vectorization: Decimal64 division with integer columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70990/ --- Review request for hive, Laszlo Bodor, Gopal V, and prasanthj. Bugs: HIVE-21437 https://issues.apache.org/jira/browse/HIVE-21437 Repository: hive-git Description --- Vectorizer fails for CREATE temporary TABLE `catalog_Sales`( `cs_quantity` int, `cs_wholesale_cost` decimal(7,2), `cs_list_price` decimal(7,2), `cs_sales_price` decimal(7,2), `cs_ext_discount_amt` decimal(7,2), `cs_ext_sales_price` decimal(7,2), `cs_ext_wholesale_cost` decimal(7,2), `cs_ext_list_price` decimal(7,2), `cs_ext_tax` decimal(7,2), `cs_coupon_amt` decimal(7,2), `cs_ext_ship_cost` decimal(7,2), `cs_net_paid` decimal(7,2), `cs_net_paid_inc_tax` decimal(7,2), `cs_net_paid_inc_ship` decimal(7,2), `cs_net_paid_inc_ship_tax` decimal(7,2), `cs_net_profit` decimal(7,2)) ; explain vectorization detail select maxcs_ext_list_price - cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from catalog_sales; SELECT operator: Could not instantiate DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], argument classes: [Integer, Integer, Integer], exception: java.lang.IllegalArgumentException Diffs - ql/src/gen/vectorization/ExpressionTemplates/ColumnDivideScalarDecimal.txt 0bd7c004215 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java 0a16e08d61e ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 52e8dcb0904 ql/src/test/queries/clientpositive/vector_decimal_col_scalar_division.q PRE-CREATION ql/src/test/results/clientpositive/perf/spark/query4.q.out a7e317cc3c9 ql/src/test/results/clientpositive/perf/tez/constraints/query4.q.out 293b2816a13 ql/src/test/results/clientpositive/perf/tez/query4.q.out 47515eda2f8 ql/src/test/results/clientpositive/vector_decimal_col_scalar_division.q.out PRE-CREATION Diff: https://reviews.apache.org/r/70990/diff/1/ Testing --- new q test: vector_decimal_col_scalar_division.q Test Result 16,752 tests 0 failures (-2) , 379 skipped (±0) Thanks, Attila Magyar
[jira] [Created] (HIVE-15585) LLAP failed to start on a host with only 1 cp
Attila Magyar created HIVE-15585: Summary: LLAP failed to start on a host with only 1 cp Key: HIVE-15585 URL: https://issues.apache.org/jira/browse/HIVE-15585 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.1.1 Reporter: Attila Magyar Assignee: Attila Magyar LLAP failed to start on a host with only 1 cpu. The number of thread was calculated by dividing the number of cpus with 2. This resulted zero if the cpu count was 1 and caused an IllegalArgumentException upon startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)