[jira] [Created] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable

2021-09-14 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25519:


 Summary: Knox homepage service UI links missing when CM 
intermittently unavailable
 Key: HIVE-25519
 URL: https://issues.apache.org/jira/browse/HIVE-25519
 Project: Hive
  Issue Type: Task
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-14 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25242:


 Summary:  Query performs extremely slow with 
hive.vectorized.adaptor.usage.mode = chosen
 Key: HIVE-25242
 URL: https://issues.apache.org/jira/browse/HIVE-25242
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
 Environment: If hive.vectorized.adaptor.usage.mode is set to chosen 
only certain UDFS are vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}
Reporter: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25223) Select with limit returns no rows on non native table

2021-06-09 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25223:


 Summary: Select with limit returns no rows on non native table
 Key: HIVE-25223
 URL: https://issues.apache.org/jira/browse/HIVE-25223
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


Str:
{code:java}
CREATE EXTERNAL TABLE hht (key string, value int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" = 
"hht");

insert into hht select uuid(), cast((rand() * 100) as int);

insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;

 set hive.fetch.task.conversion=none;
 select * from hht limit 10;

+--++
| hht.key  | hht.value  |
+--++
+--++
No rows selected (5.22 seconds) {code}
 

This is caused by GlobalLimitOptimizer. The table directory is always empty 
with a non native table since the data is not managed by hive (but hbase in 
this case).

The optimizer scans the directory and sets the file list to an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-20 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25033:


 Summary: HPL/SQL thrift call fails when returning null
 Key: HIVE-25033
 URL: https://issues.apache.org/jira/browse/HIVE-25033
 Project: Hive
  Issue Type: Sub-task
  Components: hpl/sql
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-12 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25004:


 Summary: HPL/SQL subsequent statements are failing after typing a 
malformed input in beeline
 Key: HIVE-25004
 URL: https://issues.apache.org/jira/browse/HIVE-25004
 Project: Hive
  Issue Type: Bug
  Components: hpl/sql
Affects Versions: 4.0.0
Reporter: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-09 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24997:


 Summary: HPL/SQL udf doesn't work in tez container mode
 Key: HIVE-24997
 URL: https://issues.apache.org/jira/browse/HIVE-24997
 Project: Hive
  Issue Type: Sub-task
  Components: hpl/sql
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS

2021-02-23 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24813:


 Summary: thrift regeneration is failing with cannot find symbol 
TABLE_IS_CTAS
 Key: HIVE-24813
 URL: https://issues.apache.org/jira/browse/HIVE-24813
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


{code:java}
[ERROR] 
/Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34]
 cannot find symbol
[ERROR]   symbol:   variable TABLE_IS_CTAS
[ERROR]   location: class org.apache.hadoop.hive.metastore.HMSHandler
[ERROR] 
/Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58]
 cannot find symbol
[ERROR]   symbol:   variable TABLE_IS_CTAS
[ERROR]   location: class 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer
[ERROR] -> [Help 1] {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24715:


 Summary: Increase bucketId range
 Key: HIVE-24715
 URL: https://issues.apache.org/jira/browse/HIVE-24715
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24696) Drop procedure and drop package syntax for HPLSQL

2021-01-28 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24696:


 Summary: Drop procedure and drop package syntax for HPLSQL
 Key: HIVE-24696
 URL: https://issues.apache.org/jira/browse/HIVE-24696
 Project: Hive
  Issue Type: Sub-task
  Components: hpl/sql
Reporter: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-01-12 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24625:


 Summary: CTAS with TBLPROPERTIES ('transactional'='false') loads 
data into incorrect directory
 Key: HIVE-24625
 URL: https://issues.apache.org/jira/browse/HIVE-24625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24584:


 Summary: IndexOutOfBoundsException from Kryo when running msck 
repair
 Key: HIVE-24584
 URL: https://issues.apache.org/jira/browse/HIVE-24584
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar


The following exception is coming when running "msck repair table t1 sync 
partitions".
{code:java}
java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
at 
org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
 [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
 [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24427) HPL/SQL improvements

2020-11-25 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24427:


 Summary: HPL/SQL improvements
 Key: HIVE-24427
 URL: https://issues.apache.org/jira/browse/HIVE-24427
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24383) Add Table type to HPL/SQL

2020-11-13 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24383:


 Summary: Add Table type to HPL/SQL
 Key: HIVE-24383
 URL: https://issues.apache.org/jira/browse/HIVE-24383
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24346) Store HPL/SQL packages into HMS

2020-11-02 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24346:


 Summary: Store HPL/SQL packages into HMS
 Key: HIVE-24346
 URL: https://issues.apache.org/jira/browse/HIVE-24346
 Project: Hive
  Issue Type: New Feature
  Components: hpl/sql, Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24338) HPL/SQL missing features

2020-10-30 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24338:


 Summary: HPL/SQL missing features
 Key: HIVE-24338
 URL: https://issues.apache.org/jira/browse/HIVE-24338
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar


There are some features which are supported by Oracle's PL/SQL but not by 
HPL/SQL. This Jira is about to prioritize them and investigate the feasibility 
of the implementation.
 * ForAll syntax like: ForAll j in i..j save exceptions
 * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n;
 * Type declartion: Type T_cab is TABLE of
 * TABLE datatype
 * GOTO and LABEL
 * Global variables like $$PLSQL_UNIT and others
 * Named parameters func(name1 => value1, name2 => value2);
 * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-10-27 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24315:


 Summary: Improve validation and semantic analysis in HPL/SQL 
 Key: HIVE-24315
 URL: https://issues.apache.org/jira/browse/HIVE-24315
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar


There are some known issues that need to be fixed. For example it seems that 
arity of a function is not checked when calling it, and same is true for 
parameter types. Calling an undefined function is evaluated to null and 
sometimes it seems that incorrect syntax is silently ignored. 

In cases like this a helpful error message would be expected, thought we should 
also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24230:


 Summary: Integrate HPL/SQL into HiveServer2
 Key: HIVE-24230
 URL: https://issues.apache.org/jira/browse/HIVE-24230
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar


HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-09-30 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24217:


 Summary: HMS storage backend for HPL/SQL stored procedures
 Key: HIVE-24217
 URL: https://issues.apache.org/jira/browse/HIVE-24217
 Project: Hive
  Issue Type: Bug
  Components: Hive, hpl/sql, Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar


HPL/SQL procedures are currently stored in text files. The goal of this Jira is 
to implement a Metastore backend for storing and loading these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection

2020-09-11 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24149:


 Summary: HiveStreamingConnection doesn't close HMS connection
 Key: HIVE-24149
 URL: https://issues.apache.org/jira/browse/HIVE-24149
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


There 3 HMS connections used by HiveStreamingConnection. One for TX one for 
hearbeat and for notifications. The close method only closes the first 2 
leaving the last one open which eventually overloads HMS and it becomes 
unresponsive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24137) Race condition when copying llap.tar.gz by multiple HSI

2020-09-09 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24137:


 Summary: Race condition when copying llap.tar.gz by multiple HSI
 Key: HIVE-24137
 URL: https://issues.apache.org/jira/browse/HIVE-24137
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Attila Magyar


When both HSI started simultaneously , one of it fails to start.

This issue seems to be because multiple HSI are started simultaneous and there 
is a race condition by DFSClient trying to copy llap tar package to HDFS

 

Restarting one after another would resolve the issue or trying second restart 
might help. But for long term fix , we would need to fix 
llap-server/src/main/resources/templates.py and retry copyFromLocal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23957) Limit followed by TopNKey improvement

2020-07-30 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23957:


 Summary: Limit followed by TopNKey improvement
 Key: HIVE-23957
 URL: https://issues.apache.org/jira/browse/HIVE-23957
 Project: Hive
  Issue Type: Improvement
Reporter: Attila Magyar
Assignee: Attila Magyar


The Limit + topnkey pushdown might result a limit operator followed by a TNK in 
the physical plan. This likely makes the TNK unnecessary in cases like this. 
Need to investigate if/when we can remove the TNK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins

2020-07-27 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23937:


 Summary: Take null ordering into consideration when pushing TNK 
through inner joins
 Key: HIVE-23937
 URL: https://issues.apache.org/jira/browse/HIVE-23937
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-08 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23817:


 Summary: Pushing TopN Key operator PKFK inner joins
 Key: HIVE-23817
 URL: https://issues.apache.org/jira/browse/HIVE-23817
 Project: Hive
  Issue Type: Improvement
Reporter: Attila Magyar
Assignee: Attila Magyar


If there is primary key foreign key relationship between the tables we can push 
the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-24 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23757:


 Summary: Pushing TopN Key operator through MAPJOIN
 Key: HIVE-23757
 URL: https://issues.apache.org/jira/browse/HIVE-23757
 Project: Hive
  Issue Type: Improvement
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: HIVE-23757.1.patch

So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23723:


 Summary: Limit operator pushdown through LOJ
 Key: HIVE-23723
 URL: https://issues.apache.org/jira/browse/HIVE-23723
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


Limit operator (without an order by) can be pushed through SELECTS and LEFT 
OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72570: HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72570/
---

Review request for hive, Ashutosh Chauhan and Rajesh Balamohan.


Bugs: HIVE-23277
https://issues.apache.org/jira/browse/HIVE-23277


Repository: hive-git


Description
---

This is to avoid JSON serialization being in the hotpath of compiler thread. In 
short queries, where subsecond latency matters, this becomes an issue along 
with the query complexity.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 750abcb6a61 
  
ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveHookEventProtoPartialBuilder.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
86a68008515 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/StageIDsRearranger.java
 6c874754a1f 
  
ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveHookEventProtoPartialBuilder.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
add4b6863d8 


Diff: https://reviews.apache.org/r/72570/diff/1/


Testing
---

pending


Thanks,

Attila Magyar



[jira] [Created] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure

2020-05-29 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23580:


 Summary: deleteOnExit set is not cleaned up, causing memory 
pressure
 Key: HIVE-23580
 URL: https://issues.apache.org/jira/browse/HIVE-23580
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23518) Tez may skip file permission update on intermediate output

2020-05-20 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23518:


 Summary: Tez may skip file permission update on intermediate output
 Key: HIVE-23518
 URL: https://issues.apache.org/jira/browse/HIVE-23518
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar


Before updating file permissions TEZ check if the permission change is needed 
with the following conditional:
{code:java}
if 
(!SPILL_FILE_PERMS.equals(SPILL_FILE_PERMS.applyUMask(FsPermission.getUMask(conf
 {
  rfs.setPermission(filename, SPILL_FILE_PERMS);
} {code}
If the config object is changing in the background then setPermission() call 
will be skipped.

The rfs file system is always a local file system so there is no need to do 
this check beforehand (it doesn't generate an additional NameNode call).
{code:java}
rfs = ((LocalFileSystem)FileSystem.getLocal(this.conf)).getRaw(); {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration

2020-05-19 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23500:


 Summary: [Kubernetes] Use Extend NodeId for LLAP registration
 Key: HIVE-23500
 URL: https://issues.apache.org/jira/browse/HIVE-23500
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


In kubernetes environment where pods can have same host name and port, there 
can be situations where node trackers could be retaining old instance of the 
pod in its cache. In case of Hive LLAP, where the llap tez task scheduler 
maintains the membership of nodes based on zookeeper registry events there can 
be cases where NODE_ADDED followed by NODE_REMOVED event could end up removing 
the node/host from node trackers because of stable hostname and service port. 
The NODE_REMOVED event in this case is old stale event of the already dead pod 
but ZK will send only after session timeout (in case of non-graceful shutdown). 
If this sequence of events happen, a node/host is completely lost form the 
schedulers perspective. 

To support this scenario, tez can extend yarn's NodeId to include 
uniqueIdentifier. Llap task scheduler can construct the container object with 
this new NodeId that includes uniqueIdentifier as well so that stale events 
like above will only remove the host/node that matches the old 
uniqueIdentifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23469) Use hostname + pod UID for shuffle manager caching

2020-05-14 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23469:


 Summary: Use hostname + pod UID for shuffle manager caching
 Key: HIVE-23469
 URL: https://issues.apache.org/jira/browse/HIVE-23469
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Attila Magyar
Assignee: Attila Magyar


When a pod restarts, it uses the same hostname and shuffle port. Now when 
fetcher threads connects to download the shuffle data it will use the cached 
connection info and since the pod has died it's shuffle data will also get 
cleaned up. When the pod restarts, it receives connection from clients to 
download specific shuffle data but the daemon will not have it because of the 
restart.

In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo 
which is a combination of host+port and the host's unique ID. The host host Id 
changes when a node is killed or restarted.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72437: Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-05-05 Thread Attila Magyar


> On May 5, 2020, 5:04 a.m., Ashutosh Chauhan wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> > Line 832 (original), 830 (patched)
> > <https://reviews.apache.org/r/72437/diff/3/?file=2230109#file2230109line839>
> >
> > isView only used for this check here, which can be eliminated.

Not sure what you mean by eliminating it? Removing it altogeather?


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72437/#review220616
---


On April 27, 2020, 9:15 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72437/
> ---
> 
> (Updated April 27, 2020, 9:15 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Rajesh Balamohan, and Vineet Garg.
> 
> 
> Bugs: HIVE-23282
> https://issues.apache.org/jira/browse/HIVE-23282
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
> 
>  
> 
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  4f58cd91efc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  d1558876f14 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  53b7a67a429 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
>  9834883f00f 
> 
> 
> Diff: https://reviews.apache.org/r/72437/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-23305) NullPointerException in LlapTaskSchedulerService addNode due to race condition

2020-04-27 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23305:


 Summary: NullPointerException in LlapTaskSchedulerService addNode 
due to race condition
 Key: HIVE-23305
 URL: https://issues.apache.org/jira/browse/HIVE-23305
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


{code:java}
java.lang.NullPointerException at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.addNode(LlapTaskSchedulerService.java:1575)
    at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.registerAndAddNode(LlapTaskSchedulerService.java:1566)
 at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService.access$1800(LlapTaskSchedulerService.java:128)
 at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:831)
    at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService$NodeStateChangeListener.onCreate(LlapTaskSchedulerService.java:823)
    at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase$InstanceStateChangeListener.childEvent(ZkRegistryBase.java:612)
   at  {code}
The above exception happens when a node registers too fast, before the active 
activeInstances field was initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23295) Possible NPE when on getting predicate literal list when dynamic values are not available

2020-04-24 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23295:


 Summary: Possible NPE when on getting predicate literal list when 
dynamic values are not available
 Key: HIVE-23295
 URL: https://issues.apache.org/jira/browse/HIVE-23295
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


getLiteralList() in SearchArgumentImpl$PredicateLeafImpl returns null if 
dynamic values are not available. There are multiple call sites where the 
return value is used without a null check. E.g:  leaf.getLiteralList().stream().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23253) Synchronization between external SerDe schemas and Metastore

2020-04-20 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23253:


 Summary: Synchronization between external SerDe schemas and 
Metastore
 Key: HIVE-23253
 URL: https://issues.apache.org/jira/browse/HIVE-23253
 Project: Hive
  Issue Type: Bug
  Components: Hive, Metastore
Affects Versions: 3.1.2
Reporter: Attila Magyar
 Fix For: 3.0.0


In HIVE-15995 an ALTER  UPDATE COLUMNS statement was introduce to sync 
external SerDe schema changes with the metastore. This command can only be 
manually invoked.

See it in the documentation.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns

 

Maybe it would make sense to run an update columns automatically in certain 
cases to prevent problems coming from cases where the user forgets running the 
update columns manually.

 

One way to reproduce the issue is to change the schema url via an alter table 
statement.
{code:java}
[root@c7401 vagrant]# cat test_schema1.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
}
]
}[root@c7401 vagrant]# cat test_schema2.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"last_name",
"type":[
"null",
"string"
],
"default":null
}
]
}
 {code}
{code:java}
 $ hadoop fs -copyFromLocal *.avsc /tmp/
  [beeline] create external table t1 stored as avro tblproperties 
('avro.schema.url'='/tmp/test_schema1.avsc');
  [beeline] alter table t1 set 
tblproperties('avro.schema.url'='/tmp/test_schema2.avsc'); 
  [beeline] insert into t1 values ('n1', 'l1');
  [beeline] create external table t2 stored as avro tblproperties 
('avro.schema.url'='/tmp/test_schema2.avsc');
  [beeline] insert into t2 values ('n2', 'l2');
  [beeline] insert overwrite table t1 select * from t2; {code}
Error:
{code:java}
 MetaException(message:Column last_name doesn't exist in table t1 in database 
default)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446
 {code}
Running an ALTER UPDATE COLUMNS fixes the problem.

 

cc: [~szita]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23056) LLAP registry getAll doesn't filter compute groups

2020-03-20 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23056:


 Summary: LLAP registry getAll doesn't filter compute groups
 Key: HIVE-23056
 URL: https://issues.apache.org/jira/browse/HIVE-23056
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


ZkRegistryBase's InstanceStateChangeListener gets notified every time a new 
node is added/removed even when the node doesn't belong to the same compute 
group as the registry. These znodes are stored internally and returned by 
getAll(). This causes query coordinators to assign task to executors that are 
in different compute groups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-06 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/
---

(Updated March 6, 2020, 12:44 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Rajesh Balamohan.


Bugs: HIVE-22982
https://issues.apache.org/jira/browse/HIVE-22982


Repository: hive-git


Description
---

The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12f4822e381 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
0f8eb173c66 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 b487480b938 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 06ac661028f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java ddd657e5552 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 


Diff: https://reviews.apache.org/r/72200/diff/2/

Changes: https://reviews.apache.org/r/72200/diff/1-2/


Testing
---

manually


Thanks,

Attila Magyar



Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/
---

Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Rajesh Balamohan.


Bugs: HIVE-22982
https://issues.apache.org/jira/browse/HIVE-22982


Repository: hive-git


Description
---

The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7ea2de9019c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
0f8eb173c66 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 


Diff: https://reviews.apache.org/r/72200/diff/1/


Testing
---

manually


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22982) TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22982:


 Summary: TopN Key efficiency check might disable filter too soon
 Key: HIVE-22982
 URL: https://issues.apache.org/jira/browse/HIVE-22982
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22974:


 Summary: Metastore's table location check should be optional
 Key: HIVE-22974
 URL: https://issues.apache.org/jira/browse/HIVE-22974
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


In HIVE-22189 a check was introduced to make sure managed and external tables 
are located at the proper space. This condition cannot be satisfied during an 
upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22960) Approximate TopN Key Operator

2020-03-02 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22960:


 Summary: Approximate TopN Key Operator
 Key: HIVE-22960
 URL: https://issues.apache.org/jira/browse/HIVE-22960
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png

??Different from other operators, top n operator demonstrates the notable “long 
tail” characteristics  which makes it distinct from other operators like join, 
group by and etc.   will saturate very quickly. Update is pretty frequent at 
the beginning and then diverges to a very slow update frequently.

The approximation can be implemented in two ways: one way is to stop the 
array/heap update after certain percentage of the data is been read, for 
example, 10% or 20%, if we know the table size. The other way is to set a 
frequency threshold of the array/heap update. After the threshold  is met, then 
stop the top n processing.??

[~rzhappy]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71783: Implement TopNKeyFilter efficiency check

2020-02-27 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
---

(Updated Feb. 27, 2020, 12:57 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Panos Garefalakis.


Bugs: HIVE-22925
https://issues.apache.org/jira/browse/HIVE-22925


Repository: hive-git


Description
---

In certain cases the TopNKey filter might work in an inefficient way and adds 
extra CPU overhead. For example if the rows are coming in an descending order 
but the filter wants the top N smallest elements the filter will forward 
everything.

Inefficient should be detected in runtime so that the filter can be disabled of 
the ration between forwarder_rows/total_rows is too high.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
7feadd3137d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
3869ffa2b83 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 


Diff: https://reviews.apache.org/r/71783/diff/3/

Changes: https://reviews.apache.org/r/71783/diff/2-3/


Testing
---

on dwx


Thanks,

Attila Magyar



Review Request 71783: Implement TopNKeyFilter efficiency check

2020-02-25 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
---

Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Panos Garefalakis.


Bugs: HIVE-22925
https://issues.apache.org/jira/browse/HIVE-22925


Repository: hive-git


Description
---

In certain cases the TopNKey filter might work in an inefficient way and adds 
extra CPU overhead. For example if the rows are coming in an ascending order 
but the filter wants the top N smallest elements the filter will forward 
everything.

Inefficient should be detected in runtime so that the filter can be disabled of 
the ration between forwarder_rows/total_rows is too high.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
7feadd3137d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
3869ffa2b83 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 


Diff: https://reviews.apache.org/r/71783/diff/1/


Testing
---

on dwx


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22925) Implement TopNKeyFilter efficiency check

2020-02-24 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22925:


 Summary: Implement TopNKeyFilter efficiency check
 Key: HIVE-22925
 URL: https://issues.apache.org/jira/browse/HIVE-22925
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


In certain cases the TopNKey filter might work in an inefficient way and adds 
extra CPU overhead. For example if the rows are coming in an ascending order 
but the filter wants the top N smallest elements the filter will forward 
everything.

Inefficient should be detected in runtime so that the filter can be disabled of 
the ration between forwarder_rows/total_rows is too high.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'

2020-02-13 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72113/
---

(Updated Feb. 13, 2020, 3:40 p.m.)


Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and 
Ramesh Kumar Thangarajan.


Bugs: HIVE-22870
https://issues.apache.org/jira/browse/HIVE-22870


Repository: hive-git


Description
---

Executing an update or insert statement in beeline doesn't show the actual rows 
inserted/updated.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 
  ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 9c5695ae603 
  ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out f9b5f8f0d4d 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 9ad0a9b7faf 
  ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
3e99e0ee627 
  ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out 
baeac434d79 
  ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 885cb0a9cba 


Diff: https://reviews.apache.org/r/72113/diff/2/

Changes: https://reviews.apache.org/r/72113/diff/1-2/


Testing
---

with insert and updates


Thanks,

Attila Magyar



Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'

2020-02-11 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72113/
---

Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and 
Ramesh Kumar Thangarajan.


Bugs: HIVE-22870
https://issues.apache.org/jira/browse/HIVE-22870


Repository: hive-git


Description
---

Executing an update or insert statement in beeline doesn't show the actual rows 
inserted/updated.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 


Diff: https://reviews.apache.org/r/72113/diff/1/


Testing
---

with insert and updates


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22870) DML execution on TEZ always outputs the message 'No rows affected'

2020-02-11 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22870:


 Summary: DML execution on TEZ always outputs the message 'No rows 
affected'
 Key: HIVE-22870
 URL: https://issues.apache.org/jira/browse/HIVE-22870
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar


Executing an update or insert statement in beeline doesn't show the actual rows 
inserted/updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72108: HIVE-22867

2020-02-11 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72108/#review219544
---


Ship it!




Ship It!

- Attila Magyar


On Feb. 11, 2020, 9:58 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72108/
> ---
> 
> (Updated Feb. 11, 2020, 9:58 a.m.)
> 
> 
> Review request for hive, Attila Magyar and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-22867
> https://issues.apache.org/jira/browse/HIVE-22867
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add partitioning support to VectorTopNKeyOperator
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java bd8ff6285e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> f03d65030d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 27ff0c2484 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java 
> 9a266a0c57 
>   ql/src/test/queries/clientpositive/subquery_in.q 96ed1bae41 
>   ql/src/test/queries/clientpositive/subquery_notin.q f25168ab77 
>   ql/src/test/queries/clientpositive/topnkey_windowing.q a5352d2d6c 
>   ql/src/test/queries/clientpositive/vector_windowing_streaming.q 2f7b628db3 
>   ql/src/test/queries/clientpositive/windowing_filter.q 14d0c5a7c8 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out ea8fe5ea96 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c24b79db86 
>   ql/src/test/results/clientpositive/llap/topnkey_windowing.q.out 52ba490c01 
>   ql/src/test/results/clientpositive/llap/vector_windowing_streaming.q.out 
> b63bcf47f3 
>   ql/src/test/results/clientpositive/llap/windowing_filter.q.out 8ef2261755 
>   ql/src/test/results/clientpositive/topnkey_windowing.q.out c186790bea 
> 
> 
> Diff: https://reviews.apache.org/r/72108/diff/1/
> 
> 
> Testing
> ---
> 
> mvn test -Dtest.output.overwrite -DskipSparkTests 
> -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=vector_windowing_streaming.q,subquery_notin.q,subquery_in.q,windowing_filter.q,topnkey_windowing.q
>  -pl itests/qtest -Pitests
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>



Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue

2020-01-29 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
---

(Updated Jan. 29, 2020, 2:23 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.


Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726


Repository: hive-git


Description
---

The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e3ee06ab5fa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 0ccaeea1da5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
5faa038c18d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 0786c82b7be 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 8cb48473785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
a9ff6b4a830 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java fce850f4fc2 


Diff: https://reviews.apache.org/r/71995/diff/4/

Changes: https://reviews.apache.org/r/71995/diff/3-4/


Testing
---

with the following query:


use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;

select  i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, store, item
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_store_sk = s_store_sk and
   ss_cdemo_sk = cd_demo_sk
 group by rollup (i_item_id, s_state)
 order by i_item_id
 ,s_state
 limit 5;


Results:
  enabled:   5 rows selected (715.26 seconds)
  enabled:   5 rows selected (605.888 seconds)
  disabled:  5 rows selected (1208.168 seconds)
  disabled:  5 rows selected (1219.482 seconds)


Thanks,

Attila Magyar



Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue

2020-01-22 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
---

(Updated Jan. 22, 2020, 9:44 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.


Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726


Repository: hive-git


Description
---

The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b79515fcf07 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
5faa038c18d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 0786c82b7be 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 8cb48473785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
ce6efa49192 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 


Diff: https://reviews.apache.org/r/71995/diff/3/

Changes: https://reviews.apache.org/r/71995/diff/2-3/


Testing
---

with the following query:


use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;

select  i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, store, item
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_store_sk = s_store_sk and
   ss_cdemo_sk = cd_demo_sk
 group by rollup (i_item_id, s_state)
 order by i_item_id
 ,s_state
 limit 5;


Results:
  enabled:   5 rows selected (715.26 seconds)
  enabled:   5 rows selected (605.888 seconds)
  disabled:  5 rows selected (1208.168 seconds)
  disabled:  5 rows selected (1219.482 seconds)


Thanks,

Attila Magyar



Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue

2020-01-22 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
---

(Updated Jan. 22, 2020, 12:09 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.


Changes
---

added counter + small comparator optimization


Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726


Repository: hive-git


Description
---

The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b79515fcf07 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
5faa038c18d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 0786c82b7be 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 8cb48473785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
ce6efa49192 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 


Diff: https://reviews.apache.org/r/71995/diff/2/

Changes: https://reviews.apache.org/r/71995/diff/1-2/


Testing
---

with the following query:


use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;

select  i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, store, item
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_store_sk = s_store_sk and
   ss_cdemo_sk = cd_demo_sk
 group by rollup (i_item_id, s_state)
 order by i_item_id
 ,s_state
 limit 5;


Results:
  enabled:   5 rows selected (715.26 seconds)
  enabled:   5 rows selected (605.888 seconds)
  disabled:  5 rows selected (1208.168 seconds)
  disabled:  5 rows selected (1219.482 seconds)


Thanks,

Attila Magyar



Review Request 71995: TopN Key optimizer should use array instead of priority queue

2020-01-14 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
---

Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.


Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726


Repository: hive-git


Description
---

The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e7724f9084f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
5faa038c18d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
ce6efa49192 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 


Diff: https://reviews.apache.org/r/71995/diff/1/


Testing
---

with the following query:


use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;

select  i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, store, item
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_store_sk = s_store_sk and
   ss_cdemo_sk = cd_demo_sk
 group by rollup (i_item_id, s_state)
 order by i_item_id
 ,s_state
 limit 5;


Results:
  enabled:   5 rows selected (715.26 seconds)
  enabled:   5 rows selected (605.888 seconds)
  disabled:  5 rows selected (1208.168 seconds)
  disabled:  5 rows selected (1219.482 seconds)


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22726) TopN Key optimizer should use array instead of priority queue

2020-01-14 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22726:


 Summary: TopN Key optimizer should use array instead of priority 
queue
 Key: HIVE-22726
 URL: https://issues.apache.org/jira/browse/HIVE-22726
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22647) enable session pool by default

2019-12-13 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22647:


 Summary: enable session pool by default
 Key: HIVE-22647
 URL: https://issues.apache.org/jira/browse/HIVE-22647
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


Non pooled session my leak when the client doesn't close the connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name

2019-12-12 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71871/
---

(Updated Dec. 12, 2019, 12:22 p.m.)


Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.


Changes
---

warning log if there is no suffix after worker-


Bugs: HIVE-22577
https://issues.apache.org/jira/browse/HIVE-22577


Repository: hive-git


Description
---

The sequence number from the worker node name might be missing under some 
circumstances (the root cause is not fully clear it might be a zookeeper bug).

In this case the following exception occurs:

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out 
of range: -1 at java.lang.String.substring(String.java:1931) at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781)
 at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65)
 at


Diffs (updated)
-

  llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
5751b8ed939 


Diff: https://reviews.apache.org/r/71871/diff/3/

Changes: https://reviews.apache.org/r/71871/diff/2-3/


Testing
---

qtest


Thanks,

Attila Magyar



Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name

2019-12-05 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71871/
---

(Updated Dec. 5, 2019, 6:03 p.m.)


Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.


Changes
---

there is a 2nd bug in the original code an off by one error when using the 
substring.


Bugs: HIVE-22577
https://issues.apache.org/jira/browse/HIVE-22577


Repository: hive-git


Description
---

The sequence number from the worker node name might be missing under some 
circumstances (the root cause is not fully clear it might be a zookeeper bug).

In this case the following exception occurs:

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out 
of range: -1 at java.lang.String.substring(String.java:1931) at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781)
 at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65)
 at


Diffs (updated)
-

  llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
5751b8ed939 


Diff: https://reviews.apache.org/r/71871/diff/2/

Changes: https://reviews.apache.org/r/71871/diff/1-2/


Testing
---

qtest


Thanks,

Attila Magyar



Re: Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name

2019-12-05 Thread Attila Magyar


> On Dec. 5, 2019, 4:43 p.m., Panos Garefalakis wrote:
> > llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java
> > Lines 478 (patched)
> > <https://reviews.apache.org/r/71871/diff/1/?file=2181935#file2181935line478>
> >
> > Hey Attila,
> > 
> > With Java's short circuiting the the left expression in the && 
> > operarator will always be evaluated which could also throw the error you 
> > are trying to avoid -- to safeguard this operation you would place the 
> > **nodeName.length() > workerNodePrefix.length()** check on the left part of 
> > the expression.

Hey Panos,

That's true, but the error is not originated from the startsWith() but from a 
subString() expression later on. The startsWith() method doesn't throw any 
expressions it won't fail regardless the length of nodeName or workerNodePrefix.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71871/#review218941
-------


On Dec. 4, 2019, 11:05 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71871/
> ---
> 
> (Updated Dec. 4, 2019, 11:05 a.m.)
> 
> 
> Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22577
> https://issues.apache.org/jira/browse/HIVE-22577
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The sequence number from the worker node name might be missing under some 
> circumstances (the root cause is not fully clear it might be a zookeeper bug).
> 
> In this case the following exception occurs:
> 
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 at java.lang.String.substring(String.java:1931) at 
> org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781)
>  at 
> org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507)
>  at 
> org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65)
>  at
> 
> 
> Diffs
> -
> 
>   
> llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
> 5751b8ed939 
> 
> 
> Diff: https://reviews.apache.org/r/71871/diff/1/
> 
> 
> Testing
> ---
> 
> qtest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Review Request 71871: StringIndexOutOfBoundsException when getting sessionId from worker node name

2019-12-04 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71871/
---

Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.


Bugs: HIVE-22577
https://issues.apache.org/jira/browse/HIVE-22577


Repository: hive-git


Description
---

The sequence number from the worker node name might be missing under some 
circumstances (the root cause is not fully clear it might be a zookeeper bug).

In this case the following exception occurs:

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out 
of range: -1 at java.lang.String.substring(String.java:1931) at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781)
 at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65)
 at


Diffs
-

  llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
5751b8ed939 


Diff: https://reviews.apache.org/r/71871/diff/1/


Testing
---

qtest


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22577) StringIndexOutOfBoundsException when getting sessionId from worker node name

2019-12-04 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22577:


 Summary: StringIndexOutOfBoundsException when getting sessionId 
from worker node name
 Key: HIVE-22577
 URL: https://issues.apache.org/jira/browse/HIVE-22577
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


When the node name is "worker-" the following exception is thrown

 
{code:java}
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1Caused by: java.lang.StringIndexOutOfBoundsException: String index out 
of range: -1 at java.lang.String.substring(String.java:1931) at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.extractSeqNum(ZkRegistryBase.java:781)
 at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.populateCache(ZkRegistryBase.java:507)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.access$000(LlapZookeeperRegistryImpl.java:65)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl$DynamicServiceInstanceSet.(LlapZookeeperRegistryImpl.java:313)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:462)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getApplicationId(LlapZookeeperRegistryImpl.java:469)
 at 
org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getApplicationId(LlapRegistryService.java:212)
 at 
org.apache.hadoop.hive.ql.exec.tez.Utils.getCustomSplitLocationProvider(Utils.java:77)
 at 
org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:53)
 at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:140)
  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71845: ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing

2019-11-29 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71845/
---

Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.


Bugs: HIVE-22502
https://issues.apache.org/jira/browse/HIVE-22502


Repository: hive-git


Description
---

A ConcurrentModificationException was thrown from the main loop of 
TriggerValidatorRunnable.


The ConcurrentModificationException happened because an other thread (from 
TezSessionPoolManager) updated the sessions list while the 
TriggerValidatorRunnable was iterating over it.

The sessions list is updated by TezSessionPoolManager when opening or closing a 
session. These operations are synchronized but the iteration in 
TriggerValidatorRunnable is not.

The TriggerValidatorRunnable is executed frequently (it is scheduled at a 500ms 
rate by default) therefore I was reluctant put the whole iteration into a 
synchronized block. Opening and closing a session happens not so often so I 
decided to make a copy of the sessions list before passing it to the 
TriggerValidatorRunnable. Let me know if you think otherwise.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
7c0a1fe120b 


Diff: https://reviews.apache.org/r/71845/diff/1/


Testing
---

qtest


Thanks,

Attila Magyar



Review Request 71801: The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71801/
---

Review request for hive, Laszlo Bodor, Panos Garefalakis, and Slim Bouguerra.


Bugs: HIVE-22523
https://issues.apache.org/jira/browse/HIVE-22523


Repository: hive-git


Description
---

In setError() we set the value of an atomic reference (pendingError) and we 
also put the error in a queue. The latter seems not just unnecessary but it 
might block the caller of the handler if the queue is full. Also closing the 
reader might not properly handled as some of the flags are not volatile.


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java
 77966aa9650 


Diff: https://reviews.apache.org/r/71801/diff/1/


Testing
---

q tests


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22523:


 Summary: The error handler in LlapRecordReader might block if its 
queue is full
 Key: HIVE-22523
 URL: https://issues.apache.org/jira/browse/HIVE-22523
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


In setError() we set the value of an atomic reference (pendingError) and we 
also put the error in a queue. The latter seems not just unnecessary but it 
might block the caller of the handler if the queue is full. Also closing of the 
reader is might not properly handled as some of the flags are not volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71784: HiveProtoLoggingHook might consume lots of memory

2019-11-21 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/
---

(Updated Nov. 21, 2019, 9:40 a.m.)


Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
Panos Garefalakis.


Changes
---

test might be flaky, ignore it until we find a better solution


Bugs: HIVE-22514
https://issues.apache.org/jira/browse/HIVE-22514


Repository: hive-git


Description
---

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

Since ScheduledThreadPoolExecutor does not support changing the default 
unbounded queue to a bounded one, the queue capacity is checked manually by the 
patch.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
8eab54859bf 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
450a0b544d6 


Diff: https://reviews.apache.org/r/71784/diff/2/

Changes: https://reviews.apache.org/r/71784/diff/1-2/


Testing
---

unittest


Thanks,

Attila Magyar



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-20 Thread Attila Magyar


> On Nov. 20, 2019, 1:58 a.m., Harish Jaiprakash wrote:
> > Thanks for the change. This does solve the memory problem and it looks good 
> > for me.
> > 
> > We need a follow up JIRA to address why the queue size was 17,000 events. 
> > Was this hdfs or s3fs? In either case we should have some more 
> > optimizations like:
> > * if there are lot of events, batch the flush to hdfs.
> > * if its one event per file mode, increase parallelism since writes are not 
> > happening in different files.
> > 
> > FYI, the events are lost since it is not written to the hdfs file and DAS 
> > will not get these events. But that is better than crashing hiveserver2.

Thanks for the review. The hive.hook.proto.base-directory points to an s3a path.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218709
-------


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-20 Thread Attila Magyar


> On Nov. 20, 2019, 2:02 a.m., Harish Jaiprakash wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java
> > Lines 176 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173908#file2173908line176>
> >
> > We expect the dequeue to have not happened by this time. There is no 
> > guarantee, since its another thread. Can we atleast add a comment that this 
> > test can fail intermittently?

I guess this effects the existing tests as well, right? However I don't 
remember seeing any of those faling. Maybe because we're calling the shutdown() 
on the evtLogger. According to its javadoc it waits for already submitted task 
to complete.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218710
-------


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 10:38 p.m., Slim Bouguerra wrote:
> > I recommend using a bounded queue instead of checking the size and soing 
> > the if else everytime  
> > something like this might work
> > ```java
> >  BlockingQueue linkedBlockingDeque = new 
> > LinkedBlockingDeque(
> > 1);
> > ExecutorService service = new ThreadPoolExecutor(1, 1, 30, 
> > TimeUnit.SECONDS, linkedBlockingDeque,
> >  new 
> > ThreadPoolExecutor.DiscardPolicy());
> > 
> > ```

We also have a periodic event which is scheduled with a fix delay by the 
ScheduledThreadPoolExecutor. The ThreadPoolExecutor can't do this scheduling, 
this would require either using 2 executors (a ThreadPoolExecutor and a 
ScheduledThreadPoolExecutor) or 1 executor plus a Timer, plus some 
synchronization. The queue size check looked like the most simple solution I 
could find.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218697
-------


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 10:24 p.m., Panos Garefalakis wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 192 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line193>
> >
> > The solution makes sense to me, however maybe we need to investigate 
> > further why the queueCapacity change (which is similar to what you are 
> > proposing) was reverted in HIVE-20746?
> > 
> > Also this logwriter is used to track all protobuf messages right? Is it 
> > acceptable to drop messages here?
> 
> Panos Garefalakis wrote:
> Update: Seems like HIVE-20746 only makes sure that logFile is closed at 
> the end of the day (even when no events are triggered) -- so the remaining 
> question is if its acceptible to start dropping messages here (because even 
> if we drop the messages the events are still going to happen)

@harishjp is already added to the review but he is on vacation so I don't know 
if'll respond or not. But my impression is that removing the capacity limit was 
not the goal but a side effect when the ScheduledThreadPoolExecutor was added. 
Before HIVE-20746 we were dropping events. HIVE-20746 was about making sure 
that the file is closed not about making sure that there are no drops. As far 
as I understand these events are for external systems like DAS. The event is 
still visible in the log so they're not lost. But @harishjp will hopefully 
confirm or refute this.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218694
-------


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 274 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line275>
> >
> > looking at the java code i see that it is using a bounded queue so not 
> > sure what you mean by unbounded ?
> > can you please clarify ?
> 
> Attila Magyar wrote:
> No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and 
> it cannot be changed.
> 
> Slim Bouguerra wrote:
> ```java
>  /**
>  * Specialized delay queue. To mesh with TPE declarations, this
>  * class must be declared as a BlockingQueue even though
>  * it can only hold RunnableScheduledFutures.
>  */
> static class DelayedWorkQueue extends AbstractQueue
> implements BlockingQueue {
> 
> ```
> but it is blocking means that it should block and therefore we get the 
> RejectedException.

Yes but a BlockingQueue can be created without a size limit. It will still 
block if I want to take out from an empty queue but it will never be full. So 
it'll never block when adding an element. 

/ The implementation might use Integer.MAX_VAL as a size limit which is 
practically almost the same (it'll likely never block or it'll block too late). 
/


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:58 p.m., Slim Bouguerra wrote:
> > the description is kind of confusing, is this a leak or we have spike of 
> > overload ?
> > Leak means we are not cleaning the resources thus that is why we have an 
> > OOM.
> > What you are describing seems to be a system overload that cause a memory 
> > spike.

Yeat, that's right, callig it overload might be better than a leak, i'll modify 
the description.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218680
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Line 217 (original), 219 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line220>
> >
> > how this can solve the issue ?
> > Seems like it is doing the same thing 
> > 
> > `Executors.newSingleThreadScheduledExecutor`
> > is the same as what you are doing
> > ```
> > public static ScheduledExecutorService 
> > newSingleThreadScheduledExecutor(ThreadFactory threadFactory) {
> > return new DelegatedScheduledExecutorService
> > (new ScheduledThreadPoolExecutor(1, threadFactory));
> > }
> > ```

This not the fix. This change is only needed to get back the proper type so 
that I can invoke the getQueue() method. I can't do that on 
ScheduledExecutorService which is interface that is returned by 
Executors.newSingleThreadScheduledExecutor. I need the concreate class 
(ScheduledThreadPoolExecutor).


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 274 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line275>
> >
> > looking at the java code i see that it is using a bounded queue so not 
> > sure what you mean by unbounded ?
> > can you please clarify ?

No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and it 
cannot be changed.


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 279 (patched)
> > <https://reviews.apache.org/r/71784/diff/1/?file=2173907#file2173907line280>
> >
> > i am still not sure how this is going to work?
> > the original code was dropping events when the queue is full that is 
> > the case where you see the `RejectedExecutionException`

RejectedExecutionException was never thrown with the original code because of 
the unbounded queue. The queue continued to be larger. In the heap dump there 
17000 elements in the queue totally and it takes about 2.5g space.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/
---

Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
Panos Garefalakis.


Bugs: HIVE-22514
https://issues.apache.org/jira/browse/HIVE-22514


Repository: hive-git


Description
---

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

Since ScheduledThreadPoolExecutor does not support changing the default 
unbounded queue to a bounded one, the queue capacity is checked manually by the 
patch.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
8eab54859bf 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
450a0b544d6 


Diff: https://reviews.apache.org/r/71784/diff/1/


Testing
---

unittest


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22514) HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22514:


 Summary: HiveProtoLoggingHook might leak memory
 Key: HIVE-22514
 URL: https://issues.apache.org/jira/browse/HIVE-22514
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: Screen Shot 2019-11-18 at 2.19.24 PM.png

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

!Screen Shot 2019-11-18 at 2.19.24 PM.png|width=650,height=101!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22502) ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing

2019-11-15 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22502:


 Summary: ConcurrentModificationException in 
TriggerValidatorRunnable stops trigger processing
 Key: HIVE-22502
 URL: https://issues.apache.org/jira/browse/HIVE-22502
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71707: Performance degradation on single row inserts

2019-11-07 Thread Attila Magyar


> On Nov. 5, 2019, 11:59 p.m., Ashutosh Chauhan wrote:
> > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
> > Line 331 (original), 324 (patched)
> > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line331>
> >
> > you may use BlobStorageUtils::isBlobStorageFileSystem() here.

isBlobStorageFileSystem matches to s3,s3a,s3n, but only S3AFileSystem 
(https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861)
 has an optimized listFiles() implementation.

NativeS3FileSystem 
(https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java)
 uses the same tree travesing algorithm from the base class.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218518
-------


On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> ---
> 
> (Updated Nov. 7, 2019, 9:23 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
> https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 
> 09343e56166 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/3/
> 
> 
> Testing
> ---
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71707: Performance degradation on single row inserts

2019-11-07 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218524
---




standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
Line 331 (original), 324 (patched)
<https://reviews.apache.org/r/71707/#comment306265>

BlobStorageUtils::isBlobStorageFileSystem() checks if the scheme is either 
"s3","s3n" or "s3a". But only S3AFileSystem has the optimized listFiles(). In 
NativeS3FileSystem does not override the tree walking algorithm from the base 
class.

See: 
https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861

and:


https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java


- Attila Magyar


On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> ---
> 
> (Updated Nov. 7, 2019, 9:23 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
> https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 
> 09343e56166 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/3/
> 
> 
> Testing
> ---
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71707: Performance degradation on single row inserts

2019-11-07 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
---

(Updated Nov. 7, 2019, 9:23 a.m.)


Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Changes
---

adressing review comments


Bugs: HIVE-22411
https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
---

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 
  common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 09343e56166 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 bf206fffc26 


Diff: https://reviews.apache.org/r/71707/diff/3/

Changes: https://reviews.apache.org/r/71707/diff/2-3/


Testing
---

measured and plotted insertation time


Thanks,

Attila Magyar



Re: Review Request 71707: Performance degradation on single row inserts

2019-11-05 Thread Attila Magyar


> On Nov. 5, 2019, 4:33 p.m., Panos Garefalakis wrote:
> > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line335>
> >
> > Hey Attila, the solution looks good however, as other fileSystems might 
> > face similar issues in the future using this recursive method (i.e. Azure 
> > Blob storage)  wouldn't it make sense to have hdfs a the base case and 
> > others separately? and maybe throw a warn message here when the filesystem 
> > is not supported?

Hey Panos, I checked the hadoop project and I found only one FS implementation 
with optimized recursive listFiles(), other implementations use the tree 
walking impl. from the base class. I think that's the more common case. Do you 
know where is the source of this Azure Blob storage? Is that one open source at 
all?


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218505
-----------


On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> ---
> 
> (Updated Nov. 5, 2019, 3:32 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
> https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/2/
> 
> 
> Testing
> ---
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71707: Performance degradation on single row inserts

2019-11-05 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
---

(Updated Nov. 5, 2019, 3:32 p.m.)


Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Changes
---

Adressing Ashutosh's comments


Bugs: HIVE-22411
https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
---

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs (updated)
-

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 bf206fffc26 


Diff: https://reviews.apache.org/r/71707/diff/2/

Changes: https://reviews.apache.org/r/71707/diff/1-2/


Testing
---

measured and plotted insertation time


Thanks,

Attila Magyar



Review Request 71707: Performance degradation on single row inserts

2019-10-31 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
---

Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Bugs: HIVE-22411
https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
---

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs
-

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 155ecb18bf5 


Diff: https://reviews.apache.org/r/71707/diff/1/


Testing
---

measured and plotted insertation time


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22411) Performance degradation on single row inserts

2019-10-28 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22411:


 Summary: Performance degradation on single row inserts
 Key: HIVE-22411
 URL: https://issues.apache.org/jira/browse/HIVE-22411
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. For this it traverses the directory recursively. 
During the recursion for each path a separate listStatus call is executed. In 
the end the more delta directory you have the more time it takes to calculate 
the statistics.

Therefore insertion time goes up linearly:

!Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436!

The fix is to use fs.listFiles(path, /*recursive*/ true) instead the 
handcrafter recursive method/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71555: Incompatible java.util.ArrayList for java 11

2019-09-30 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71555/
---

Review request for hive, Laszlo Bodor, Ashutosh Chauhan, and Prasanth_J.


Bugs: HIVE-22097
https://issues.apache.org/jira/browse/HIVE-22097


Repository: hive-git


Description
---

The following exceptions come when running a query on Java 11:

java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
at 
org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NoSuchFieldException: parentOffset
at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384)
... 29 more

The internal structure of ArrayList$SubList changed and our serializer fails. 
This serialzier comes from kryo-serializers package where they already updated 
the code. This patch does the some.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
e4d33e82168 


Diff: https://reviews.apache.org/r/71555/diff/1/


Testing
---

Tested on a real cluster with Java 11.


Thanks,

Attila Magyar



Review Request 71456: select count gives incorrect result after loading data from text file

2019-09-09 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71456/
---

Review request for hive, Ashutosh Chauhan, Jesús Camacho Rodríguez, and Slim 
Bouguerra.


Bugs: HIVE-22055
https://issues.apache.org/jira/browse/HIVE-22055


Repository: hive-git


Description
---

This happens when tez.grouping.min-size is set to a small value (for example 1) 
so that the split size that is calculated from the file size is going to be 
used. This changes as the table grows and different split sizes will be used 
while doing each selects.

load 90 records from f1
select count(1) gives back 90
load 90 records from f2
select count(1) gives back 172 // 8 records missing


When running the second select the split size is larger, and 
SerDeLowLevelCacheImpl is already populated with stripes from the first select 
(and by that tiem split size was smaller).


There is problem with how LineRecordReader works togeather with the cache. So 
if a larger split is requested and an overlapping smaller one is already in the 
cache, then SerDeEncodedDataReader'll try to extend the existing split by 
reading the 
difference between the large and the small split. But it'll start reading right 
after the last stripe pyhsically ends,
and LineRecordReader always skips the first row, unless we are at the beginning 
of the file. So this line skipping behaviour is not considered at one point and 
that's why some rows are missing.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 98280c52fe9 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java
 462b25fa234 
  ql/src/test/queries/clientpositive/mm_loaddata_split_change.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/mm_loaddata_split_change.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/71456/diff/1/


Testing
---

with q test


Thanks,

Attila Magyar



Review Request 71262: Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector

2019-08-09 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71262/
---

Review request for hive, Ashutosh Chauhan, Gopal V, and Jesús Camacho Rodríguez.


Bugs: HIVE-22094
https://issues.apache.org/jira/browse/HIVE-22094


Repository: hive-git


Description
---

ClassNotFoundException when running join on decimal column:

Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector

at 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal64ToDecimal.aggregateInput(VectorUDAFSumDecimal64ToDecimal.java:320)

at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:217)


Diffs
-

  data/files/employee_closure/employee_closure.tsv PRE-CREATION 
  data/files/salary/salary.tsv PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 84c20426763 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 573368829e5 
  ql/src/test/queries/clientpositive/vector_decimal_mapjoin2.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/vector_decimal_mapjoin2.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/71262/diff/1/


Testing
---

qtest


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22094) Mondrian queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector

2019-08-09 Thread Attila Magyar (JIRA)
Attila Magyar created HIVE-22094:


 Summary: Mondrian queries failing with ClassCastException: 
hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
hive.ql.exec.vector.Decimal64ColumnVector
 Key: HIVE-22094
 URL: https://issues.apache.org/jira/browse/HIVE-22094
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


When running a query like this

select sum(salary.salary_paid) from salary, employee_closure where 
salary.employee_id = employee_closure.employee_id;

with hive.auto.convert.join=true and hive.vectorized.execution.enabled=true the 
following exception occurs
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector

at 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal64ToDecimal.aggregateInput(VectorUDAFSumDecimal64ToDecimal.java:320)

at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:217)

at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.doProcessBatch(VectorGroupByOperator.java:414)

at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:182)

at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1124)

at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)

at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardOverflow(VectorMapJoinGenerateResultOperator.java:706)

at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultMultiValue(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:268)

at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnly(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:180)

at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.processBatch(VectorMapJoinInnerBigOnlyLongOperator.java:379)

... 28 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22055) select count gives incorrect result after loading data from text file

2019-07-26 Thread Attila Magyar (JIRA)
Attila Magyar created HIVE-22055:


 Summary: select count gives incorrect result after loading data 
from text file
 Key: HIVE-22055
 URL: https://issues.apache.org/jira/browse/HIVE-22055
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar


Load data 3 times (both kv1.txt and kv2.txt contains 500 records)
{code:java}
create table load0_mm (key string, value string) stored as textfile 
tblproperties("transactional"="true", "transactional_properties"="insert_only");
load data local inpath '../../data/files/kv1.txt' into table load0_mm;
select count(1) from load0_mm;
load data local inpath '../../data/files/kv2.txt' into table load0_mm;
select count(1) from load0_mm;
load data local inpath '../../data/files/kv2.txt' into table load0_mm;
select count(1) from load0_mm;{code}
Expected output


{code:java}
PREHOOK: query: load data local inpath '../../data/files/kv2.txt' into table 
load0_mm
PREHOOK: type: LOAD
 A masked pattern was here 
PREHOOK: Output: default@load0_mm
POSTHOOK: query: load data local inpath '../../data/files/kv2.txt' into table 
load0_mm
POSTHOOK: type: LOAD
 A masked pattern was here 
POSTHOOK: Output: default@load0_mm
PREHOOK: query: select count(1) from load0_mm
PREHOOK: type: QUERY
PREHOOK: Input: default@load0_mm
 A masked pattern was here 
POSTHOOK: query: select count(1) from load0_mm
POSTHOOK: type: QUERY
POSTHOOK: Input: default@load0_mm
 A masked pattern was here 
1500{code}
Got:

[ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:59 Client Execution 
succeeded but contained differences (error code = 1) after executing 
mm_loaddata.q 

63c63

< 1480

---

> 1500

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Review Request 71156: Tez: Use a pre-parsed TezConfiguration from DagUtils

2019-07-25 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71156/
---

(Updated July 25, 2019, 8:05 a.m.)


Review request for hive, Laszlo Bodor, Gopal V, and Jesús Camacho Rodríguez.


Bugs: HIVE-21828
https://issues.apache.org/jira/browse/HIVE-21828


Repository: hive-git


Description
---

The HS2 tez-site.xml does not change dynamically - the XML parsed components of 
the config can be obtained statically and kept across sessions.

This allows for the replacing of "new TezConfiguration()" with a HS2 local 
version instead.

The configuration object however has to reference the right resource file (i.e 
location of tez-site.xml) without reparsing it for each query.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b7468a1ab7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 3278dfea061 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java dd7ccd4764d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRegExp.java 
3bf3cfd3d9e 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java befeb4f2dd4 
  ql/src/test/org/apache/hive/testutils/HiveTestEnvSetup.java f872da02a3c 
  ql/src/test/queries/clientpositive/mm_loaddata.q 7e5787f2a65 
  ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
eaed60c1ba7 


Diff: https://reviews.apache.org/r/71156/diff/2/

Changes: https://reviews.apache.org/r/71156/diff/1-2/


Testing
---

unittests


Thanks,

Attila Magyar



Review Request 71156: Tez: Use a pre-parsed TezConfiguration from DagUtils

2019-07-24 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71156/
---

Review request for hive, Laszlo Bodor, Gopal V, and Jesús Camacho Rodríguez.


Bugs: HIVE-21828
https://issues.apache.org/jira/browse/HIVE-21828


Repository: hive-git


Description
---

The HS2 tez-site.xml does not change dynamically - the XML parsed components of 
the config can be obtained statically and kept across sessions.

This allows for the replacing of "new TezConfiguration()" with a HS2 local 
version instead.

The configuration object however has to reference the right resource file (i.e 
location of tez-site.xml) without reparsing it for each query.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 440d761f03d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 3278dfea061 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java dd7ccd4764d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRegExp.java 
3bf3cfd3d9e 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java befeb4f2dd4 
  ql/src/test/org/apache/hive/testutils/HiveTestEnvSetup.java f872da02a3c 
  ql/src/test/queries/clientpositive/mm_loaddata.q 7e5787f2a65 


Diff: https://reviews.apache.org/r/71156/diff/1/


Testing
---

unittests


Thanks,

Attila Magyar



Review Request 70990: Vectorization: Decimal64 division with integer columns

2019-07-02 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70990/
---

Review request for hive, Laszlo Bodor, Gopal V, and prasanthj.


Bugs: HIVE-21437
https://issues.apache.org/jira/browse/HIVE-21437


Repository: hive-git


Description
---

Vectorizer fails for

CREATE temporary TABLE `catalog_Sales`(
  `cs_quantity` int, 
  `cs_wholesale_cost` decimal(7,2), 
  `cs_list_price` decimal(7,2), 
  `cs_sales_price` decimal(7,2), 
  `cs_ext_discount_amt` decimal(7,2), 
  `cs_ext_sales_price` decimal(7,2), 
  `cs_ext_wholesale_cost` decimal(7,2), 
  `cs_ext_list_price` decimal(7,2), 
  `cs_ext_tax` decimal(7,2), 
  `cs_coupon_amt` decimal(7,2), 
  `cs_ext_ship_cost` decimal(7,2), 
  `cs_net_paid` decimal(7,2), 
  `cs_net_paid_inc_tax` decimal(7,2), 
  `cs_net_paid_inc_ship` decimal(7,2), 
  `cs_net_paid_inc_ship_tax` decimal(7,2), 
  `cs_net_profit` decimal(7,2))
 ;

explain vectorization detail select maxcs_ext_list_price - 
cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
catalog_sales;


SELECT operator: Could not instantiate DecimalColDivideDecimalScalar with 
arguments arguments: [21, 20, 22], argument classes: [Integer, Integer, 
Integer], exception: java.lang.IllegalArgumentException


Diffs
-

  ql/src/gen/vectorization/ExpressionTemplates/ColumnDivideScalarDecimal.txt 
0bd7c004215 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java
 0a16e08d61e 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
52e8dcb0904 
  ql/src/test/queries/clientpositive/vector_decimal_col_scalar_division.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/perf/spark/query4.q.out a7e317cc3c9 
  ql/src/test/results/clientpositive/perf/tez/constraints/query4.q.out 
293b2816a13 
  ql/src/test/results/clientpositive/perf/tez/query4.q.out 47515eda2f8 
  ql/src/test/results/clientpositive/vector_decimal_col_scalar_division.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/70990/diff/1/


Testing
---

new q test: vector_decimal_col_scalar_division.q

Test Result
16,752 tests 0 failures (-2) , 379 skipped (±0)


Thanks,

Attila Magyar



[jira] [Created] (HIVE-15585) LLAP failed to start on a host with only 1 cp

2017-01-11 Thread Attila Magyar (JIRA)
Attila Magyar created HIVE-15585:


 Summary: LLAP failed to start on a host with only 1 cp
 Key: HIVE-15585
 URL: https://issues.apache.org/jira/browse/HIVE-15585
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.1.1
Reporter: Attila Magyar
Assignee: Attila Magyar


LLAP failed to start on a host with only 1 cpu. The number of thread was 
calculated by dividing the number of cpus with 2. This resulted zero if the cpu 
count was 1 and caused an IllegalArgumentException upon startup. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)