from:"Panagiotis Garefalakis \(Jira\)"

[jira] [Created] (HIVE-25599) Addendum of HIVE-25570 Hive should send full URL path for authorization for the command insert overwrite location

2021-10-07 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25599:
-

 Summary: Addendum of HIVE-25570 Hive should send full URL path for 
authorization for the command insert overwrite location
 Key: HIVE-25599
 URL: https://issues.apache.org/jira/browse/HIVE-25599
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String

2021-09-20 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25541:
-

 Summary: JsonSerDe: TBLPROPERTY treating nested json as String
 Key: HIVE-25541
 URL: https://issues.apache.org/jira/browse/HIVE-25541
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not 
support loading nested json into a string type directly. It requires the 
declaring the column as complex type (struct, map, array) to unpack nested json 
data.

Even though the data field is not a valid JSON String type there is value 
treating it as plain String instead of throwing an exception as we currently do.

{code:java}
create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

{"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}
{code}

This JIRA introduces an extra Table property allowing to Stringify Complex JSON 
values instead of forcing the User to define the complete nested structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25398) Converted external tables should be able to configure purge behaviour

2021-07-28 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25398:
-

 Summary: Converted external tables should be able to configure 
purge behaviour
 Key: HIVE-25398
 URL: https://issues.apache.org/jira/browse/HIVE-25398
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Creating non-ACID MANAGED tables is not allowed on Hive, which is instead 
converting these tables to External: 
https://issues.apache.org/jira/browse/HIVE-22158
During table translation  both TRANSLATED_TO_EXTERNAL and 
'external.table.purge' are set to True. However, there could be the case that 
the second parameter is already set in the table properties by the User. This 
is ticket is adding an extra check to maintain that property if set.

PS: A cleaner solution would be to create these Tables as External directly but 
there could be the case the User is taking advantage of the translation and is 
expecting the data NOT to be purged!

Example:
{code:java}
-- Non-ACID table will be translated to EXTERNAL
create table c(c int) LOCATION 'etp_1' 
TBLPROPERTIES('transactional'='false','external.table.purge'='false');
insert into c values(1);

-- Maintain the purge=false property set above
desc formatted c;
select count(*) from c;
drop table c;

-- Create table in same location, data should still be there
create table c(c int) LOCATION 'etp_1' 
TBLPROPERTIES('transactional'='false','external.table.purge'='false');
desc formatted c;
select count(*) from c;
{code}








--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25362) LLAP: ensure tasks with locality are added to DelayQueue

2021-07-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25362:
-

 Summary: LLAP: ensure tasks with locality are added to DelayQueue
 Key: HIVE-25362
 URL: https://issues.apache.org/jira/browse/HIVE-25362
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


HIVE-24914 introduced a short-circuit optimization when all nodes are busy 
returning DELAYED_RESOURCES and reseting locality delay for a given tasks.

However, this may prevent tasks from being added to the DelayQueue leading to 
worse locality when all LLap resources are fully utilized.
To address the issue we should handle the two cases separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25155) Bump ORC to 1.6.8

2021-05-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25155:
-

 Summary: Bump ORC to 1.6.8
 Key: HIVE-25155
 URL: https://issues.apache.org/jira/browse/HIVE-25155
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


 https://orc.apache.org/news/2021/05/21/ORC-1.6.8/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25149) Support parallel load for Optimized HT implementations

2021-05-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25149:
-

 Summary: Support parallel load for Optimized HT implementations
 Key: HIVE-25149
 URL: https://issues.apache.org/jira/browse/HIVE-25149
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25148) Support parallel load for Fast HT implementation

2021-05-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25148:
-

 Summary: Support parallel load for Fast HT implementation
 Key: HIVE-25148
 URL: https://issues.apache.org/jira/browse/HIVE-25148
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25146) JMH tests for Multi HT and parallel load

2021-05-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25146:
-

 Summary: JMH tests for Multi HT and parallel load
 Key: HIVE-25146
 URL: https://issues.apache.org/jira/browse/HIVE-25146
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


As the title suggests, add some benchmarks for Parallel HT construction feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25145) Improve Multi-HashTable EstimatedMemorySize

2021-05-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25145:
-

 Summary: Improve Multi-HashTable EstimatedMemorySize
 Key: HIVE-25145
 URL: https://issues.apache.org/jira/browse/HIVE-25145
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


When Multi HashTable is used for parallel HT loading, we calculate the 
estimatedMemorySize as the sum of all HTs.
However, each of those HTs already adds some constants to memory estimation 
e.g., adding 16KB constant memory for keyBinarySortableDeserializeRead

This ticket aims to improve the memory estimation for Multi HT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25117) Vector PTF ClassCastException with Decimal64

2021-05-14 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25117:
-

 Summary: Vector PTF ClassCastException with Decimal64
 Key: HIVE-25117
 URL: https://issues.apache.org/jira/browse/HIVE-25117
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis


Only reproduces when there is at least 1 buffered batch, so needed 2 rows with 
1 row/batch:

{code:java}
set hive.vectorized.testing.reducer.batch.size=1;
{code}

{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.copyNonSelectedColumnVector(VectorizedBatchUtil.java:664)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.forwardBufferedBatches(VectorPTFGroupBatches.java:228)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.fillGroupResultsAndForward(VectorPTFGroupBatches.java:318)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:403)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25103) Update row.serde excludes defaults

2021-05-11 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25103:
-

 Summary: Update row.serde excludes defaults
 Key: HIVE-25103
 URL: https://issues.apache.org/jira/browse/HIVE-25103
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


HIVE-16222 introduced row.serde.inputformat.excludes setting to disable 
row.serde for specific NON-Vectorized formats.
Since MapredParquetInputFormat is currently natively vectorized it should be 
removed from that list.

Even when hive.vectorized.use.vectorized.input.format is DISABLED
Vectorizer will not vectorize in row deserialize mode if the input format has 
is natively Vectorized so it is safe to remove.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25083) Extra reviewer pattern

2021-04-30 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25083:
-

 Summary: Extra reviewer pattern
 Key: HIVE-25083
 URL: https://issues.apache.org/jira/browse/HIVE-25083
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25082) Make SettableTreeReader updateTimezone a default method

2021-04-30 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25082:
-

 Summary: Make SettableTreeReader updateTimezone a default method
 Key: HIVE-25082
 URL: https://issues.apache.org/jira/browse/HIVE-25082
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Avoid useless TimestampStreamReader instance checks by making updateTimezone() 
a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25049) LlapDaemon preemption should not be triggered for same Vertex tasks

2021-04-22 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-25049:
-

 Summary: LlapDaemon preemption should not be triggered for same 
Vertex tasks
 Key: HIVE-25049
 URL: https://issues.apache.org/jira/browse/HIVE-25049
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Due to the asynchronous nature of QueryInfo$FinishableState notification, we 
usually end up receiving finishable state updates across tasks/queryfragments 
with some time difference.

Imagine vertex Map8 with dependency on Map7.
If Map8 vertex is already running with some tasks still pending, we can end-up 
in a situation where on Map7 completion, some of the pending Map8 tasks are 
getting the finishable state update BEFORE the already running Map8 tasks, 
ending up preempting tasks for no reason!


{code:java}
2021-04-22T15:30:45.124Z source:Map 7 updated, notifying: [Map 8] 

2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received 
finishable state update for attempt_1619105382691_0001_1_05_14_0, state=true
2021-04-22T15:30:45.125Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to execute 
TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in


 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Task 
TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
firstAttemptStartTime=1619105437749, dagStartTime=1619105422608, 
withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, 
selfAndUpstreamComplete= 17} managed to preempt task 
TaskWrapper{task=attempt_1619105382691_0001_1_05_06_0, Vertex=Map 8, 
inWaitQueue=false, inPreemptionQueue=true, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=false, isGuaranteed=false, 
firstAttemptStartTime=1619105437737, dagStartTime=1619105422608, 
withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, 
selfAndUpstreamComplete= 15}

 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Invoking kill task for 
attempt_1619105382691_0001_1_05_06_0 due to pre-emption to run 
attempt_1619105382691_0001_1_05_14_0
 2021-04-22T15:30:45.126Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received 
finishable state update for attempt_1619105382691_0001_1_05_11_0, state=true
 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Kill task requested for 
id=attempt_1619105382691_0001_1_05_06_0, taskRunnerSetup=true
 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Issuing kill to task 
attempt_1619105382691_0001_1_05_06_0
 2021-04-22T15:30:45.127Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
 2021-04-22T15:30:45.127Z query-executor-0 class="task.TezTaskRunner2" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to abort 
attempt_1619105382691_0001_1_05_06_0 due to an invocation of killTask
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Received abort
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to 
RecordProcessor
 2021-04-22T15:30:45.128Z query-executor-0 
class="runtime.LogicalIOProcessorRuntimeTask" dagId="dag_1619105382691_0001_1" 
fragmentId="1619105382691_0001_1_05_11_0" level="INFO" 
queryId="hive_20210422153013_397b96bf-d5a6-493a-9c51-9446f64eeed4" 
thread="TezTR-382691_1_1_5_11_0"] Waiting for 1 initializers to finish
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.MapRecordProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to mapOp: {} MAP
 2021-04-22T15:30:45.128Z query-executor-0 class="vector.VectorMapOperator" 
level="INFO" thread="Wait-Queue-Scheduler-0"]

[jira] [Created] (HIVE-24914) Improve scheduling by only traversing hosts with capacity

2021-03-19 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24914:
-

 Summary: Improve scheduling by only traversing hosts with capacity
 Key: HIVE-24914
 URL: https://issues.apache.org/jira/browse/HIVE-24914
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


*schedulePendingTasks* on the LlapTaskScheduler currently goes through all the 
pending tasks and tries to allocate them based on their Priority -- if a 
priority can not be scheduled completely, we bail out as lower priorities would 
not be able to get allocations either.

An optimization here could be to only walk through the nodes with capacity (if 
any) ,and not all available hosts, for scheduling these tasks based on their 
priority and locality preferences.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24913) LlapTaskScheduler Improvements

2021-03-19 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24913:
-

 Summary: LlapTaskScheduler Improvements
 Key: HIVE-24913
 URL: https://issues.apache.org/jira/browse/HIVE-24913
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24793) Compiler probe MJ selection algorithm fallback

2021-02-18 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24793:
-

 Summary: Compiler probe MJ selection algorithm fallback
 Key: HIVE-24793
 URL: https://issues.apache.org/jira/browse/HIVE-24793
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


As per discussion on #1286 current probe MJ selection algorithm:

* selects best MJ candidate (currently based on the distinct row ratio)
* does some further processing - which may bail out

Bailing out for the best candidate doesn't necessarily mean that we can not use 
a less charming candidate. The extra compilation can be wrapped into to for 
loop instead of selecting the best candidate the first part could be 
implemented as a priority logic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24780) Consolidate type.checking configuration

2021-02-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24780:
-

 Summary: Consolidate type.checking configuration
 Key: HIVE-24780
 URL: https://issues.apache.org/jira/browse/HIVE-24780
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


Seems like hive.strict.timestamp.conversion, hive.strict.checks.type.safety, 
hive.strict.checks.no.partition.filter etc. are serving similar purpose



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24779) Document type checks and allowed conversions

2021-02-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24779:
-

 Summary: Document type checks and allowed conversions
 Key: HIVE-24779
 URL: https://issues.apache.org/jira/browse/HIVE-24779
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


Currently, with hive.strict.checks.type.safety=true we disallow:
* Comparing bigints and strings/(var)chars
* Comparing bigints and doubles
* Comparing decimals and strings/(var)chars

while there is a TODO to add any remaining checks.
We need to decide how strict we are going to be here(following mySQL checks? 
Posgres?) and then document everything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24777) Standardize Hive's Type conversions

2021-02-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24777:
-

 Summary: Standardize Hive's Type conversions
 Key: HIVE-24777
 URL: https://issues.apache.org/jira/browse/HIVE-24777
 Project: Hive
  Issue Type: New Feature
Reporter: Panagiotis Garefalakis


Currently, Hive does some type safety checks looking for loosy type conversions 
such as: decimal to char, and long to varchar (even though not complete).
These conversion are not documented and they might even differ depending on the 
CBO status. For example:


{code:java}
set hive.cbo.enable=false;
create table comparison_table (col_v varchar(3), col_i int) stored as orc;
insert into comparison_table values ('001', 15), ('002', 25), ('007', 75);

select * from comparison_table where col_v >= 4L;
// Converts to pred '4' to String
// No result when CBO off

set hive.cbo.enable=on;
select * from comparison_table where col_v >= 4L;
// Converts both to Double
// 007 when CBO is on
{code}

Type conversions on the predicates also affect Sargs evaluation as in the first 
case (cbo off) string padding is missing, and in the latter case (cbo on) 
UDFBridge can no be evaluated.

Finally, it seems that there are multiple configuration tracking the same thing:
* hive.strict.timestamp.conversion
* hive.strict.checks.type.safety

This uber Jira is targeting the standardisation/documentation of these type 
checks and their conversions on Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24735) Implement TIMESTAMP WITH LOCAL TIME ZONE integration with ORC

2021-02-04 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24735:
-

 Summary: Implement TIMESTAMP WITH LOCAL TIME ZONE integration with 
ORC
 Key: HIVE-24735
 URL: https://issues.apache.org/jira/browse/HIVE-24735
 Project: Hive
  Issue Type: New Feature
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


TIMESTAMP_INSTANT in ORC is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type 
in Hive. Support to read/write timestamp with local time zone in ORC was added 
as part of ORC-189.

We should implement their 
[integration|https://github.com/apache/hive/pull/1823#discussion_r564077084].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24721) Align CacheWriter bufferSize with Llap max alloc

2021-02-02 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24721:
-

 Summary: Align CacheWriter bufferSize with Llap max alloc
 Key: HIVE-24721
 URL: https://issues.apache.org/jira/browse/HIVE-24721
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


Before bumping to ORC-1.6, LLAP_ALLOCATOR_MAX_ALLOC value was also used as the 
ORC CacheWriter buffer size.

As per ORC-238, the max bufferSize argument can be up to 2^(3*8 - 1) -- i.e., 
less than 8Mb and since we enforce the size to be power of 2 the next available 
is 4Mb.

In HIVE-23553 we decouple the two configuration (LLAP max alloc and CacheWriter 
buffer size) with the first one being 16Mb, and the latter 4Mb -- this ticket 
is to investigate if there is a need for these two conf to convergence.

More details: 
https://github.com/apache/hive/pull/1823#pullrequestreview-575698916



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24703) LLAP move RowGroup reading to StrippePlanner

2021-01-29 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24703:
-

 Summary: LLAP move RowGroup reading to StrippePlanner
 Key: HIVE-24703
 URL: https://issues.apache.org/jira/browse/HIVE-24703
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


LlapDataREader is almost identical to ORC's DataREader, however the main issue 
the readFileData method of the latter takes a DiskRangeList instead of a 
BufferChunkList.

ORC-1.6 uses a separate StripePlanner that reads RowGroups converting them to 
BufferChunks while in LLAP we are doing our own custom planning with DiskRanges.

I am opening this ticket to explore moving LLAP to Stripe planning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24631) Move to ChronoLocalDate and day of epoch instead of java's sql Date

2021-01-13 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24631:
-

 Summary: Move to ChronoLocalDate and day of epoch instead of 
java's sql Date
 Key: HIVE-24631
 URL: https://issues.apache.org/jira/browse/HIVE-24631
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


For Predicate evaluation see https://issues.apache.org/jira/browse/ORC-661



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24539:
-

 Summary: OrcInputFormat schema generation should respect column 
delimiter
 Key: HIVE-24539
 URL: https://issues.apache.org/jira/browse/HIVE-24539
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


OrcInputFormat currently generates schema using the given configuration and the 
default delimiter – that causes inconsistencies when names contain commas.

We should follow a similar approach to 
[OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference

2020-12-03 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24478:
-

 Summary: Inner GroupBy with Distinct SemanticException: Invalid 
column reference
 Key: HIVE-24478
 URL: https://issues.apache.org/jira/browse/HIVE-24478
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis


{code:java}
CREATE TABLE tmp_src1(
  `npp` string,
  `nsoc` string) stored as orc;

INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111');

SELECT `min_nsoc`
FROM
 (SELECT `npp`,
 MIN(`nsoc`) AS `min_nsoc`,
 COUNT(DISTINCT `nsoc`) AS `nb_nsoc`
  FROM tmp_src1
  GROUP BY `npp`) `a`
WHERE `nb_nsoc` > 0;
{code}
Issue:
{code:java}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
reference 'nsoc' at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405)
{code}
Query runs fine when we include `nb_nsoc` in the Select expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24430:
-

 Summary: DiskRangeInfo should make use of DiskRangeList
 Key: HIVE-24430
 URL: https://issues.apache.org/jira/browse/HIVE-24430
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


DiskRangeInfo should make user of DiskRangeList instead of List – 
this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24391) Fix FIX TestOrcFile failures in branch-3.1

2020-11-16 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24391:
-

 Summary: Fix FIX TestOrcFile failures in branch-3.1 
 Key: HIVE-24391
 URL: https://issues.apache.org/jira/browse/HIVE-24391
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.1.3
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Recent ORC upgrade to 1.5.8 in 3.1 introduced some failures HIVE-24316 – 
tackling those here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24309) Simplify ConvertJoinMapJoin logic

2020-10-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24309:
-

 Summary: Simplify ConvertJoinMapJoin logic 
 Key: HIVE-24309
 URL: https://issues.apache.org/jira/browse/HIVE-24309
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


ConvertMapJoin logic can be further simplified:

[https://github.com/pgaref/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L92]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24308) FIX conditions used for DPHJ conversion

2020-10-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24308:
-

 Summary: FIX conditions used for DPHJ conversion  
 Key: HIVE-24308
 URL: https://issues.apache.org/jira/browse/HIVE-24308
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Found a weird scenario when looking at the ConvertJoinMapJoin logic: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1198]
 When the distinct keys cannot fit in memory AND the DPHJ ShuffleSize is lower 
than expected the code returns a MJ because of the condition above!

In general, I believe the ShuffleSize check: 
[https://github.com/apache/hive/blob/052c9da958f5cf3998091a7eb4b24192a5bb61e9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1624]
 should be part of the shuffleJoin DPHJ conversion.

And the preferred conversion would be: MJ > DPHJ > SMB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24225:
-

 Summary: FIX S3A recordReader policy selection
 Key: HIVE-24225
 URL: https://issues.apache.org/jira/browse/HIVE-24225
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Dynamic S3A recordReader policy selection can cause issues on lazy initialized 
FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-02 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24224:
-

 Summary: Fix skipping header/footer for Hive on Tez on compressed 
files
 Key: HIVE-24224
 URL: https://issues.apache.org/jira/browse/HIVE-24224
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Compressed file with Hive on Tez  returns header and footers - for both select 
* and select count ( * ):
{noformat}
printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
1357\",123\nrst,rst,rst" > data.csv
hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
bzip2 -f data.csv 
hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/

beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
  sequence   int,
  id string,
  other  string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
TBLPROPERTIES (
  'skip.header.line.count'='1',
  'skip.footer.line.count'='1');"

beeline -e "
  SET hive.fetch.task.conversion = none;
  SELECT * FROM default.bz2tst2;"
+---+++
| bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
+---+++
| offset| id | other  |
| 9 | 20200315 X00 1356  | 123|
| 17| 20200315 X00 1357  | 123|
| rst   | rst| rst|
+---+++
{noformat}





PS: HIVE-22769 addressed the issue for Hive on LLAP.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24192) Properly log TaskExecutorService eviction details

2020-09-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24192:
-

 Summary: Properly log TaskExecutorService eviction details
 Key: HIVE-24192
 URL: https://issues.apache.org/jira/browse/HIVE-24192
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


HIVE-23122 introduced task eviction logging but the log condition is 
problematic -- this ticket fixes the issue below (when debug is ON, info is 
also ON causing Debug info never to show up. Also adds a separate logger conf 
for TaskExecutorService


{code:java}
if (LOG.isInfoEnabled())  {
...
}
else if (LOG.isDebugEnabled(){
...
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24185) Upgrade snappy-java to 1.1.7.5

2020-09-21 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-24185:
-

 Summary: Upgrade snappy-java to 1.1.7.5
 Key: HIVE-24185
 URL: https://issues.apache.org/jira/browse/HIVE-24185
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Bump version to take advantage of perf improvements, glibc compatibility etc.

https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-117-2017-11-30



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23914) Support Char, VarChar, Small/tiny Int for Struct IN clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23914:
-

 Summary: Support Char, VarChar, Small/tiny Int for Struct IN clause
 Key: HIVE-23914
 URL: https://issues.apache.org/jira/browse/HIVE-23914
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23913) Support Date, Decimal and Timestamp for Struct IN clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23913:
-

 Summary: Support Date, Decimal and Timestamp for Struct IN clause
 Key: HIVE-23913
 URL: https://issues.apache.org/jira/browse/HIVE-23913
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23912) Extend vectorization support for Struct IN() clause

2020-07-23 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23912:
-

 Summary: Extend vectorization support for Struct IN() clause
 Key: HIVE-23912
 URL: https://issues.apache.org/jira/browse/HIVE-23912
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Currently Struct IN() Vectorization does not support all writable type.
As a result Operators using such conditions fail to vectorize: for example we 
support String type but not Char or Varchar





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization

2020-07-20 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23882:
-

 Summary: Compiler should skip MJ keyExpr for probe optimization
 Key: HIVE-23882
 URL: https://issues.apache.org/jira/browse/HIVE-23882
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


In probe we cannot currently support Key expressions (on the big table Side) as 
ORC CVs Probe directly the smalltable HT (there is no expr evaluation at that 
level).

TezCompiler should take this into account when picking MJs to push probe details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties

2020-07-17 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23871:
-

 Summary: ObjectStore should properly handle MicroManaged Table 
properties
 Key: HIVE-23871
 URL: https://issues.apache.org/jira/browse/HIVE-23871
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23854) Natively support Double and Decimal CVs in ReduceSinkOperator

2020-07-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23854:
-

 Summary: Natively support Double and Decimal CVs in 
ReduceSinkOperator
 Key: HIVE-23854
 URL: https://issues.apache.org/jira/browse/HIVE-23854
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23852) Natively support Date and Timestamp types in ReduceSink operator

2020-07-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23852:
-

 Summary: Natively support Date and Timestamp types in ReduceSink 
operator
 Key: HIVE-23852
 URL: https://issues.apache.org/jira/browse/HIVE-23852
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


There is no native support currently meaning that these types end up being 
serialized as multi-key columns which is much slower (iterating through batch 
columns instead of writing a value directly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23823) Fix violated naming conventions

2020-07-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23823:
-

 Summary: Fix violated naming conventions
 Key: HIVE-23823
 URL: https://issues.apache.org/jira/browse/HIVE-23823
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis


Like the violated method naming conventions in PerfLogger class
See: 
https://github.com/apache/hive/pull/1161/files/dbed3ff5e69d81cedae9c1254a90326d26a19d63#diff-bdbfbb8352f29fd90c559eef871f4853R137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23773) Support multi-key probe MapJoins

2020-06-29 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23773:
-

 Summary: Support multi-key probe MapJoins
 Key: HIVE-23773
 URL: https://issues.apache.org/jira/browse/HIVE-23773
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23734) Untangle LlapRecordReader Includes construction

2020-06-20 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23734:
-

 Summary: Untangle LlapRecordReader Includes construction
 Key: HIVE-23734
 URL: https://issues.apache.org/jira/browse/HIVE-23734
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23733) [LLAP] Extend InputFormat to genIncludedColNames

2020-06-20 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23733:
-

 Summary: [LLAP] Extend InputFormat to genIncludedColNames 
 Key: HIVE-23733
 URL: https://issues.apache.org/jira/browse/HIVE-23733
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Extend LLAP ORCInputFormat to generate includedColNames -- similar to what we 
currently do for generating includedColIds. This will enable us to do 
ColNames->ReaderId mapping when we need to apply filters and we are only aware 
of colNames  (e.g., HIVE-23730)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin

2020-06-19 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23730:
-

 Summary: Compiler support tracking TS keyColName for Probe MapJoin
 Key: HIVE-23730
 URL: https://issues.apache.org/jira/browse/HIVE-23730
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


TezCompiler needs to track the original TS key columnName used for MJ 
probedecode.
Even thought we know the MJ keyCol at compile time, this could be generated by 
previous (parent) operators thus we dont always know the original TS column it 
maps to.

To find the original columnMapping, we need to track the MJ keyCol through the 
operator pipeline. The tracking is done through the output column name to input 
expression Map of the operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23698) Compiler support for row-level filtering on filterPredicates

2020-06-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23698:
-

 Summary: Compiler support for row-level filtering on 
filterPredicates
 Key: HIVE-23698
 URL: https://issues.apache.org/jira/browse/HIVE-23698
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23658) Fix FindBug issues in hive-kudu-handler

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23658:
-

 Summary: Fix FindBug issues in hive-kudu-handler
 Key: HIVE-23658
 URL: https://issues.apache.org/jira/browse/HIVE-23658
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23656) Fix FindBug issues in llap-server

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23656:
-

 Summary: Fix FindBug issues in llap-server
 Key: HIVE-23656
 URL: https://issues.apache.org/jira/browse/HIVE-23656
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23657) Fix FindBug issues in hive-shims

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23657:
-

 Summary: Fix FindBug issues in hive-shims
 Key: HIVE-23657
 URL: https://issues.apache.org/jira/browse/HIVE-23657
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23655) Fix FindBug issues in llap-tez

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23655:
-

 Summary: Fix FindBug issues in llap-tez
 Key: HIVE-23655
 URL: https://issues.apache.org/jira/browse/HIVE-23655
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23654) Fix FindBug issues in llap-ext-client

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23654:
-

 Summary: Fix FindBug issues in llap-ext-client
 Key: HIVE-23654
 URL: https://issues.apache.org/jira/browse/HIVE-23654
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23653) Fix FindBug issues in llap-client

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23653:
-

 Summary: Fix FindBug issues in llap-client
 Key: HIVE-23653
 URL: https://issues.apache.org/jira/browse/HIVE-23653
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23652) Fix FindBug issues in llap-common

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23652:
-

 Summary: Fix FindBug issues in llap-common
 Key: HIVE-23652
 URL: https://issues.apache.org/jira/browse/HIVE-23652
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23651) Fix FindBug issues in hive-service

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23651:
-

 Summary: Fix FindBug issues in hive-service
 Key: HIVE-23651
 URL: https://issues.apache.org/jira/browse/HIVE-23651
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23650) Fix FindBug issues in hive-streaming

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23650:
-

 Summary: Fix FindBug issues in hive-streaming
 Key: HIVE-23650
 URL: https://issues.apache.org/jira/browse/HIVE-23650
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23649) Fix FindBug issues in hive-service-rpc

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23649:
-

 Summary: Fix FindBug issues in hive-service-rpc
 Key: HIVE-23649
 URL: https://issues.apache.org/jira/browse/HIVE-23649
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23648) Fix FindBug issues in hive-serde

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23648:
-

 Summary: Fix FindBug issues in hive-serde
 Key: HIVE-23648
 URL: https://issues.apache.org/jira/browse/HIVE-23648
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23646) Fix FindBug issues in hive-ql

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23646:
-

 Summary: Fix FindBug issues in hive-ql
 Key: HIVE-23646
 URL: https://issues.apache.org/jira/browse/HIVE-23646
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23647) Fix FindBug issues in hive-parser

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23647:
-

 Summary: Fix FindBug issues in hive-parser
 Key: HIVE-23647
 URL: https://issues.apache.org/jira/browse/HIVE-23647
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23645) Fix FindBug issues in hive-metastore

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23645:
-

 Summary: Fix FindBug issues in hive-metastore
 Key: HIVE-23645
 URL: https://issues.apache.org/jira/browse/HIVE-23645
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23643) Fix FindBug issues in hive-hplsql

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23643:
-

 Summary: Fix FindBug issues in hive-hplsql
 Key: HIVE-23643
 URL: https://issues.apache.org/jira/browse/HIVE-23643
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23644) Fix FindBug issues in hive-jdbc

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23644:
-

 Summary: Fix FindBug issues in hive-jdbc
 Key: HIVE-23644
 URL: https://issues.apache.org/jira/browse/HIVE-23644
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23642) Fix FindBug issues in hive-jdbc-handler

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23642:
-

 Summary: Fix FindBug issues in hive-jdbc-handler
 Key: HIVE-23642
 URL: https://issues.apache.org/jira/browse/HIVE-23642
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23641) Fix FindBug issues in hive-hbase-handler

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23641:
-

 Summary: Fix FindBug issues in hive-hbase-handler
 Key: HIVE-23641
 URL: https://issues.apache.org/jira/browse/HIVE-23641
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23640:
-

 Summary: Fix FindBug issues in hive-druid-handler
 Key: HIVE-23640
 URL: https://issues.apache.org/jira/browse/HIVE-23640
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23639:
-

 Summary: Fix FindBug issues in hive-contrib
 Key: HIVE-23639
 URL: https://issues.apache.org/jira/browse/HIVE-23639
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23638:
-

 Summary: Fix FindBug issues in hive-common
 Key: HIVE-23638
 URL: https://issues.apache.org/jira/browse/HIVE-23638
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23637) Fix FindBug issues in cli

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23637:
-

 Summary: Fix FindBug issues in cli
 Key: HIVE-23637
 URL: https://issues.apache.org/jira/browse/HIVE-23637
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23636) Fix FindBug issues in beeline

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23636:
-

 Summary: Fix FindBug issues in beeline
 Key: HIVE-23636
 URL: https://issues.apache.org/jira/browse/HIVE-23636
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23635) Fix FindBug issues in vector-code-gen

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23635:
-

 Summary: Fix FindBug issues in vector-code-gen
 Key: HIVE-23635
 URL: https://issues.apache.org/jira/browse/HIVE-23635
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23634) Fix FindBug issues in storage-api accumulo-handler

2020-06-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23634:
-

 Summary: Fix FindBug issues in storage-api accumulo-handler
 Key: HIVE-23634
 URL: https://issues.apache.org/jira/browse/HIVE-23634
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23620) Explore moving to SpotBugs

2020-06-05 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23620:
-

 Summary: Explore moving to SpotBugs
 Key: HIVE-23620
 URL: https://issues.apache.org/jira/browse/HIVE-23620
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


We may want to switch to [SpoBugs|https://github.com/spotbugs/spotbugs]
-- the spiritual successor of FindBugs, carrying on from the point where it 
left off with support of its community

SpotBugs is in a reality a fork of FindBugs: 
https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2016-November/004321.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23617) Fix FindBug issues on storage-api

2020-06-05 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23617:
-

 Summary: Fix FindBug issues on storage-api
 Key: HIVE-23617
 URL: https://issues.apache.org/jira/browse/HIVE-23617
 Project: Hive
  Issue Type: Sub-task
  Components: storage-api
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23561) FIX Arrow Decimal serialization for VectorRowBatches

2020-05-28 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23561:
-

 Summary: FIX Arrow Decimal serialization for VectorRowBatches
 Key: HIVE-23561
 URL: https://issues.apache.org/jira/browse/HIVE-23561
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23554) [LLAP] ReadPipeline support for ColumnVectorBatch with FilterContext

2020-05-27 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23554:
-

 Summary: [LLAP] ReadPipeline support for ColumnVectorBatch with 
FilterContext
 Key: HIVE-23554
 URL: https://issues.apache.org/jira/browse/HIVE-23554
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Currently the readPipeline in LLAP supports consuming ColumnVectorBatches.
As each batch can be now tied with a Filter (HIVE-23215) we should update the 
pipeline to consume BatchWrappers of ColumnVectorBatch and a Filter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23553) Bump ORC version

2020-05-27 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23553:
-

 Summary: Bump ORC version
 Key: HIVE-23553
 URL: https://issues.apache.org/jira/browse/HIVE-23553
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Make sure we are using one of the more recent orc.versions that include 
row-filtering (ORC-577 , ORC-622)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23537) RecordReader support for row-filtering

2020-05-22 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23537:
-

 Summary: RecordReader support for row-filtering
 Key: HIVE-23537
 URL: https://issues.apache.org/jira/browse/HIVE-23537
 Project: Hive
  Issue Type: Sub-task
  Components: llap, Reader
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


ORC-577 enables row-level filtering for the ORC format while HIVE-23167 is 
aiming to extend the existing compiler logic and push filters further down the 
pipeline wherever possible.

In this jira we extend the HIVE Record readers to utilize the above filtering 
functionality (similar to what we already do for PPD).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23486) Compiler support for row-level filtering on transactional deletes

2020-05-17 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23486:
-

 Summary: Compiler support for row-level filtering on transactional 
deletes
 Key: HIVE-23486
 URL: https://issues.apache.org/jira/browse/HIVE-23486
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23475) Track MJ HashTable mem usage

2020-05-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23475:
-

 Summary: Track MJ HashTable mem usage
 Key: HIVE-23475
 URL: https://issues.apache.org/jira/browse/HIVE-23475
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23393) LLapInputFormat reader policy for Random IO formats

2020-05-07 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23393:
-

 Summary: LLapInputFormat reader policy for Random IO formats
 Key: HIVE-23393
 URL: https://issues.apache.org/jira/browse/HIVE-23393
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Extension of HIVE-23158 for LLAPInputFormat



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23375) Track MJ HashTable Load time

2020-05-06 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23375:
-

 Summary: Track MJ HashTable Load time
 Key: HIVE-23375
 URL: https://issues.apache.org/jira/browse/HIVE-23375
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Introduce TezCounter to track MJ HashTable Load time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23352) More user friendly StorageAuthorizationProvider log messages

2020-05-01 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23352:
-

 Summary: More user friendly StorageAuthorizationProvider log 
messages
 Key: HIVE-23352
 URL: https://issues.apache.org/jira/browse/HIVE-23352
 Project: Hive
  Issue Type: Improvement
  Components: Security
Affects Versions: 4.0.0
Reporter: Panagiotis Garefalakis


Currently *StorageBasedAuthorizationProvider* returns messages (like below) 
about data paths even for _External_ tables where a drop command would just 
remove metadata. Lets make those messages more user-friendly.
{code:java}
Permission Denied: User hive can't delete hdfs://XXX.com:8020/tmp/testuser 
because sticky bit is set on the parent dir and user does not own this file or 
its parent)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23238) FIX PreemptionQueueComparator edge cases

2020-04-17 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23238:
-

 Summary: FIX PreemptionQueueComparator edge cases
 Key: HIVE-23238
 URL: https://issues.apache.org/jira/browse/HIVE-23238
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis
 Fix For: llap


Properly handle preemption comparator edge cases where tasks are same type and 
have the same number or upstream tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23214) Remove skipCorrupt from OrcEncodedDataConsumer

2020-04-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23214:
-

 Summary: Remove skipCorrupt from OrcEncodedDataConsumer
 Key: HIVE-23214
 URL: https://issues.apache.org/jira/browse/HIVE-23214
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


SkipCorrupt is always the default (false) so there is no reason to pass it 
around.

[https://github.com/apache/hive/blob/3e4f6122c32b1ffa22e1458806ae8ee30e51a41f/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java#L86

]

If we want to change the default behaviour we could set "orc.skip.corrupt.data" 
as part of the configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23170) Probe support for ORC DataConsumer

2020-04-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23170:
-

 Summary: Probe support for ORC DataConsumer
 Key: HIVE-23170
 URL: https://issues.apache.org/jira/browse/HIVE-23170
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23169) Probe runtime support for LLAP

2020-04-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23169:
-

 Summary: Probe runtime support for LLAP
 Key: HIVE-23169
 URL: https://issues.apache.org/jira/browse/HIVE-23169
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23168) Implement MJ HashTable contains key functionality

2020-04-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23168:
-

 Summary: Implement MJ HashTable contains key functionality
 Key: HIVE-23168
 URL: https://issues.apache.org/jira/browse/HIVE-23168
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23167) Extend compiler support for Probe static filters

2020-04-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23167:
-

 Summary: Extend compiler support for Probe static filters
 Key: HIVE-23167
 URL: https://issues.apache.org/jira/browse/HIVE-23167
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23166) Protect VGB from flushing too often

2020-04-09 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23166:
-

 Summary: Protect VGB from flushing too often
 Key: HIVE-23166
 URL: https://issues.apache.org/jira/browse/HIVE-23166
 Project: Hive
  Issue Type: Improvement
  Components: llap
Affects Versions: 4.0.0
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


The existing flush logic in our VectorGroupByOperator is completely static.
 It depends on the: number of HtEntries (*hive.vectorized.groupby.maxentries*) 
and the MAX memory threshold (by default 90% of available memory)
 
Assuming that we are not memory constrained the periodicity of flushing is 
currently dictated by the static number of entries (1M by default) which can be 
also misconfigured to a very low value.

I am proposing along with maxHtEntries, to also take into account current 
memory usage, to avoid flushing too ofter as it can hurt op throughput for 
particular workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23158) Optimize S3A recordReader policy for Random IO formats

2020-04-08 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23158:
-

 Summary: Optimize S3A recordReader policy for Random IO formats
 Key: HIVE-23158
 URL: https://issues.apache.org/jira/browse/HIVE-23158
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


S3A filesystem client (inherited by Hadoop) supports the notion of input 
policies.
These policies tune the behaviour of HTTP requests that are used for reading 
different filetypes such as TEXT or ORC.

For formats such as ORC and Parquet do a lot of seek operations, thus there is 
an optimized RANDOM mode that reads files only partially instead of fully 
(default).

I am suggesting to add some extra logic as part of HiveInputFormat to make sure 
we optimize for random IO when data is stored on S3A using formats such as ORC 
or Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23036) Incorrect ORC PPD eval with sub-millisecond timestamps

2020-03-17 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23036:
-

 Summary: Incorrect ORC PPD eval with sub-millisecond timestamps
 Key: HIVE-23036
 URL: https://issues.apache.org/jira/browse/HIVE-23036
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details

ORC stores timestamps with:
 - nanosecond precision for the data itself
 - milliseconds precision for min-max statistics

As both min and max are rounded to the same value,  timestamps with ns 
precision will not pass the PPD evaluator.
{code:java}
create table tsstat (ts timestamp) stored as orc;
insert into tsstat values ("1970-01-01 00:00:00.0005");
select * from tsstat where ts = "1970-01-01 00:00:00.0005";
-- returned 0 rows{code}

ORC PPD evaluation currently happens as part of OrcInputFormat 
[https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23006) Compiler support for Probe MapJoin

2020-03-10 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-23006:
-

 Summary: Compiler support for Probe MapJoin
 Key: HIVE-23006
 URL: https://issues.apache.org/jira/browse/HIVE-23006
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22959) Extend storage-api to expose FilterContext

2020-03-02 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-22959:
-

 Summary: Extend storage-api to expose FilterContext
 Key: HIVE-22959
 URL: https://issues.apache.org/jira/browse/HIVE-22959
 Project: Hive
  Issue Type: Sub-task
  Components: storage-api
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


To enable row-level filtering at the ORC level ORC-577, or as an extension 
ProDecode MapJoin HIVE-22731 we need a common context class that will hold all 
the needed information for the filter.

I propose this class to be part of the storage-api – similar to 
VectorizedRowBatch class and hold the information below:
 * A boolean variable showing if the filter is enabled
 * A int array storing the row Ids that are actually selected (passing the 
filter)
 * An int variable storing the the number or rows that passed the filter
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22958) Extend storage-api to expose FilterContext

2020-03-02 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-22958:
-

 Summary: Extend storage-api to expose FilterContext
 Key: HIVE-22958
 URL: https://issues.apache.org/jira/browse/HIVE-22958
 Project: Hive
  Issue Type: New Feature
  Components: storage-api
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


To enable row-level filtering at the ORC level ORC-577, or as an extension 
ProDecode MapJoin HIVE-22731 we need a common context class that will hold all 
the needed information for the filter.

I propose this class to be part of the storage-api – similar to 
VectorizedRowBatch class and hold the information below:


 * A boolean variable showing if the filter is enabled
 * A int array storing the row Ids that are actually selected (passing the 
filter)
 * An int variable storing the the number or rows that passed the filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22731) Use MapJoin hashtables for row level filtering

2020-01-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-22731:
-

 Summary: Use MapJoin hashtables for row level filtering
 Key: HIVE-22731
 URL: https://issues.apache.org/jira/browse/HIVE-22731
 Project: Hive
  Issue Type: Bug
  Components: Hive, llap
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis
 Attachments: decode_time_bars.pdf

Currently, RecordReaders such as ORC support filtering at coarser-grained 
levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They 
only filter sets of rows if they can guarantee that none of the rows can pass a 
filter (usually given as searchable argument).

However, a significant amount of time can be spend deconding rows with multiple 
columns that are not even used in the final result. See figure where original 
is what happens today and in LazyDecode we skip decoding rows that do not much 
the key.

To enable a more fine-grained filtering in the particular case of a MapJoin we 
could utilize the key HashTable created from the smaller table to skip 
deserializing row columns at the larger table that do not match any key and 
thus save CPU time. 
This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22505) ClassCastException caused by wrong Vectorized operator selection

2019-11-15 Thread Panagiotis Garefalakis (Jira)

Panagiotis Garefalakis created HIVE-22505:
-

 Summary: ClassCastException caused by wrong Vectorized operator 
selection
 Key: HIVE-22505
 URL: https://issues.apache.org/jira/browse/HIVE-22505
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis
 Attachments: query_error.out, query_vector_explain.out

VectorMapJoinOuterFilteredOperator does not currently support full outer joins 
but using the current Vectorizer logic it can be selected when a there is a 
filter involved. This can make queries fail with ClassCastException when their 
data and metadata in the VectorMapJoinOuterFilteredOperator do not match.

The query attached demonstrates the issue and the log attached shows the 
java.lang.ClassCastException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

97 matches

Mail list logo