[jira] [Created] (HIVE-25599) Addendum of HIVE-25570 Hive should send full URL path for authorization for the command insert overwrite location
Panagiotis Garefalakis created HIVE-25599: - Summary: Addendum of HIVE-25570 Hive should send full URL path for authorization for the command insert overwrite location Key: HIVE-25599 URL: https://issues.apache.org/jira/browse/HIVE-25599 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String
Panagiotis Garefalakis created HIVE-25541: - Summary: JsonSerDe: TBLPROPERTY treating nested json as String Key: HIVE-25541 URL: https://issues.apache.org/jira/browse/HIVE-25541 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not support loading nested json into a string type directly. It requires the declaring the column as complex type (struct, map, array) to unpack nested json data. Even though the data field is not a valid JSON String type there is value treating it as plain String instead of throwing an exception as we currently do. {code:java} create table json_table(data string, messageid string, publish_time bigint, attributes string); {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}} {code} This JIRA introduces an extra Table property allowing to Stringify Complex JSON values instead of forcing the User to define the complete nested structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25398) Converted external tables should be able to configure purge behaviour
Panagiotis Garefalakis created HIVE-25398: - Summary: Converted external tables should be able to configure purge behaviour Key: HIVE-25398 URL: https://issues.apache.org/jira/browse/HIVE-25398 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Creating non-ACID MANAGED tables is not allowed on Hive, which is instead converting these tables to External: https://issues.apache.org/jira/browse/HIVE-22158 During table translation both TRANSLATED_TO_EXTERNAL and 'external.table.purge' are set to True. However, there could be the case that the second parameter is already set in the table properties by the User. This is ticket is adding an extra check to maintain that property if set. PS: A cleaner solution would be to create these Tables as External directly but there could be the case the User is taking advantage of the translation and is expecting the data NOT to be purged! Example: {code:java} -- Non-ACID table will be translated to EXTERNAL create table c(c int) LOCATION 'etp_1' TBLPROPERTIES('transactional'='false','external.table.purge'='false'); insert into c values(1); -- Maintain the purge=false property set above desc formatted c; select count(*) from c; drop table c; -- Create table in same location, data should still be there create table c(c int) LOCATION 'etp_1' TBLPROPERTIES('transactional'='false','external.table.purge'='false'); desc formatted c; select count(*) from c; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25362) LLAP: ensure tasks with locality are added to DelayQueue
Panagiotis Garefalakis created HIVE-25362: - Summary: LLAP: ensure tasks with locality are added to DelayQueue Key: HIVE-25362 URL: https://issues.apache.org/jira/browse/HIVE-25362 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis HIVE-24914 introduced a short-circuit optimization when all nodes are busy returning DELAYED_RESOURCES and reseting locality delay for a given tasks. However, this may prevent tasks from being added to the DelayQueue leading to worse locality when all LLap resources are fully utilized. To address the issue we should handle the two cases separately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25155) Bump ORC to 1.6.8
Panagiotis Garefalakis created HIVE-25155: - Summary: Bump ORC to 1.6.8 Key: HIVE-25155 URL: https://issues.apache.org/jira/browse/HIVE-25155 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis https://orc.apache.org/news/2021/05/21/ORC-1.6.8/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25149) Support parallel load for Optimized HT implementations
Panagiotis Garefalakis created HIVE-25149: - Summary: Support parallel load for Optimized HT implementations Key: HIVE-25149 URL: https://issues.apache.org/jira/browse/HIVE-25149 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25148) Support parallel load for Fast HT implementation
Panagiotis Garefalakis created HIVE-25148: - Summary: Support parallel load for Fast HT implementation Key: HIVE-25148 URL: https://issues.apache.org/jira/browse/HIVE-25148 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25146) JMH tests for Multi HT and parallel load
Panagiotis Garefalakis created HIVE-25146: - Summary: JMH tests for Multi HT and parallel load Key: HIVE-25146 URL: https://issues.apache.org/jira/browse/HIVE-25146 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis As the title suggests, add some benchmarks for Parallel HT construction feature -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25145) Improve Multi-HashTable EstimatedMemorySize
Panagiotis Garefalakis created HIVE-25145: - Summary: Improve Multi-HashTable EstimatedMemorySize Key: HIVE-25145 URL: https://issues.apache.org/jira/browse/HIVE-25145 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis When Multi HashTable is used for parallel HT loading, we calculate the estimatedMemorySize as the sum of all HTs. However, each of those HTs already adds some constants to memory estimation e.g., adding 16KB constant memory for keyBinarySortableDeserializeRead This ticket aims to improve the memory estimation for Multi HT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25117) Vector PTF ClassCastException with Decimal64
Panagiotis Garefalakis created HIVE-25117: - Summary: Vector PTF ClassCastException with Decimal64 Key: HIVE-25117 URL: https://issues.apache.org/jira/browse/HIVE-25117 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Only reproduces when there is at least 1 buffered batch, so needed 2 rows with 1 row/batch: {code:java} set hive.vectorized.testing.reducer.batch.size=1; {code} {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.copyNonSelectedColumnVector(VectorizedBatchUtil.java:664) at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.forwardBufferedBatches(VectorPTFGroupBatches.java:228) at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.fillGroupResultsAndForward(VectorPTFGroupBatches.java:318) at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:403) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25103) Update row.serde excludes defaults
Panagiotis Garefalakis created HIVE-25103: - Summary: Update row.serde excludes defaults Key: HIVE-25103 URL: https://issues.apache.org/jira/browse/HIVE-25103 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis HIVE-16222 introduced row.serde.inputformat.excludes setting to disable row.serde for specific NON-Vectorized formats. Since MapredParquetInputFormat is currently natively vectorized it should be removed from that list. Even when hive.vectorized.use.vectorized.input.format is DISABLED Vectorizer will not vectorize in row deserialize mode if the input format has is natively Vectorized so it is safe to remove. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25083) Extra reviewer pattern
Panagiotis Garefalakis created HIVE-25083: - Summary: Extra reviewer pattern Key: HIVE-25083 URL: https://issues.apache.org/jira/browse/HIVE-25083 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25082) Make SettableTreeReader updateTimezone a default method
Panagiotis Garefalakis created HIVE-25082: - Summary: Make SettableTreeReader updateTimezone a default method Key: HIVE-25082 URL: https://issues.apache.org/jira/browse/HIVE-25082 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Avoid useless TimestampStreamReader instance checks by making updateTimezone() a default method in SettableTreeReader -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25049) LlapDaemon preemption should not be triggered for same Vertex tasks
Panagiotis Garefalakis created HIVE-25049: - Summary: LlapDaemon preemption should not be triggered for same Vertex tasks Key: HIVE-25049 URL: https://issues.apache.org/jira/browse/HIVE-25049 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Due to the asynchronous nature of QueryInfo$FinishableState notification, we usually end up receiving finishable state updates across tasks/queryfragments with some time difference. Imagine vertex Map8 with dependency on Map7. If Map8 vertex is already running with some tasks still pending, we can end-up in a situation where on Map7 completion, some of the pending Map8 tasks are getting the finishable state update BEFORE the already running Map8 tasks, ending up preempting tasks for no reason! {code:java} 2021-04-22T15:30:45.124Z source:Map 7 updated, notifying: [Map 8] 2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received finishable state update for attempt_1619105382691_0001_1_05_14_0, state=true 2021-04-22T15:30:45.125Z query-executor-0 class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server handler 3 on 25000"] Now notifying: Map 8 2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to execute TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, canFinish=true, canFinish(in 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" level="INFO" thread="Wait-Queue-Scheduler-0"] Task TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, canFinish=true, canFinish(in queue)=true, isGuaranteed=false, firstAttemptStartTime=1619105437749, dagStartTime=1619105422608, withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, selfAndUpstreamComplete= 17} managed to preempt task TaskWrapper{task=attempt_1619105382691_0001_1_05_06_0, Vertex=Map 8, inWaitQueue=false, inPreemptionQueue=true, registeredForNotifications=true, canFinish=true, canFinish(in queue)=false, isGuaranteed=false, firstAttemptStartTime=1619105437737, dagStartTime=1619105422608, withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, selfAndUpstreamComplete= 15} 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" level="INFO" thread="Wait-Queue-Scheduler-0"] Invoking kill task for attempt_1619105382691_0001_1_05_06_0 due to pre-emption to run attempt_1619105382691_0001_1_05_14_0 2021-04-22T15:30:45.126Z query-executor-0 class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server handler 3 on 25000"] Now notifying: Map 8 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received finishable state update for attempt_1619105382691_0001_1_05_11_0, state=true 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" level="INFO" thread="Wait-Queue-Scheduler-0"] Kill task requested for id=attempt_1619105382691_0001_1_05_06_0, taskRunnerSetup=true 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" level="INFO" thread="Wait-Queue-Scheduler-0"] Issuing kill to task attempt_1619105382691_0001_1_05_06_0 2021-04-22T15:30:45.127Z query-executor-0 class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server handler 3 on 25000"] Now notifying: Map 8 2021-04-22T15:30:45.127Z query-executor-0 class="task.TezTaskRunner2" level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to abort attempt_1619105382691_0001_1_05_06_0 due to an invocation of killTask 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" level="INFO" thread="Wait-Queue-Scheduler-0"] Received abort 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to RecordProcessor 2021-04-22T15:30:45.128Z query-executor-0 class="runtime.LogicalIOProcessorRuntimeTask" dagId="dag_1619105382691_0001_1" fragmentId="1619105382691_0001_1_05_11_0" level="INFO" queryId="hive_20210422153013_397b96bf-d5a6-493a-9c51-9446f64eeed4" thread="TezTR-382691_1_1_5_11_0"] Waiting for 1 initializers to finish 2021-04-22T15:30:45.128Z query-executor-0 class="tez.MapRecordProcessor" level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to mapOp: {} MAP 2021-04-22T15:30:45.128Z query-executor-0 class="vector.VectorMapOperator" level="INFO" thread="Wait-Queue-Scheduler-0"]
[jira] [Created] (HIVE-24914) Improve scheduling by only traversing hosts with capacity
Panagiotis Garefalakis created HIVE-24914: - Summary: Improve scheduling by only traversing hosts with capacity Key: HIVE-24914 URL: https://issues.apache.org/jira/browse/HIVE-24914 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis *schedulePendingTasks* on the LlapTaskScheduler currently goes through all the pending tasks and tries to allocate them based on their Priority -- if a priority can not be scheduled completely, we bail out as lower priorities would not be able to get allocations either. An optimization here could be to only walk through the nodes with capacity (if any) ,and not all available hosts, for scheduling these tasks based on their priority and locality preferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24913) LlapTaskScheduler Improvements
Panagiotis Garefalakis created HIVE-24913: - Summary: LlapTaskScheduler Improvements Key: HIVE-24913 URL: https://issues.apache.org/jira/browse/HIVE-24913 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24793) Compiler probe MJ selection algorithm fallback
Panagiotis Garefalakis created HIVE-24793: - Summary: Compiler probe MJ selection algorithm fallback Key: HIVE-24793 URL: https://issues.apache.org/jira/browse/HIVE-24793 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis As per discussion on #1286 current probe MJ selection algorithm: * selects best MJ candidate (currently based on the distinct row ratio) * does some further processing - which may bail out Bailing out for the best candidate doesn't necessarily mean that we can not use a less charming candidate. The extra compilation can be wrapped into to for loop instead of selecting the best candidate the first part could be implemented as a priority logic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24780) Consolidate type.checking configuration
Panagiotis Garefalakis created HIVE-24780: - Summary: Consolidate type.checking configuration Key: HIVE-24780 URL: https://issues.apache.org/jira/browse/HIVE-24780 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Seems like hive.strict.timestamp.conversion, hive.strict.checks.type.safety, hive.strict.checks.no.partition.filter etc. are serving similar purpose -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24779) Document type checks and allowed conversions
Panagiotis Garefalakis created HIVE-24779: - Summary: Document type checks and allowed conversions Key: HIVE-24779 URL: https://issues.apache.org/jira/browse/HIVE-24779 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Currently, with hive.strict.checks.type.safety=true we disallow: * Comparing bigints and strings/(var)chars * Comparing bigints and doubles * Comparing decimals and strings/(var)chars while there is a TODO to add any remaining checks. We need to decide how strict we are going to be here(following mySQL checks? Posgres?) and then document everything. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24777) Standardize Hive's Type conversions
Panagiotis Garefalakis created HIVE-24777: - Summary: Standardize Hive's Type conversions Key: HIVE-24777 URL: https://issues.apache.org/jira/browse/HIVE-24777 Project: Hive Issue Type: New Feature Reporter: Panagiotis Garefalakis Currently, Hive does some type safety checks looking for loosy type conversions such as: decimal to char, and long to varchar (even though not complete). These conversion are not documented and they might even differ depending on the CBO status. For example: {code:java} set hive.cbo.enable=false; create table comparison_table (col_v varchar(3), col_i int) stored as orc; insert into comparison_table values ('001', 15), ('002', 25), ('007', 75); select * from comparison_table where col_v >= 4L; // Converts to pred '4' to String // No result when CBO off set hive.cbo.enable=on; select * from comparison_table where col_v >= 4L; // Converts both to Double // 007 when CBO is on {code} Type conversions on the predicates also affect Sargs evaluation as in the first case (cbo off) string padding is missing, and in the latter case (cbo on) UDFBridge can no be evaluated. Finally, it seems that there are multiple configuration tracking the same thing: * hive.strict.timestamp.conversion * hive.strict.checks.type.safety This uber Jira is targeting the standardisation/documentation of these type checks and their conversions on Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24735) Implement TIMESTAMP WITH LOCAL TIME ZONE integration with ORC
Panagiotis Garefalakis created HIVE-24735: - Summary: Implement TIMESTAMP WITH LOCAL TIME ZONE integration with ORC Key: HIVE-24735 URL: https://issues.apache.org/jira/browse/HIVE-24735 Project: Hive Issue Type: New Feature Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis TIMESTAMP_INSTANT in ORC is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type in Hive. Support to read/write timestamp with local time zone in ORC was added as part of ORC-189. We should implement their [integration|https://github.com/apache/hive/pull/1823#discussion_r564077084]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24721) Align CacheWriter bufferSize with Llap max alloc
Panagiotis Garefalakis created HIVE-24721: - Summary: Align CacheWriter bufferSize with Llap max alloc Key: HIVE-24721 URL: https://issues.apache.org/jira/browse/HIVE-24721 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Before bumping to ORC-1.6, LLAP_ALLOCATOR_MAX_ALLOC value was also used as the ORC CacheWriter buffer size. As per ORC-238, the max bufferSize argument can be up to 2^(3*8 - 1) -- i.e., less than 8Mb and since we enforce the size to be power of 2 the next available is 4Mb. In HIVE-23553 we decouple the two configuration (LLAP max alloc and CacheWriter buffer size) with the first one being 16Mb, and the latter 4Mb -- this ticket is to investigate if there is a need for these two conf to convergence. More details: https://github.com/apache/hive/pull/1823#pullrequestreview-575698916 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24703) LLAP move RowGroup reading to StrippePlanner
Panagiotis Garefalakis created HIVE-24703: - Summary: LLAP move RowGroup reading to StrippePlanner Key: HIVE-24703 URL: https://issues.apache.org/jira/browse/HIVE-24703 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis LlapDataREader is almost identical to ORC's DataREader, however the main issue the readFileData method of the latter takes a DiskRangeList instead of a BufferChunkList. ORC-1.6 uses a separate StripePlanner that reads RowGroups converting them to BufferChunks while in LLAP we are doing our own custom planning with DiskRanges. I am opening this ticket to explore moving LLAP to Stripe planning -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24631) Move to ChronoLocalDate and day of epoch instead of java's sql Date
Panagiotis Garefalakis created HIVE-24631: - Summary: Move to ChronoLocalDate and day of epoch instead of java's sql Date Key: HIVE-24631 URL: https://issues.apache.org/jira/browse/HIVE-24631 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis For Predicate evaluation see https://issues.apache.org/jira/browse/ORC-661 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter
Panagiotis Garefalakis created HIVE-24539: - Summary: OrcInputFormat schema generation should respect column delimiter Key: HIVE-24539 URL: https://issues.apache.org/jira/browse/HIVE-24539 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis OrcInputFormat currently generates schema using the given configuration and the default delimiter – that causes inconsistencies when names contain commas. We should follow a similar approach to [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference
Panagiotis Garefalakis created HIVE-24478: - Summary: Inner GroupBy with Distinct SemanticException: Invalid column reference Key: HIVE-24478 URL: https://issues.apache.org/jira/browse/HIVE-24478 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis {code:java} CREATE TABLE tmp_src1( `npp` string, `nsoc` string) stored as orc; INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111'); SELECT `min_nsoc` FROM (SELECT `npp`, MIN(`nsoc`) AS `min_nsoc`, COUNT(DISTINCT `nsoc`) AS `nb_nsoc` FROM tmp_src1 GROUP BY `npp`) `a` WHERE `nb_nsoc` > 0; {code} Issue: {code:java} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column reference 'nsoc' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405) {code} Query runs fine when we include `nb_nsoc` in the Select expression -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
Panagiotis Garefalakis created HIVE-24430: - Summary: DiskRangeInfo should make use of DiskRangeList Key: HIVE-24430 URL: https://issues.apache.org/jira/browse/HIVE-24430 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis DiskRangeInfo should make user of DiskRangeList instead of List – this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24391) Fix FIX TestOrcFile failures in branch-3.1
Panagiotis Garefalakis created HIVE-24391: - Summary: Fix FIX TestOrcFile failures in branch-3.1 Key: HIVE-24391 URL: https://issues.apache.org/jira/browse/HIVE-24391 Project: Hive Issue Type: Improvement Affects Versions: 3.1.3 Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Recent ORC upgrade to 1.5.8 in 3.1 introduced some failures HIVE-24316 – tackling those here -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24309) Simplify ConvertJoinMapJoin logic
Panagiotis Garefalakis created HIVE-24309: - Summary: Simplify ConvertJoinMapJoin logic Key: HIVE-24309 URL: https://issues.apache.org/jira/browse/HIVE-24309 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis ConvertMapJoin logic can be further simplified: [https://github.com/pgaref/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L92] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24308) FIX conditions used for DPHJ conversion
Panagiotis Garefalakis created HIVE-24308: - Summary: FIX conditions used for DPHJ conversion Key: HIVE-24308 URL: https://issues.apache.org/jira/browse/HIVE-24308 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Found a weird scenario when looking at the ConvertJoinMapJoin logic: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1198] When the distinct keys cannot fit in memory AND the DPHJ ShuffleSize is lower than expected the code returns a MJ because of the condition above! In general, I believe the ShuffleSize check: [https://github.com/apache/hive/blob/052c9da958f5cf3998091a7eb4b24192a5bb61e9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1624] should be part of the shuffleJoin DPHJ conversion. And the preferred conversion would be: MJ > DPHJ > SMB -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24225) FIX S3A recordReader policy selection
Panagiotis Garefalakis created HIVE-24225: - Summary: FIX S3A recordReader policy selection Key: HIVE-24225 URL: https://issues.apache.org/jira/browse/HIVE-24225 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Dynamic S3A recordReader policy selection can cause issues on lazy initialized FS objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files
Panagiotis Garefalakis created HIVE-24224: - Summary: Fix skipping header/footer for Hive on Tez on compressed files Key: HIVE-24224 URL: https://issues.apache.org/jira/browse/HIVE-24224 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Compressed file with Hive on Tez returns header and footers - for both select * and select count ( * ): {noformat} printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 1357\",123\nrst,rst,rst" > data.csv hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/ bzip2 -f data.csv hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/ beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 ( sequence int, id string, other string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' TBLPROPERTIES ( 'skip.header.line.count'='1', 'skip.footer.line.count'='1');" beeline -e " SET hive.fetch.task.conversion = none; SELECT * FROM default.bz2tst2;" +---+++ | bz2tst2.sequence | bz2tst2.id | bz2tst2.other | +---+++ | offset| id | other | | 9 | 20200315 X00 1356 | 123| | 17| 20200315 X00 1357 | 123| | rst | rst| rst| +---+++ {noformat} PS: HIVE-22769 addressed the issue for Hive on LLAP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24192) Properly log TaskExecutorService eviction details
Panagiotis Garefalakis created HIVE-24192: - Summary: Properly log TaskExecutorService eviction details Key: HIVE-24192 URL: https://issues.apache.org/jira/browse/HIVE-24192 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis HIVE-23122 introduced task eviction logging but the log condition is problematic -- this ticket fixes the issue below (when debug is ON, info is also ON causing Debug info never to show up. Also adds a separate logger conf for TaskExecutorService {code:java} if (LOG.isInfoEnabled()) { ... } else if (LOG.isDebugEnabled(){ ... } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24185) Upgrade snappy-java to 1.1.7.5
Panagiotis Garefalakis created HIVE-24185: - Summary: Upgrade snappy-java to 1.1.7.5 Key: HIVE-24185 URL: https://issues.apache.org/jira/browse/HIVE-24185 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Bump version to take advantage of perf improvements, glibc compatibility etc. https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-117-2017-11-30 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23914) Support Char, VarChar, Small/tiny Int for Struct IN clause
Panagiotis Garefalakis created HIVE-23914: - Summary: Support Char, VarChar, Small/tiny Int for Struct IN clause Key: HIVE-23914 URL: https://issues.apache.org/jira/browse/HIVE-23914 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23913) Support Date, Decimal and Timestamp for Struct IN clause
Panagiotis Garefalakis created HIVE-23913: - Summary: Support Date, Decimal and Timestamp for Struct IN clause Key: HIVE-23913 URL: https://issues.apache.org/jira/browse/HIVE-23913 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23912) Extend vectorization support for Struct IN() clause
Panagiotis Garefalakis created HIVE-23912: - Summary: Extend vectorization support for Struct IN() clause Key: HIVE-23912 URL: https://issues.apache.org/jira/browse/HIVE-23912 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Currently Struct IN() Vectorization does not support all writable type. As a result Operators using such conditions fail to vectorize: for example we support String type but not Char or Varchar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
Panagiotis Garefalakis created HIVE-23882: - Summary: Compiler should skip MJ keyExpr for probe optimization Key: HIVE-23882 URL: https://issues.apache.org/jira/browse/HIVE-23882 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis In probe we cannot currently support Key expressions (on the big table Side) as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at that level). TezCompiler should take this into account when picking MJs to push probe details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
Panagiotis Garefalakis created HIVE-23871: - Summary: ObjectStore should properly handle MicroManaged Table properties Key: HIVE-23871 URL: https://issues.apache.org/jira/browse/HIVE-23871 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23854) Natively support Double and Decimal CVs in ReduceSinkOperator
Panagiotis Garefalakis created HIVE-23854: - Summary: Natively support Double and Decimal CVs in ReduceSinkOperator Key: HIVE-23854 URL: https://issues.apache.org/jira/browse/HIVE-23854 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23852) Natively support Date and Timestamp types in ReduceSink operator
Panagiotis Garefalakis created HIVE-23852: - Summary: Natively support Date and Timestamp types in ReduceSink operator Key: HIVE-23852 URL: https://issues.apache.org/jira/browse/HIVE-23852 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis There is no native support currently meaning that these types end up being serialized as multi-key columns which is much slower (iterating through batch columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23823) Fix violated naming conventions
Panagiotis Garefalakis created HIVE-23823: - Summary: Fix violated naming conventions Key: HIVE-23823 URL: https://issues.apache.org/jira/browse/HIVE-23823 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Like the violated method naming conventions in PerfLogger class See: https://github.com/apache/hive/pull/1161/files/dbed3ff5e69d81cedae9c1254a90326d26a19d63#diff-bdbfbb8352f29fd90c559eef871f4853R137 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23773) Support multi-key probe MapJoins
Panagiotis Garefalakis created HIVE-23773: - Summary: Support multi-key probe MapJoins Key: HIVE-23773 URL: https://issues.apache.org/jira/browse/HIVE-23773 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23734) Untangle LlapRecordReader Includes construction
Panagiotis Garefalakis created HIVE-23734: - Summary: Untangle LlapRecordReader Includes construction Key: HIVE-23734 URL: https://issues.apache.org/jira/browse/HIVE-23734 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23733) [LLAP] Extend InputFormat to genIncludedColNames
Panagiotis Garefalakis created HIVE-23733: - Summary: [LLAP] Extend InputFormat to genIncludedColNames Key: HIVE-23733 URL: https://issues.apache.org/jira/browse/HIVE-23733 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Extend LLAP ORCInputFormat to generate includedColNames -- similar to what we currently do for generating includedColIds. This will enable us to do ColNames->ReaderId mapping when we need to apply filters and we are only aware of colNames (e.g., HIVE-23730) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
Panagiotis Garefalakis created HIVE-23730: - Summary: Compiler support tracking TS keyColName for Probe MapJoin Key: HIVE-23730 URL: https://issues.apache.org/jira/browse/HIVE-23730 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis TezCompiler needs to track the original TS key columnName used for MJ probedecode. Even thought we know the MJ keyCol at compile time, this could be generated by previous (parent) operators thus we dont always know the original TS column it maps to. To find the original columnMapping, we need to track the MJ keyCol through the operator pipeline. The tracking is done through the output column name to input expression Map of the operators. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23698) Compiler support for row-level filtering on filterPredicates
Panagiotis Garefalakis created HIVE-23698: - Summary: Compiler support for row-level filtering on filterPredicates Key: HIVE-23698 URL: https://issues.apache.org/jira/browse/HIVE-23698 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23658) Fix FindBug issues in hive-kudu-handler
Panagiotis Garefalakis created HIVE-23658: - Summary: Fix FindBug issues in hive-kudu-handler Key: HIVE-23658 URL: https://issues.apache.org/jira/browse/HIVE-23658 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23656) Fix FindBug issues in llap-server
Panagiotis Garefalakis created HIVE-23656: - Summary: Fix FindBug issues in llap-server Key: HIVE-23656 URL: https://issues.apache.org/jira/browse/HIVE-23656 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23657) Fix FindBug issues in hive-shims
Panagiotis Garefalakis created HIVE-23657: - Summary: Fix FindBug issues in hive-shims Key: HIVE-23657 URL: https://issues.apache.org/jira/browse/HIVE-23657 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23655) Fix FindBug issues in llap-tez
Panagiotis Garefalakis created HIVE-23655: - Summary: Fix FindBug issues in llap-tez Key: HIVE-23655 URL: https://issues.apache.org/jira/browse/HIVE-23655 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23654) Fix FindBug issues in llap-ext-client
Panagiotis Garefalakis created HIVE-23654: - Summary: Fix FindBug issues in llap-ext-client Key: HIVE-23654 URL: https://issues.apache.org/jira/browse/HIVE-23654 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23653) Fix FindBug issues in llap-client
Panagiotis Garefalakis created HIVE-23653: - Summary: Fix FindBug issues in llap-client Key: HIVE-23653 URL: https://issues.apache.org/jira/browse/HIVE-23653 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23652) Fix FindBug issues in llap-common
Panagiotis Garefalakis created HIVE-23652: - Summary: Fix FindBug issues in llap-common Key: HIVE-23652 URL: https://issues.apache.org/jira/browse/HIVE-23652 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23651) Fix FindBug issues in hive-service
Panagiotis Garefalakis created HIVE-23651: - Summary: Fix FindBug issues in hive-service Key: HIVE-23651 URL: https://issues.apache.org/jira/browse/HIVE-23651 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23650) Fix FindBug issues in hive-streaming
Panagiotis Garefalakis created HIVE-23650: - Summary: Fix FindBug issues in hive-streaming Key: HIVE-23650 URL: https://issues.apache.org/jira/browse/HIVE-23650 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23649) Fix FindBug issues in hive-service-rpc
Panagiotis Garefalakis created HIVE-23649: - Summary: Fix FindBug issues in hive-service-rpc Key: HIVE-23649 URL: https://issues.apache.org/jira/browse/HIVE-23649 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23648) Fix FindBug issues in hive-serde
Panagiotis Garefalakis created HIVE-23648: - Summary: Fix FindBug issues in hive-serde Key: HIVE-23648 URL: https://issues.apache.org/jira/browse/HIVE-23648 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23646) Fix FindBug issues in hive-ql
Panagiotis Garefalakis created HIVE-23646: - Summary: Fix FindBug issues in hive-ql Key: HIVE-23646 URL: https://issues.apache.org/jira/browse/HIVE-23646 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23647) Fix FindBug issues in hive-parser
Panagiotis Garefalakis created HIVE-23647: - Summary: Fix FindBug issues in hive-parser Key: HIVE-23647 URL: https://issues.apache.org/jira/browse/HIVE-23647 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23645) Fix FindBug issues in hive-metastore
Panagiotis Garefalakis created HIVE-23645: - Summary: Fix FindBug issues in hive-metastore Key: HIVE-23645 URL: https://issues.apache.org/jira/browse/HIVE-23645 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23643) Fix FindBug issues in hive-hplsql
Panagiotis Garefalakis created HIVE-23643: - Summary: Fix FindBug issues in hive-hplsql Key: HIVE-23643 URL: https://issues.apache.org/jira/browse/HIVE-23643 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23644) Fix FindBug issues in hive-jdbc
Panagiotis Garefalakis created HIVE-23644: - Summary: Fix FindBug issues in hive-jdbc Key: HIVE-23644 URL: https://issues.apache.org/jira/browse/HIVE-23644 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23642) Fix FindBug issues in hive-jdbc-handler
Panagiotis Garefalakis created HIVE-23642: - Summary: Fix FindBug issues in hive-jdbc-handler Key: HIVE-23642 URL: https://issues.apache.org/jira/browse/HIVE-23642 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23641) Fix FindBug issues in hive-hbase-handler
Panagiotis Garefalakis created HIVE-23641: - Summary: Fix FindBug issues in hive-hbase-handler Key: HIVE-23641 URL: https://issues.apache.org/jira/browse/HIVE-23641 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23640) Fix FindBug issues in hive-druid-handler
Panagiotis Garefalakis created HIVE-23640: - Summary: Fix FindBug issues in hive-druid-handler Key: HIVE-23640 URL: https://issues.apache.org/jira/browse/HIVE-23640 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23639) Fix FindBug issues in hive-contrib
Panagiotis Garefalakis created HIVE-23639: - Summary: Fix FindBug issues in hive-contrib Key: HIVE-23639 URL: https://issues.apache.org/jira/browse/HIVE-23639 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23638) Fix FindBug issues in hive-common
Panagiotis Garefalakis created HIVE-23638: - Summary: Fix FindBug issues in hive-common Key: HIVE-23638 URL: https://issues.apache.org/jira/browse/HIVE-23638 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23637) Fix FindBug issues in cli
Panagiotis Garefalakis created HIVE-23637: - Summary: Fix FindBug issues in cli Key: HIVE-23637 URL: https://issues.apache.org/jira/browse/HIVE-23637 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23636) Fix FindBug issues in beeline
Panagiotis Garefalakis created HIVE-23636: - Summary: Fix FindBug issues in beeline Key: HIVE-23636 URL: https://issues.apache.org/jira/browse/HIVE-23636 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23635) Fix FindBug issues in vector-code-gen
Panagiotis Garefalakis created HIVE-23635: - Summary: Fix FindBug issues in vector-code-gen Key: HIVE-23635 URL: https://issues.apache.org/jira/browse/HIVE-23635 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23634) Fix FindBug issues in storage-api accumulo-handler
Panagiotis Garefalakis created HIVE-23634: - Summary: Fix FindBug issues in storage-api accumulo-handler Key: HIVE-23634 URL: https://issues.apache.org/jira/browse/HIVE-23634 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23620) Explore moving to SpotBugs
Panagiotis Garefalakis created HIVE-23620: - Summary: Explore moving to SpotBugs Key: HIVE-23620 URL: https://issues.apache.org/jira/browse/HIVE-23620 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis We may want to switch to [SpoBugs|https://github.com/spotbugs/spotbugs] -- the spiritual successor of FindBugs, carrying on from the point where it left off with support of its community SpotBugs is in a reality a fork of FindBugs: https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2016-November/004321.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23617) Fix FindBug issues on storage-api
Panagiotis Garefalakis created HIVE-23617: - Summary: Fix FindBug issues on storage-api Key: HIVE-23617 URL: https://issues.apache.org/jira/browse/HIVE-23617 Project: Hive Issue Type: Sub-task Components: storage-api Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23561) FIX Arrow Decimal serialization for VectorRowBatches
Panagiotis Garefalakis created HIVE-23561: - Summary: FIX Arrow Decimal serialization for VectorRowBatches Key: HIVE-23561 URL: https://issues.apache.org/jira/browse/HIVE-23561 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23554) [LLAP] ReadPipeline support for ColumnVectorBatch with FilterContext
Panagiotis Garefalakis created HIVE-23554: - Summary: [LLAP] ReadPipeline support for ColumnVectorBatch with FilterContext Key: HIVE-23554 URL: https://issues.apache.org/jira/browse/HIVE-23554 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Currently the readPipeline in LLAP supports consuming ColumnVectorBatches. As each batch can be now tied with a Filter (HIVE-23215) we should update the pipeline to consume BatchWrappers of ColumnVectorBatch and a Filter instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23553) Bump ORC version
Panagiotis Garefalakis created HIVE-23553: - Summary: Bump ORC version Key: HIVE-23553 URL: https://issues.apache.org/jira/browse/HIVE-23553 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Make sure we are using one of the more recent orc.versions that include row-filtering (ORC-577 , ORC-622) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23537) RecordReader support for row-filtering
Panagiotis Garefalakis created HIVE-23537: - Summary: RecordReader support for row-filtering Key: HIVE-23537 URL: https://issues.apache.org/jira/browse/HIVE-23537 Project: Hive Issue Type: Sub-task Components: llap, Reader Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis ORC-577 enables row-level filtering for the ORC format while HIVE-23167 is aiming to extend the existing compiler logic and push filters further down the pipeline wherever possible. In this jira we extend the HIVE Record readers to utilize the above filtering functionality (similar to what we already do for PPD). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23486) Compiler support for row-level filtering on transactional deletes
Panagiotis Garefalakis created HIVE-23486: - Summary: Compiler support for row-level filtering on transactional deletes Key: HIVE-23486 URL: https://issues.apache.org/jira/browse/HIVE-23486 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23475) Track MJ HashTable mem usage
Panagiotis Garefalakis created HIVE-23475: - Summary: Track MJ HashTable mem usage Key: HIVE-23475 URL: https://issues.apache.org/jira/browse/HIVE-23475 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23393) LLapInputFormat reader policy for Random IO formats
Panagiotis Garefalakis created HIVE-23393: - Summary: LLapInputFormat reader policy for Random IO formats Key: HIVE-23393 URL: https://issues.apache.org/jira/browse/HIVE-23393 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Extension of HIVE-23158 for LLAPInputFormat -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23375) Track MJ HashTable Load time
Panagiotis Garefalakis created HIVE-23375: - Summary: Track MJ HashTable Load time Key: HIVE-23375 URL: https://issues.apache.org/jira/browse/HIVE-23375 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Introduce TezCounter to track MJ HashTable Load time -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23352) More user friendly StorageAuthorizationProvider log messages
Panagiotis Garefalakis created HIVE-23352: - Summary: More user friendly StorageAuthorizationProvider log messages Key: HIVE-23352 URL: https://issues.apache.org/jira/browse/HIVE-23352 Project: Hive Issue Type: Improvement Components: Security Affects Versions: 4.0.0 Reporter: Panagiotis Garefalakis Currently *StorageBasedAuthorizationProvider* returns messages (like below) about data paths even for _External_ tables where a drop command would just remove metadata. Lets make those messages more user-friendly. {code:java} Permission Denied: User hive can't delete hdfs://XXX.com:8020/tmp/testuser because sticky bit is set on the parent dir and user does not own this file or its parent) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23238) FIX PreemptionQueueComparator edge cases
Panagiotis Garefalakis created HIVE-23238: - Summary: FIX PreemptionQueueComparator edge cases Key: HIVE-23238 URL: https://issues.apache.org/jira/browse/HIVE-23238 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Fix For: llap Properly handle preemption comparator edge cases where tasks are same type and have the same number or upstream tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23214) Remove skipCorrupt from OrcEncodedDataConsumer
Panagiotis Garefalakis created HIVE-23214: - Summary: Remove skipCorrupt from OrcEncodedDataConsumer Key: HIVE-23214 URL: https://issues.apache.org/jira/browse/HIVE-23214 Project: Hive Issue Type: Improvement Components: llap Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis SkipCorrupt is always the default (false) so there is no reason to pass it around. [https://github.com/apache/hive/blob/3e4f6122c32b1ffa22e1458806ae8ee30e51a41f/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java#L86 ] If we want to change the default behaviour we could set "orc.skip.corrupt.data" as part of the configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23170) Probe support for ORC DataConsumer
Panagiotis Garefalakis created HIVE-23170: - Summary: Probe support for ORC DataConsumer Key: HIVE-23170 URL: https://issues.apache.org/jira/browse/HIVE-23170 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23169) Probe runtime support for LLAP
Panagiotis Garefalakis created HIVE-23169: - Summary: Probe runtime support for LLAP Key: HIVE-23169 URL: https://issues.apache.org/jira/browse/HIVE-23169 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23168) Implement MJ HashTable contains key functionality
Panagiotis Garefalakis created HIVE-23168: - Summary: Implement MJ HashTable contains key functionality Key: HIVE-23168 URL: https://issues.apache.org/jira/browse/HIVE-23168 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23167) Extend compiler support for Probe static filters
Panagiotis Garefalakis created HIVE-23167: - Summary: Extend compiler support for Probe static filters Key: HIVE-23167 URL: https://issues.apache.org/jira/browse/HIVE-23167 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23166) Protect VGB from flushing too often
Panagiotis Garefalakis created HIVE-23166: - Summary: Protect VGB from flushing too often Key: HIVE-23166 URL: https://issues.apache.org/jira/browse/HIVE-23166 Project: Hive Issue Type: Improvement Components: llap Affects Versions: 4.0.0 Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis The existing flush logic in our VectorGroupByOperator is completely static. It depends on the: number of HtEntries (*hive.vectorized.groupby.maxentries*) and the MAX memory threshold (by default 90% of available memory) Assuming that we are not memory constrained the periodicity of flushing is currently dictated by the static number of entries (1M by default) which can be also misconfigured to a very low value. I am proposing along with maxHtEntries, to also take into account current memory usage, to avoid flushing too ofter as it can hurt op throughput for particular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23158) Optimize S3A recordReader policy for Random IO formats
Panagiotis Garefalakis created HIVE-23158: - Summary: Optimize S3A recordReader policy for Random IO formats Key: HIVE-23158 URL: https://issues.apache.org/jira/browse/HIVE-23158 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis S3A filesystem client (inherited by Hadoop) supports the notion of input policies. These policies tune the behaviour of HTTP requests that are used for reading different filetypes such as TEXT or ORC. For formats such as ORC and Parquet do a lot of seek operations, thus there is an optimized RANDOM mode that reads files only partially instead of fully (default). I am suggesting to add some extra logic as part of HiveInputFormat to make sure we optimize for random IO when data is stored on S3A using formats such as ORC or Parquet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23036) Incorrect ORC PPD eval with sub-millisecond timestamps
Panagiotis Garefalakis created HIVE-23036: - Summary: Incorrect ORC PPD eval with sub-millisecond timestamps Key: HIVE-23036 URL: https://issues.apache.org/jira/browse/HIVE-23036 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details ORC stores timestamps with: - nanosecond precision for the data itself - milliseconds precision for min-max statistics As both min and max are rounded to the same value, timestamps with ns precision will not pass the PPD evaluator. {code:java} create table tsstat (ts timestamp) stored as orc; insert into tsstat values ("1970-01-01 00:00:00.0005"); select * from tsstat where ts = "1970-01-01 00:00:00.0005"; -- returned 0 rows{code} ORC PPD evaluation currently happens as part of OrcInputFormat [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23006) Compiler support for Probe MapJoin
Panagiotis Garefalakis created HIVE-23006: - Summary: Compiler support for Probe MapJoin Key: HIVE-23006 URL: https://issues.apache.org/jira/browse/HIVE-23006 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22959) Extend storage-api to expose FilterContext
Panagiotis Garefalakis created HIVE-22959: - Summary: Extend storage-api to expose FilterContext Key: HIVE-22959 URL: https://issues.apache.org/jira/browse/HIVE-22959 Project: Hive Issue Type: Sub-task Components: storage-api Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis To enable row-level filtering at the ORC level ORC-577, or as an extension ProDecode MapJoin HIVE-22731 we need a common context class that will hold all the needed information for the filter. I propose this class to be part of the storage-api – similar to VectorizedRowBatch class and hold the information below: * A boolean variable showing if the filter is enabled * A int array storing the row Ids that are actually selected (passing the filter) * An int variable storing the the number or rows that passed the filter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22958) Extend storage-api to expose FilterContext
Panagiotis Garefalakis created HIVE-22958: - Summary: Extend storage-api to expose FilterContext Key: HIVE-22958 URL: https://issues.apache.org/jira/browse/HIVE-22958 Project: Hive Issue Type: New Feature Components: storage-api Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis To enable row-level filtering at the ORC level ORC-577, or as an extension ProDecode MapJoin HIVE-22731 we need a common context class that will hold all the needed information for the filter. I propose this class to be part of the storage-api – similar to VectorizedRowBatch class and hold the information below: * A boolean variable showing if the filter is enabled * A int array storing the row Ids that are actually selected (passing the filter) * An int variable storing the the number or rows that passed the filter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22731) Use MapJoin hashtables for row level filtering
Panagiotis Garefalakis created HIVE-22731: - Summary: Use MapJoin hashtables for row level filtering Key: HIVE-22731 URL: https://issues.apache.org/jira/browse/HIVE-22731 Project: Hive Issue Type: Bug Components: Hive, llap Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Attachments: decode_time_bars.pdf Currently, RecordReaders such as ORC support filtering at coarser-grained levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They only filter sets of rows if they can guarantee that none of the rows can pass a filter (usually given as searchable argument). However, a significant amount of time can be spend deconding rows with multiple columns that are not even used in the final result. See figure where original is what happens today and in LazyDecode we skip decoding rows that do not much the key. To enable a more fine-grained filtering in the particular case of a MapJoin we could utilize the key HashTable created from the smaller table to skip deserializing row columns at the larger table that do not match any key and thus save CPU time. This Jira investigates this direction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22505) ClassCastException caused by wrong Vectorized operator selection
Panagiotis Garefalakis created HIVE-22505: - Summary: ClassCastException caused by wrong Vectorized operator selection Key: HIVE-22505 URL: https://issues.apache.org/jira/browse/HIVE-22505 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Attachments: query_error.out, query_vector_explain.out VectorMapJoinOuterFilteredOperator does not currently support full outer joins but using the current Vectorizer logic it can be selected when a there is a filter involved. This can make queries fail with ClassCastException when their data and metadata in the VectorMapJoinOuterFilteredOperator do not match. The query attached demonstrates the issue and the log attached shows the java.lang.ClassCastException -- This message was sent by Atlassian Jira (v8.3.4#803005)