[jira] [Created] (HIVE-23218) LlapRecordReader queue limit computation is not optimal

2020-04-15 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23218:
---

 Summary: LlapRecordReader queue limit computation is not optimal
 Key: HIVE-23218
 URL: https://issues.apache.org/jira/browse/HIVE-23218
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Rajesh Balamohan


After decoding {{OrcEncodedDataConsumer::decodeBatch}}, data is enqueued into a 
queue in LlapRecordReader. Queue limit for this queue is determined in 
LlapRecordReader. If it is minimal, it ends up waiting for 100ms until it gets 
capacity.

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L168

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L590

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L260

{{determineQueueLimit}} takes into consideration all columns though only few 
columns are needed for projection. Here is an example.

{noformat}

create table test_acid(a1 string, a2 string, a3 string, a4 string, a5 string, 
a6 string, a7 string, a8 string, a9 string, a10 string,
a11 string, a22 string, a33 string, a44 string, a55 string, a66 string, a77 
string, a88 string, a99 string, a100 string,
a111 decimal(25,2), a222 decimal(25,2), a333 decimal(25,2), a444 decimal(25,2), 
a555 decimal(25,2), a666 decimal(25,2), a777 decimal(25,2),
 a888 decimal(25,2), a999 decimal(25,2), a1000 decimal(25,2)) stored as orc;

insert into table test_acid values 
("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10",
"a11","a22","a33","a44","a55","a66","a77","a88","a99","a100",
10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23
);

select a44, count(*) from test_acid where a44 like "a4%" group by a44 order by 
a44;

{noformat}

For this query, queue size predicted would be "138" as it takes into account 
all fields instead of just 2. This would causes unwanted delays in adding data 
to the queue.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23217) Move q tests to TestMiniLlapLocal from TestCliDriver where the output is unchanged

2020-04-15 Thread Miklos Gergely (Jira)
Miklos Gergely created HIVE-23217:
-

 Summary: Move q tests to TestMiniLlapLocal from TestCliDriver 
where the output is unchanged
 Key: HIVE-23217
 URL: https://issues.apache.org/jira/browse/HIVE-23217
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Miklos Gergely
Assignee: Miklos Gergely






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23216) Add new api as replacement of get_partitions_by_expr to return PartitionSpec instead of Partitions

2020-04-15 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23216:
--

 Summary: Add new api as replacement of get_partitions_by_expr to 
return PartitionSpec instead of Partitions
 Key: HIVE-23216
 URL: https://issues.apache.org/jira/browse/HIVE-23216
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23215) Make FilterContext and MutableFilterContext interfaces

2020-04-15 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-23215:


 Summary: Make FilterContext and MutableFilterContext interfaces
 Key: HIVE-23215
 URL: https://issues.apache.org/jira/browse/HIVE-23215
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-22959 introduced FilterContext to support ORC-577. The duplication of 
fields between the FilterContext and VectorizedRowBatch seems likely to cause 
user confusion. This patch makes them interfaces that VectorizedRowBatch 
implements.

Thus, there is a single copy of the data and no need to copy them back and 
forth. LLAP can make its own implementation of the interfaces if it doesn't 
want to use VectorizedRowBatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Remove REGEX Column Specification

2020-04-15 Thread David Mollitor
I've got all tests passing on this.

Are there other questions?

Is anyone willing to +1 ?

Thanks.

On Tue, Apr 14, 2020, 12:28 PM David Mollitor  wrote:

> Hey Zoltan,
>
> Thanks for the feedback and for sharing HIVE-16496.
>
> I think HIVE-16496 is a better approach because it allows for the standard
> SQL behavior of object identifiers, but the SQL syntax is expanded (instead
> of overloaded) to provide this feature.
>
> Also, if a user would like to do some sort of regex, they can query the
> information_schema (if/when Hive gets that).
>
> Also, I just re-read my previous email and I do apologize, I provided the
> wrong jira.  The correct one for removal is:
>
> https://issues.apache.org/jira/browse/HIVE-23176
>
> Thanks.
>
>
>
> David
>
> On Tue, Apr 14, 2020 at 12:16 PM Zoltan Haindrich  wrote:
>
>> Hey,
>>
>> I don't want to protect this feature - but I think it could be usefull;
>> probably it would be ok to remove it but we should provide something else
>> instead - I think this is
>> the only way to "exclude" some specific columns from the output - without
>> listing all the columns.
>>
>> How much are users actually use this feature?
>>
>> We had a somewhat related discussion a few years ago:
>> https://issues.apache.org/jira/browse/HIVE-16496
>>
>> cheers,
>> Zoltan
>>
>> On 4/13/20 3:56 PM, David Mollitor wrote:
>> > Hello Gang,
>> >
>> > I've been tracking a lot of issues recently regarding qualified tables
>> > names, qualified table names, table names using back ticks, and other
>> > similar circumstances.
>> >
>> > I've looked into trying to address some of these and noted that these
>> issue
>> > goes way back and are go all the way down to the core of Hive.
>> >
>> > To start with, I wanted to use the ANTLR grammar to address some of
>> these
>> > issues and to standardize behavior across all queries.  For example,
>> there
>> > is currently a patch that disallows table names from having a 'dot' in
>> the
>> > name.  I'm not 100% sure it applies to all queries, so  I wanted to
>> codify
>> > this restriction in the parser grammar.  So it got me looking at the
>> > grammar.
>> >
>> > In parallel, I also tried to build a supplemental parser in Java for
>> > parsing table names (HIVE-23150) and I was hitting some weird, and
>> > confusing, edge cases bubbling up from the parser.  I eventually traced
>> it
>> > back to the fact that there are a lot of weird rules around table names
>> in
>> > the grammar including something called "REGEX Column Specification."
>> >
>> > This feature is problematic as it blindly labels most table names as
>> being
>> > a regex.  It really should only apply to column names, but the grammar
>> > defines a table name as also possibly being a regex. There is a lot of
>> > ambiguity because a table named "a" could be a literal value or a legal
>> > regex.  When a table name is defined as a regex, a different code path
>> is
>> > taken from when a table name is considered to be a literal value. Where
>> I
>> > first saw this issue was in a qtest where a table name `s/c` was
>> producing
>> > a different result than a table named `s+c`.
>> >
>> > This regex feature is not something I've seen in MySQL or Postgres.  In
>> > MySQL, any table name surrounded with a back tick can be just about any
>> > UTF-8 character, so it's not really feasible to tell, without some kind
>> of
>> > SQL hint, that this table name is a regex or a literal value.
>> >
>> > This feature adds a lot of ambiguity and complexity, it is not
>> supported by
>> > other major RDBMS, and it adds only very minor benefit.  I also hope to
>> > move Hive in a direction of fully supporting UTF-8.
>> >
>> > I have put a patch up to remove it:
>> > https://issues.apache.org/jira/browse/HIVE-23183
>> >
>> >
>> > References:
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
>> >
>> >
>> > https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
>> >
>> >
>> > Thanks,
>> > David
>> >
>>
>


[jira] [Created] (HIVE-23214) Remove skipCorrupt from OrcEncodedDataConsumer

2020-04-15 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23214:
-

 Summary: Remove skipCorrupt from OrcEncodedDataConsumer
 Key: HIVE-23214
 URL: https://issues.apache.org/jira/browse/HIVE-23214
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


SkipCorrupt is always the default (false) so there is no reason to pass it 
around.

[https://github.com/apache/hive/blob/3e4f6122c32b1ffa22e1458806ae8ee30e51a41f/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java#L86

]

If we want to change the default behaviour we could set "orc.skip.corrupt.data" 
as part of the configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23213) HiveStrictManagedMigration should handle legacy Kudu tables

2020-04-15 Thread Jira
Ádám Szita created HIVE-23213:
-

 Summary: HiveStrictManagedMigration should handle legacy Kudu 
tables
 Key: HIVE-23213
 URL: https://issues.apache.org/jira/browse/HIVE-23213
 Project: Hive
  Issue Type: Improvement
Reporter: Ádám Szita
Assignee: Ádám Szita
 Attachments: HIVE-23213.0.patch

As storage handler for Kudu backed legacy tables might be 
"com.cloudera.kudu.hive.KuduStorageHandler" set. This needs to be upgraded to 
org.apache.hadoop.hive.kudu.KuduStorageHandler as that's the actual class 
shipped with Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Enable Github Autolink feature

2020-04-15 Thread Panos Garefalakis
Hello all,

Github's autolink feature

recognizes patterns like `HIVE-XXX` in commits or PRs
and can automatically link to an issue tracking system like
https://issues.apache.org/jira/browse/HIVE-XXX.
It's already being used in a variety of open source projects and can save
us a bunch of time navigating through issues and commits.

Shall we enable this feature?

Cheers,
Panagiotis


[jira] [Created] (HIVE-23212) SemanticAnalyzer::getStagingDirectoryPathname should check for encryption zone only when needed

2020-04-15 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23212:
---

 Summary: SemanticAnalyzer::getStagingDirectoryPathname should 
check for encryption zone only when needed
 Key: HIVE-23212
 URL: https://issues.apache.org/jira/browse/HIVE-23212
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2572]

 

When cluster does not have encryption zones configured, this ends up making 2 
calls to NN unnecessarily. It would be good to guard it with config or check 
for the KMS config from HDFS and invoke it on need basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-15 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23211:
---

 Summary: Fix metastore schema differences between init scripts, 
and upgrade scripts
 Key: HIVE-23211
 URL: https://issues.apache.org/jira/browse/HIVE-23211
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


There are some differences (character encoding, defaults etc..) in metastore 
schema if we initialize using the init scripts, or upgrade using the upgrade 
scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-15 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23210:
---

 Summary: Fix shortestjobcomparator when jobs submitted have 1 task 
their vertices
 Key: HIVE-23210
 URL: https://issues.apache.org/jira/browse/HIVE-23210
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


In latency sensitive queries, lots of jobs can have vertices with 1 task. 
Currently shortestjobcomparator does not work correctly and returns tasks in 
random order.

[https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]

This causes delay in the job runtime. I will attach a simple test case shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23209) ptest2 compilation failure after HIVE-21603

2020-04-15 Thread Jira
László Bodor created HIVE-23209:
---

 Summary: ptest2 compilation failure after HIVE-21603
 Key: HIVE-23209
 URL: https://issues.apache.org/jira/browse/HIVE-23209
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)