Re: Review Request 68099: SerDe to support Teradata Binary Format

2018-08-06 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/#review206929
---



Please note that while I have not commented on every occurrence of the 
following issues, I would still like them all to be fixed:
 * Unnecessary 'else' clauses
 * Unnecessary uses of 'this'
 * RuntimeExceptions which should be replaced with checked exceptions.


contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
Lines 1 (patched)


I think this code and the other files in this patch belong in the serde 
module. Please move the code to 
serde/src/java/org/apache/hadoop/hive/serde2/teradata



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
Lines 71 (patched)


This should be a static final class variable, i.e:

static final byte RECORD_END_BYTE = ...



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 74 (patched)


Please avoid unnecessary uses of "this", both in this file and others in 
the patch.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 89 (patched)


Change message to "Input file is compressed. Using compression code %s"



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 92 (patched)


Please remove or change message to "The input file is not compressed".



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 116 (patched)


static import String.format() in order to avoid constantly using the 
"String." prefix.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 155 (patched)


Magic constants (e.g. "0x0a") should be defined in one place (e.g. 
TeradataConstants.java) as a static final variable.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 197 (patched)


Remove



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 211 (patched)


Please make this more readable by replace the ?: operator with equivalent 
if(){} else {} code.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 101 (patched)


Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 122 (patched)


This else clause is unnecessary.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 105 (patched)


This else clause is unnecessary if you explicitly return from the previous 
block.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 148 (patched)


Consider breaking this into multiple lines for improved readability:

int toWrite = date.get().getYear() * 1 +
  date.get().getMonth() * 100 +
  ...



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 178 (patched)


Add explicit return and remove unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 184 (patched)


Instead of logging this info separately, I think it would make more sense 
to include this in the exception message.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 191 (patched)


Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 190 (patched)


Is this worth logging? If so, consider changing the log level to DEBUG.




HIVE-20220: Incorrect result when hive.groupby.skewindata is enabled

2018-08-06 Thread Ganesha Shreedhara
Hi Team,

I found a corner case bug with *hive.groupby.skewindata* configuration
parameter as explained in the following jira.

https://issues.apache.org/jira/browse/HIVE-20220

I need some help in reviewing the fix.

RB request: https://reviews.apache.org/r/68121/


Thanks,
Ganesha


[jira] [Created] (HIVE-20328) Reenable: TestMiniDruidCliDriver

2018-08-06 Thread Matt McCline (JIRA)
Matt McCline created HIVE-20328:
---

 Summary: Reenable: TestMiniDruidCliDriver
 Key: HIVE-20328
 URL: https://issues.apache.org/jira/browse/HIVE-20328
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: slim bouguerra


Reenable tests disabled in HIVE-20322.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20327) Compactor should gracefully handle 0 length files and invalid orc files

2018-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20327:
-

 Summary: Compactor should gracefully handle 0 length files and 
invalid orc files
 Key: HIVE-20327
 URL: https://issues.apache.org/jira/browse/HIVE-20327
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Older versions of Streaming API did not handle interrupts well and could leave 
0-length ORC files behind which cannot be read.

These should be just skipped.

Other cases of file where ORC Reader cannot be created
1. regular write (1 txn delta) where the client died and didn't properly close 
the file - this delta should be aborted and never read
2. streaming ingest write (delta_x_y, x < y).  There should always be a side 
file if the file was not closed properly. (though it may still indicate that 
length is 0)


If we check these cases and still can't create a reader, it should not silently 
skip the file since the system thinks it contains at least some committed data 
but the file is corrupted (and the side file doesn't point at a valid footer) - 
we should never be in this situation and we should throw so that the end user 
can try manual intervention (where the only option may be deleting the file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20326) Create constraints with RELY as default instead of NO RELY

2018-08-06 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20326:
--

 Summary: Create constraints with RELY as default instead of NO RELY
 Key: HIVE-20326
 URL: https://issues.apache.org/jira/browse/HIVE-20326
 Project: Hive
  Issue Type: Task
Reporter: Vineet Garg
Assignee: Vineet Garg
 Attachments: HIVE-20326.1.patch

Currently constraints such as NOT NULL, CHECK are created with ENABLE and NO 
RELY as default, instead it should be created with ENABLE and RELY as default 
so that optimizer could take advantage of these constraints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20325) FlakyTest: TestMiniDruidCliDriver

2018-08-06 Thread Matt McCline (JIRA)
Matt McCline created HIVE-20325:
---

 Summary: FlakyTest: TestMiniDruidCliDriver
 Key: HIVE-20325
 URL: https://issues.apache.org/jira/browse/HIVE-20325
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


TestMiniDruidCliDriver is failing intermittently a significant percentage of 
the time.

druid_timestamptz
druidmini_joins
druidmini_masking
druidmini_test1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20324) change hive.compactor.max.num.delta default to 50

2018-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20324:
-

 Summary: change hive.compactor.max.num.delta default to 50
 Key: HIVE-20324
 URL: https://issues.apache.org/jira/browse/HIVE-20324
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman


current default is 500 - this is way to hight.  OOM is likely at 50 or so.
Need to update the default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20323) Update desc formatted/extended table to show if constraint is enabled or disabled and rely/norely

2018-08-06 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20323:
--

 Summary: Update desc formatted/extended table to show if 
constraint is enabled or disabled and rely/norely
 Key: HIVE-20323
 URL: https://issues.apache.org/jira/browse/HIVE-20323
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently {{desc formatted }} do not show if a constraint is 
enabled or disable (or Rely or no rely). It is hard to figure out if a 
constraint is enabled or disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #409: HIVE-20283: Logs may be directed to 2 files if --hiv...

2018-08-06 Thread beltran
GitHub user beltran opened a pull request:

https://github.com/apache/hive/pull/409

HIVE-20283: Logs may be directed to 2 files if --hiveconf hive.log.fi…

…le is used (metastore)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/beltran/hive HIVE-20283

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #409


commit a60c2297aa0639cf2fe4c68aa9cc44a550dd3384
Author: Jaume Marhuenda 
Date:   2018-08-06T20:01:04Z

HIVE-20283: Logs may be directed to 2 files if --hiveconf hive.log.file is 
used (metastore)




---


[jira] [Created] (HIVE-20322) FlakyTest: TestMiniDruidCliDriver

2018-08-06 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20322:
---

 Summary: FlakyTest: TestMiniDruidCliDriver
 Key: HIVE-20322
 URL: https://issues.apache.org/jira/browse/HIVE-20322
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter


TestMiniDruidCliDriver is failing intermittently but I'm seeing it fail a 
significant percentage of the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20321) Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 CacheLine

2018-08-06 Thread Gopal V (JIRA)
Gopal V created HIVE-20321:
--

 Summary: Vectorization: Cut down memory size of 1 col 
VectorHashKeyWrapper to <1 CacheLine
 Key: HIVE-20321
 URL: https://issues.apache.org/jira/browse/HIVE-20321
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


With a full sized LLAP instance, the memory size of the VectorHashKeyWrapper is 
bigger than the low Xmx JVMs.

{code}
* 64-bit VM: **
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper object internals:
 OFFSET  SIZE   
  TYPE DESCRIPTION  VALUE
  016   
   (object header)  N/A
 16 4   
   int VectorHashKeyWrapper.hashcodeN/A
 20 4   
   (alignment/padding gap) 
 24 8   
long[] VectorHashKeyWrapper.longValues  N/A
 32 8 
double[] VectorHashKeyWrapper.doubleValuesN/A
 40 8 
byte[][] VectorHashKeyWrapper.byteValues  N/A
 48 8   
 int[] VectorHashKeyWrapper.byteStarts  N/A
 56 8   
 int[] VectorHashKeyWrapper.byteLengths N/A
 64 8   
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable[] 
VectorHashKeyWrapper.decimalValues   N/A
 72 8 
java.sql.Timestamp[] VectorHashKeyWrapper.timestampValues N/A
 80 8 
org.apache.hadoop.hive.common.type.HiveIntervalDayTime[] 
VectorHashKeyWrapper.intervalDayTimeValues   N/A
 88 8
boolean[] VectorHashKeyWrapper.isNull  N/A
 96 8   
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.HashContext 
VectorHashKeyWrapper.hashCtx N/A
Instance size: 104 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
{code}

Pulling this up to a parent class allows for this to be cut down to 32 bytes 
for the single column case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20320) Turn on hive.optimize.remove.sq_count_check flag

2018-08-06 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20320:
--

 Summary: Turn on hive.optimize.remove.sq_count_check flag
 Key: HIVE-20320
 URL: https://issues.apache.org/jira/browse/HIVE-20320
 Project: Hive
  Issue Type: Task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20319) group by and union all always generate empty query result

2018-08-06 Thread Wang Yan (JIRA)
Wang Yan created HIVE-20319:
---

 Summary: group by and union all always generate empty query result
 Key: HIVE-20319
 URL: https://issues.apache.org/jira/browse/HIVE-20319
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.3.2
Reporter: Wang Yan


The following query always generates empty results which is wrong.

{code:sql}
create table if not exists test_table(column1 string, column2 int);
insert into test_table values('a',1),('b',2);
set hive.optimize.union.remove=true;

select column1 from test_table group by column1
union all
select column1 from test_table group by column1;

{code}

Actual result : empty

Expected result: 

{code}

a

b

a

b

{code}

Note that when correct result is generated when set 
hive.optimize.union.remove=false.

It seems like the fix in https://issues.apache.org/jira/browse/HIVE-12788 is 
insufficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20318) NullPointerException when union with lateral view

2018-08-06 Thread Wang Yan (JIRA)
Wang Yan created HIVE-20318:
---

 Summary: NullPointerException when union with lateral view
 Key: HIVE-20318
 URL: https://issues.apache.org/jira/browse/HIVE-20318
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.3.2
 Environment: Run on MR, hadoop 2.7.3
Reporter: Wang Yan


The following sql throws NullPointerException.

This sql is not table/data specific and can be run directly.
WITH t1 AS ( SELECT 0 AS c1 ),
t2 AS ( SELECT 0 AS c1  FROM (  SELECT COLLECT_SET('line') AS c2
) t3 lateral VIEW explode(ARRAY("a")) er AS c3
)SELECT c1FROM t1UNION ALLSELECT c1FROM t2
This is the exception.
2018-04-20 01:53:50,845 WARN [Thread-5] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: Hive Runtime Error while 
closing operators
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
  at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java:3901)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1020)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
  ... 8 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20317) Spark Dynamic Partition Pruning - Use Stats to Determine Partition Count

2018-08-06 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20317:
--

 Summary: Spark Dynamic Partition Pruning - Use Stats to Determine 
Partition Count
 Key: HIVE-20317
 URL: https://issues.apache.org/jira/browse/HIVE-20317
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 3.1.0, 4.0.0
Reporter: BELUGA BEHR


{code:xml|hive-site.xml}


hive.metastore.limit.partition.request
2

{code}

{code:sql}
CREATE TABLE partitioned_user(
firstname VARCHAR(64),
lastname  VARCHAR(64)
) PARTITIONED BY (country VARCHAR(64))
STORED AS PARQUET;

CREATE TABLE country(
name VARCHAR(64)
) STORED AS PARQUET;

insert into partitioned_user partition (country='USA') values ("John", "Doe");
insert into partitioned_user partition (country='UK') values ("Sir", "Arthur");
insert into partitioned_user partition (country='FR') values ("Jacque", 
"Martin");

insert into country values ('USA');

set hive.execution.engine=spark;
set hive.spark.dynamic.partition.pruning=true;
explain select * from partitioned_user u where u.country in (select c.name from 
country c);
-- Error while compiling statement: FAILED: SemanticException 
MetaException(message:Number of partitions scanned (=3) on table 
'partitioned_user' exceeds limit (=2). This is controlled on the metastore 
server by hive.metastore.limit.partition.request.)
{code}

The EXPLAIN plan generation fails because there are three partitions involved 
in this query.  However, since Spark DPP is enabled, Hive should be able to use 
table stats to know that the {{country}} table only has one record and 
therefore there will only need to be one partitioned scanned and allow this 
query to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #408: HIVE-20316: Skip external table file listing for cre...

2018-08-06 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/408

HIVE-20316: Skip external table file listing for create table event.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-20316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #408


commit 0c89ab0d5811beeb58800b9f8f66f6f2d6119116
Author: Sankar Hariappan 
Date:   2018-08-06T09:42:10Z

HIVE-20316: Skip external table file listing for create table event.




---