date:20190315

[GitHub] [hive] rmsmani commented on issue #571: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

2019-03-15 Thread GitBox

rmsmani commented on issue #571: HIVE-21392 Fix misconfigurations of 
DataNucleus log in log4j.properties
URL: https://github.com/apache/hive/pull/571#issuecomment-473493360
 
 
   @coder-chenzhi 
   If the test case is failing other than your test cases, it may be due to 
falky test cases.
   Only way to solve this resubmit the patch again.
   
   Add some test case for this change


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] coder-chenzhi edited a comment on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

2019-03-15 Thread GitBox

coder-chenzhi edited a comment on issue #556: HIVE-21392 Fix misconfigurations 
of DataNucleus log in log4j.properties
URL: https://github.com/apache/hive/pull/556#issuecomment-473483540
 
 
   Hi, @rmsmani I have resolved the error in the patch and create another 
[PR](https://github.com/apache/hive/pull/571). The new test report in JIRA 
shows that a test case has failed, but I can't figure out why my patch will 
influence that test case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] coder-chenzhi commented on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

2019-03-15 Thread GitBox

coder-chenzhi commented on issue #556: HIVE-21392 Fix misconfigurations of 
DataNucleus log in log4j.properties
URL: https://github.com/apache/hive/pull/556#issuecomment-473483540
 
 
   Hi, @rmsmani I have resolved the error in the patch and create another 
(PR)[https://github.com/apache/hive/pull/571]. The new test report in JIRA 
shows that a test case has failed, but I can't figure out why my patch will 
influence that test case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation

2019-03-15 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70224/
---

Review request for hive and Gopal V.


Bugs: HIVE-21457
https://issues.apache.org/jira/browse/HIVE-21457


Repository: hive-git


Description
---

HIVE-21457: Perf optimizations in ORC split-generation


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
e6b47de877e4931f30f1fab725ea0e62c98bdf26 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 
50a233d5de20491e0107af7eeefdc1515f706894 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
11876fbb10ac45772153c357202645fe08ed28a7 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 
6bac285c15ced93cf4215281447c7adafa98bd1c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
9dac185067c68fd94fbec53d5bb5274b878bbb00 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
62a1061dfd9499954ff2ed9432ab235d3b28a819 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 1795bb54570e5b71a19b3a9091c2172c6b284cb4 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
10192859a7326a223ec9d9cce7d284fd83122f86 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
deabec6f8767c5397a7503fa64d1b03f0cb41ac2 


Diff: https://reviews.apache.org/r/70224/diff/1/


Testing
---


Thanks,

Prasanth_J

Re: Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation

2019-03-15 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70224/
---

(Updated March 16, 2019, 12:28 a.m.)


Review request for hive and Gopal V.


Changes
---

Another place for reuse.


Bugs: HIVE-21457
https://issues.apache.org/jira/browse/HIVE-21457


Repository: hive-git


Description
---

HIVE-21457: Perf optimizations in ORC split-generation


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java e6b47de877 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 50a233d5de 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
11876fbb10 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 6bac285c15 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 9dac185067 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
62a1061dfd 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 1795bb5457 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
10192859a7 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java deabec6f87 


Diff: https://reviews.apache.org/r/70224/diff/2/

Changes: https://reviews.apache.org/r/70224/diff/1-2/


Testing
---


Thanks,

Prasanth_J

[GitHub] [hive] asfgit closed pull request #567: HIVE-21382: Group by keys reduction optimization - keys are not reduced in query23

2019-03-15 Thread GitBox

asfgit closed pull request #567: HIVE-21382: Group by keys reduction 
optimization - keys are not reduced in query23
URL: https://github.com/apache/hive/pull/567
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21459) DOCO - HiveonSpark (HOS) do custom serde/udf jars go in hive or spark folder?

2019-03-15 Thread t oo (JIRA)

t oo created HIVE-21459:
---

 Summary: DOCO - HiveonSpark (HOS) do custom serde/udf jars go in 
hive or spark folder?
 Key: HIVE-21459
 URL: https://issues.apache.org/jira/browse/HIVE-21459
 Project: Hive
  Issue Type: Improvement
Reporter: t oo


[https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started]
 does not mention how to register custom serde/udf jars/classes. For example if 
i want to query a {{'com.uber.hoodie.hadoop.HoodieInputFormat' (this class 
relies on parquet) table the docs dont say where to place the jar.}}

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader

2019-03-15 Thread Vaibhav Gumashta (JIRA)

Vaibhav Gumashta created HIVE-21458:
---

 Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check 
by caching the split reader
 Key: HIVE-21458
 URL: https://issues.apache.org/jira/browse/HIVE-21458
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.1
Reporter: Vaibhav Gumashta


In the transactional subsystems, in several places we check to see if a data 
file has ROW__ID fields or not. Every time we do that (even within the context 
of the same query), we open a Reader for that file/split. We could optimize 
this by caching. Also, perhaps we don't need to do this for every split. An 
example call stack:
{code}
OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 
AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 
AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 
AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007  
OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, 
Configuration) line: 1231   
OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, 
Configuration, OrcRawRecordMerger$Options) line: 722  
OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, 
ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 
1022
OrcInputFormat.getReader(InputSplit, Options) line: 2108
OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006
FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776  
FetchOperator.getRecordReader() line: 344   
FetchOperator.getNextRow() line: 540
FetchOperator.pushRow() line: 509   
FetchTask.fetch(List) line: 146 
{code} 

Here, for each split we'll make that check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21457) Perf optimizations in split-generation

2019-03-15 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-21457:


 Summary: Perf optimizations in split-generation
 Key: HIVE-21457
 URL: https://issues.apache.org/jira/browse/HIVE-21457
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Minor split generation optimizations
 * Reuse vectorization checks
 * Reuse isAcid checks
 * Reuse filesystem objects
 * Improved logging (log at top-level instead of inside the thread pool)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21456) Hive Metastore HTTP Thrift

2019-03-15 Thread Amit Khanna (JIRA)

Amit Khanna created HIVE-21456:
--

 Summary: Hive Metastore HTTP Thrift
 Key: HIVE-21456
 URL: https://issues.apache.org/jira/browse/HIVE-21456
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore
Reporter: Amit Khanna
Assignee: Amit Khanna


Hive Metastore currently doesn't have support for HTTP transport because of 
which it is not possible to access it via Knox. Adding support for Thrift over 
HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21455) Too verbose logging in AvroGenericRecordReader

2019-03-15 Thread Miklos Szurap (JIRA)

Miklos Szurap created HIVE-21455:


 Summary: Too verbose logging in AvroGenericRecordReader
 Key: HIVE-21455
 URL: https://issues.apache.org/jira/browse/HIVE-21455
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Miklos Szurap


{{AvroGenericRecordReader}} logs the Avro schema for each datafile. It is too 
verbose, likely we don't need to log that on INFO level.
For example a table:
{noformat}
create table avro_tbl (c1 string, c2 int, c3 float) stored as avro;
{noformat}
and querying it with a select star - with 3 datafiles HiveServer2 logs the 
following:
{noformat}
2019-03-15 09:18:35,999 INFO  org.apache.hadoop.mapred.FileInputFormat: 
[HiveServer2-Handler-Pool: Thread-64]: Total input paths to process : 3
2019-03-15 09:18:35,999 INFO  
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
[HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
{"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
2019-03-15 09:18:36,004 INFO  
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
[HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
{"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
2019-03-15 09:18:36,010 INFO  
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
[HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
{"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
{noformat}
This has a huge performance and storage penalty on a table with big schema and 
thousands of datafiles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21454) Tez default configs get overwritten by MR default configs

2019-03-15 Thread Syed Shameerur Rahman (JIRA)

Syed Shameerur Rahman created HIVE-21454:


 Summary: Tez default configs get overwritten by MR default configs
 Key: HIVE-21454
 URL: https://issues.apache.org/jira/browse/HIVE-21454
 Project: Hive
  Issue Type: Bug
Reporter: Syed Shameerur Rahman


Due to changes done in HIVE-17781 Tez default configs such as tez.counters.max 
which has a default value of 1200 gets overwritten by 
mapreduce.job.counters.max which has a default value of 120



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

2019-03-15 Thread GitBox

dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date 
and Timestamp type value INTO variable
URL: https://github.com/apache/hive/pull/566#issuecomment-473240651
 
 
   @rmsmani thanks for your patience, patch submitted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] rmsmani commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

2019-03-15 Thread GitBox

rmsmani commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and 
Timestamp type value INTO variable
URL: https://github.com/apache/hive/pull/566#issuecomment-473238518
 
 
   @dingqiangliu 
   In Jira there will be a button **Submit Patch**
   Once the patch is submitted, the build will get triggered automatically and 
the results will be published to the JIRA ticket automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

2019-03-15 Thread GitBox

dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date 
and Timestamp type value INTO variable
URL: https://github.com/apache/hive/pull/566#issuecomment-473234237
 
 
   thank you @rmsmani !
   it's done, please review it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21453) HPL/SQL can not SELECT Date and Timestamp type value into variable

2019-03-15 Thread DingQiang Liu (JIRA)

DingQiang Liu created HIVE-21453:


 Summary: HPL/SQL can not SELECT Date and Timestamp type value into 
variable 
 Key: HIVE-21453
 URL: https://issues.apache.org/jira/browse/HIVE-21453
 Project: Hive
  Issue Type: Bug
  Components: hpl/sql
Affects Versions: 3.1.1
 Environment: Centos 7.1, Hive 3.1.1
Reporter: DingQiang Liu
Assignee: DingQiang Liu


HPL/SQL forgot Date and Timestamp types when SELECT INTO variables. for 
example, current implement will set  null, not correct value, to variables for 
following case,  select_into3.sql:

 declare v_date date;
 declare v_timestamp timestamp(17, 3); 

select
   cast('2019-02-20 12:23:45.678' as date),
   cast('2019-02-20 12:23:45.678' as timestamp)
 into
   v_date,
   v_timestamp
 from src limit 1;

print 'date: ' || v_date;
 print 'timestamp: ' || v_timestamp;

The result when running :  bin/hplsql -f select_into3.sql --trace

Configuration file: file:/hive/conf/hplsql-site.xml
Parser tree: (program (block (stmt (declare_stmt declare (declare_stmt_item 
(declare_var_item (ident v_date) (dtype date) (stmt (semicolon_stmt ;)) 
(stmt (declare_stmt declare (declare_stmt_item (declare_var_item (ident 
v_timestamp) (dtype timestamp) (dtype_len ( 17 , 3 )) (stmt (semicolon_stmt 
;)) (stmt (select_stmt (fullselect_stmt (fullselect_stmt_item (subselect_stmt 
select (select_list (select_list_item (expr (expr_spec_func cast ( (expr 
(expr_atom (string '2019-02-20 12:23:45.678'))) as (dtype date)  , 
(select_list_item (expr (expr_spec_func cast ( (expr (expr_atom (string 
'2019-02-20 12:23:45.678'))) as (dtype timestamp) ) (into_clause into 
(ident v_date) , (ident v_timestamp)) (from_clause from (from_table_clause 
(from_table_name_clause (table_name (ident src) (select_options 
(select_options_item limit (expr (expr_atom (int_number 1)) (stmt 
(semicolon_stmt ;)) (stmt (print_stmt print (expr (expr_concat 
(expr_concat_item (expr_atom (string 'date: '))) || (expr_concat_item 
(expr_atom (ident v_date))) (stmt (semicolon_stmt ;)) (stmt (print_stmt 
print (expr (expr_concat (expr_concat_item (expr_atom (string 'timestamp: '))) 
|| (expr_concat_item (expr_atom (ident v_timestamp))) (stmt (semicolon_stmt 
;))) )
Ln:1 DECLARE v_date date
Ln:2 DECLARE v_timestamp timestamp
Ln:4 SELECT
Ln:4 select
  cast('2019-02-20 12:23:45.678' as date), cast('2019-02-20 12:23:45.678' as 
timestamp)
from src LIMIT 1
Open connection: jdbc:vertica://v001:5433/test (256 ms)
Starting query
Query executed successfully (55 ms)
Ln:4 SELECT completed successfully
Ln:4 SELECT INTO statement executed
Ln:4 COLUMN: ?column?, Date
Ln:4 SET v_date = null
Ln:4 COLUMN: ?column?, Timestamp
Ln:4 SET v_timestamp = null
Ln:12 PRINT
date: 
Ln:13 PRINT
timestamp:




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [hive] ashutosh-bapat opened a new pull request #572: HIVE-21430 : INSERT into a dynamically partitioned table with autogather stats disabled throws a MetaException

2019-03-15 Thread GitBox

ashutosh-bapat opened a new pull request #572: HIVE-21430 : INSERT into a 
dynamically partitioned table with autogather stats disabled throws a 
MetaException
URL: https://github.com/apache/hive/pull/572
 
 
   loadDynamicPartitions is not passing a valid writeId list while altering 
multiple partitions. It's
   also fetching table snapshot separately for each of the partitions. Instead 
fetch the snapshot once
   to be used for all partitions. Use the same snapshot to alter partitions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21452) Loss of query condition when exist exists

2019-03-15 Thread zengxl (JIRA)

zengxl created HIVE-21452:
-

 Summary: Loss of query condition when exist exists
 Key: HIVE-21452
 URL: https://issues.apache.org/jira/browse/HIVE-21452
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: zengxl


In our production environment, there are four tables to do association queries. 
There are exists in the conditions, and we found that the first two conditions 
were lost.

The following two conditions are missing：

{color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
 
In the test environment, I prepared the data of the following four tables, and 
a condition was lost in the simulation.

tables:

test_table1
cust_no,name
60001,lisa
60002,tina
60003,kylin
60004,jeny
60005,john
60006,jamse

test_table2
cust_no,acct_type
60001,1
60001,1
60001,2
60002,1
60003,2
60003,3

test_table3
cust_no
60001
60002
60003
60004
60005
60007

test_table4
cust_no,cust_level
60001,1
60002,2
60003,3
60004,4
60005,5

 

create table tmp.test_table1(cust_no string,name string);
create table tmp.test_table2(cust_no string,acct_type string);
create table tmp.test_table3(cust_no string);
create table tmp.test_table4(cust_no string,cust_level string);

insert into tmp.test_table1 select '60001','lisa';
insert into tmp.test_table1 select '60002','tina';
insert into tmp.test_table1 select '60003','kylin';
insert into tmp.test_table1 select '60004','jeny';
insert into tmp.test_table1 select '60005','john';
insert into tmp.test_table1 select '60006','jamse';

insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','2';
insert into tmp.test_table2 select '60002','1';
insert into tmp.test_table2 select '60003','2';
insert into tmp.test_table2 select '60002','3';

insert into tmp.test_table3 select '60001';
insert into tmp.test_table3 select '60002';
insert into tmp.test_table3 select '60003';
insert into tmp.test_table3 select '60004';
insert into tmp.test_table3 select '60005';
insert into tmp.test_table3 select '60007';

insert into tmp.test_table4 select '60001','1';
insert into tmp.test_table4 select '60002','2';
insert into tmp.test_table4 select '60003','3';
insert into tmp.test_table4 select '60004','4';
insert into tmp.test_table4 select '60005','5';
 
Here is my query SQL And shut down mapjoin:

set hive.auto.convert.join=false;

select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 
t1
left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
and t2.acct_type='1'
left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 
1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)

 

All I want is to include cust_no for 6003，But the result is inclusive 6004 and 
6005，this wrong 。{color:#f79232}In my production environment, 6001 came 
out。Loss of condition because cust_no is  null。{color}

{color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
condition missing{color}

*result:*

60003 NULL kylin
60003 NULL kylin
60003 NULL kylin
60004 NULL jeny
60005 NULL john

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21451) ACID: Avoid using hive.acid.key.index to determine if the file is original or not

2019-03-15 Thread Vaibhav Gumashta (JIRA)

Vaibhav Gumashta created HIVE-21451:
---

 Summary: ACID: Avoid using hive.acid.key.index to determine if the 
file is original or not
 Key: HIVE-21451
 URL: https://issues.apache.org/jira/browse/HIVE-21451
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.1
Reporter: Vaibhav Gumashta


The transactional files written in hive have each row decorated with ROW__ID 
column. However, when we bring in files using LOAD DATA... command to the 
transactional tables, they do not have these metadata columns (in Hive ACID 
parlance, these are called original files). These original files are decorated 
with an inferred ROW__ID generated while reading these. However, after these 
are compacted, the ROW__ID metadata column, becomes part of the file itself.

To determine if a file is original or not, currently we use check for the 
presence of hive.acid.key.index. For query based compaction, currently we do 
not write hive.acid.key.index (HIVE-21165). This means, there is a possibility 
that that even after compaction, they get treated as original files.

Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide 
whether the file is original or not, and instead look for the presence of 
ROW__ID to do that. hive.acid.key.index should be treated as a performance 
optimization, as it was seemingly meant to be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [hive] rmsmani commented on issue #571: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

[GitHub] [hive] coder-chenzhi edited a comment on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

[GitHub] [hive] coder-chenzhi commented on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties

Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation

Re: Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation

[GitHub] [hive] asfgit closed pull request #567: HIVE-21382: Group by keys reduction optimization - keys are not reduced in query23

[jira] [Created] (HIVE-21459) DOCO - HiveonSpark (HOS) do custom serde/udf jars go in hive or spark folder?

[jira] [Created] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader

[jira] [Created] (HIVE-21457) Perf optimizations in split-generation

[jira] [Created] (HIVE-21456) Hive Metastore HTTP Thrift

[jira] [Created] (HIVE-21455) Too verbose logging in AvroGenericRecordReader

[jira] [Created] (HIVE-21454) Tez default configs get overwritten by MR default configs

[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

[GitHub] [hive] rmsmani commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable

[jira] [Created] (HIVE-21453) HPL/SQL can not SELECT Date and Timestamp type value into variable

[GitHub] [hive] ashutosh-bapat opened a new pull request #572: HIVE-21430 : INSERT into a dynamically partitioned table with autogather stats disabled throws a MetaException

[jira] [Created] (HIVE-21452) Loss of query condition when exist exists

[jira] [Created] (HIVE-21451) ACID: Avoid using hive.acid.key.index to determine if the file is original or not

19 matches

Site Navigation

Mail list logo

Footer information