[GitHub] [hive] rmsmani commented on issue #571: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties
rmsmani commented on issue #571: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties URL: https://github.com/apache/hive/pull/571#issuecomment-473493360 @coder-chenzhi If the test case is failing other than your test cases, it may be due to falky test cases. Only way to solve this resubmit the patch again. Add some test case for this change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] coder-chenzhi edited a comment on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties
coder-chenzhi edited a comment on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties URL: https://github.com/apache/hive/pull/556#issuecomment-473483540 Hi, @rmsmani I have resolved the error in the patch and create another [PR](https://github.com/apache/hive/pull/571). The new test report in JIRA shows that a test case has failed, but I can't figure out why my patch will influence that test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] coder-chenzhi commented on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties
coder-chenzhi commented on issue #556: HIVE-21392 Fix misconfigurations of DataNucleus log in log4j.properties URL: https://github.com/apache/hive/pull/556#issuecomment-473483540 Hi, @rmsmani I have resolved the error in the patch and create another (PR)[https://github.com/apache/hive/pull/571]. The new test report in JIRA shows that a test case has failed, but I can't figure out why my patch will influence that test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70224/ --- Review request for hive and Gopal V. Bugs: HIVE-21457 https://issues.apache.org/jira/browse/HIVE-21457 Repository: hive-git Description --- HIVE-21457: Perf optimizations in ORC split-generation Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java e6b47de877e4931f30f1fab725ea0e62c98bdf26 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 50a233d5de20491e0107af7eeefdc1515f706894 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 11876fbb10ac45772153c357202645fe08ed28a7 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 6bac285c15ced93cf4215281447c7adafa98bd1c ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 9dac185067c68fd94fbec53d5bb5274b878bbb00 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 62a1061dfd9499954ff2ed9432ab235d3b28a819 ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 1795bb54570e5b71a19b3a9091c2172c6b284cb4 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 10192859a7326a223ec9d9cce7d284fd83122f86 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java deabec6f8767c5397a7503fa64d1b03f0cb41ac2 Diff: https://reviews.apache.org/r/70224/diff/1/ Testing --- Thanks, Prasanth_J
Re: Review Request 70224: HIVE-21457: Perf optimizations in ORC split-generation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70224/ --- (Updated March 16, 2019, 12:28 a.m.) Review request for hive and Gopal V. Changes --- Another place for reuse. Bugs: HIVE-21457 https://issues.apache.org/jira/browse/HIVE-21457 Repository: hive-git Description --- HIVE-21457: Perf optimizations in ORC split-generation Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java e6b47de877 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 50a233d5de ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 11876fbb10 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 6bac285c15 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 9dac185067 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 62a1061dfd ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 1795bb5457 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 10192859a7 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java deabec6f87 Diff: https://reviews.apache.org/r/70224/diff/2/ Changes: https://reviews.apache.org/r/70224/diff/1-2/ Testing --- Thanks, Prasanth_J
[GitHub] [hive] asfgit closed pull request #567: HIVE-21382: Group by keys reduction optimization - keys are not reduced in query23
asfgit closed pull request #567: HIVE-21382: Group by keys reduction optimization - keys are not reduced in query23 URL: https://github.com/apache/hive/pull/567 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HIVE-21459) DOCO - HiveonSpark (HOS) do custom serde/udf jars go in hive or spark folder?
t oo created HIVE-21459: --- Summary: DOCO - HiveonSpark (HOS) do custom serde/udf jars go in hive or spark folder? Key: HIVE-21459 URL: https://issues.apache.org/jira/browse/HIVE-21459 Project: Hive Issue Type: Improvement Reporter: t oo [https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started] does not mention how to register custom serde/udf jars/classes. For example if i want to query a {{'com.uber.hoodie.hadoop.HoodieInputFormat' (this class relies on parquet) table the docs dont say where to place the jar.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader
Vaibhav Gumashta created HIVE-21458: --- Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader Key: HIVE-21458 URL: https://issues.apache.org/jira/browse/HIVE-21458 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching. Also, perhaps we don't need to do this for every split. An example call stack: {code} OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231 OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722 OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022 OrcInputFormat.getReader(InputSplit, Options) line: 2108 OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 FetchOperator.getRecordReader() line: 344 FetchOperator.getNextRow() line: 540 FetchOperator.pushRow() line: 509 FetchTask.fetch(List) line: 146 {code} Here, for each split we'll make that check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21457) Perf optimizations in split-generation
Prasanth Jayachandran created HIVE-21457: Summary: Perf optimizations in split-generation Key: HIVE-21457 URL: https://issues.apache.org/jira/browse/HIVE-21457 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Minor split generation optimizations * Reuse vectorization checks * Reuse isAcid checks * Reuse filesystem objects * Improved logging (log at top-level instead of inside the thread pool) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21456) Hive Metastore HTTP Thrift
Amit Khanna created HIVE-21456: -- Summary: Hive Metastore HTTP Thrift Key: HIVE-21456 URL: https://issues.apache.org/jira/browse/HIVE-21456 Project: Hive Issue Type: Bug Components: Metastore, Standalone Metastore Reporter: Amit Khanna Assignee: Amit Khanna Hive Metastore currently doesn't have support for HTTP transport because of which it is not possible to access it via Knox. Adding support for Thrift over HTTP transport will allow the clients to access via Knox -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21455) Too verbose logging in AvroGenericRecordReader
Miklos Szurap created HIVE-21455: Summary: Too verbose logging in AvroGenericRecordReader Key: HIVE-21455 URL: https://issues.apache.org/jira/browse/HIVE-21455 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Miklos Szurap {{AvroGenericRecordReader}} logs the Avro schema for each datafile. It is too verbose, likely we don't need to log that on INFO level. For example a table: {noformat} create table avro_tbl (c1 string, c2 int, c3 float) stored as avro; {noformat} and querying it with a select star - with 3 datafiles HiveServer2 logs the following: {noformat} 2019-03-15 09:18:35,999 INFO org.apache.hadoop.mapred.FileInputFormat: [HiveServer2-Handler-Pool: Thread-64]: Total input paths to process : 3 2019-03-15 09:18:35,999 INFO org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]} 2019-03-15 09:18:36,004 INFO org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]} 2019-03-15 09:18:36,010 INFO org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]} {noformat} This has a huge performance and storage penalty on a table with big schema and thousands of datafiles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21454) Tez default configs get overwritten by MR default configs
Syed Shameerur Rahman created HIVE-21454: Summary: Tez default configs get overwritten by MR default configs Key: HIVE-21454 URL: https://issues.apache.org/jira/browse/HIVE-21454 Project: Hive Issue Type: Bug Reporter: Syed Shameerur Rahman Due to changes done in HIVE-17781 Tez default configs such as tez.counters.max which has a default value of 1200 gets overwritten by mapreduce.job.counters.max which has a default value of 120 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable
dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable URL: https://github.com/apache/hive/pull/566#issuecomment-473240651 @rmsmani thanks for your patience, patch submitted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] rmsmani commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable
rmsmani commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable URL: https://github.com/apache/hive/pull/566#issuecomment-473238518 @dingqiangliu In Jira there will be a button **Submit Patch** Once the patch is submitted, the build will get triggered automatically and the results will be published to the JIRA ticket automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable
dingqiangliu commented on issue #566: HIVE-21453: HPL/SQL can not SELECT Date and Timestamp type value INTO variable URL: https://github.com/apache/hive/pull/566#issuecomment-473234237 thank you @rmsmani ! it's done, please review it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HIVE-21453) HPL/SQL can not SELECT Date and Timestamp type value into variable
DingQiang Liu created HIVE-21453: Summary: HPL/SQL can not SELECT Date and Timestamp type value into variable Key: HIVE-21453 URL: https://issues.apache.org/jira/browse/HIVE-21453 Project: Hive Issue Type: Bug Components: hpl/sql Affects Versions: 3.1.1 Environment: Centos 7.1, Hive 3.1.1 Reporter: DingQiang Liu Assignee: DingQiang Liu HPL/SQL forgot Date and Timestamp types when SELECT INTO variables. for example, current implement will set null, not correct value, to variables for following case, select_into3.sql: declare v_date date; declare v_timestamp timestamp(17, 3); select cast('2019-02-20 12:23:45.678' as date), cast('2019-02-20 12:23:45.678' as timestamp) into v_date, v_timestamp from src limit 1; print 'date: ' || v_date; print 'timestamp: ' || v_timestamp; The result when running : bin/hplsql -f select_into3.sql --trace Configuration file: file:/hive/conf/hplsql-site.xml Parser tree: (program (block (stmt (declare_stmt declare (declare_stmt_item (declare_var_item (ident v_date) (dtype date) (stmt (semicolon_stmt ;)) (stmt (declare_stmt declare (declare_stmt_item (declare_var_item (ident v_timestamp) (dtype timestamp) (dtype_len ( 17 , 3 )) (stmt (semicolon_stmt ;)) (stmt (select_stmt (fullselect_stmt (fullselect_stmt_item (subselect_stmt select (select_list (select_list_item (expr (expr_spec_func cast ( (expr (expr_atom (string '2019-02-20 12:23:45.678'))) as (dtype date) , (select_list_item (expr (expr_spec_func cast ( (expr (expr_atom (string '2019-02-20 12:23:45.678'))) as (dtype timestamp) ) (into_clause into (ident v_date) , (ident v_timestamp)) (from_clause from (from_table_clause (from_table_name_clause (table_name (ident src) (select_options (select_options_item limit (expr (expr_atom (int_number 1)) (stmt (semicolon_stmt ;)) (stmt (print_stmt print (expr (expr_concat (expr_concat_item (expr_atom (string 'date: '))) || (expr_concat_item (expr_atom (ident v_date))) (stmt (semicolon_stmt ;)) (stmt (print_stmt print (expr (expr_concat (expr_concat_item (expr_atom (string 'timestamp: '))) || (expr_concat_item (expr_atom (ident v_timestamp))) (stmt (semicolon_stmt ;))) ) Ln:1 DECLARE v_date date Ln:2 DECLARE v_timestamp timestamp Ln:4 SELECT Ln:4 select cast('2019-02-20 12:23:45.678' as date), cast('2019-02-20 12:23:45.678' as timestamp) from src LIMIT 1 Open connection: jdbc:vertica://v001:5433/test (256 ms) Starting query Query executed successfully (55 ms) Ln:4 SELECT completed successfully Ln:4 SELECT INTO statement executed Ln:4 COLUMN: ?column?, Date Ln:4 SET v_date = null Ln:4 COLUMN: ?column?, Timestamp Ln:4 SET v_timestamp = null Ln:12 PRINT date: Ln:13 PRINT timestamp: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [hive] ashutosh-bapat opened a new pull request #572: HIVE-21430 : INSERT into a dynamically partitioned table with autogather stats disabled throws a MetaException
ashutosh-bapat opened a new pull request #572: HIVE-21430 : INSERT into a dynamically partitioned table with autogather stats disabled throws a MetaException URL: https://github.com/apache/hive/pull/572 loadDynamicPartitions is not passing a valid writeId list while altering multiple partitions. It's also fetching table snapshot separately for each of the partitions. Instead fetch the snapshot once to be used for all partitions. Use the same snapshot to alter partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HIVE-21452) Loss of query condition when exist exists
zengxl created HIVE-21452: - Summary: Loss of query condition when exist exists Key: HIVE-21452 URL: https://issues.apache.org/jira/browse/HIVE-21452 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: zengxl In our production environment, there are four tables to do association queries. There are exists in the conditions, and we found that the first two conditions were lost. The following two conditions are missing: {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color} In the test environment, I prepared the data of the following four tables, and a condition was lost in the simulation. tables: test_table1 cust_no,name 60001,lisa 60002,tina 60003,kylin 60004,jeny 60005,john 60006,jamse test_table2 cust_no,acct_type 60001,1 60001,1 60001,2 60002,1 60003,2 60003,3 test_table3 cust_no 60001 60002 60003 60004 60005 60007 test_table4 cust_no,cust_level 60001,1 60002,2 60003,3 60004,4 60005,5 create table tmp.test_table1(cust_no string,name string); create table tmp.test_table2(cust_no string,acct_type string); create table tmp.test_table3(cust_no string); create table tmp.test_table4(cust_no string,cust_level string); insert into tmp.test_table1 select '60001','lisa'; insert into tmp.test_table1 select '60002','tina'; insert into tmp.test_table1 select '60003','kylin'; insert into tmp.test_table1 select '60004','jeny'; insert into tmp.test_table1 select '60005','john'; insert into tmp.test_table1 select '60006','jamse'; insert into tmp.test_table2 select '60001','1'; insert into tmp.test_table2 select '60001','1'; insert into tmp.test_table2 select '60001','2'; insert into tmp.test_table2 select '60002','1'; insert into tmp.test_table2 select '60003','2'; insert into tmp.test_table2 select '60002','3'; insert into tmp.test_table3 select '60001'; insert into tmp.test_table3 select '60002'; insert into tmp.test_table3 select '60003'; insert into tmp.test_table3 select '60004'; insert into tmp.test_table3 select '60005'; insert into tmp.test_table3 select '60007'; insert into tmp.test_table4 select '60001','1'; insert into tmp.test_table4 select '60002','2'; insert into tmp.test_table4 select '60003','3'; insert into tmp.test_table4 select '60004','4'; insert into tmp.test_table4 select '60005','5'; Here is my query SQL And shut down mapjoin: set hive.auto.convert.join=false; select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 t1 left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no and t2.acct_type='1' left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no) All I want is to include cust_no for 6003,But the result is inclusive 6004 and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came out。Loss of condition because cust_no is null。{color} {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') condition missing{color} *result:* 60003 NULL kylin 60003 NULL kylin 60003 NULL kylin 60004 NULL jeny 60005 NULL john -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21451) ACID: Avoid using hive.acid.key.index to determine if the file is original or not
Vaibhav Gumashta created HIVE-21451: --- Summary: ACID: Avoid using hive.acid.key.index to determine if the file is original or not Key: HIVE-21451 URL: https://issues.apache.org/jira/browse/HIVE-21451 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta The transactional files written in hive have each row decorated with ROW__ID column. However, when we bring in files using LOAD DATA... command to the transactional tables, they do not have these metadata columns (in Hive ACID parlance, these are called original files). These original files are decorated with an inferred ROW__ID generated while reading these. However, after these are compacted, the ROW__ID metadata column, becomes part of the file itself. To determine if a file is original or not, currently we use check for the presence of hive.acid.key.index. For query based compaction, currently we do not write hive.acid.key.index (HIVE-21165). This means, there is a possibility that that even after compaction, they get treated as original files. Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide whether the file is original or not, and instead look for the presence of ROW__ID to do that. hive.acid.key.index should be treated as a performance optimization, as it was seemingly meant to be. -- This message was sent by Atlassian JIRA (v7.6.3#76005)