[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-04-15 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495723#comment-14495723
 ] 

Dong Chen commented on HIVE-10016:
--

Thanks for working on the branch! [~spena]

I am uploading the patch, but a problem occurs. When I rebase the latest patch 
'HIVE-10016.patch' (target to trunk) to 'parquet' branch, a merge confilct 
happens. This is because the code of branch is behind trunk about one month.

Do you think we sync the branch first, and then update the patch? (If so, I 
will rebase the latest patch after branch is sync-ed)

Or we merge all the patches first, and then sync with trunk, resolve conflict 
together? (If so, patch 'HIVE-10016.1-parquet.patch' is ok for committing now)

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch, HIVE-10016.1-parquet.patch, 
 HIVE-10016.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-04-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494897#comment-14494897
 ] 

Sergio Peña commented on HIVE-10016:


[~dongc] Could you upload the patch that belongs to the 'parquet' branch so 
that I can commit it to parquet?
Thanks.

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch, HIVE-10016.1-parquet.patch, 
 HIVE-10016.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-04-09 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486835#comment-14486835
 ] 

Dong Chen commented on HIVE-10016:
--

The failed test is not related. The patch is rebased to trunk and is ready to 
go.

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch, HIVE-10016.1-parquet.patch, 
 HIVE-10016.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-04-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486617#comment-14486617
 ] 

Hive QA commented on HIVE-10016:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723827/HIVE-10016.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8665 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3336/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3336/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3336/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723827 - PreCommit-HIVE-TRUNK-Build

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch, HIVE-10016.1-parquet.patch, 
 HIVE-10016.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376026#comment-14376026
 ] 

Sergio Peña commented on HIVE-10016:


+1

This is good [~dongc]
Thanks for the patch.

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch, HIVE-10016.1-parquet.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-20 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371002#comment-14371002
 ] 

Dong Chen commented on HIVE-10016:
--

Thanks for your review! [~Ferd].

Yes, Parquet have a new instance there. The ReadSupport instance in Hive side 
is just for providing some info for ParquetRecordReaderWrapper creation.

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371448#comment-14371448
 ] 

Sergio Peña commented on HIVE-10016:


Looks good [~dongc].

Just a couple of small comments:

- In DataWritableRecordConverter.java
  Could you remove the imports that are not used anymore:
  * import parquet.schema.MessageTypeParser;
  * import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;

- In DataWritableReadSupport.java
  I think the 'MessageType tableSchema' is not needed. What if we just assign 
the value
  to hiveTableSchema, and use this variable in the rest of the block?

   MessageType tableSchema = new MessageType(TABLE_SCHEMA, typeListTable);
   hiveTableSchema = tableSchema;

   could it be:

  hiveTableSchema = new MessageType(TABLE_SCHEMA, typeListTable);

 Remove duplicated Hive table schema parsing in DataWritableReadSupport
 --

 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10016-parquet.patch


 In {{DataWritableReadSupport.init()}}, the table schema is created and its 
 string format is set in conf. When construct the 
 {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
 several times.
 We could remove these schema parsing, and improve the speed of 
 getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)