date:20140929

[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-29 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8196:
-
Attachment: HIVE-8196.4.patch

Fixes parallel.q test. Rebased patch to latest trunk.

> Joining on partition columns with fetch column stats enabled results it very 
> small CE which negatively affects query performance 
> -
>
> Key: HIVE-8196
> URL: https://issues.apache.org/jira/browse/HIVE-8196
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Blocker
>  Labels: performance
> Fix For: 0.14.0
>
> Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
> HIVE-8196.4.patch
>
>
> To make the best out of dynamic partition pruning joins should be on the 
> partitioning columns which results in dynamically pruning the partitions from 
> the fact table based on the qualifying column keys from the dimension table, 
> this type of joins negatively effects on cardinality estimates with fetch 
> column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV 
> is set to row count, doing that negatively affects the estimated join 
> selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of 
> partitions incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
> set to row count 
> {code}
>   if (encd.getIsPartitionColOrVirtualCol()) {
> // vitual columns
> colType = encd.getTypeInfo().getTypeName();
> countDistincts = numRows;
> oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=true;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date 
> from store_sales, date_dim 
> where 
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
> date_dim.d_year = 1998;
> {code}
> Plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
>   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ss_sold_date_sk is not null (type: boolean)
>   Statistics: Num rows: 550076554 Data size: 47370018816 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {ss_sold_date_sk}
>   1 {d_date_sk} {d_date}
> keys:
>   0 ss_sold_date_sk (type: int)
>   1 d_date_sk (type: int)
> outputColumnNames: _col22, _col26, _col28
> input vertices:
>   1 Map 2
> Statistics: Num rows: 652 Data size: 66504 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: (_col22 = _col26) (type: boolean)
>   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col28 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 326 Data size: 30644 Basic 
> stats: COMPLETE Column stats: COMPLETE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 326 Data size: 30644 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
> (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Oper

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151450#comment-14151450
 ] 

Prasanth J commented on HIVE-8226:
--

[~mmccline] Can you rebase the patch against current trunk? I see failure when 
I tried to commit this patch. There is diff in golden file when I ran 
dynpart_sort_opt_vectorization.q test. Also patch did not apply cleanly on 
trunk. Also is this going into branch-0.14 as well? If so please check with 
[~vikram.dixit] and make changes to Affects and Fix versions accordingly.

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message

2014-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Status: Patch Available  (was: Open)

> StorageBasedAuth in metastore does not produce useful error message
> ---
>
> Key: HIVE-8287
> URL: https://issues.apache.org/jira/browse/HIVE-8287
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-8287.1.patch
>
>
> Example of error message that doesn't given enough useful information -
> {noformat}
> 0: jdbc:hive2://localhost:1> alter table parttab1 drop partition 
> (p1='def');
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
> logs. (state=08S01,code=1)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message

2014-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Attachment: HIVE-8287.1.patch

> StorageBasedAuth in metastore does not produce useful error message
> ---
>
> Key: HIVE-8287
> URL: https://issues.apache.org/jira/browse/HIVE-8287
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-8287.1.patch
>
>
> Example of error message that doesn't given enough useful information -
> {noformat}
> 0: jdbc:hive2://localhost:1> alter table parttab1 drop partition 
> (p1='def');
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
> logs. (state=08S01,code=1)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Status: In Progress  (was: Patch Available)

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Attachment: HIVE-8226.03.patch

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Status: Patch Available  (was: In Progress)

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151468#comment-14151468
 ] 

Matt McCline commented on HIVE-8226:


Yes, I rebased and re-ran the dynpart_sort_opt_vectorization.q and found a few 
stages now vectorize...  Perhaps I didn't create patch #2 correctly.  Anyway, 
submitted patch #3.

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151469#comment-14151469
 ] 

Hive QA commented on HIVE-7723:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671737/HIVE-7723.8.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671737

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 0.14.0
>
> Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
> HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, 
> HIVE-7723.8.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion O

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Affects Version/s: 0.14.0

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Fix Version/s: 0.14.0

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151472#comment-14151472
 ] 

Matt McCline commented on HIVE-8226:


~pjayachandran I added you to e-mail I sent to Gunther about branch-0.14

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message

2014-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Attachment: HIVE-8287.2.patch

> StorageBasedAuth in metastore does not produce useful error message
> ---
>
> Key: HIVE-8287
> URL: https://issues.apache.org/jira/browse/HIVE-8287
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch
>
>
> Example of error message that doesn't given enough useful information -
> {noformat}
> 0: jdbc:hive2://localhost:1> alter table parttab1 drop partition 
> (p1='def');
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
> logs. (state=08S01,code=1)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message

2014-09-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151488#comment-14151488
 ] 

Thejas M Nair commented on HIVE-8287:
-

HIVE-8287.2.patch - also includes changes to webhcat e2e tests for new error 
messages, and for changes in HIVE-8221 .



> StorageBasedAuth in metastore does not produce useful error message
> ---
>
> Key: HIVE-8287
> URL: https://issues.apache.org/jira/browse/HIVE-8287
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch
>
>
> Example of error message that doesn't given enough useful information -
> {noformat}
> 0: jdbc:hive2://localhost:1> alter table parttab1 drop partition 
> (p1='def');
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
> logs. (state=08S01,code=1)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Dong Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151496#comment-14151496
 ] 

Dong Chen commented on HIVE-7685:
-

Hi Brock,

I think a brief design for this memory manager is:
Every new writer registers itself to the manager. The manager has an overall 
view of all the writers. When a condition is up (such as every 1000 rows), it 
will notify the writers to check memory usage and flush if necessary.

However, a problem for Parquet specifically is: Hive only has a wrapper for the 
ParquetRecordWriter, and even ParquetRecordWriter also wrap the real writer 
(InternalParquetRecordWriter) in Parquet project. Since the behaviors of 
measuring dynamic buffer size and flushing are private in the real writer, I 
think we also have to add code in InternalParquetRecordWriter to implement the 
memory manager functionality. 

It seems only changing Hive code cannot fix this Jira. 
Not sure whether we should put this problem in Parquet project and fix it 
there, if it is generic enough and not Hive specific? 

Any other ideas?

Best Regards,
Dong

> Parquet memory manager
> --
>
> Key: HIVE-7685
> URL: https://issues.apache.org/jira/browse/HIVE-7685
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Brock Noland
>
> Similar to HIVE-4248, Parquet tries to write large very large "row groups". 
> This causes Hive to run out of memory during dynamic partitions when a 
> reducer may have many Parquet files open at a given time.
> As such, we should implement a memory manager which ensures that we don't run 
> out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8222) CBO Trunk Merge: Fix Check Style issues

2014-09-29 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151498#comment-14151498
 ] 

Lars Francke commented on HIVE-8222:


Would anyone mind taking a look? Shall I open a review?
This one will probably go stale very fast so I'd appreciate a quick turnaround 
to avoid a lot of extra work.

> CBO Trunk Merge: Fix Check Style issues
> ---
>
> Key: HIVE-8222
> URL: https://issues.apache.org/jira/browse/HIVE-8222
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8222.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Attachment: HIVE-7776.2-spark.patch

add several MR configuration legacy to enable hive features which based on 
mapred.task.id/mapreduce.task.attempt.id/mapred.task.partition. 

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:10 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description (updated)
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 155abad 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3ff0782 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 02f9d99 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:11 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 89243fc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
0b8b7c9 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:13 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 89243fc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
0b8b7c9 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Attachment: HIVE-7776.3-spark.patch

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Status: Patch Available  (was: Open)

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151533#comment-14151533
 ] 

Hive QA commented on HIVE-7776:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671768/HIVE-7776.3-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-171/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/0.20/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/common-secure/target metastore/target common/target common/src/gen 
serde/target ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1628143.

At revision 1628143.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671768

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151539#comment-14151539
 ] 

Chengxiang Li commented on HIVE-7776:
-

This patch depends on HIVE-7627, I should re-upload it after HIVE-7627 has been 
committed.

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2573) Create per-session function registry

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151555#comment-14151555
 ] 

Hive QA commented on HIVE-2573:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671743/HIVE-2573.4.patch.txt

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 6365 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_collect_set
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_corr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_pop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_avg
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_percentile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_std
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sum
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_pop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_variance
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_invalid_row_sequence
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.service.TestHiveServerSessions.testSessionFuncs
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671743

> Create per-session function registry 
> -
>
> Key: HIVE-2573
> URL: https://issues.apache.org/jira/browse/HIVE-2573
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, 
> HIVE-2573.1.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, 
> HIVE-2573.4.patch.txt
>
>
> Currently the function registry is shared resource and could be overrided by 
> other users when using HiveServer. If per-session function registry is 
> provided, this situation could be prevented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151597#comment-14151597
 ] 

Hive QA commented on HIVE-8186:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671748/HIVE-8186.2.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6362 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671748

> Self join may fail if one side has VCs and other doesn't
> 
>
> Key: HIVE-8186
> URL: https://issues.apache.org/jira/browse/HIVE-8186
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt
>
>
> See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8267) Exposing hbase cell latest timestamp through hbase columns mappings to hive columns.

2014-09-29 Thread Muhammad Ehsan ul Haque (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151623#comment-14151623
 ] 

Muhammad Ehsan ul Haque commented on HIVE-8267:
---

My bad I just copy pasted from HIVE-2828 description (Originated from HIVE-2781 
and not accepted, but I think this could be helpful to someone). However there 
is one more HIVE-2306 still open, no patch available. 

Hive-2828, has failing test after 2.5 year rebase. Also exposes timestamp by 
picking the timestamp of the first cell only. 
{code}
long timestamp = result.rawCells()[0].getTimestamp();
{code}
Does not allow to expose timestamp of all or particular cells in some column 
families.

> Exposing hbase cell latest timestamp through hbase columns mappings to hive 
> columns.
> 
>
> Key: HIVE-8267
> URL: https://issues.apache.org/jira/browse/HIVE-8267
> Project: Hive
>  Issue Type: New Feature
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Muhammad Ehsan ul Haque
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-8267.0.patch
>
>
> Previous attempts HIVE-2781 (not accepted), HIVE-2828 (broken and proposed 
> with restricted feature).
> The feature is to have hbase cell latest timestamp accessible in hive query, 
> by mapping the cell timestamp with a hive column, using mapping format like 
> {code}:timestamp:cf:[optional qualifier or qualifier prefix]{code}
> The hive create table statement would be like
> h4. For mapping a cell latest timestamp.
> {code}
> CREATE TABLE hive_hbase_table (key STRING, col1 STRING, col1_ts BIGINT)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:qualifier, 
> :timestamp:cf:qualifier")
> TBLPROPERTIES ("hbase.table.name" = "hbase_table");
> {code}
> h4. For mapping a column family latest timestamp.
> {code}
> CREATE TABLE hive_hbase_table (key STRING, valuemap MAP, 
> timestampmap MAP)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:,:timestamp:cf:")
> TBLPROPERTIES ("hbase.table.name" = "hbase_table");
> {code}
> h4. Providing default cell value
> {code}
> CREATE TABLE hive_hbase_table(key int, value string, value_timestamp bigint)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:qualifier, 
> :timestamp:cf:qualifier",
>   "hbase.put.default.cell.value" = "default value")
> TBLPROPERTIES ("hbase.table.name" = "hbase_table");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151657#comment-14151657
 ] 

Hive QA commented on HIVE-8283:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671749/HIVE-8283.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1035/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671749

> Missing break in FilterSelectivityEstimator#visitCall()
> ---
>
> Key: HIVE-8283
> URL: https://issues.apache.org/jira/browse/HIVE-8283
> Project: Hive
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: HIVE-8283.1.patch.txt
>
>
> {code}
> case NOT_EQUALS: {
>   selectivity = computeNotEqualitySelectivity(call);
> }
> case LESS_THAN_OR_EQUAL:
> case GREATER_THAN_OR_EQUAL:
> case LESS_THAN:
> case GREATER_THAN: {
>   selectivity = ((double) 1 / (double) 3);
>   break;
> }
> {code}
> break is missing for NOT_EQUALS case. selectivity would be overwritten with 
> 1/3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151701#comment-14151701
 ] 

Yongzhi Chen commented on HIVE-8182:


Is trimming very line a good idea? I think to be consistent with single line 
case, maybe only trim last line is better choice. 
My suggestion is add
line = line.trim();
before
if (line.endsWith(";")) { line = line.substring(0, line.length() - 1); }

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151736#comment-14151736
 ] 

Hive QA commented on HIVE-8196:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671753/HIVE-8196.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671753

> Joining on partition columns with fetch column stats enabled results it very 
> small CE which negatively affects query performance 
> -
>
> Key: HIVE-8196
> URL: https://issues.apache.org/jira/browse/HIVE-8196
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Blocker
>  Labels: performance
> Fix For: 0.14.0
>
> Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
> HIVE-8196.4.patch
>
>
> To make the best out of dynamic partition pruning joins should be on the 
> partitioning columns which results in dynamically pruning the partitions from 
> the fact table based on the qualifying column keys from the dimension table, 
> this type of joins negatively effects on cardinality estimates with fetch 
> column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV 
> is set to row count, doing that negatively affects the estimated join 
> selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of 
> partitions incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
> set to row count 
> {code}
>   if (encd.getIsPartitionColOrVirtualCol()) {
> // vitual columns
> colType = encd.getTypeInfo().getTypeName();
> countDistincts = numRows;
> oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=true;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date 
> from store_sales, date_dim 
> where 
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
> date_dim.d_year = 1998;
> {code}
> Plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
>   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ss_sold_date_sk is not null (type: boolean)
>   Statistics: Num rows: 550076554 Data size: 47370018816 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {ss_sold_date_sk}
>   1 {d_date_sk} {d_date}
> keys:
>   0 ss_sold_date_sk (type: int)
>   1 d_date_sk (type: int)
> outputColumnNames: _col22, _col26, _col28
> input vertices:
>   1 Map 2
> Statistics: Num rows: 652 Data size: 66504 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: (_col22 = _col26) (type: boolean)
>   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col28 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 326 D

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-29 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151757#comment-14151757
 ] 

Damien Carol commented on HIVE-8231:


[~alangates] I made lot of tests in WE. It seems that INSERT/DELETE/UPDATE 
doesn't work at all with concurrency enabled.

If I deactivate ACID with :
{noformat}


  hive.support.concurrency
  false




  hive.txn.manager
  org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

{noformat}
then everything is ok.

> Error when insert into empty table with ACID
> 
>
> Key: HIVE-8231
> URL: https://issues.apache.org/jira/browse/HIVE-8231
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.14.0
>
>
> Steps to show the bug :
> 1. create table 
> {code}
> create table encaissement_1b_64m like encaissement_1b;
> {code}
> 2. check table 
> {code}
> desc encaissement_1b_64m;
> dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
> {code}
> everything is ok:
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> desc encaissement_1b_64m;
>   
> +++--+--+
> |  col_name  | data_type  | comment  |
> +++--+--+
> | id | int|  |
> | idmagasin  | int|  |
> | zibzin | string |  |
> | cheque | int|  |
> | montant| double |  |
> | date   | timestamp  |  |
> | col_6  | string |  |
> | col_7  | string |  |
> | col_8  | string |  |
> +++--+--+
> 9 rows selected (0.158 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS Output  |
> +-+--+
> +-+--+
> No rows selected (0.01 seconds)
> {noformat}
> 3. Insert values into the new table
> {noformat}
> insert into table encaissement_1b_64m VALUES (1, 1, 
> '8909', 1, 12.5, '12/05/2014', '','','');
> {noformat}
> 4. Check
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> select id from encaissement_1b_64m;
> +-+--+
> | id  |
> +-+--+
> +-+--+
> No rows selected (0.091 seconds)
> {noformat}
> There are already a pb. I don't see the inserted row.
> 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS 
> Output  |
> +-+--+
> | Found 1 items   
> |
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
>   |
> +-+--+
> 2 rows selected (0.014 seconds)
> {noformat}
> 6. Doing a major compaction solves the bug
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> alter table encaissement_1b_64m compact 
> 'major';
> No rows affected (0.046 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> ++--+
> | DFS Output  
>|
> ++--+
> | Found 1 items   
>|
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
> |
> +---

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-29 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151758#comment-14151758
 ] 

Damien Carol commented on HIVE-8231:


To be more precise, this commands works :
{code}
drop table if exists foo6;
create table foo6 (id int) clustered by (id) into 1 buckets;
insert into table foo6 VALUES(1);
select * from foo6;

drop table if exists foo7;
create table foo7 (id int) STORED AS ORC;
insert into table foo7 VALUES(1);
select * from foo7;
{code}

> Error when insert into empty table with ACID
> 
>
> Key: HIVE-8231
> URL: https://issues.apache.org/jira/browse/HIVE-8231
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.14.0
>
>
> Steps to show the bug :
> 1. create table 
> {code}
> create table encaissement_1b_64m like encaissement_1b;
> {code}
> 2. check table 
> {code}
> desc encaissement_1b_64m;
> dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
> {code}
> everything is ok:
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> desc encaissement_1b_64m;
>   
> +++--+--+
> |  col_name  | data_type  | comment  |
> +++--+--+
> | id | int|  |
> | idmagasin  | int|  |
> | zibzin | string |  |
> | cheque | int|  |
> | montant| double |  |
> | date   | timestamp  |  |
> | col_6  | string |  |
> | col_7  | string |  |
> | col_8  | string |  |
> +++--+--+
> 9 rows selected (0.158 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS Output  |
> +-+--+
> +-+--+
> No rows selected (0.01 seconds)
> {noformat}
> 3. Insert values into the new table
> {noformat}
> insert into table encaissement_1b_64m VALUES (1, 1, 
> '8909', 1, 12.5, '12/05/2014', '','','');
> {noformat}
> 4. Check
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> select id from encaissement_1b_64m;
> +-+--+
> | id  |
> +-+--+
> +-+--+
> No rows selected (0.091 seconds)
> {noformat}
> There are already a pb. I don't see the inserted row.
> 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS 
> Output  |
> +-+--+
> | Found 1 items   
> |
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
>   |
> +-+--+
> 2 rows selected (0.014 seconds)
> {noformat}
> 6. Doing a major compaction solves the bug
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> alter table encaissement_1b_64m compact 
> 'major';
> No rows affected (0.046 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> ++--+
> | DFS Output  
>|
> ++--+
> | Found 1 items   
>|
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
> |
> +--

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-29 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151759#comment-14151759
 ] 

Damien Carol commented on HIVE-8231:


This bug is still here even with HIVE-8203 committed.

> Error when insert into empty table with ACID
> 
>
> Key: HIVE-8231
> URL: https://issues.apache.org/jira/browse/HIVE-8231
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.14.0
>
>
> Steps to show the bug :
> 1. create table 
> {code}
> create table encaissement_1b_64m like encaissement_1b;
> {code}
> 2. check table 
> {code}
> desc encaissement_1b_64m;
> dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
> {code}
> everything is ok:
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> desc encaissement_1b_64m;
>   
> +++--+--+
> |  col_name  | data_type  | comment  |
> +++--+--+
> | id | int|  |
> | idmagasin  | int|  |
> | zibzin | string |  |
> | cheque | int|  |
> | montant| double |  |
> | date   | timestamp  |  |
> | col_6  | string |  |
> | col_7  | string |  |
> | col_8  | string |  |
> +++--+--+
> 9 rows selected (0.158 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS Output  |
> +-+--+
> +-+--+
> No rows selected (0.01 seconds)
> {noformat}
> 3. Insert values into the new table
> {noformat}
> insert into table encaissement_1b_64m VALUES (1, 1, 
> '8909', 1, 12.5, '12/05/2014', '','','');
> {noformat}
> 4. Check
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> select id from encaissement_1b_64m;
> +-+--+
> | id  |
> +-+--+
> +-+--+
> No rows selected (0.091 seconds)
> {noformat}
> There are already a pb. I don't see the inserted row.
> 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS 
> Output  |
> +-+--+
> | Found 1 items   
> |
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
>   |
> +-+--+
> 2 rows selected (0.014 seconds)
> {noformat}
> 6. Doing a major compaction solves the bug
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> alter table encaissement_1b_64m compact 
> 'major';
> No rows affected (0.046 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> ++--+
> | DFS Output  
>|
> ++--+
> | Found 1 items   
>|
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
> |
> ++--+
> 2 rows selected (0.02 seconds)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8289) Exclude temp tables in compactor threads

2014-09-29 Thread Damien Carol (JIRA)

Damien Carol created HIVE-8289:
--

 Summary: Exclude temp tables in compactor threads
 Key: HIVE-8289
 URL: https://issues.apache.org/jira/browse/HIVE-8289
 Project: Hive
  Issue Type: Improvement
Reporter: Damien Carol
Priority: Minor


Currently, compactor thread try to compact temp table.
This throws errors like this one :
{noformat}
2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
(Initiator.java:run(111)) - Caught exception while trying to determine if we 
should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
repeated failures, java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)

2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
(CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
record
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151792#comment-14151792
 ] 

Brock Noland commented on HIVE-7685:


Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in HIve. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock

> Parquet memory manager
> --
>
> Key: HIVE-7685
> URL: https://issues.apache.org/jira/browse/HIVE-7685
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Brock Noland
>
> Similar to HIVE-4248, Parquet tries to write large very large "row groups". 
> This causes Hive to run out of memory during dynamic partitions when a 
> reducer may have many Parquet files open at a given time.
> As such, we should implement a memory manager which ensures that we don't run 
> out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Status: Patch Available  (was: Open)

Trim the line once instead of doing it on every line.

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 0.12.0
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151792#comment-14151792
 ] 

Brock Noland edited comment on HIVE-7685 at 9/29/14 3:31 PM:
-

Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in Hive. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock


was (Author: brocknoland):
Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in HIve. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock

> Parquet memory manager
> --
>
> Key: HIVE-7685
> URL: https://issues.apache.org/jira/browse/HIVE-7685
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Brock Noland
>
> Similar to HIVE-4248, Parquet tries to write large very large "row groups". 
> This causes Hive to run out of memory during dynamic partitions when a 
> reducer may have many Parquet files open at a given time.
> As such, we should implement a memory manager which ensures that we don't run 
> out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Attachment: HIVE-8182.2.patch

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Status: Open  (was: Patch Available)

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 0.12.0
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151789#comment-14151789
 ] 

Sergio Peña commented on HIVE-8182:
---

Thanks [~ychena]

I agree with having only one trimming instead of doing it every line. We can 
reduce extra work on Hive by using your suggestion. I did the test and it 
worked.

I'll upload another patch.

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151844#comment-14151844
 ] 

Hive QA commented on HIVE-8287:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671758/HIVE-8287.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_partition_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_rename_partition_failure2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_temp_table_rename
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1037/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671758

> StorageBasedAuth in metastore does not produce useful error message
> ---
>
> Key: HIVE-8287
> URL: https://issues.apache.org/jira/browse/HIVE-8287
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch
>
>
> Example of error message that doesn't given enough useful information -
> {noformat}
> 0: jdbc:hive2://localhost:1> alter table parttab1 drop partition 
> (p1='def');
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
> logs. (state=08S01,code=1)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase

2014-09-29 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151846#comment-14151846
 ] 

Swarnim Kulkarni commented on HIVE-6148:


The failed test seems flaky and unrelated to my changes here.

> Support arbitrary structs stored in HBase
> -
>
> Key: HIVE-6148
> URL: https://issues.apache.org/jira/browse/HIVE-6148
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.12.0
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, 
> HIVE-6148.3.patch.txt
>
>
> We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

2014-09-29 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7627:
---
Attachment: HIVE-7627.5-spark.patch

Re-uploading the same patch to test the precommit infra.

> FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
> -
>
> Key: HIVE-7627
> URL: https://issues.apache.org/jira/browse/HIVE-7627
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: spark-m1
> Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
> HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
> HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch
>
>
> Hive table statistic failed on FSStatsPublisher mode, with the following 
> exception in Spark executor side:
> {noformat}
> 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
> 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
> mismatch. Request id and saved id: 20277 , 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>

[jira] [Commented] (HIVE-8245) Collect table read entities at same time as view read entities

2014-09-29 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151883#comment-14151883
 ] 

Ashutosh Chauhan commented on HIVE-8245:


Committed to 0.14 branch.

> Collect table read entities at same time as view read entities 
> ---
>
> Key: HIVE-8245
> URL: https://issues.apache.org/jira/browse/HIVE-8245
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Security
>Affects Versions: 0.13.0, 0.14.0, 0.13.1
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8245.1.patch, HIVE-8245.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8245) Collect table read entities at same time as view read entities

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8245:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> Collect table read entities at same time as view read entities 
> ---
>
> Key: HIVE-8245
> URL: https://issues.apache.org/jira/browse/HIVE-8245
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Security
>Affects Versions: 0.13.0, 0.14.0, 0.13.1
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8245.1.patch, HIVE-8245.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8191) Update and delete on tables with non Acid output formats gives runtime error

2014-09-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8191:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch 3 checked in.  Thanks Eugene for the review.

> Update and delete on tables with non Acid output formats gives runtime error
> 
>
> Key: HIVE-8191
> URL: https://issues.apache.org/jira/browse/HIVE-8191
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8191.2.patch, HIVE-8191.3.patch, HIVE-8191.patch
>
>
> {code}
> create table not_an_acid_table(a int, b varchar(128));
> insert into table not_an_acid_table select cint, cast(cstring1 as 
> varchar(128)) from alltypesorc where cint is not null order by cint limit 10;
> delete from not_an_acid_table where b = '0ruyd6Y50JpdGRf6HqD';
> {code}
> This generates a runtime error.  It should get a compile error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Alan Gates (JIRA)

Alan Gates created HIVE-8290:


 Summary: With DbTxnManager configured, all ORC tables forced to be 
transactional
 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


Currently, once a user configures DbTxnManager to the be transaction manager, 
all tables that use ORC are expected to be transactional.  This means they all 
have to have buckets.  This most likely won't be what users want.

We need to add a specific mark to a table so that users can indicate it should 
be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151896#comment-14151896
 ] 

Alan Gates commented on HIVE-8290:
--

[~vikram.dixit] I'd like to get this into 0.14 as I believe not having it is a 
big usability issue, and it will be a backwards incompatible change if we add 
it later.

> With DbTxnManager configured, all ORC tables forced to be transactional
> ---
>
> Key: HIVE-8290
> URL: https://issues.apache.org/jira/browse/HIVE-8290
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Currently, once a user configures DbTxnManager to the be transaction manager, 
> all tables that use ORC are expected to be transactional.  This means they 
> all have to have buckets.  This most likely won't be what users want.
> We need to add a specific mark to a table so that users can indicate it 
> should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151918#comment-14151918
 ] 

Hive QA commented on HIVE-7627:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671815/HIVE-7627.5-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6509 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-173/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671815

> FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
> -
>
> Key: HIVE-7627
> URL: https://issues.apache.org/jira/browse/HIVE-7627
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: spark-m1
> Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
> HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
> HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch
>
>
> Hive table statistic failed on FSStatsPublisher mode, with the following 
> exception in Spark executor side:
> {noformat}
> 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
> 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
> mismatch. Request id and saved id: 20277 , 20278 for fil

[jira] [Updated] (HIVE-8114) Type resolution for udf arguments of Decimal Type results in error

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8114:
---
Fix Version/s: (was: 0.15.0)

> Type resolution for udf arguments of Decimal Type results in error
> --
>
> Key: HIVE-8114
> URL: https://issues.apache.org/jira/browse/HIVE-8114
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Types
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Ashutosh Chauhan
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-8114.1.patch
>
>
> {code}
> select log (2, 10.5BD) from src;
> {code}
> results in exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8228) CBO: fix couple of issues with partition pruning

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8228:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> CBO: fix couple of issues with partition pruning
> 
>
> Key: HIVE-8228
> URL: https://issues.apache.org/jira/browse/HIVE-8228
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.14.0
>
> Attachments: HIVE-8228.1.patch
>
>
> - Pruner doesn't handle non-deterministic UDFs correctly
> - Plan genned after CBO has a Project between TScan and Filter; which 
> prevents PartPruning from triggering in hive post CBO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8290:
-
Status: Patch Available  (was: Open)

> With DbTxnManager configured, all ORC tables forced to be transactional
> ---
>
> Key: HIVE-8290
> URL: https://issues.apache.org/jira/browse/HIVE-8290
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8290.patch
>
>
> Currently, once a user configures DbTxnManager to the be transaction manager, 
> all tables that use ORC are expected to be transactional.  This means they 
> all have to have buckets.  This most likely won't be what users want.
> We need to add a specific mark to a table so that users can indicate it 
> should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8290:
-
Attachment: HIVE-8290.patch

This patch changes the SemanticAnalyzer to look for a table property 
"transactional" before treating a table as requiring transactions.  I also 
added a number of negative tests for things such as making sure the buckets 
aren't sorted, etc.

> With DbTxnManager configured, all ORC tables forced to be transactional
> ---
>
> Key: HIVE-8290
> URL: https://issues.apache.org/jira/browse/HIVE-8290
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8290.patch
>
>
> Currently, once a user configures DbTxnManager to the be transaction manager, 
> all tables that use ORC are expected to be transactional.  This means they 
> all have to have buckets.  This most likely won't be what users want.
> We need to add a specific mark to a table so that users can indicate it 
> should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8266) create function using statement compilation should include resource URI entity

2014-09-29 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8266:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks [~brocknoland] for the review!

> create function using  statement compilation should include 
> resource URI entity
> -
>
> Key: HIVE-8266
> URL: https://issues.apache.org/jira/browse/HIVE-8266
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.15.0
>
> Attachments: HIVE-8266.2.patch, HIVE-8266.3.patch
>
>
> The compiler add function name and db name as write entities for "create 
> function using " statement. We should also include the resource URI 
> path in the write entity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8223) CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8223:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering
> -
>
> Key: HIVE-8223
> URL: https://issues.apache.org/jira/browse/HIVE-8223
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.14.0
>
> Attachments: HIVE-8223.01.patch, HIVE-8223.02.patch, HIVE-8223.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8199) CBO Trunk Merge: quote2 test fails due to incorrect literal translation

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8199:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> CBO Trunk Merge: quote2 test fails due to incorrect literal translation
> ---
>
> Key: HIVE-8199
> URL: https://issues.apache.org/jira/browse/HIVE-8199
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.14.0
>
> Attachments: HIVE-8199.01.patch, HIVE-8199.02.patch, HIVE-8199.patch
>
>
> Quoting of quotes and slashes is lost in translation back from CBO to AST, it 
> seems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151944#comment-14151944
 ] 

Hive QA commented on HIVE-8226:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671756/HIVE-8226.03.patch

{color:green}SUCCESS:{color} +1 6363 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1038/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671756

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8111:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
> 
>
> Key: HIVE-8111
> URL: https://issues.apache.org/jira/browse/HIVE-8111
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.14.0
>
> Attachments: HIVE-8111.01.patch, HIVE-8111.02.patch, 
> HIVE-8111.03.patch, HIVE-8111.patch
>
>
> Original test failure: looks like column type changes to different decimals 
> in most cases. In one case it causes the integer part to be too big to fit, 
> so the result becomes null it seems.
> What happens is that CBO adds casts to arithmetic expressions to make them 
> type compatible; these casts become part of new AST, and then Hive adds casts 
> on top of these casts. This (the first part) also causes lots of out file 
> changes. It's not clear how to best fix it so far, in addition to incorrect 
> decimal width and sometimes nulls when width is larger than allowed in Hive.
> Option one - don't add those for numeric ops - cannot be done if numeric op 
> is a part of compare, for which CBO needs correct types.
> Option two - unwrap casts when determining type in Hive - hard or impossible 
> to tell apart CBO-added casts and user casts. 
> Option three - don't change types in Hive if CBO has run - seems hacky and 
> hard to ensure it's applied everywhere.
> Option four - map all expressions precisely between two trees and remove 
> casts again after optimization, will be pretty difficult.
> Option five - somehow mark those casts. Not sure about how yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Prasanth J  (was: Alan Gates)

> Reading from partitioned bucketed tables has high overhead, 50% of time is 
> spent in OrcInputFormat.getReader
> 
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Fix For: 0.14.0
>
>
> When loading into a partitioned bucketed sorted table the query fails with 
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to create file 
> [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
>  for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
> client [172.21.128.111], because this file is already being created by 
> [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
> [172.21.128.122]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy15.create(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy15.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966)
>   at 
> org.apache.hadoop.hive.ql.io.orc.W

[jira] [Created] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)

Mostafa Mokhtar created HIVE-8291:
-

 Summary: Reading from partitioned bucketed tables has high 
overhead, 50% of time is spent in OrcInputFormat.getReader
 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Alan Gates
 Fix For: 0.14.0


When loading into a partitioned bucketed sorted table the query fails with 
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
 for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
client [172.21.128.111], because this file is already being created by 
[DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
[172.21.128.122]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1983)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2287)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:356)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriter
{code}

DDL  
{code}
CREATE TABLE stor

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
  hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002   
58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572
   
mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
  
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 
  58.016
 
hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)   
1,891   55.325

hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)
1,723   50.41
   
hive.common.ValidTxnListImpl.(String)  934 27.326
   conf.Configuration.get(String, String)   
621 18.169
 {code}

  was:
When loading into a partitioned bucketed sorted table the query fails with 
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
 for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
client [172.21.128.111], because this file is already being created by 
[DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
[172.21.128.122]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
at org.apac

[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.

2014-09-29 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151962#comment-14151962
 ] 

Ashutosh Chauhan commented on HIVE-8270:


LGTM +1
As Thejas pointed out, we should clarify in doc that this is meant for remote 
HS2, not for embedded one.

> JDBC uber jar is missing some classes required in secure setup.
> ---
>
> Key: HIVE-8270
> URL: https://issues.apache.org/jira/browse/HIVE-8270
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-8270.1.patch
>
>
> JDBC uber jar is missing some required classes for a secure setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25497: HIVE-7627, FSStatsPublisher does fit into Spark multi-thread task mode

2014-09-29 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25497/#review54837
---

Ship it!


Ship It!

- Xuefu Zhang


On Sept. 28, 2014, 9:50 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25497/
> ---
> 
> (Updated Sept. 28, 2014, 9:50 a.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: HIVE-7627
> https://issues.apache.org/jira/browse/HIVE-7627
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive table statistic failed on FSStatsPublisher mode because of missing 
> "mapred.task.patition" parameter.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 1674d4b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 0b8b7c9 
> 
> Diff: https://reviews.apache.org/r/25497/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>

[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

2014-09-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151963#comment-14151963
 ] 

Xuefu Zhang commented on HIVE-7627:
---

+1

> FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
> -
>
> Key: HIVE-7627
> URL: https://issues.apache.org/jira/browse/HIVE-7627
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: spark-m1
> Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
> HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
> HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch
>
>
> Hive table statistic failed on FSStatsPublisher mode, with the following 
> exception in Spark executor side:
> {noformat}
> 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
> 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
> mismatch. Request id and saved id: 20277 , 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Serve

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
  hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002   
58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572
   
mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

2014-09-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7627:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chengxiang for the contribution.

> FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
> -
>
> Key: HIVE-7627
> URL: https://issues.apache.org/jira/browse/HIVE-7627
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: spark-m1
> Fix For: spark-branch
>
> Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
> HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
> HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch
>
>
> Hive table statistic failed on FSStatsPublisher mode, with the following 
> exception in Spark executor side:
> {noformat}
> 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
> 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
> mismatch. Request id and saved id: 20277 , 20278 for file 
> /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.ja

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Alan Gates  (was: Prasanth J)

> Reading from partitioned bucketed tables has high overhead, 50% of time is 
> spent in OrcInputFormat.getReader
> 
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Alan Gates
> Fix For: 0.14.0
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in 
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace   Sample CountPercentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
>   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
>  Object)  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984   58.046
>   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
> Reporter)   1,983   58.016
>   
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
>   1,891   55.325
>   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
> AcidInputFormat$Options)1,723   50.41
>   hive.common.ValidTxnListImpl.(String) 
> 934 27.326
> conf.Configuration.get(String, String)621 
> 18.169
>  {code}
> Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 15% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample CountPercentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
> 28.613
>org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
> 978 28.613
>   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
> 866 25.336
>  
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
> 866 25.336
> java.net.URI.relativize(URI)  655 19.163
>java.net.URI.relativize(URI, URI)  655 19.163
>   java.net.URI.normalize(String)  517 15.126
>   java.net.URI.needsNormalization(String) 
> 372 10.884
>  java.lang.String.charAt(int) 235 
> 6.875
> 
> java.net.URI.equal(String, String)27  0.79
> 
> java.lang.StringBuilder.toString()1   0.029
> 
> java.lang.StringBuilder.()  1   0.029
> 
> java.lang.StringBuilder.append(String)1   0.029
>   
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
>   4.886
>  
> org.apache.hadoop.fs.Path.(String) 162 4.74
> 
> org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
> 4.74
>   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
>  org.apache.commons.lang.StringUtils.replace(String, String, String)  
> 97  2.838
> org.apache.commons.lang.StringUtils.replace(String, String, 
> String, int)  97  2.838
>java.lang.String.indexOf(String, int)  97  2.838
>   java.net.URI.(String, String, String, String, String) 
> 65  1.902
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8238) [CBO] Preserve subquery alias while generating ast

2014-09-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8238:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Non-trivial to fix. HIVE-8245 solves immediate problem of view authorization.

> [CBO] Preserve subquery alias while generating ast
> --
>
> Key: HIVE-8238
> URL: https://issues.apache.org/jira/browse/HIVE-8238
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8238.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7776:
--
Attachment: HIVE-7776.3-spark.patch

Reattach the same patch to trigger test run.

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch, HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq

2014-09-29 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8261:

Attachment: HIVE-8261.1.patch

> CBO : Predicate pushdown is removed by Optiq 
> -
>
> Key: HIVE-8261
> URL: https://issues.apache.org/jira/browse/HIVE-8261
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0, 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Harish Butani
> Fix For: 0.14.0
>
> Attachments: HIVE-8261.1.patch
>
>
> Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I 
> realized that predicate pushdown is not applied on date_dim d1.
> Interestingly before optiq we have the predicate pushed :
> {code}
> HiveFilterRel(condition=[<=($5, $1)])
> HiveJoinRel(condition=[=($3, $6)], joinType=[inner])
>   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], 
> _o__col3=[$1])
> HiveFilterRel(condition=[=($0, 2000)])
>   HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)])
> HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
>   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
> HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
>   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
> HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
> ss_wholesale_cost=[$11])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
> HiveProjectRel(d_date_sk=[$0], d_year=[$6])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
>   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
> 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
> between(false, $1, +(35, 1), +(35, 15)))])
> HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
> i_color=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
> HiveProjectRel(_o__col0=[$0])
>   HiveAggregateRel(group=[{0}])
> HiveProjectRel($f0=[$0])
>   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
> joinType=[inner])
> HiveProjectRel(cs_item_sk=[$15], 
> cs_order_number=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
> HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
>   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1])
> HiveFilterRel(condition=[=($0, +(2000, 1))])
>   HiveAggregateRel(group=[{0, 1}], agg#0=[count()])
> HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
>   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
> HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
>   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
> HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
> ss_wholesale_cost=[$11])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
> HiveProjectRel(d_date_sk=[$0], d_year=[$6])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
>   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
> 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
> between(false, $1, +(35, 1), +(35, 15)))])
> HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
> i_color=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
> HiveProjectRel(_o__col0=[$0])
>   HiveAggregateRel(group=[{0}])
> HiveProjectRel($f0=[$0])
>   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
> joinType=[inner])
> HiveProjectRel(cs_item_sk=[$15], 
> cs_order_number=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
> HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
> {code}
> While after Optiq the filter on date_dim gets pulled up the plan 
> {code}
>   HiveFilterRel(condition=[<=($5, $1)]): rowcount = 1.0, cumulative cost = 
> {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895
> HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
> _o__col3=[$3], _o__col00=[$4], _o__col1

[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq

2014-09-29 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8261:

Status: Patch Available  (was: Open)

> CBO : Predicate pushdown is removed by Optiq 
> -
>
> Key: HIVE-8261
> URL: https://issues.apache.org/jira/browse/HIVE-8261
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.13.1, 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Harish Butani
> Fix For: 0.14.0
>
> Attachments: HIVE-8261.1.patch
>
>
> Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I 
> realized that predicate pushdown is not applied on date_dim d1.
> Interestingly before optiq we have the predicate pushed :
> {code}
> HiveFilterRel(condition=[<=($5, $1)])
> HiveJoinRel(condition=[=($3, $6)], joinType=[inner])
>   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], 
> _o__col3=[$1])
> HiveFilterRel(condition=[=($0, 2000)])
>   HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)])
> HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
>   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
> HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
>   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
> HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
> ss_wholesale_cost=[$11])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
> HiveProjectRel(d_date_sk=[$0], d_year=[$6])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
>   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
> 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
> between(false, $1, +(35, 1), +(35, 15)))])
> HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
> i_color=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
> HiveProjectRel(_o__col0=[$0])
>   HiveAggregateRel(group=[{0}])
> HiveProjectRel($f0=[$0])
>   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
> joinType=[inner])
> HiveProjectRel(cs_item_sk=[$15], 
> cs_order_number=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
> HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
>   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1])
> HiveFilterRel(condition=[=($0, +(2000, 1))])
>   HiveAggregateRel(group=[{0, 1}], agg#0=[count()])
> HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
>   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
> HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
>   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
> HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
> ss_wholesale_cost=[$11])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
> HiveProjectRel(d_date_sk=[$0], d_year=[$6])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
>   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
> 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
> between(false, $1, +(35, 1), +(35, 15)))])
> HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
> i_color=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
> HiveProjectRel(_o__col0=[$0])
>   HiveAggregateRel(group=[{0}])
> HiveProjectRel($f0=[$0])
>   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
> joinType=[inner])
> HiveProjectRel(cs_item_sk=[$15], 
> cs_order_number=[$17])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
> HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
>   
> HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
> {code}
> While after Optiq the filter on date_dim gets pulled up the plan 
> {code}
>   HiveFilterRel(condition=[<=($5, $1)]): rowcount = 1.0, cumulative cost = 
> {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895
> HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
> _o__col3=[$3], _o__col00=[$4], _

[jira] [Updated] (HIVE-7971) Support alter table change/replace/add columns for existing partitions

2014-09-29 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7971:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.14 branch.

> Support alter table change/replace/add columns for existing partitions
> --
>
> Key: HIVE-7971
> URL: https://issues.apache.org/jira/browse/HIVE-7971
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-7971.1.patch, HIVE-7971.2.patch, HIVE-7971.3.patch
>
>
> ALTER TABLE CHANGE COLUMN is allowed for tables, but not for partitions. Same 
> for add/replace columns.
> Allowing this for partitions can be useful in some cases. For example, one 
> user has tables with Hive 0.12 Decimal columns, which do not specify 
> precision/scale. To be able to properly read the decimal values from the 
> existing partitions, the column types in the partitions need to be changed to 
> decimal types with precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]

2014-09-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152002#comment-14152002
 ] 

Xuefu Zhang commented on HIVE-8180:
---

Hi [~chinnalalam], the patch looks very good. I just had a very miner comment 
on RB. Thanks.

> Update SparkReduceRecordHandler for processing the vectors [spark branch]
> -
>
> Key: HIVE-8180
> URL: https://issues.apache.org/jira/browse/HIVE-8180
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>  Labels: Spark-M1
> Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
> HIVE-8180.2-spark.patch
>
>
> Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152008#comment-14152008
 ] 

Vikram Dixit K commented on HIVE-8290:
--

+1 for 0.14.

> With DbTxnManager configured, all ORC tables forced to be transactional
> ---
>
> Key: HIVE-8290
> URL: https://issues.apache.org/jira/browse/HIVE-8290
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8290.patch
>
>
> Currently, once a user configures DbTxnManager to the be transaction manager, 
> all tables that use ORC are expected to be transactional.  This means they 
> all have to have buckets.  This most likely won't be what users want.
> We need to add a specific mark to a table so that users can indicate it 
> should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-29 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152007#comment-14152007
 ] 

Alan Gates commented on HIVE-8231:
--

I think I can reproduce the same bug with 2 command line sessions doing things 
in the following order:

# Start session 1
# in session 1 insert into table
# start session 2
# in session 2 select * see all rows
# in session 1, delete some rows
# in session 1 selec *, see less rows
# in session 2 select * , see all rows

If I stop and restart session 2 after this, than it sees the appropriate number 
of rows.  So either it isn't getting new transaction information for each query 
in the session, or the results are being cached somewhere on it.

Does this match the behavior you're seeing?

> Error when insert into empty table with ACID
> 
>
> Key: HIVE-8231
> URL: https://issues.apache.org/jira/browse/HIVE-8231
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.14.0
>
>
> Steps to show the bug :
> 1. create table 
> {code}
> create table encaissement_1b_64m like encaissement_1b;
> {code}
> 2. check table 
> {code}
> desc encaissement_1b_64m;
> dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
> {code}
> everything is ok:
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> desc encaissement_1b_64m;
>   
> +++--+--+
> |  col_name  | data_type  | comment  |
> +++--+--+
> | id | int|  |
> | idmagasin  | int|  |
> | zibzin | string |  |
> | cheque | int|  |
> | montant| double |  |
> | date   | timestamp  |  |
> | col_6  | string |  |
> | col_7  | string |  |
> | col_8  | string |  |
> +++--+--+
> 9 rows selected (0.158 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS Output  |
> +-+--+
> +-+--+
> No rows selected (0.01 seconds)
> {noformat}
> 3. Insert values into the new table
> {noformat}
> insert into table encaissement_1b_64m VALUES (1, 1, 
> '8909', 1, 12.5, '12/05/2014', '','','');
> {noformat}
> 4. Check
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> select id from encaissement_1b_64m;
> +-+--+
> | id  |
> +-+--+
> +-+--+
> No rows selected (0.091 seconds)
> {noformat}
> There are already a pb. I don't see the inserted row.
> 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> +-+--+
> | DFS 
> Output  |
> +-+--+
> | Found 1 items   
> |
> | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
>   |
> +-+--+
> 2 rows selected (0.014 seconds)
> {noformat}
> 6. Doing a major compaction solves the bug
> {noformat}
> 0: jdbc:hive2://nc-h04:1/casino> alter table encaissement_1b_64m compact 
> 'major';
> No rows affected (0.046 seconds)
> 0: jdbc:hive2://nc-h04:1/casino> dfs -ls 
> hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
> ++--+
> | DFS Output  
>|
> ++--+
> | Found 1 items   
>

[jira] [Commented] (HIVE-7843) orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch]

2014-09-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152012#comment-14152012
 ] 

Xuefu Zhang commented on HIVE-7843:
---

Hi [~vkorukanti], would you like to reload the patch to trigger the test run? 
The build VM were killed in the weekend.

> orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark 
> Branch]
> ---
>
> Key: HIVE-7843
> URL: https://issues.apache.org/jira/browse/HIVE-7843
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>  Labels: Spark-M1
> Fix For: spark-branch
>
> Attachments: HIVE-7843.1-spark.patch
>
>
> {code}
> java.lang.AssertionError: data length is different from num of DP columns
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
> org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
> org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152015#comment-14152015
 ] 

Prasanth J commented on HIVE-8226:
--

Committed patch to trunk. I will wait for [~vikram.dixit] to weigh this for 
branch-0.14 commit.

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152019#comment-14152019
 ] 

Vikram Dixit K commented on HIVE-8226:
--

+1 for 0.14

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Owen O'Malley  (was: Alan Gates)

> Reading from partitioned bucketed tables has high overhead, 50% of time is 
> spent in OrcInputFormat.getReader
> 
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in 
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace   Sample CountPercentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
>   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
>  Object)  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984   58.046
>   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
> Reporter)   1,983   58.016
>   
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
>   1,891   55.325
>   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
> AcidInputFormat$Options)1,723   50.41
>   hive.common.ValidTxnListImpl.(String) 
> 934 27.326
> conf.Configuration.get(String, String)621 
> 18.169
>  {code}
> Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 15% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample CountPercentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
> 28.613
>org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
> 978 28.613
>   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
> 866 25.336
>  
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
> 866 25.336
> java.net.URI.relativize(URI)  655 19.163
>java.net.URI.relativize(URI, URI)  655 19.163
>   java.net.URI.normalize(String)  517 15.126
>   java.net.URI.needsNormalization(String) 
> 372 10.884
>  java.lang.String.charAt(int) 235 
> 6.875
> 
> java.net.URI.equal(String, String)27  0.79
> 
> java.lang.StringBuilder.toString()1   0.029
> 
> java.lang.StringBuilder.()  1   0.029
> 
> java.lang.StringBuilder.append(String)1   0.029
>   
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
>   4.886
>  
> org.apache.hadoop.fs.Path.(String) 162 4.74
> 
> org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
> 4.74
>   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
>  org.apache.commons.lang.StringUtils.replace(String, String, String)  
> 97  2.838
> org.apache.commons.lang.StringUtils.replace(String, String, 
> String, int)  97  2.838
>java.lang.String.indexOf(String, int)  97  2.838
>   java.net.URI.(String, String, String, String, String) 
> 65  1.902
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8226:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch-0.14 as well. Thanks [~mmccline] and [~vikram.dixit]!

> Vectorize dynamic partitioning in VectorFileSinkOperator
> 
>
> Key: HIVE-8226
> URL: https://issues.apache.org/jira/browse/HIVE-8226
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
> HIVE-8226.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-29 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8196:
-
Attachment: HIVE-8196.5.patch

Not sure why parallel.q is adding and removing POSTHOOK between test runs. 
Anyways trying again to see if the passes this time.

> Joining on partition columns with fetch column stats enabled results it very 
> small CE which negatively affects query performance 
> -
>
> Key: HIVE-8196
> URL: https://issues.apache.org/jira/browse/HIVE-8196
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Blocker
>  Labels: performance
> Fix For: 0.14.0
>
> Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
> HIVE-8196.4.patch, HIVE-8196.5.patch
>
>
> To make the best out of dynamic partition pruning joins should be on the 
> partitioning columns which results in dynamically pruning the partitions from 
> the fact table based on the qualifying column keys from the dimension table, 
> this type of joins negatively effects on cardinality estimates with fetch 
> column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV 
> is set to row count, doing that negatively affects the estimated join 
> selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of 
> partitions incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
> set to row count 
> {code}
>   if (encd.getIsPartitionColOrVirtualCol()) {
> // vitual columns
> colType = encd.getTypeInfo().getTypeName();
> countDistincts = numRows;
> oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=true;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date 
> from store_sales, date_dim 
> where 
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
> date_dim.d_year = 1998;
> {code}
> Plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
>   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ss_sold_date_sk is not null (type: boolean)
>   Statistics: Num rows: 550076554 Data size: 47370018816 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {ss_sold_date_sk}
>   1 {d_date_sk} {d_date}
> keys:
>   0 ss_sold_date_sk (type: int)
>   1 d_date_sk (type: int)
> outputColumnNames: _col22, _col26, _col28
> input vertices:
>   1 Map 2
> Statistics: Num rows: 652 Data size: 66504 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: (_col22 = _col26) (type: boolean)
>   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col28 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 326 Data size: 30644 Basic 
> stats: COMPLETE Column stats: COMPLETE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 326 Data size: 30644 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
> (type: boolean)
>   Statistics: Num rows: 73049 Data

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Summary: ACID : Reading from partitioned bucketed tables has high overhead, 
50% of time is spent in OrcInputFormat.getReader  (was: Reading from 
partitioned bucketed tables has high overhead, 50% of time is spent in 
OrcInputFormat.getReader)

> ACID : Reading from partitioned bucketed tables has high overhead, 50% of 
> time is spent in OrcInputFormat.getReader
> ---
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in 
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace   Sample CountPercentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
>   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
>  Object)  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984   58.046
>   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
> Reporter)   1,983   58.016
>   
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
>   1,891   55.325
>   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
> AcidInputFormat$Options)1,723   50.41
>   hive.common.ValidTxnListImpl.(String) 
> 934 27.326
> conf.Configuration.get(String, String)621 
> 18.169
>  {code}
> Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 15% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample CountPercentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
> 28.613
>org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
> 978 28.613
>   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
> 866 25.336
>  
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
> 866 25.336
> java.net.URI.relativize(URI)  655 19.163
>java.net.URI.relativize(URI, URI)  655 19.163
>   java.net.URI.normalize(String)  517 15.126
>   java.net.URI.needsNormalization(String) 
> 372 10.884
>  java.lang.String.charAt(int) 235 
> 6.875
> 
> java.net.URI.equal(String, String)27  0.79
> 
> java.lang.StringBuilder.toString()1   0.029
> 
> java.lang.StringBuilder.()  1   0.029
> 
> java.lang.StringBuilder.append(String)1   0.029
>   
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
>   4.886
>  
> org.apache.hadoop.fs.Path.(String) 162 4.74
> 
> org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
> 4.74
>   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
>  org.apache.commons.lang.StringUtils.replace(String, String, String)  
> 97  2.838
> org.apache.commons.lang.StringUtils.replace(String, String, 
> String, int)  97  2.838
>java.lang.String.indexOf(String, int)  97  2.838
>   java.net.URI.(String, String, String,

[jira] [Created] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

Mostafa Mokhtar created HIVE-8292:
-

 Summary: Reading from partitioned bucketed tables has high 
overhead in MapOperator.cleanUpInputFileChangedOp
 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Owen O'Malley
 Fix For: 0.14.0


Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.(String, String, String, String, String) 
65  1.902
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Assignee: Prasanth J  (was: Owen O'Malley)

> Reading from partitioned bucketed tables has high overhead in 
> MapOperator.cleanUpInputFileChangedOp
> ---
>
> Key: HIVE-8292
> URL: https://issues.apache.org/jira/browse/HIVE-8292
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Fix For: 0.14.0
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in 
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace   Sample CountPercentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
>   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
>  Object)  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984   58.046
>   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
> Reporter)   1,983   58.016
>   
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
>   1,891   55.325
>   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
> AcidInputFormat$Options)1,723   50.41
>   hive.common.ValidTxnListImpl.(String) 
> 934 27.326
> conf.Configuration.get(String, String)621 
> 18.169
>  {code}
> Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 15% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample CountPercentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
> 28.613
>org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
> 978 28.613
>   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
> 866 25.336
>  
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
> 866 25.336
> java.net.URI.relativize(URI)  655 19.163
>java.net.URI.relativize(URI, URI)  655 19.163
>   java.net.URI.normalize(String)  517 15.126
>   java.net.URI.needsNormalization(String) 
> 372 10.884
>  java.lang.String.charAt(int) 235 
> 6.875
> 
> java.net.URI.equal(String, String)27  0.79
> 
> java.lang.StringBuilder.toString()1   0.029
> 
> java.lang.StringBuilder.()  1   0.029
> 
> java.lang.StringBuilder.append(String)1   0.029
>   
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
>   4.886
>  
> org.apache.hadoop.fs.Path.(String) 162 4.74
> 
> org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
> 4.74
>   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
>  org.apache.commons.lang.StringUtils.replace(String, String, String)  
> 97  2.838
> org.apache.commons.lang.StringUtils.replace(String, String, 
> String, int)  97  2.838
>java.lang.String.indexOf(String, int)  97  2.838
>   java.net.URI.(String, String, String, String, String) 
> 65  1.902
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Attachment: 2014_09_29_14_46_04.jfr

Hot function profile

> Reading from partitioned bucketed tables has high overhead in 
> MapOperator.cleanUpInputFileChangedOp
> ---
>
> Key: HIVE-8292
> URL: https://issues.apache.org/jira/browse/HIVE-8292
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Fix For: 0.14.0
>
> Attachments: 2014_09_29_14_46_04.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 15% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample CountPercentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
> 28.613
>org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
> 978 28.613
>   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
> 866 25.336
>  
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
> 866 25.336
> java.net.URI.relativize(URI)  655 19.163
>java.net.URI.relativize(URI, URI)  655 19.163
>   java.net.URI.normalize(String)  517 15.126
>   java.net.URI.needsNormalization(String) 
> 372 10.884
>  java.lang.String.charAt(int) 235 
> 6.875
> 
> java.net.URI.equal(String, String)27  0.79
> 
> java.lang.StringBuilder.toString()1   0.029
> 
> java.lang.StringBuilder.()  1   0.029
> 
> java.lang.StringBuilder.append(String)1   0.029
>   
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
>   4.886
>  
> org.apache.hadoop.fs.Path.(String) 162 4.74
> 
> org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
> 4.74
>   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
>  org.apache.commons.lang.StringUtils.replace(String, String, String)  
> 97  2.838
> org.apache.commons.lang.StringUtils.replace(String, String, 
> String, int)  97  2.838
>java.lang.String.indexOf(String, int)  97  2.838
>   java.net.URI.(String, String, String, String, String) 
> 65  1.902
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Attachment: 2014_09_28_16_48_48.jfr

Hot function profile.
Use Java mission control (jmc) to open the file, JMC is part of Java 7.

> ACID : Reading from partitioned bucketed tables has high overhead, 50% of 
> time is spent in OrcInputFormat.getReader
> ---
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: 2014_09_28_16_48_48.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in 
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace   Sample CountPercentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
>   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
>  Object)  2,002   58.572
>   
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984   58.046
>   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
> Reporter)   1,983   58.016
>   
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
>   1,891   55.325
>   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
> AcidInputFormat$Options)1,723   50.41
>   hive.common.ValidTxnListImpl.(String) 
> 934 27.326
> conf.Configuration.get(String, String)621 
> 18.169
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}



  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + ":");
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String,

[jira] [Created] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

Selina Zhang created HIVE-8293:
--

 Summary: Metastore direct SQL failed for Oracle becasue ORA-01722: 
invalid number
 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang


The direct SQL route of retrieve partition objects through filters failed for 
Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
PARTITION_KEY_VALs table to decimal before evaluate the condition. 

Here is the stack trace:
{quote}
2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query "select 
"PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on 
"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner 
join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ? inner 
join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
"PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where (((case when 
"TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
as decimal(21,0)) else null end) < ?))".
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152087#comment-14152087
 ] 

Selina Zhang commented on HIVE-8293:


It is easy to reproduce:
{block}
hive> create table a (col string) partitioned by (dt string);
hive> create table b (col string) partitioned by (idx int); 
hive> alter table a add partition(dt='20140808');
hive> alter table b add partition(idx=50);  
hive> select * from b where idx < 10;
{block}

> Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
> 
>
> Key: HIVE-8293
> URL: https://issues.apache.org/jira/browse/HIVE-8293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Selina Zhang
>Assignee: Selina Zhang
>
> The direct SQL route of retrieve partition objects through filters failed for 
> Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
> PARTITION_KEY_VALs table to decimal before evaluate the condition. 
> Here is the stack trace:
> {quote}
> 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner 
> join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ? inner 
> join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) < ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
> at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
> at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.con

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152089#comment-14152089
 ] 

Vikram Dixit K commented on HIVE-7723:
--

+1 for 0.14 once tests pass/commit to trunk.

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 0.14.0
>
> Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
> HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, 
> HIVE-7723.8.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed

[jira] [Commented] (HIVE-8281) NPE with dynamic partition pruning on Tez

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152086#comment-14152086
 ] 

Vikram Dixit K commented on HIVE-8281:
--

+1 for both 0.14 and trunk.

> NPE with dynamic partition pruning on Tez
> -
>
> Key: HIVE-8281
> URL: https://issues.apache.org/jira/browse/HIVE-8281
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8281.1.patch
>
>
> Dynamic partition pruning can generate incorrect query plans during join 
> algorithm selection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7389) Reduce number of metastore calls in MoveTask (when loading dynamic partitions)

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152090#comment-14152090
 ] 

Vikram Dixit K commented on HIVE-7389:
--

+1 for 0.14.

> Reduce number of metastore calls in MoveTask (when loading dynamic partitions)
> --
>
> Key: HIVE-7389
> URL: https://issues.apache.org/jira/browse/HIVE-7389
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
>  Labels: performance
> Fix For: 0.14.0
>
> Attachments: HIVE-7389.1.patch, HIVE-7389.2.patch, 
> local_vm_testcase.txt
>
>
> When the number of dynamic partitions to be loaded are high, the time taken 
> for 'MoveTask' is greater than the actual job in some scenarios.  It would be 
> possible to reduce overall runtime by reducing the number of calls made to 
> metastore from MoveTask operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalize

[jira] [Commented] (HIVE-8204) Dynamic partition pruning fails with IndexOutOfBoundsException

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152091#comment-14152091
 ] 

Vikram Dixit K commented on HIVE-8204:
--

+1 for 0.14.

> Dynamic partition pruning fails with IndexOutOfBoundsException
> --
>
> Key: HIVE-8204
> URL: https://issues.apache.org/jira/browse/HIVE-8204
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Gunther Hagleitner
> Attachments: HIVE-8204.1.patch, HIVE-8204.2.patch
>
>
> Dynamic partition pruning fails with IndexOutOfBounds exception when 
> dimension table is partitioned and fact table is not.
> Steps to reproduce:
> 1) Partition date_dim table from tpcds on d_date_sk
> 2) Fact table is store_sales which is not partitioned
> 3) Run the following
> {code}
> set hive.stats.fetch.column.stats=ture;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date 
> from store_sales, date_dim 
> where 
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
> date_dim.d_year = 1998;
> {code}
> The stack trace is:
> {code}
> 2014-09-19 19:06:16,254 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: IndexOutOfBoundsException Index: 0, Size: 0
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.optimizer.RemoveDynamicPruningBySize.process(RemoveDynamicPruningBySize.java:61)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:61)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:277)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:97)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9781)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1060)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1130)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:987)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327   62.348
   hive.ql.exec.vector.VectorMapOperator.process(Writable)  5,326   62.336
  hive.ql.exec.Operator.cleanUpInputFileChanged()   4,851   56.777
 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   4,849   56.753
 java.net.URI.relativize(URI)   3,903   45.681
java.net.URI.relativize(URI, URI)   3,903   
45.681
   java.net.URI.normalize(String)   2,169   
25.386
   java.net.URI.equal(String, String)   
526 6.156
   java.net.URI.equalIgnoringCase(String, 
String)   1   0.012
   java.lang.String.substring(int)  1   
0.012
hive.ql.exec.MapOperator.normalizePath(String)  506 5.922
org.apache.commons.logging.impl.Log4JLogger.info(Object)32  
0.375
 java.net.URI.equals(Object)12  0.14
 java.util.HashMap$KeySet.iterator()5   
0.059
 java.util.HashMap.get(Object)  4   0.047
 java.util.LinkedHashMap.get(Object)3   
0.035
 hive.ql.exec.Operator.cleanUpInputFileChanged()1   0.012
  hive.ql.exec.Operator.forward(Object, ObjectInspector)473 5.536
  hive.ql.exec.mr.ExecMapperContext.inputFileChanged()  1   0.012
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.(String, String, String, String, String) 
65  1.902
{code}



> Reading from partitioned bucketed tables has high overhead in 
> MapOperator.cleanUpInputFileChangedOp
> --

[jira] [Updated] (HIVE-7857) Hive query fails after Tez session times out

2014-09-29 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7857:
-
Attachment: HIVE-7857.3.patch

Rebased to latest trunk.

> Hive query fails after Tez session times out
> 
>
> Key: HIVE-7857
> URL: https://issues.apache.org/jira/browse/HIVE-7857
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch
>
>
> Originally reported by [~deepesh]
> Steps to reproduce:
> Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar 
> in the path.
> Keep it idle for more than 5 minutes (this is the default tez session 
> timeout). Essentially Tez session should time out.
> Run a Hive on Tez query, the query fails. Here is a sample CLI session:
> {noformat}
> hive> select from_unixtime(unix_timestamp(), "dd-MMM-") from 
> vectortab10korc limit 1;
> Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
> Status: Running (application id: application_1403688364015_1930)
> Map 1: -/-
> Map 1: 0/1
> Map 1: 0/1
> Map 1: 0/1
> Map 1: 0/1
> Map 1: 0/1
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, 
> diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, 
> diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 
> Info:Container container_1403688364015_1930_01_02 COMPLETED with 
> diagnostics set to [Resource 
> hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
>  changed on src filesystem (expected 1403741969169, was 1403742347351
> ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container 
> container_1403688364015_1930_01_03 COMPLETED with diagnostics set to 
> [Resource 
> hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
>  changed on src filesystem (expected 1403741969169, was 1403742347351
> ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container 
> container_1403688364015_1930_01_04 COMPLETED with diagnostics set to 
> [Resource 
> hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
>  changed on src filesystem (expected 1403741969169, was 1403742347351
> ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container 
> container_1403688364015_1930_01_05 COMPLETED with diagnostics set to 
> [Resource 
> hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
>  changed on src filesystem (expected 1403741969169, was 1403742347351
> ]], Vertex failed as one or more tasks failed. failedTasks:1]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:0
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152100#comment-14152100
 ] 

Hive QA commented on HIVE-7776:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671832/HIVE-7776.3-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6510 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/175/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/175/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-175/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671832

> enable sample10.q.[Spark Branch]
> 
>
> Key: HIVE-7776
> URL: https://issues.apache.org/jira/browse/HIVE-7776
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
> HIVE-7776.3-spark.patch, HIVE-7776.3-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest 
> after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152102#comment-14152102
 ] 

Hive QA commented on HIVE-8182:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671800/HIVE-8182.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6368 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1039/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1039/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1039/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671800

> beeline fails when executing multiple-line queries with trailing spaces
> ---
>
> Key: HIVE-8182
> URL: https://issues.apache.org/jira/browse/HIVE-8182
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Yongzhi Chen
>Assignee: Sergio Peña
> Fix For: 0.14.0
>
> Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch
>
>
> As title indicates, when executing a multi-line query with trailing spaces, 
> beeline reports syntax error: 
> Error: Error while compiling statement: FAILED: ParseException line 1:76 
> extraneous input ';' expecting EOF near '' (state=42000,code=4)
> If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-09-29 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5690:
-
Priority: Critical  (was: Minor)

> Support subquery for single sourced multi query
> ---
>
> Key: HIVE-5690
> URL: https://issues.apache.org/jira/browse/HIVE-5690
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
> HIVE-5690.11.patch.txt, HIVE-5690.12.patch.txt, HIVE-5690.13.patch.txt, 
> HIVE-5690.14.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, 
> HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, 
> HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt
>
>
> Single sourced multi (insert) query is very useful for various ETL processes 
> but it does not allow subqueries included. For example, 
> {noformat}
> explain from src 
> insert overwrite table x1 select * from (select distinct key,value) b order 
> by key
> insert overwrite table x2 select * from (select distinct key,value) c order 
> by value;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 >

1 - 100 of 318 matches

Mail list logo