[jira] [Updated] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20304:
-
Description: 
When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
and the execution engine is set to mr, same stage may launch twice due to the 
wrong generated plan. If hive.exec.parallel is also true, the job will failed 
due to the first completed stage clear the map.xml/reduce.xml.

use following sql to reproduce the issue:


{code:java}
CREATE TABLE `tbl1`(
  `fence` string);

CREATE TABLE `tbl2`(
  `order_id` string,
  `phone` string,
  `search_id` string
)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl3`(
  `order_id` string,
  `platform` string)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl4`(
  `groupname` string,
  `phone` string)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl5`(
  `search_id` string,
  `fence` string)
PARTITIONED BY (
  `dt` string);

SET hive.exec.parallel = TRUE;

SET hive.auto.convert.join = TRUE;

SET hive.optimize.skewjoin = TRUE;


SELECT dt,
   platform,
   groupname,
   count(1) as cnt
FROM
(SELECT dt,
platform,
groupname
 FROM
 (SELECT fence
  FROM tbl1)ta
   JOIN
   (SELECT a0.dt,
   a1.platform,
   a2.groupname,
   a3.fence
FROM
(SELECT dt,
order_id,
phone,
search_id
 FROM tbl2
 WHERE dt =20180703 )a0
  JOIN
  (SELECT order_id,
  platform,
  dt
   FROM tbl3
   WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
  INNER JOIN
  (SELECT groupname,
  phone,
  dt
   FROM tbl4
   WHERE dt =20180703 )a2 ON a0.phone = a2.phone
  LEFT JOIN
  (SELECT search_id,
  fence,
  dt
   FROM tbl5
   WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
ta.fence = t0.fence)t11
GROUP BY dt,
 platform,
 groupname;

DROP TABLE tbl1;
DROP TABLE tbl2;
DROP TABLE tbl3;
DROP TABLE tbl4;
DROP TABLE tbl5;

{code}

you will get error message like this:

Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
job_1531284442065_3637

Task with the most failures(4):
-
Task ID:
  task_1531284442065_3637_m_00

URL:
  
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00
-
Diagnostic Messages for this Task:
File does not exist: 
hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
java.io.FileNotFoundException: File does not exist: 
hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml

When I check the plan by executing explain, I found that the Stage-4 and 
Stage-5 can reached from multi root tasks, it is the reason to this issue.


{code:java}
Explain
STAGE DEPENDENCIES:
  Stage-21 is a root stage , consists of Stage-34, Stage-5
  Stage-34 has a backup stage: Stage-5
  Stage-20 depends on stages: Stage-34
  Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
Stage-32, Stage-33, Stage-1
  Stage-32 has a backup stage: Stage-1
  Stage-15 depends on stages: Stage-32
  Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
Stage-31, Stage-2
  Stage-31
  Stage-9 depends on stages: Stage-31
  Stage-2 depends on stages: Stage-9
  Stage-33 has a backup stage: Stage-1
  Stage-16 depends on stages: Stage-33
  Stage-1
  Stage-5
  Stage-27 is a root stage , consists of Stage-37, Stage-38, Stage-4
  Stage-37 has a backup stage: Stage-4
  Stage-25 depends on stages: Stage-37
  Stage-12 depends on stages: Stage-4, Stage-22, Stage-23, Stage-25, Stage-26 , 
consists of Stage-36, Stage-5
  Stage-36
  Stage-11 depends on stages: Stage-36
  Stage-19 depends on stages: Stage-11 , consists of Stage-35, Stage-5
  Stage-35 has a backup stage: Stage-5
  Stage-18 depends on stages: Stage-35
  Stage-38 has a backup stage: Stage-4
  Stage-26 depends on stages: Stage-38
  Stage-4
  Stage-30 is a root stage , consists of Stage-42, Stage-43, Stage-3
  Stage-42 has a backup stage: Stage-3
  Stage-28 depends on stages: Stage-42
  Stage-14 depends on stages: Stage-3, Stage-28, Stage-29 , consists of 
Stage-41, Stage-4
  Stage-41
  Stage-13 depends on stages: Stage-41
  Stage-24 depends on stages: Stage-13 , consists of Stage-39, Stage-40, Stage-4
  Stage-39 has a 

[jira] [Assigned] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang reassigned HIVE-20304:



> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage may launch twice due to the 
> wrong generated plan. If hive.exec.parallel is also true, the job will failed 
> due to the first completed stage clear the map.xml/reduce.xml.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> you will get error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> When I check the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks, it is the reason to this issue.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on stages: Stage-31
>   Stage-2 depends on stages: Stage-9
>   Stage-33 has a backup stage: Stage-1
>   Stage-16 depends on stages: Stage-33
>   Stage-1
>   Stage-5
>   Stage-27 is a root stage , consists of Stage-37, 

[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-02 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Attachment: HIVE-20284.3.patch

> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, 
> HIVE-20284.3.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error message is confusing because the expression 
> concat(year,month)='201807' is a partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-02 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: Patch Available  (was: In Progress)

> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, 
> HIVE-20284.3.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error message is confusing because the expression 
> concat(year,month)='201807' is a partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-02 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: In Progress  (was: Patch Available)

> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error message is confusing because the expression 
> concat(year,month)='201807' is a partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566257#comment-16566257
 ] 

Hui Huang commented on HIVE-20284:
--

The 4 failed tests are irrelevant.

> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error message is confusing because the expression 
> concat(year,month)='201807' is a partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Description: 
In strict mode and the hive.optimize.constant.propagation is set to true, the 
following sql will failed:

{code:java}
hive> desc employee_part;
OK
col_namedata_type   comment
eid int
namestring
deptstring
yearstring
month   string

# Partition Information
# col_name  data_type   comment

yearstring
month   string
Time taken: 0.564 seconds, Fetched: 11 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from employee_part where false and concat(year,month)='201807';
FAILED: SemanticException Queries against partitioned tables without a 
partition filter are disabled for safety reasons. If you know what you are 
doing, please sethive.strict.checks.large.query to false and that 
hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
errors or incorrect results if you make a mistake while using some of the 
unsafe features. No partition predicate for Alias "employee_part" Table 
"employee_part"
{code}

The above error message is confusing because the expression 
concat(year,month)='201807' is a partition filter。

The reason is during logic optimization, the ConstantPropagate optimizer is 
running before partitionPruner optimizer, when found a express like 'false and 
concat(year,month)=', the express is replaced with 'fasle' , and the 
partition filter is dropped. So the PartitionPruner can not get the partition 
filter.

Users can remove the constant express that always has true/false values to work 
around.

When views used, if some columns are constant values, users  will be  confusing.

So we should add some more message in the error msg returned.


  was:
In strict mode and the hive.optimize.constant.propagation is set to true, the 
following sql will failed:

{code:java}
hive> desc employee_part;
OK
col_namedata_type   comment
eid int
namestring
deptstring
yearstring
month   string

# Partition Information
# col_name  data_type   comment

yearstring
month   string
Time taken: 0.564 seconds, Fetched: 11 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from employee_part where false and concat(year,month)='201807';
FAILED: SemanticException Queries against partitioned tables without a 
partition filter are disabled for safety reasons. If you know what you are 
doing, please sethive.strict.checks.large.query to false and that 
hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
errors or incorrect results if you make a mistake while using some of the 
unsafe features. No partition predicate for Alias "employee_part" Table 
"employee_part"
{code}

The above error msg is confusing,  concat(year,month)='201807' is the partition 
filter。

The reason is during logic optimization, the ConstantPropagate optimizer is 
running before partitionPruner optimizer, when found a express like 'false and 
concat(year,month)=', the express is replaced with 'fasle' , and the 
partition filter is dropped. So the PartitionPruner can not get the partition 
filter.

Users can remove the constant express that always has true/false values to work 
around.

When views used, if some columns are constant values, users  will be  confusing.

So we should add some more message in the error msg returned.



> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  

[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter may be folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Summary: In strict mode, if constant propagation is enable, the partition 
filter may be folded before partition pruner lead to error "No partition 
predicate for Alias"(was: In strict mode, if constant propagation is 
enable, the partition filter is folded before partition pruner lead to error 
"No partition predicate for Alias"  )

> In strict mode, if constant propagation is enable, the partition filter may 
> be folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Attachment: HIVE-20284.2.patch

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: Patch Available  (was: In Progress)

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.2.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: In Progress  (was: Patch Available)

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: Patch Available  (was: In Progress)

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Attachment: HIVE-20284.1.patch

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-08-01 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
   Fix Version/s: 4.0.0
Target Version/s: 2.3.3, 4.0.0  (was: 2.3.3)
  Status: In Progress  (was: Patch Available)

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3, 4.0.0
>
> Attachments: HIVE-20284.1.patch, HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Status: Patch Available  (was: Open)

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3
>
> Attachments: HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564771#comment-16564771
 ] 

Hui Huang commented on HIVE-20284:
--

The patch add some message into the returned error string.

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3
>
> Attachments: HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Attachment: HIVE-20284.patch

> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3
>
> Attachments: HIVE-20284.patch
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express is replaced with 'fasle' , and the 
> partition filter is dropped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20284:
-
Description: 
In strict mode and the hive.optimize.constant.propagation is set to true, the 
following sql will failed:

{code:java}
hive> desc employee_part;
OK
col_namedata_type   comment
eid int
namestring
deptstring
yearstring
month   string

# Partition Information
# col_name  data_type   comment

yearstring
month   string
Time taken: 0.564 seconds, Fetched: 11 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from employee_part where false and concat(year,month)='201807';
FAILED: SemanticException Queries against partitioned tables without a 
partition filter are disabled for safety reasons. If you know what you are 
doing, please sethive.strict.checks.large.query to false and that 
hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
errors or incorrect results if you make a mistake while using some of the 
unsafe features. No partition predicate for Alias "employee_part" Table 
"employee_part"
{code}

The above error msg is confusing,  concat(year,month)='201807' is the partition 
filter。

The reason is during logic optimization, the ConstantPropagate optimizer is 
running before partitionPruner optimizer, when found a express like 'false and 
concat(year,month)=', the express is replaced with 'fasle' , and the 
partition filter is dropped. So the PartitionPruner can not get the partition 
filter.

Users can remove the constant express that always has true/false values to work 
around.

When views used, if some columns are constant values, users  will be  confusing.

So we should add some more message in the error msg returned.


  was:
In strict mode and the hive.optimize.constant.propagation is set to true, the 
following sql will failed:

{code:java}
hive> desc employee_part;
OK
col_namedata_type   comment
eid int
namestring
deptstring
yearstring
month   string

# Partition Information
# col_name  data_type   comment

yearstring
month   string
Time taken: 0.564 seconds, Fetched: 11 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from employee_part where false and concat(year,month)='201807';
FAILED: SemanticException Queries against partitioned tables without a 
partition filter are disabled for safety reasons. If you know what you are 
doing, please sethive.strict.checks.large.query to false and that 
hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
errors or incorrect results if you make a mistake while using some of the 
unsafe features. No partition predicate for Alias "employee_part" Table 
"employee_part"
{code}

The above error msg is confusing,  concat(year,month)='201807' is the partition 
filter。

The reason is during logic optimization, the ConstantPropagate optimizer is 
running before partitionPruner optimizer, when found a express like 'false and 
concat(year,month)=', the express will replace with 'fasle' and the 
partition filter is droped. So the PartitionPruner can not get the partition 
filter.

Users can remove the constant express that always has true/false values to work 
around.

When views used, if some columns are constant values, users  will be  confusing.

So we should add some more message in the error msg returned.



> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> 

[jira] [Assigned] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang reassigned HIVE-20284:



> In strict mode, if constant propagation is enable, the partition filter is 
> folded before partition pruner lead to error "No partition predicate for 
> Alias"  
> 
>
> Key: HIVE-20284
> URL: https://issues.apache.org/jira/browse/HIVE-20284
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Trivial
> Fix For: 2.3.3
>
>
> In strict mode and the hive.optimize.constant.propagation is set to true, the 
> following sql will failed:
> {code:java}
> hive> desc employee_part;
> OK
> col_name  data_type   comment
> eid   int
> name  string
> dept  string
> year  string
> month string
> # Partition Information
> # col_namedata_type   comment
> year  string
> month string
> Time taken: 0.564 seconds, Fetched: 11 row(s)
> hive> set hive.mapred.mode=strict;
> hive> select * from employee_part where false and concat(year,month)='201807';
> FAILED: SemanticException Queries against partitioned tables without a 
> partition filter are disabled for safety reasons. If you know what you are 
> doing, please sethive.strict.checks.large.query to false and that 
> hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
> errors or incorrect results if you make a mistake while using some of the 
> unsafe features. No partition predicate for Alias "employee_part" Table 
> "employee_part"
> {code}
> The above error msg is confusing,  concat(year,month)='201807' is the 
> partition filter。
> The reason is during logic optimization, the ConstantPropagate optimizer is 
> running before partitionPruner optimizer, when found a express like 'false 
> and concat(year,month)=', the express will replace with 'fasle' and the 
> partition filter is droped. So the PartitionPruner can not get the partition 
> filter.
> Users can remove the constant express that always has true/false values to 
> work around.
> When views used, if some columns are constant values, users  will be  
> confusing.
> So we should add some more message in the error msg returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2018-04-28 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Open  (was: Patch Available)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.1.0
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.2.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2018-04-28 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Patch Available  (was: Open)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.1.0
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.2.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2018-04-28 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Attachment: HIVE-18265.2.patch

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.1.0
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.2.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2018-04-28 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Target Version/s: 3.1.0  (was: 3.0.0)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.1.0
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2018-04-28 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Affects Version/s: (was: 3.0.0)
   3.1.0

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.1.0
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-19 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Affects Version/s: 1.2.1

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-18 Thread Hui Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296062#comment-16296062
 ] 

Hui Huang edited comment on HIVE-18265 at 12/19/17 2:45 AM:


[~asherman] could you take a look when you have time? I have ran the test cases 
without added codes, those failed test cases are still appeared, so I don't 
think these failed testcase is related. Thanks!


was (Author: bigrey):
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asherman could you 
take a look when you have time? I have ran the test cases without added codes, 
those failed test cases are still appeared, so I don't think these failed 
testcase is related. Thanks!

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-18 Thread Hui Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296062#comment-16296062
 ] 

Hui Huang commented on HIVE-18265:
--

https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asherman could you 
take a look when you have time? I have ran the test cases without added codes, 
those failed test cases are still appeared, so I don't think these failed 
testcase is related. Thanks!

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-15 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Patch Available  (was: Open)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-15 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18265 stopped by Hui Huang.

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-15 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: In Progress  (was: Patch Available)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-14 Thread Hui Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292018#comment-16292018
 ] 

Hui Huang commented on HIVE-18265:
--

The failed tests is irrelevant.

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-14 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Patch Available  (was: Open)

Add some test cases.

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-14 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Attachment: HIVE-18265.1.patch

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-14 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Open  (was: Patch Available)

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.1.patch, HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-13 Thread Hui Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290231#comment-16290231
 ] 

Hui Huang commented on HIVE-18265:
--

Ok, I'll add the test cases today.

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-12 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Status: Patch Available  (was: Open)

Hi, all~ 
I've tried the following methods:
1. Modify the HiveLexer.g and HiveParser.g, but failed
2. Replace tab character with a space character
3. Check the comment during semantic analyzing and throw semantic exception

At last, I took the third one. 

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-12 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-18265:
-
Attachment: HIVE-18265.patch

> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
> Attachments: HIVE-18265.patch
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18265) desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

2017-12-12 Thread Hui Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang reassigned HIVE-18265:



> desc formatted/extended or show create table can not fully display the result 
> when field or table comment contains tab character
> 
>
> Key: HIVE-18265
> URL: https://issues.apache.org/jira/browse/HIVE-18265
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.0.0
>Reporter: Hui Huang
>Assignee: Hui Huang
> Fix For: 3.0.0
>
>
> Here are some examples:
> create table test_comment (id1 string comment 'full_\tname1', id2 string 
> comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;
> When execute `show create table test_comment`, we can see the following 
> content in the console,
> {quote}
> createtab_stmt
> CREATE TABLE `test_comment`(
>   `id1` string COMMENT 'full_
>   `id2` string COMMENT 'full_
>   `id3` string COMMENT 'full_
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1513095570')
> {quote}
> And the output of `desc formatted table ` is a little similar,
> {quote}
> col_name  data_type   comment
> \# col_name   data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> \# Detailed Table Information
> (ignore)...
> {quote}
> When execute `desc extended test_comment`, the problem is more obvious,
> {quote}
> col_name  data_type   comment
> id1   string  full_
> id2   string  full_
> id3   string  full_
> Detailed Table InformationTable(tableName:test_comment, 
> dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
> comment:full_name1), FieldSchema(name:id2, type:string, comment:full_
> {quote}
> *the rest of the content is lost*.
> The content is not really lost, it's just can not display normal. Because 
> hive store the result in LazyStruct, and LazyStruct use '\t' as field 
> separator:
> {code:java}
> // LazyStruct.java#parse()
> // Go through all bytes in the byte[]
> while (fieldByteEnd <= structByteEnd) {
>   if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
> // Reached the end of a field?
> if (lastColumnTakesRest && fieldId == fields.length - 1) {
>   fieldByteEnd = structByteEnd;
> }
> startPosition[fieldId] = fieldByteBegin;
> fieldId++;
> if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
>   // All fields have been parsed, or bytes have been parsed.
>   // We need to set the startPosition of fields.length to ensure we
>   // can use the same formula to calculate the length of each field.
>   // For missing fields, their starting positions will all be the 
> same,
>   // which will make their lengths to be -1 and uncheckedGetField will
>   // return these fields as NULLs.
>   for (int i = fieldId; i <= fields.length; i++) {
> startPosition[i] = fieldByteEnd + 1;
>   }
>   break;
> }
> fieldByteBegin = fieldByteEnd + 1;
> fieldByteEnd++;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-17 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063320#comment-15063320
 ] 

Hui Zheng commented on HIVE-11531:
--

Hi [~sershe] and [~prasanth_j]
The union9 is good in my local machine.Could you give more details about how to 
reproduce it?
{code}
mvn test -Dtest=TestCliDriver -Dqfile=union9.q,offset_limit.q
..
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-it-qfile-spark ---
[INFO] Compiling 3 source files to 
/Users/huzheng/git/hive/itests/qtest-spark/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-it-qfile-spark 
---
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Hive Integration - Parent . SUCCESS [1.191s]
[INFO] Hive Integration - Custom Serde ... SUCCESS [1.955s]
[INFO] Hive Integration - HCatalog Unit Tests  SUCCESS [2.974s]
[INFO] Hive Integration - Testing Utilities .. SUCCESS [2.312s]
[INFO] Hive Integration - Unit Tests . SUCCESS [4.672s]
[INFO] Hive Integration - Test Serde . SUCCESS [0.326s]
[INFO] Hive Integration - QFile Tests  SUCCESS [1:35.067s]
[INFO] Hive Integration - QFile Accumulo Tests ... SUCCESS [2.523s]
[INFO] JMH benchmark: Hive ... SUCCESS [0.399s]
[INFO] Hive Integration - Unit Tests - Hadoop 2 .. SUCCESS [1.358s]
[INFO] Hive Integration - Unit Tests with miniKdc  SUCCESS [1.407s]
[INFO] Hive Integration - QFile Spark Tests .. SUCCESS [3.650s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:58.710s
[INFO] Finished at: Fri Dec 18 11:18:28 JST 2015
[INFO] Final Memory: 164M/875M
[INFO] 
{code}

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-07 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.07.patch

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-07 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046273#comment-15046273
 ] 

Hui Zheng commented on HIVE-11531:
--

The HIVE-11531.07.patch has passed those related limit and offset in local 
machine.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-04 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.05.patch

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-01 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035243#comment-15035243
 ] 

Hui Zheng commented on HIVE-11531:
--

Yes, I have correct it in HIVE-11531.03.patch .
But it seems it will take so long time to be tested by Hive QA.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-30 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.03.patch

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-24 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.02.patch

Thanks [~sershe] and [~jcamachorodriguez]
 I updated the patch.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-24 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026010#comment-15026010
 ] 

Hui Zheng commented on HIVE-11531:
--

Thanks [~jcamachorodriguez]
I have implemented 
{code}
LIMIT n OFFSET skip
{code}


> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-09 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.patch

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-21 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.WIP.2.patch

Hi [~sershe]
I updated the patch.
Next I will look into VectorLimitOperator,GlobalLimitOptimizer and 
LimitPushdownOptimizer.Maybe you can give me some advice.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-15 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958715#comment-14958715
 ] 

Hui Zheng commented on HIVE-11531:
--

Thanks [~sershe] and [~jcamachorodriguez] for your instructions
I will continue on it.
 

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-13 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng updated HIVE-11531:
-
Attachment: HIVE-11531.WIP.1.patch

Hi [~sershe]
To get some opinion from you I uploaded a patch which implemented the 
mysql-style LIMIT simply but isn't completely finished.
Next I will implement it with CBO and research how to improve the 
Optimizers(GroupByOptimizer and GlobalLimitOptimizer) .At last I will do more 
tests.


> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-08-26 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713004#comment-14713004
 ] 

Hui Zheng commented on HIVE-11531:
--

Hi [~sershe]
I'd like to work on this jira.

 Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
 -

 Key: HIVE-11531
 URL: https://issues.apache.org/jira/browse/HIVE-11531
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Hui Zheng

 For any UIs that involve pagination, it is useful to issue queries in the 
 form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
 paginated (which can be extremely large by itself). At present, ROW_NUMBER 
 can be used to achieve this effect, but optimizations for LIMIT such as TopN 
 in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
 skip to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2