[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=766356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766356
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 05/May/22 00:19
Start Date: 05/May/22 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2585: 
HIVE-25448: Invalid partition columns when skew with distinct
URL: https://github.com/apache/hive/pull/2585




Issue Time Tracking
---

Worklog Id: (was: 766356)
Time Spent: 3h 10m  (was: 3h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=762636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762636
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 27/Apr/22 00:23
Start Date: 27/Apr/22 00:23
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1110372946

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 762636)
Time Spent: 3h  (was: 2h 50m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=732049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732049
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 24/Feb/22 02:08
Start Date: 24/Feb/22 02:08
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2585:
URL: https://github.com/apache/hive/pull/2585#discussion_r813487836



##
File path: ql/src/test/results/clientpositive/llap/partition_distinct_skew.q.out
##
@@ -0,0 +1,261 @@
+PREHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@partition_distinct_skew
+PREHOOK: query: insert into table partition_distinct_skew values('a', 'b'), 
('a', 'a'), ('a', 'b')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: insert into table partition_distinct_skew values('a', 'b'), 
('a', 'a'), ('a', 'b')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@partition_distinct_skew
+POSTHOOK: Lineage: partition_distinct_skew.col1 SCRIPT []
+POSTHOOK: Lineage: partition_distinct_skew.col2 SCRIPT []
+PREHOOK: query: select col1, col2 from partition_distinct_skew
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+ A masked pattern was here 
+POSTHOOK: query: select col1, col2 from partition_distinct_skew
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+ A masked pattern was here 
+a  b
+a  a
+a  b
+PREHOOK: query: explain select col1, count(distinct col2), count(col2) from 
partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+ A masked pattern was here 
+POSTHOOK: query: explain select col1, count(distinct col2), count(col2) from 
partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+ A masked pattern was here 
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (SIMPLE_EDGE)
+Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1 
+Map Operator Tree:
+TableScan
+  alias: partition_distinct_skew
+  Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Select Operator
+expressions: col1 (type: string), col2 (type: string)
+outputColumnNames: col1, col2
+Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
+Group By Operator
+  aggregations: count(DISTINCT col2), count(col2)
+  keys: col1 (type: string), col2 (type: string)
+  minReductionHashAggr: 0.4
+  mode: hash
+  outputColumnNames: _col0, _col1, _col2, _col3
+  Statistics: Num rows: 2 Data size: 372 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Reduce Output Operator
+key expressions: _col0 (type: string), _col1 (type: 
string)
+null sort order: zz
+sort order: ++
+Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
+Statistics: Num rows: 2 Data size: 372 Basic stats: 
COMPLETE Column stats: COMPLETE
+value expressions: _col3 (type: bigint)
+Execution mode: vectorized, llap
+LLAP IO: all inputs
+Reducer 2 
+Execution mode: llap
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(DISTINCT KEY._col1:0._col0), 
count(VALUE._col1)
+keys: KEY._col0 (type: string)
+mode: partials
+outputColumnNames: _col0, _col1, _col2
+Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE 
Column stats: COMPLETE
+Reduce Output Operator
+  key expressions: _col0 (type: string)
+  null sort order: z
+  sort order: +
+  Map-reduce partition columns: _col0 (type: string)
+  Statistics: 

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731641
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 23/Feb/22 15:28
Start Date: 23/Feb/22 15:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 edited a comment on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780


   I found something interesting, when I explain `select col1, count(distinct 
col2) from partition_distinct_skew group by col1;` on master branch,  the 
output is following:
   ```
 Vertices:
   Map 1
   Map Operator Tree:
   TableScan
 alias: partition_distinct_skew
 Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: col1 (type: string), col2 (type: string)
   outputColumnNames: col1, col2
   Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
   Group By Operator
 keys: col1 (type: string), col2 (type: string)
 minReductionHashAggr: 0.4
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: _col0 (type: string), _col1 (type: 
string)
   null sort order: zz
   sort order: ++
   Map-reduce partition columns: rand() (type: double)
   Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
   ```
   The partition column is **rand()** for this case. It's seems we have done 
something to improve the skew case, though I'm not able to find where the cause 
locates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 731641)
Time Spent: 2h 40m  (was: 2.5h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731640
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 23/Feb/22 15:28
Start Date: 23/Feb/22 15:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780


   
   I found something interesting, when I explain `select col1, count(distinct 
col2) from partition_distinct_skew group by col1;` on master branch,  the 
output is following:
   ```
 Vertices:
   Map 1
   Map Operator Tree:
   TableScan
 alias: partition_distinct_skew
 Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: col1 (type: string), col2 (type: string)
   outputColumnNames: col1, col2
   Statistics: Num rows: 3 Data size: 510 Basic stats: 
COMPLETE Column stats: COMPLETE
   Group By Operator
 keys: col1 (type: string), col2 (type: string)
 minReductionHashAggr: 0.4
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: _col0 (type: string), _col1 (type: 
string)
   null sort order: zz
   sort order: ++
   Map-reduce partition columns: rand() (type: double)
   Statistics: Num rows: 2 Data size: 340 Basic stats: 
COMPLETE Column stats: COMPLETE
   ```
   The partition column is **rand()** for this case. it's seems we have done 
something to improve the skew case, though I not able to find where the cause 
locates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 731640)
Time Spent: 2.5h  (was: 2h 20m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731634
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 23/Feb/22 15:14
Start Date: 23/Feb/22 15:14
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 731634)
Time Spent: 2h 20m  (was: 2h 10m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731535
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 23/Feb/22 13:25
Start Date: 23/Feb/22 13:25
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2585:
URL: https://github.com/apache/hive/pull/2585#discussion_r812739484



##
File path: ql/src/test/results/clientpositive/llap/autoColumnStats_7.q.out
##
@@ -56,7 +56,7 @@ STAGE PLANS:
   key expressions: _col0 (type: string), _col1 (type: 
string)
   null sort order: zz
   sort order: ++
-  Map-reduce partition columns: _col0 (type: string)
+  Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)

Review comment:
   do you happen to have a directed testcase which were working incorrectly 
before this patch?
   
   I guess it was returning 3 for distinct in case the rows were in the order 
of:
   ```
   a | b
   a | a
   a | b
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 731535)
Time Spent: 2h 10m  (was: 2h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=729941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729941
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 19/Feb/22 00:17
Start Date: 19/Feb/22 00:17
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 729941)
Time Spent: 2h  (was: 1h 50m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=724900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724900
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 11/Feb/22 02:32
Start Date: 11/Feb/22 02:32
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 724900)
Time Spent: 1h 50m  (was: 1h 40m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=723311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-723311
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 09/Feb/22 00:18
Start Date: 09/Feb/22 00:18
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 723311)
Time Spent: 1h 40m  (was: 1.5h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2022-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=719102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-719102
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 02/Feb/22 00:13
Start Date: 02/Feb/22 00:13
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1027414846


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 719102)
Time Spent: 1.5h  (was: 1h 20m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=689917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689917
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 03/Dec/21 11:13
Start Date: 03/Dec/21 11:13
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-985433461


   > > @dengzhhu653 do you happen to have a testcase for this?
   > 
   > Not yet, I have tested on our environment for the skew table, shows that 
it can get pretty performance gain(mr).
   
   Hi @kgyrtkirk, what do you think about this? there are also some tests like 
[groupby11.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby11.q)
 and 
[groupby8_map_skew.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby8_map_skew.q)
 showing the changes in partition columns after applying the fix. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 689917)
Time Spent: 1h 20m  (was: 1h 10m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684646
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 12:00
Start Date: 22/Nov/21 12:00
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975448819


   > @dengzhhu653 do you happen to have a testcase for this?
   
   Not yet, I have tested on our environment for the skew table, shows that it 
can get pretty performance gain(mr).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684646)
Time Spent: 1h 10m  (was: 1h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684642
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:45
Start Date: 22/Nov/21 11:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975438145


   @dengzhhu653 do you happen to have a testcase for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684642)
Time Spent: 1h  (was: 50m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=662753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662753
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 08/Oct/21 14:29
Start Date: 08/Oct/21 14:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-938688401


   Hey @pgaref, mind taking a look if have secs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 662753)
Time Spent: 50m  (was: 40m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=658091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658091
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 04:05
Start Date: 30/Sep/21 04:05
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 removed a comment on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599


   @kasakrisz cloud you please take a look at the changes ? 
   Thanks,
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658091)
Time Spent: 40m  (was: 0.5h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=649894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-649894
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 13/Sep/21 09:36
Start Date: 13/Sep/21 09:36
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599


   @kasakrisz cloud you please take a look at the changes ? 
   Thanks,
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 649894)
Time Spent: 0.5h  (was: 20m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=638492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638492
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 03:27
Start Date: 17/Aug/21 03:27
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-899965525


   Hi @kgyrtkirk @zabetak, cloud you please take a look if have secs? 
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638492)
Time Spent: 20m  (was: 10m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-08-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=638023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638023
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 16/Aug/21 02:35
Start Date: 16/Aug/21 02:35
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638023)
Remaining Estimate: 0h
Time Spent: 10m

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)