[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=766356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766356 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 05/May/22 00:19 Start Date: 05/May/22 00:19 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct URL: https://github.com/apache/hive/pull/2585 Issue Time Tracking --- Worklog Id: (was: 766356) Time Spent: 3h 10m (was: 3h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=762636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762636 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 27/Apr/22 00:23 Start Date: 27/Apr/22 00:23 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-1110372946 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 762636) Time Spent: 3h (was: 2h 50m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=732049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732049 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 24/Feb/22 02:08 Start Date: 24/Feb/22 02:08 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2585: URL: https://github.com/apache/hive/pull/2585#discussion_r813487836 ## File path: ql/src/test/results/clientpositive/llap/partition_distinct_skew.q.out ## @@ -0,0 +1,261 @@ +PREHOOK: query: create table partition_distinct_skew(col1 string, col2 string) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@partition_distinct_skew +POSTHOOK: query: create table partition_distinct_skew(col1 string, col2 string) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@partition_distinct_skew +PREHOOK: query: insert into table partition_distinct_skew values('a', 'b'), ('a', 'a'), ('a', 'b') +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@partition_distinct_skew +POSTHOOK: query: insert into table partition_distinct_skew values('a', 'b'), ('a', 'a'), ('a', 'b') +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@partition_distinct_skew +POSTHOOK: Lineage: partition_distinct_skew.col1 SCRIPT [] +POSTHOOK: Lineage: partition_distinct_skew.col2 SCRIPT [] +PREHOOK: query: select col1, col2 from partition_distinct_skew +PREHOOK: type: QUERY +PREHOOK: Input: default@partition_distinct_skew + A masked pattern was here +POSTHOOK: query: select col1, col2 from partition_distinct_skew +POSTHOOK: type: QUERY +POSTHOOK: Input: default@partition_distinct_skew + A masked pattern was here +a b +a a +a b +PREHOOK: query: explain select col1, count(distinct col2), count(col2) from partition_distinct_skew group by col1 +PREHOOK: type: QUERY +PREHOOK: Input: default@partition_distinct_skew + A masked pattern was here +POSTHOOK: query: explain select col1, count(distinct col2), count(col2) from partition_distinct_skew group by col1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@partition_distinct_skew + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Tez + A masked pattern was here + Edges: +Reducer 2 <- Map 1 (SIMPLE_EDGE) +Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + A masked pattern was here + Vertices: +Map 1 +Map Operator Tree: +TableScan + alias: partition_distinct_skew + Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +expressions: col1 (type: string), col2 (type: string) +outputColumnNames: col1, col2 +Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count(DISTINCT col2), count(col2) + keys: col1 (type: string), col2 (type: string) + minReductionHashAggr: 0.4 + mode: hash + outputColumnNames: _col0, _col1, _col2, _col3 + Statistics: Num rows: 2 Data size: 372 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +key expressions: _col0 (type: string), _col1 (type: string) +null sort order: zz +sort order: ++ +Map-reduce partition columns: _col0 (type: string), _col1 (type: string) +Statistics: Num rows: 2 Data size: 372 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col3 (type: bigint) +Execution mode: vectorized, llap +LLAP IO: all inputs +Reducer 2 +Execution mode: llap +Reduce Operator Tree: + Group By Operator +aggregations: count(DISTINCT KEY._col1:0._col0), count(VALUE._col1) +keys: KEY._col0 (type: string) +mode: partials +outputColumnNames: _col0, _col1, _col2 +Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE Column stats: COMPLETE +Reduce Output Operator + key expressions: _col0 (type: string) + null sort order: z + sort order: + + Map-reduce partition columns: _col0 (type: string) + Statistics:
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731641 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 23/Feb/22 15:28 Start Date: 23/Feb/22 15:28 Worklog Time Spent: 10m Work Description: dengzhhu653 edited a comment on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780 I found something interesting, when I explain `select col1, count(distinct col2) from partition_distinct_skew group by col1;` on master branch, the output is following: ``` Vertices: Map 1 Map Operator Tree: TableScan alias: partition_distinct_skew Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: col1 (type: string), col2 (type: string) outputColumnNames: col1, col2 Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: col1 (type: string), col2 (type: string) minReductionHashAggr: 0.4 mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) null sort order: zz sort order: ++ Map-reduce partition columns: rand() (type: double) Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE ``` The partition column is **rand()** for this case. It's seems we have done something to improve the skew case, though I'm not able to find where the cause locates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 731641) Time Spent: 2h 40m (was: 2.5h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731640 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 23/Feb/22 15:28 Start Date: 23/Feb/22 15:28 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780 I found something interesting, when I explain `select col1, count(distinct col2) from partition_distinct_skew group by col1;` on master branch, the output is following: ``` Vertices: Map 1 Map Operator Tree: TableScan alias: partition_distinct_skew Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: col1 (type: string), col2 (type: string) outputColumnNames: col1, col2 Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: col1 (type: string), col2 (type: string) minReductionHashAggr: 0.4 mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) null sort order: zz sort order: ++ Map-reduce partition columns: rand() (type: double) Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE ``` The partition column is **rand()** for this case. it's seems we have done something to improve the skew case, though I not able to find where the cause locates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 731640) Time Spent: 2.5h (was: 2h 20m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731634 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 23/Feb/22 15:14 Start Date: 23/Feb/22 15:14 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #2585: URL: https://github.com/apache/hive/pull/2585 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 731634) Time Spent: 2h 20m (was: 2h 10m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731535 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 23/Feb/22 13:25 Start Date: 23/Feb/22 13:25 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2585: URL: https://github.com/apache/hive/pull/2585#discussion_r812739484 ## File path: ql/src/test/results/clientpositive/llap/autoColumnStats_7.q.out ## @@ -56,7 +56,7 @@ STAGE PLANS: key expressions: _col0 (type: string), _col1 (type: string) null sort order: zz sort order: ++ - Map-reduce partition columns: _col0 (type: string) + Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Review comment: do you happen to have a directed testcase which were working incorrectly before this patch? I guess it was returning 3 for distinct in case the rows were in the order of: ``` a | b a | a a | b ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 731535) Time Spent: 2h 10m (was: 2h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=729941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729941 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 19/Feb/22 00:17 Start Date: 19/Feb/22 00:17 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2585: URL: https://github.com/apache/hive/pull/2585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 729941) Time Spent: 2h (was: 1h 50m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=724900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724900 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 11/Feb/22 02:32 Start Date: 11/Feb/22 02:32 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #2585: URL: https://github.com/apache/hive/pull/2585 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 724900) Time Spent: 1h 50m (was: 1h 40m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=723311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-723311 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 09/Feb/22 00:18 Start Date: 09/Feb/22 00:18 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2585: URL: https://github.com/apache/hive/pull/2585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 723311) Time Spent: 1h 40m (was: 1.5h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=719102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-719102 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 02/Feb/22 00:13 Start Date: 02/Feb/22 00:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-1027414846 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 719102) Time Spent: 1.5h (was: 1h 20m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=689917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689917 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 03/Dec/21 11:13 Start Date: 03/Dec/21 11:13 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-985433461 > > @dengzhhu653 do you happen to have a testcase for this? > > Not yet, I have tested on our environment for the skew table, shows that it can get pretty performance gain(mr). Hi @kgyrtkirk, what do you think about this? there are also some tests like [groupby11.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby11.q) and [groupby8_map_skew.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby8_map_skew.q) showing the changes in partition columns after applying the fix. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689917) Time Spent: 1h 20m (was: 1h 10m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684646 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 22/Nov/21 12:00 Start Date: 22/Nov/21 12:00 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-975448819 > @dengzhhu653 do you happen to have a testcase for this? Not yet, I have tested on our environment for the skew table, shows that it can get pretty performance gain(mr). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 684646) Time Spent: 1h 10m (was: 1h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684642 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 22/Nov/21 11:45 Start Date: 22/Nov/21 11:45 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-975438145 @dengzhhu653 do you happen to have a testcase for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 684642) Time Spent: 1h (was: 50m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=662753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662753 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 08/Oct/21 14:29 Start Date: 08/Oct/21 14:29 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-938688401 Hey @pgaref, mind taking a look if have secs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 662753) Time Spent: 50m (was: 40m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=658091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658091 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 30/Sep/21 04:05 Start Date: 30/Sep/21 04:05 Worklog Time Spent: 10m Work Description: dengzhhu653 removed a comment on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599 @kasakrisz cloud you please take a look at the changes ? Thanks, Zhihua Deng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 658091) Time Spent: 40m (was: 0.5h) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=649894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-649894 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 13/Sep/21 09:36 Start Date: 13/Sep/21 09:36 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599 @kasakrisz cloud you please take a look at the changes ? Thanks, Zhihua Deng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 649894) Time Spent: 0.5h (was: 20m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=638492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638492 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 17/Aug/21 03:27 Start Date: 17/Aug/21 03:27 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-899965525 Hi @kgyrtkirk @zabetak, cloud you please take a look if have secs? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638492) Time Spent: 20m (was: 10m) > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct
[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=638023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638023 ] ASF GitHub Bot logged work on HIVE-25448: - Author: ASF GitHub Bot Created on: 16/Aug/21 02:35 Start Date: 16/Aug/21 02:35 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #2585: URL: https://github.com/apache/hive/pull/2585 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638023) Remaining Estimate: 0h Time Spent: 10m > Invalid partition columns when skew with distinct > - > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)