[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=443686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443686
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 10/Jun/20 11:40
Start Date: 10/Jun/20 11:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1031:
URL: https://github.com/apache/hive/pull/1031


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 443686)
Time Spent: 3.5h  (was: 3h 20m)

> Add option to rewrite CUME_DIST to sketch functions
> ---
>
> Key: HIVE-23462
> URL: https://issues.apache.org/jira/browse/HIVE-23462
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23462.01.patch, HIVE-23462.02.patch, 
> HIVE-23462.03.patch, HIVE-23462.04.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=443684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443684
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 10/Jun/20 11:39
Start Date: 10/Jun/20 11:39
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r438057083



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
##
@@ -392,10 +392,11 @@ private static String getName(GenericUDF hiveUDF) {
   registerFunction("istrue", SqlStdOperatorTable.IS_TRUE, 
hToken(HiveParser.Identifier, "istrue"));
   registerFunction("isnotfalse", SqlStdOperatorTable.IS_NOT_FALSE, 
hToken(HiveParser.Identifier, "isnotfalse"));
   registerFunction("isfalse", SqlStdOperatorTable.IS_FALSE, 
hToken(HiveParser.Identifier, "isfalse"));
-  registerFunction("is not distinct from", 
SqlStdOperatorTable.IS_NOT_DISTINCT_FROM, hToken(HiveParser.EQUAL_NS, "<=>"));

Review comment:
   the calcite2hive translation was not working - and thrown an exception
   I'm not sure about hive2calcite ; I've made a note on HIVE-23594 to check 
out what was happening





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 443684)
Time Spent: 3h 20m  (was: 3h 10m)

> Add option to rewrite CUME_DIST to sketch functions
> ---
>
> Key: HIVE-23462
> URL: https://issues.apache.org/jira/browse/HIVE-23462
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23462.01.patch, HIVE-23462.02.patch, 
> HIVE-23462.03.patch, HIVE-23462.04.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=442563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442563
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 03:54
Start Date: 08/Jun/20 03:54
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r436445595



##
File path: ql/src/test/results/clientpositive/llap/cbo_rp_windowing_2.q.out
##
@@ -625,32 +625,32 @@ window w1 as (distribute by p_mfgr sort by p_mfgr, p_name 
rows between 2 precedi
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@part
  A masked pattern was here 
-Manufacturer#1 almond antique burnished rose metallic  2   1   1   
0   0.0 1   2   2.0 0.0 2   2   2
-Manufacturer#1 almond antique burnished rose metallic  2   1   1   
0   0.0 1   2   2.0 0.0 2   2   2
-Manufacturer#1 almond antique chartreuse lavender yellow   34  3   
2   0   0.4 2   3   12.666  
15.084944665313014  2   34  2
-Manufacturer#1 almond antique salmon chartreuse burlywood  6   4   
3   0   0.6 2   4   11.013.379088160259652  2   
6   2
-Manufacturer#1 almond aquamarine burnished black steel 28  5   4   
0   0.8 3   5   14.413.763720427268202  2   28  
34
-Manufacturer#1 almond aquamarine pink moccasin thistle 42  6   5   
1   1.0 3   6   19.016.237815945091466  2   42  
6
-Manufacturer#2 almond antique violet chocolate turquoise   14  1   
1   0   0.0 1   1   14.00.0 4   14  14
-Manufacturer#2 almond antique violet turquoise frosted 40  2   2   
0   0.251   2   27.013.04   40  14
-Manufacturer#2 almond aquamarine midnight light salmon 2   3   3   
0   0.5 2   3   18.668  15.86050300449376   
4   2   14
-Manufacturer#2 almond aquamarine rose maroon antique   25  4   4   
0   0.752   4   20.25   14.00669482783144   4   25  
40
-Manufacturer#2 almond aquamarine sandy cyan gainsboro  18  5   5   
1   1.0 3   5   19.812.560254774486067  4   18  
2
-Manufacturer#3 almond antique chartreuse khaki white   17  1   1   
0   0.0 1   1   17.00.0 2   17  17
-Manufacturer#3 almond antique forest lavender goldenrod14  2   
2   0   0.251   2   15.51.5 2   14  17
-Manufacturer#3 almond antique metallic orange dim  19  3   3   
0   0.5 2   3   16.668  2.0548046676563256  
2   19  17
-Manufacturer#3 almond antique misty red olive  1   4   4   0   
0.752   4   12.75   7.013380069552769   2   1   14
-Manufacturer#3 almond antique olive coral navajo   45  5   5   
1   1.0 3   5   19.214.344336861632886  2   45  
19
-Manufacturer#4 almond antique gainsboro frosted violet 10  1   1   
0   0.0 1   1   10.00.0 0   10  10
-Manufacturer#4 almond antique violet mint lemon39  2   2   
0   0.251   2   24.514.50   39  10
-Manufacturer#4 almond aquamarine floral ivory bisque   27  3   3   
0   0.5 2   3   25.332  11.897712198383164  
0   27  10
-Manufacturer#4 almond aquamarine yellow dodger mint7   4   4   
0   0.752   4   20.75   13.007209539328564  0   7   
39
-Manufacturer#4 almond azure aquamarine papaya violet   12  5   5   
1   1.0 3   5   19.012.149074038789951  0   12  
27
-Manufacturer#5 almond antique blue firebrick mint  31  1   1   
0   0.0 1   1   31.00.0 1   31  31
-Manufacturer#5 almond antique medium spring khaki  6   2   2   
0   0.251   2   18.512.51   6   31
-Manufacturer#5 almond antique sky peru orange  2   3   3   0   
0.5 2   3   13.012.832251036613439  1   2   31
-Manufacturer#5 almond aquamarine dodger light gainsboro46  4   
4   0   0.752   4   21.25   18.102140757380052  1   
46  6
-Manufacturer#5 almond azure blanched chiffon midnight  23  5   5   
1   1.0 3   5   21.6

[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=442562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442562
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 03:53
Start Date: 08/Jun/20 03:53
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r436445371



##
File path: ql/src/test/results/clientpositive/llap/cbo_rp_windowing_2.q.out
##
@@ -625,32 +625,32 @@ window w1 as (distribute by p_mfgr sort by p_mfgr, p_name 
rows between 2 precedi
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@part
  A masked pattern was here 
-Manufacturer#1 almond antique burnished rose metallic  2   1   1   
0   0.0 1   2   2.0 0.0 2   2   2
-Manufacturer#1 almond antique burnished rose metallic  2   1   1   
0   0.0 1   2   2.0 0.0 2   2   2
-Manufacturer#1 almond antique chartreuse lavender yellow   34  3   
2   0   0.4 2   3   12.666  
15.084944665313014  2   34  2
-Manufacturer#1 almond antique salmon chartreuse burlywood  6   4   
3   0   0.6 2   4   11.013.379088160259652  2   
6   2
-Manufacturer#1 almond aquamarine burnished black steel 28  5   4   
0   0.8 3   5   14.413.763720427268202  2   28  
34
-Manufacturer#1 almond aquamarine pink moccasin thistle 42  6   5   
1   1.0 3   6   19.016.237815945091466  2   42  
6
-Manufacturer#2 almond antique violet chocolate turquoise   14  1   
1   0   0.0 1   1   14.00.0 4   14  14
-Manufacturer#2 almond antique violet turquoise frosted 40  2   2   
0   0.251   2   27.013.04   40  14
-Manufacturer#2 almond aquamarine midnight light salmon 2   3   3   
0   0.5 2   3   18.668  15.86050300449376   
4   2   14
-Manufacturer#2 almond aquamarine rose maroon antique   25  4   4   
0   0.752   4   20.25   14.00669482783144   4   25  
40
-Manufacturer#2 almond aquamarine sandy cyan gainsboro  18  5   5   
1   1.0 3   5   19.812.560254774486067  4   18  
2
-Manufacturer#3 almond antique chartreuse khaki white   17  1   1   
0   0.0 1   1   17.00.0 2   17  17
-Manufacturer#3 almond antique forest lavender goldenrod14  2   
2   0   0.251   2   15.51.5 2   14  17
-Manufacturer#3 almond antique metallic orange dim  19  3   3   
0   0.5 2   3   16.668  2.0548046676563256  
2   19  17
-Manufacturer#3 almond antique misty red olive  1   4   4   0   
0.752   4   12.75   7.013380069552769   2   1   14
-Manufacturer#3 almond antique olive coral navajo   45  5   5   
1   1.0 3   5   19.214.344336861632886  2   45  
19
-Manufacturer#4 almond antique gainsboro frosted violet 10  1   1   
0   0.0 1   1   10.00.0 0   10  10
-Manufacturer#4 almond antique violet mint lemon39  2   2   
0   0.251   2   24.514.50   39  10
-Manufacturer#4 almond aquamarine floral ivory bisque   27  3   3   
0   0.5 2   3   25.332  11.897712198383164  
0   27  10
-Manufacturer#4 almond aquamarine yellow dodger mint7   4   4   
0   0.752   4   20.75   13.007209539328564  0   7   
39
-Manufacturer#4 almond azure aquamarine papaya violet   12  5   5   
1   1.0 3   5   19.012.149074038789951  0   12  
27
-Manufacturer#5 almond antique blue firebrick mint  31  1   1   
0   0.0 1   1   31.00.0 1   31  31
-Manufacturer#5 almond antique medium spring khaki  6   2   2   
0   0.251   2   18.512.51   6   31
-Manufacturer#5 almond antique sky peru orange  2   3   3   0   
0.5 2   3   13.012.832251036613439  1   2   31
-Manufacturer#5 almond aquamarine dodger light gainsboro46  4   
4   0   0.752   4   21.25   18.102140757380052  1   
46  6
-Manufacturer#5 almond azure blanched chiffon midnight  23  5   5   
1   1.0 3   5   21.6

[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=442561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442561
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 03:49
Start Date: 08/Jun/20 03:49
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r436444684



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
##
@@ -392,10 +392,11 @@ private static String getName(GenericUDF hiveUDF) {
   registerFunction("istrue", SqlStdOperatorTable.IS_TRUE, 
hToken(HiveParser.Identifier, "istrue"));
   registerFunction("isnotfalse", SqlStdOperatorTable.IS_NOT_FALSE, 
hToken(HiveParser.Identifier, "isnotfalse"));
   registerFunction("isfalse", SqlStdOperatorTable.IS_FALSE, 
hToken(HiveParser.Identifier, "isfalse"));
-  registerFunction("is not distinct from", 
SqlStdOperatorTable.IS_NOT_DISTINCT_FROM, hToken(HiveParser.EQUAL_NS, "<=>"));

Review comment:
   Just to confirm, does this mean that `is not distinct from` was never 
used?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442561)
Time Spent: 2h 50m  (was: 2h 40m)

> Add option to rewrite CUME_DIST to sketch functions
> ---
>
> Key: HIVE-23462
> URL: https://issues.apache.org/jira/browse/HIVE-23462
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23462.01.patch, HIVE-23462.02.patch, 
> HIVE-23462.03.patch, HIVE-23462.04.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=442560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442560
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 03:43
Start Date: 08/Jun/20 03:43
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r436443600



##
File path: 
ql/src/test/queries/clientpositive/sketches_materialized_view_cume_dist.q
##
@@ -0,0 +1,54 @@
+--! qt:transactional
+set hive.fetch.task.conversion=none;
+
+create table sketch_input (id int, category char(1))
+STORED AS ORC
+TBLPROPERTIES ('transactional'='true');
+
+insert into table sketch_input values
+  (1,'a'),(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 
'a'), (8, 'a'), (9, 'a'), (10, 'a'),
+  (6,'b'),(6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 
'b'), (13, 'b'), (14, 'b'), (15, 'b')
+; 
+
+-- create an mv for the intermediate results
+create  materialized view mv_1 as
+  select category,ds_kll_sketch(cast(-id as float)) from sketch_input group by 
category;
+
+-- bi mode on
+set hive.optimize.bi.enabled=true;
+
+explain
+select 'rewrite; mv matching', id, cume_dist() over (order by id) from 
sketch_input order by id;
+select 'rewrite; mv matching', id, cume_dist() over (order by id) from 
sketch_input order by id;

Review comment:
   Yes, that's certainly useful. Something that may help for explain plans 
is to use `explain cbo` as it is definitely less verbose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442560)
Time Spent: 2h 40m  (was: 2.5h)

> Add option to rewrite CUME_DIST to sketch functions
> ---
>
> Key: HIVE-23462
> URL: https://issues.apache.org/jira/browse/HIVE-23462
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23462.01.patch, HIVE-23462.02.patch, 
> HIVE-23462.03.patch, HIVE-23462.04.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=440747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440747
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 03/Jun/20 12:24
Start Date: 03/Jun/20 12:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r434517408



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -368,4 +388,210 @@ void rewrite(AggregateCall aggCall) {
   }
 }
   }
+
+  /**
+   * Generic support for rewriting Windowing expression into a different form 
usually using joins.
+   */
+  private static abstract class WindowingToProjectAggregateJoinProject extends 
RelOptRule {
+
+protected final String sketchType;
+
+public WindowingToProjectAggregateJoinProject(String sketchType) {
+  super(operand(HiveProject.class, any()), HiveRelFactories.HIVE_BUILDER, 
null);
+  this.sketchType = sketchType;
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final Project project = call.rel(0);
+
+  VbuilderPAP vb = buildProcessor(call);
+  RelNode newProject = vb.processProject(project);
+
+  if (newProject == project) {
+return;
+  } else {
+call.transformTo(newProject);
+  }
+}
+
+protected abstract VbuilderPAP buildProcessor(RelOptRuleCall call);
+
+protected static abstract class VbuilderPAP {
+  private final String sketchClass;
+  protected final RelBuilder relBuilder;
+  protected final RexBuilder rexBuilder;
+
+  protected VbuilderPAP(String sketchClass, RelBuilder relBuilder) {
+this.sketchClass = sketchClass;
+this.relBuilder = relBuilder;
+rexBuilder = relBuilder.getRexBuilder();
+  }
+
+  final class ProcessShuttle extends RexShuttle {
+public RexNode visitOver(RexOver over) {
+  return processCall(over);
+}
+  };
+
+  protected final RelNode processProject(Project project) {
+RelNode origInput = project.getInput();
+relBuilder.push(origInput);
+RexShuttle shuttle = new ProcessShuttle();
+List newProjects = new ArrayList();
+for (RexNode expr : project.getChildExps()) {
+  newProjects.add(expr.accept(shuttle));
+}
+if (relBuilder.peek() == origInput) {
+  relBuilder.clear();
+  return project;
+}
+relBuilder.project(newProjects);
+return relBuilder.build();
+  }
+
+  private final RexNode processCall(RexNode expr) {
+if (expr instanceof RexOver) {
+  RexOver over = (RexOver) expr;
+  if (isApplicable(over)) {
+return rewrite(over);
+  }
+}
+return expr;
+  }
+
+  protected final SqlOperator getSqlOperator(String fnName) {
+UDFDescriptor fn = 
DataSketchesFunctions.INSTANCE.getSketchFunction(sketchClass, fnName);
+if (!fn.getCalciteFunction().isPresent()) {
+  throw new RuntimeException(fn.toString() + " doesn't have a Calcite 
function associated with it");
+}
+return fn.getCalciteFunction().get();
+  }
+
+  /**
+   * Do the rewrite for the given expression.
+   *
+   * When this method is invoked the {@link #relBuilder} will only contain 
the current input.
+   * Expectation is to leave the new input there after the method finishes.
+   */
+  abstract RexNode rewrite(RexOver expr);
+
+  abstract boolean isApplicable(RexOver expr);
+
+}
+  }
+
+  public static class CumeDistRewrite extends 
WindowingToProjectAggregateJoinProject {
+
+public CumeDistRewrite(String sketchType) {
+  super(sketchType);
+}
+
+@Override
+protected VbuilderPAP buildProcessor(RelOptRuleCall call) {
+  return new VB(sketchType, call.builder());
+}
+
+private static class VB extends VbuilderPAP {
+
+  protected VB(String sketchClass, RelBuilder relBuilder) {
+super(sketchClass, relBuilder);
+  }
+
+  @Override
+  boolean isApplicable(RexOver over) {
+SqlAggFunction aggOp = over.getAggOperator();
+RexWindow window = over.getWindow();
+if (aggOp.getName().equalsIgnoreCase("cume_dist") && 
window.orderKeys.size() == 1
+&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()) {
+  return true;
+}
+return false;
+  }
+
+  @Override
+  RexNode rewrite(RexOver over) {
+RexWindow w = over.getWindow();
+RexFieldCollation orderKey = w.orderKeys.get(0);
+// we don't really support nulls in aggregate/etc...they are actually 
ignored
+// so some hack w

[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=440746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440746
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 03/Jun/20 12:23
Start Date: 03/Jun/20 12:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r434511996



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2495,19 +2495,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
HIVE_OPTIMIZE_BI_REWRITE_COUNTDISTINCT_ENABLED("hive.optimize.bi.rewrite.countdistinct.enabled",
 true,
 "Enables to rewrite COUNT(DISTINCT(X)) queries to be rewritten to use 
sketch functions."),
-HIVE_OPTIMIZE_BI_REWRITE_COUNT_DISTINCT_SKETCH(
-"hive.optimize.bi.rewrite.countdistinct.sketch", "hll",
+
HIVE_OPTIMIZE_BI_REWRITE_COUNT_DISTINCT_SKETCH("hive.optimize.bi.rewrite.countdistinct.sketch",
 "hll",
 new StringSet("hll"),

Review comment:
   about enabling other sketches for count-distinct: I think they should 
just work - however they might need a little testing; probably more important 
would be to provide some way to change sketch construction 
parameters...actually for our rewrites the sketch type could be considered as 
part of the parameters
   
   opened: HIVE-23600





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440746)
Time Spent: 2h 20m  (was: 2h 10m)

> Add option to rewrite CUME_DIST to sketch functions
> ---
>
> Key: HIVE-23462
> URL: https://issues.apache.org/jira/browse/HIVE-23462
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23462.01.patch, HIVE-23462.02.patch, 
> HIVE-23462.03.patch, HIVE-23462.04.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23462) Add option to rewrite CUME_DIST to sketch functions

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23462?focusedWorklogId=440740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440740
 ]

ASF GitHub Bot logged work on HIVE-23462:
-

Author: ASF GitHub Bot
Created on: 03/Jun/20 12:20
Start Date: 03/Jun/20 12:20
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1031:
URL: https://github.com/apache/hive/pull/1031#discussion_r434513609



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -68,25 +82,32 @@
  *   ⇒ SELECT ds_kll_quantile(ds_kll_sketch(CAST(id AS FLOAT)), 0.2) FROM 
sketch_input;
  *
  *  
+ *  {@code cume_dist() over (order by id)}

Review comment:
   I think these apidoc could be moved to the rewrite-rules - but they also 
have there meaning here as well...maybe move them and add a more brief 
description here?

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2495,19 +2495,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
HIVE_OPTIMIZE_BI_REWRITE_COUNTDISTINCT_ENABLED("hive.optimize.bi.rewrite.countdistinct.enabled",
 true,
 "Enables to rewrite COUNT(DISTINCT(X)) queries to be rewritten to use 
sketch functions."),
-HIVE_OPTIMIZE_BI_REWRITE_COUNT_DISTINCT_SKETCH(
-"hive.optimize.bi.rewrite.countdistinct.sketch", "hll",
+
HIVE_OPTIMIZE_BI_REWRITE_COUNT_DISTINCT_SKETCH("hive.optimize.bi.rewrite.countdistinct.sketch",
 "hll",
 new StringSet("hll"),

Review comment:
   about enabling other sketches for count-distinct: I think they should 
just work - however they might need a little testing; probably more important 
would be to provide some way to change sketch construction 
parameters...actually for our rewrites the sketch type could be considered as 
part of the parameters

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java
##
@@ -165,4 +166,10 @@ protected boolean shouldMergeProject() {
 return false;
   }
 
+  /** Make the method visible */
+  @Override
+  public AggCall aggregateCall(SqlAggFunction aggFunction, boolean distinct, 
boolean approximate, boolean ignoreNulls,

Review comment:
   this method is needed to use the relbuilder to create aggregates;
   the overriden method is protected...and there is no way to access this level 
of detail without exposing it
   

##
File path: ql/src/test/queries/clientpositive/sketches_rewrite_cume_dist.q
##
@@ -0,0 +1,47 @@
+--! qt:transactional
+
+
+create table sketch_input (id int, category char(1))
+STORED AS ORC
+TBLPROPERTIES ('transactional'='true');
+
+insert into table sketch_input values
+  (1,'a'),(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 
'a'), (8, 'a'), (9, 'a'), (10, 'a'),
+  (6,'b'),(6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 
'b'), (13, 'b'), (14, 'b'), (15, 'b')
+; 
+
+select id,cume_dist() over (order by id) from sketch_input;
+
+select id,cume_dist() over (order by id),1.0-ds_kll_cdf(ds, CAST(-id AS FLOAT) 
)[0]

Review comment:
   these commands nicely show the original expression and the rewritten one 
alongside to eachother - it would be nice to also add an assertion that they 
are in the same neightbourhood

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -368,4 +388,210 @@ void rewrite(AggregateCall aggCall) {
   }
 }
   }
+
+  /**
+   * Generic support for rewriting Windowing expression into a different form 
usually using joins.
+   */
+  private static abstract class WindowingToProjectAggregateJoinProject extends 
RelOptRule {
+
+protected final String sketchType;
+
+public WindowingToProjectAggregateJoinProject(String sketchType) {
+  super(operand(HiveProject.class, any()), HiveRelFactories.HIVE_BUILDER, 
null);
+  this.sketchType = sketchType;
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final Project project = call.rel(0);
+
+  VbuilderPAP vb = buildProcessor(call);
+  RelNode newProject = vb.processProject(project);
+
+  if (newProject == project) {
+return;
+  } else {
+call.transformTo(newProject);
+  }
+}
+
+protected abstract VbuilderPAP buildProcessor(RelOptRuleCall call);
+
+protected static abstract class VbuilderPAP {
+  private final String sketchClass;
+  protected final RelBuilder relBuilder;
+  protected final RexBuilder rexBuilder;
+
+  protected VbuilderPAP(String sketchClass, RelBuilder relBuilder) {
+this.sketchClass = sketchClass;
+this.relBuilder = relBuilder;
+rexBuilder = relBuilder.getRexBuilder();
+  }
+
+