[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=439083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439083
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 30/May/20 14:08
Start Date: 30/May/20 14:08
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #930:
URL: https://github.com/apache/hive/pull/930


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439083)
Time Spent: 2h 40m  (was: 2.5h)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch, 
> HIVE-22940.03.patch, HIVE-22940.04.patch, HIVE-22940.05.patch, 
> HIVE-22940.06.patch, HIVE-22940.07.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=402471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402471
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 12/Mar/20 20:38
Start Date: 12/Mar/20 20:38
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391881328
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = 
"sketchtonumberofretainedentries";
+  private static final String SKETCH_TO_QUANTILES_SKETCH = 
"sketchToQuantilesSketch";
+  private static final String SKETCH_TO_VALUES = "sketchToValues";
+  private static final String SKETCH_TO_VARIANCES = "sketchToVariances";
+  private static final String SKETCH_TO_PERCENTILE = "sketchToPercentile";
+  private static final String UNION_SKETCH1 = "unionSketch1";
+  private static final String INTERSECT_SKETCH1 = "intersect";
+
+  private final Registry system;
+
+  public DataSketchesFunctions(Registry system) {
+this.system = system;
+  }
+
+  public static void register(Registry system) {
+DataSketchesFunctions dsf = new DataSketchesFunctions(system);
+// FIXME: what this should be approx, ds ... other?
+String prefix = "ds";
 
 Review comment:
   I like `ds` 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402471)
Time Spent: 2.5h  (was: 2h 20m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=402469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402469
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 12/Mar/20 20:37
Start Date: 12/Mar/20 20:37
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880759
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = 
"sketchtonumberofretainedentries";
 
 Review comment:
   LGTM, if you remove `get` in all of them, use `n_retained` 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402469)
Time Spent: 2h 20m  (was: 2h 10m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=402468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402468
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 12/Mar/20 20:36
Start Date: 12/Mar/20 20:36
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880374
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402468)
Time Spent: 2h 10m  (was: 2h)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=402467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402467
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 12/Mar/20 20:36
Start Date: 12/Mar/20 20:36
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880266
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   LGTM, I think they should easy enough to identify, and it is good we follow 
same pattern.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402467)
Time Spent: 2h  (was: 1h 50m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=402466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402466
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 12/Mar/20 20:36
Start Date: 12/Mar/20 20:36
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880118
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   makes sense, +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402466)
Time Spent: 1h 50m  (was: 1h 40m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400802
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:12
Start Date: 10/Mar/20 16:12
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390434617
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   the final name of the method is:
   * `ds_theta_gen_sketch`
   * `ds_theta_to_sketch`
   * `ds_theta_build_sketch`
   * `ds_theta_build`
   * `ds_theta_sketch`
   I right now feel these last 2 the best; they are easy to remember and the 
naming of these methods will be structured anyway - so the `ds_theta_` prefix 
somewhat adds a context for the function we may not need to repeat "sketch"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400802)
Time Spent: 1h 40m  (was: 1.5h)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400790
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:03
Start Date: 10/Mar/20 16:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390409296
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
 
 Review comment:
   `get_estimate_bounds` ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400790)
Time Spent: 1h 20m  (was: 1h 10m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400793
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:03
Start Date: 10/Mar/20 16:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390411646
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   I think it might make sense to remove the `get_` prefix - but in that case 
remove it from all of the functions...
   what do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400793)
Time Spent: 1.5h  (was: 1h 20m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400792
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:03
Start Date: 10/Mar/20 16:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390410296
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   this was something which was really consistent; but I think:
   * `to_sketch`
   * `gen_sketch`
   * `build_sketch`
   would be probably better - I changed it to `gen_sketch` for now...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400792)
Time Spent: 1h 20m  (was: 1h 10m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400794
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:03
Start Date: 10/Mar/20 16:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390417433
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = 
"sketchtonumberofretainedentries";
 
 Review comment:
   I right now think that we could change this to:
   * `get_n_retained`
   * `n_retained`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400794)
Time Spent: 1.5h  (was: 1h 20m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=400791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400791
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 10/Mar/20 16:03
Start Date: 10/Mar/20 16:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390419219
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = 
"sketchtonumberofretainedentries";
+  private static final String SKETCH_TO_QUANTILES_SKETCH = 
"sketchToQuantilesSketch";
+  private static final String SKETCH_TO_VALUES = "sketchToValues";
+  private static final String SKETCH_TO_VARIANCES = "sketchToVariances";
+  private static final String SKETCH_TO_PERCENTILE = "sketchToPercentile";
+  private static final String UNION_SKETCH1 = "unionSketch1";
+  private static final String INTERSECT_SKETCH1 = "intersect";
+
+  private final Registry system;
+
+  public DataSketchesFunctions(Registry system) {
+this.system = system;
+  }
+
+  public static void register(Registry system) {
+DataSketchesFunctions dsf = new DataSketchesFunctions(system);
+// FIXME: what this should be approx, ds ... other?
+String prefix = "ds";
 
 Review comment:
   @jcamachor  what do you think about this "ds" prefix ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400791)
Time Spent: 1h 20m  (was: 1h 10m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396447
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386674517
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   `gen_sketch` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396447)
Time Spent: 40m  (was: 0.5h)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396444
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386676030
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
 
 Review comment:
   Makes sense to change to `union`. Maybe the UDF (`UNION_SKETCH1`), which I 
expect will be less used, can be changed to `union_f`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396444)
Time Spent: 20m  (was: 10m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396449
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386675311
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
 
 Review comment:
   `stringify` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396449)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396448
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684215
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
 
 Review comment:
   `intersection` ->  `intersect`
   
   Below:
   `intersect` ->  `intersect_f` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396448)
Time Spent: 50m  (was: 40m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396451
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386683403
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
 
 Review comment:
   `get_bounded_estimate` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396451)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396445
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386675181
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   `get_estimate` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396445)
Time Spent: 0.5h  (was: 20m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396450
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684979
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
 
 Review comment:
   As you mentioned above, probably `sketchto` can be removed from these 
function names. Additionally, we can use `_` when they consist of multiple 
words.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396450)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396452
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684058
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
 
 Review comment:
   Can we use `_` for this function and those below instead of lower-upper case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396452)
Time Spent: 1h  (was: 50m)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396453
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 22:32
Start Date: 02/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386691018
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = 
"sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case 
of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = 
"sketchtonumberofretainedentries";
 
 Review comment:
   `get_number_retained_entries` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396453)
Time Spent: 1h 10m  (was: 1h)

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22940.01.patch, HIVE-22940.02.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22940) Make the datasketches functions available as predefined functions

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22940?focusedWorklogId=396135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396135
 ]

ASF GitHub Bot logged work on HIVE-22940:
-

Author: ASF GitHub Bot
Created on: 02/Mar/20 14:48
Start Date: 02/Mar/20 14:48
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #930: HIVE-22940 
datasketches functions
URL: https://github.com/apache/hive/pull/930
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396135)
Remaining Estimate: 0h
Time Spent: 10m

> Make the datasketches functions available as predefined functions 
> --
>
> Key: HIVE-22940
> URL: https://issues.apache.org/jira/browse/HIVE-22940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)