[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390386#comment-15390386
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/534


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390166#comment-15390166
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/534
  
Overall LGTM +1. 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390165#comment-15390165
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71942925
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/QueryClassLoader.java
 ---
@@ -44,8 +44,8 @@
   public static final String JAVA_COMPILER_OPTION = "exec.java_compiler";
   public static final StringValidator JAVA_COMPILER_VALIDATOR = new 
StringValidator(JAVA_COMPILER_OPTION, CompilerPolicy.DEFAULT.toString()) {
 @Override
-public void validate(OptionValue v) {
-  super.validate(v);
+public void validate(OptionValue v, OptionManager manager) {
--- End diff --

final ?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390163#comment-15390163
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71942865
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/InboundImpersonationManager.java
 ---
@@ -90,8 +91,8 @@ public InboundImpersonationPolicyValidator(String name, 
String def) {
 }
 
 @Override
-public void validate(OptionValue v) {
-  super.validate(v);
+public void validate(OptionValue v, final OptionManager manager) {
--- End diff --

final OptionValue ?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389157#comment-15389157
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

I have updated the pull request (https://github.com/apache/drill/pull/534). 
[~amansinha100] [~sudheeshkatkam] can you please take a look? Thanks!

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388850#comment-15388850
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71823822
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +91,62 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
+  super.validate(v, manager);
+  if (manager != null) {
--- End diff --

In that case, the caller should not be using this validator because this 
would be the same `RangeDoubleValidator`.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388734#comment-15388734
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71816185
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +91,62 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
+  super.validate(v, manager);
+  if (manager != null) {
--- End diff --

The caller can pass in a NULL manager if they do not want any dependency 
validation.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384862#comment-15384862
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71424000
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +91,62 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
+  super.validate(v, manager);
+  if (manager != null) {
+OptionValue maxValue = manager.getOption(maxValidatorName);
+
+if (v.float_val > maxValue.float_val) {
+  throw UserException.validationError()
+  .message(String.format("Option %s must be less than or 
equal to Option %s",
+  getOptionName(), maxValidatorName))
+  .build(logger);
+}
+  }
+}
+  }
+
+  public static class MaxRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String minValidatorName;
+
+public MaxRangeDoubleValidator(String name, double min, double max, 
double def, String minValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.minValidatorName = minValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
--- End diff --

you could make the 1st argument final too since it is read-only. 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382838#comment-15382838
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71207858
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -212,6 +219,14 @@ public static long getInitialPlanningMemorySize() {
 return INITIAL_OFF_HEAP_ALLOCATION_IN_BYTES;
   }
 
+  public double getFilterMinSelectivityEstimateFactor() {
+return 
options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR.getOptionName()).float_val;
--- End diff --

Make the option validators above typed:
`public static final FloatValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR 
...`

and change this line to:
`return options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR);`

Same for the other option.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382839#comment-15382839
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71207876
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +91,62 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
+  super.validate(v, manager);
+  if (manager != null) {
--- End diff --

Is this null check necessary?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379781#comment-15379781
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71013627
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -27,6 +27,21 @@
 import static com.google.common.base.Preconditions.checkArgument;
 
 public class TypeValidators {
+
+  /** Interface implemented by option validators which depend on other 
options
+   *  in order to perform validation. e.g. MIN/MAX option validators might 
be
+   *  dependent if MIN should always be less than MAX.
+   *
+   */
+  public interface DependentTypeValidators
+  {
+/* Interface method requires providing an OptionManager which can
+ * be used to read option values of dependencies. As an example look at
+ * MinRangeDoubleValidator/MaxRangeDoubleValidator
+ */
+public void validate(OptionValue v, BaseOptionManager manager);
--- End diff --

Would replacing `validate(OptionValue)` with `validate(OptionValue, 
OptionManager)` in the `OptionValidator` interface work?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379776#comment-15379776
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71013391
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -251,7 +254,12 @@ public void setOption(final OptionValue value) {
 final String name = value.name.toLowerCase();
 final OptionValidator validator = getValidator(name);
 
-validator.validate(value); // validate the option
+/* If the validator depends on other options */
+if (validator instanceof DependentTypeValidators) {
+  ((DependentTypeValidators)validator).validate(value, this); // 
validate the option
--- End diff --

Oops, I missed that.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379768#comment-15379768
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71012854
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -251,7 +254,12 @@ public void setOption(final OptionValue value) {
 final String name = value.name.toLowerCase();
 final OptionValidator validator = getValidator(name);
 
-validator.validate(value); // validate the option
+/* If the validator depends on other options */
+if (validator instanceof DependentTypeValidators) {
+  ((DependentTypeValidators)validator).validate(value, this); // 
validate the option
--- End diff --

The validation is also done in the FallbackOptionManager. A testcase is 
present which checks the validation when changing the option in the session.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379732#comment-15379732
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71010198
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -251,7 +254,12 @@ public void setOption(final OptionValue value) {
 final String name = value.name.toLowerCase();
 final OptionValidator validator = getValidator(name);
 
-validator.validate(value); // validate the option
+/* If the validator depends on other options */
+if (validator instanceof DependentTypeValidators) {
+  ((DependentTypeValidators)validator).validate(value, this); // 
validate the option
--- End diff --

This will not work if the option is set at session level.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379687#comment-15379687
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71005763
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -251,7 +254,12 @@ public void setOption(final OptionValue value) {
 final String name = value.name.toLowerCase();
 final OptionValidator validator = getValidator(name);
 
-validator.validate(value); // validate the option
+/* If the validator depends on other options */
+if (validator instanceof DependentTypeValidators) {
--- End diff --

It would be good to avoid doing an instanceof (although sometimes it 
becomes unavoidable).  Can you see what other changes are needed to achieve 
that ?  Seems like validate with a second parameter can be implemented by 
DependentTypeValidators only and other validators could throw an unsupported 
exception. 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373871#comment-15373871
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70537404
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
--- End diff --

Changed to TestSelectivity


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373835#comment-15373835
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70535046
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
--- End diff --

Well, one reason I mentioned that was it is odd to have Explain plan 
matching tests in TestOptions.  Also, what does it mean to 'test all options' ? 
  Clearly, we have option settings spread throughout the unit tests,  so 
options are tested implicitly.  


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373828#comment-15373828
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70534443
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -81,6 +83,11 @@
   new RangeLongValidator("planner.identifier_max_length", 128 /* A 
minimum length is needed because option names are identifiers themselves */,
   Integer.MAX_VALUE, 
DEFAULT_IDENTIFIER_MAX_LENGTH);
 
+  public static final OptionValidator 
FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new 
MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor",
--- End diff --

On second thoughts, I agree with keeping it as it is.  


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373539#comment-15373539
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

[~amansinha100]] I have updated the pull request 
(https://github.com/apache/drill/pull/534). Please take a look.

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373525#comment-15373525
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70505090
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -81,6 +83,11 @@
   new RangeLongValidator("planner.identifier_max_length", 128 /* A 
minimum length is needed because option names are identifiers themselves */,
   Integer.MAX_VALUE, 
DEFAULT_IDENTIFIER_MAX_LENGTH);
 
+  public static final OptionValidator 
FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new 
MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor",
--- End diff --

This will reduce the flexibility for the user. I think we should still 
maintain one-on-one mappings for RelMdRowCount, RelMdSelectivity and the 
selectivity upper/lower bounds. Selectivity and bounds will be different e.g. 
for filters than for joins.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373516#comment-15373516
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70504720
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
+static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestOptions.class);
+
+static final String WORKING_PATH = TestTools.getWorkingPath();
+static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR ";
--- End diff --

Removed.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373518#comment-15373518
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70504733
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
+static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestOptions.class);
+
+static final String WORKING_PATH = TestTools.getWorkingPath();
+static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR ";
+private static final String EXPLAIN_WITH_ATTRIB_PREFIX = "EXPLAIN PLAN 
INCLUDING ATTRIBUTES FOR ";
--- End diff --

Removed.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373520#comment-15373520
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70504774
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +105,66 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue maxValue = manager.getOption(maxValidatorName);
+
+  if (v.float_val > maxValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be less than Option 
%s", getOptionName(), maxValidatorName))
--- End diff --

Done


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373519#comment-15373519
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70504761
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +105,66 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue maxValue = manager.getOption(maxValidatorName);
+
+  if (v.float_val > maxValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be less than Option 
%s", getOptionName(), maxValidatorName))
+.build(logger);
+  }
+}
+  }
+
+  public static class MaxRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String minValidatorName;
+
+public MaxRangeDoubleValidator(String name, double min, double max, 
double def, String minValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.minValidatorName = minValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue minValue = manager.getOption(minValidatorName);
+
+  if (v.float_val < minValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be greater than 
Option %s", getOptionName(), minValidatorName))
--- End diff --

Done


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373514#comment-15373514
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70504693
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
--- End diff --

I assumed this would be used to test all options not just selectivity 
options. testSelectivity will limit the scope to only selectivity options 
(filter, join, aggregate, union, project, sort). Thoughts?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373415#comment-15373415
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70495155
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
+static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestOptions.class);
+
+static final String WORKING_PATH = TestTools.getWorkingPath();
+static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR ";
+private static final String EXPLAIN_WITH_ATTRIB_PREFIX = "EXPLAIN PLAN 
INCLUDING ATTRIBUTES FOR ";
--- End diff --

Not used ?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373414#comment-15373414
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70495124
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
+static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestOptions.class);
+
+static final String WORKING_PATH = TestTools.getWorkingPath();
+static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR ";
--- End diff --

Not used anywhere ?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373410#comment-15373410
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70494718
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import org.apache.drill.common.util.TestTools;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestOptions extends BaseTestQuery {
--- End diff --

TestOptions sounds too generic, since this test suite is not really testing 
all the options.  How about TestSelectivity ?


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373405#comment-15373405
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70494339
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +105,66 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue maxValue = manager.getOption(maxValidatorName);
+
+  if (v.float_val > maxValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be less than Option 
%s", getOptionName(), maxValidatorName))
+.build(logger);
+  }
+}
+  }
+
+  public static class MaxRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String minValidatorName;
+
+public MaxRangeDoubleValidator(String name, double min, double max, 
double def, String minValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.minValidatorName = minValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue minValue = manager.getOption(minValidatorName);
+
+  if (v.float_val < minValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be greater than 
Option %s", getOptionName(), minValidatorName))
--- End diff --

should be "greater than or equal to"


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373403#comment-15373403
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70494274
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +105,66 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
implements DependentTypeValidators {
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+}
+
+@Override
+public void validate(OptionValue v, final BaseOptionManager manager) {
+  super.validate(v);
+  OptionValue maxValue = manager.getOption(maxValidatorName);
+
+  if (v.float_val > maxValue.float_val) {
+throw UserException.validationError()
+.message(String.format("Option %s must be less than Option 
%s", getOptionName(), maxValidatorName))
--- End diff --

should be "less than or equal to" 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373400#comment-15373400
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70493693
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -81,6 +83,11 @@
   new RangeLongValidator("planner.identifier_max_length", 128 /* A 
minimum length is needed because option names are identifiers themselves */,
   Integer.MAX_VALUE, 
DEFAULT_IDENTIFIER_MAX_LENGTH);
 
+  public static final OptionValidator 
FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new 
MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor",
--- End diff --

Thinking about this more, we should probably remove the term 'Filter' from 
the option.   Calcite's RelMdSelectivity has a default implementation for 
various Rels, not just Filter.  In the future we could potentially use the same 
min and max bounds for other Rels.What do you think ?  If you agree, other 
places need to be modified too. 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-11 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372060#comment-15372060
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

[~amansinha100] I have updated the pull request 
(https://github.com/apache/drill/pull/534) according to your comments. Please 
take a look.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371411#comment-15371411
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r70316280
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -81,6 +81,11 @@
   new RangeLongValidator("planner.identifier_max_length", 128 /* A 
minimum length is needed because option names are identifiers themselves */,
   Integer.MAX_VALUE, 
DEFAULT_IDENTIFIER_MAX_LENGTH);
 
+  public static final OptionValidator 
FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new 
RangeDoubleValidator("planner.filter.min_selectivity_estimate_factor",
+  0.0, 1.0, 0.0d);
+  public static final OptionValidator 
FILTER_MAX_SELECTIVITY_ESTIMATE_FACTOR = new 
RangeDoubleValidator("planner.filter.max_selectivity_estimate_factor",
--- End diff --

can you add validation if the min does not exceed the max and vice versa. 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343061#comment-15343061
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

[~amansinha100] I have updated the pull request. Please take a look.

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342826#comment-15342826
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

I have created a pull request https://github.com/apache/drill/pull/534 
[~amansinha100] can you please take a look and provide the feedback

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342825#comment-15342825
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

GitHub user gparai opened a pull request:

https://github.com/apache/drill/pull/534

[DRILL-4743] HashJoin's not fully parallelized in query plan

Provide a user parameter for defining a lower bound of selectivity to 
prevent under-estimates on filter selectivity.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gparai/drill MD-880-ADM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #534






> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)