[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390386#comment-15390386 ] ASF GitHub Bot commented on DRILL-4743: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/534 > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390166#comment-15390166 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/534 Overall LGTM +1. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390165#comment-15390165 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71942925 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/QueryClassLoader.java --- @@ -44,8 +44,8 @@ public static final String JAVA_COMPILER_OPTION = "exec.java_compiler"; public static final StringValidator JAVA_COMPILER_VALIDATOR = new StringValidator(JAVA_COMPILER_OPTION, CompilerPolicy.DEFAULT.toString()) { @Override -public void validate(OptionValue v) { - super.validate(v); +public void validate(OptionValue v, OptionManager manager) { --- End diff -- final ? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390163#comment-15390163 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71942865 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/InboundImpersonationManager.java --- @@ -90,8 +91,8 @@ public InboundImpersonationPolicyValidator(String name, String def) { } @Override -public void validate(OptionValue v) { - super.validate(v); +public void validate(OptionValue v, final OptionManager manager) { --- End diff -- final OptionValue ? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389157#comment-15389157 ] Gautam Kumar Parai commented on DRILL-4743: --- I have updated the pull request (https://github.com/apache/drill/pull/534). [~amansinha100] [~sudheeshkatkam] can you please take a look? Thanks! > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388850#comment-15388850 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71823822 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +91,62 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v, final OptionManager manager) { + super.validate(v, manager); + if (manager != null) { --- End diff -- In that case, the caller should not be using this validator because this would be the same `RangeDoubleValidator`. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388734#comment-15388734 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71816185 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +91,62 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v, final OptionManager manager) { + super.validate(v, manager); + if (manager != null) { --- End diff -- The caller can pass in a NULL manager if they do not want any dependency validation. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384862#comment-15384862 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71424000 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +91,62 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v, final OptionManager manager) { + super.validate(v, manager); + if (manager != null) { +OptionValue maxValue = manager.getOption(maxValidatorName); + +if (v.float_val > maxValue.float_val) { + throw UserException.validationError() + .message(String.format("Option %s must be less than or equal to Option %s", + getOptionName(), maxValidatorName)) + .build(logger); +} + } +} + } + + public static class MaxRangeDoubleValidator extends RangeDoubleValidator { +private final double min; +private final double max; +private final String minValidatorName; + +public MaxRangeDoubleValidator(String name, double min, double max, double def, String minValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.minValidatorName = minValidatorName; +} + +@Override +public void validate(OptionValue v, final OptionManager manager) { --- End diff -- you could make the 1st argument final too since it is read-only. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382838#comment-15382838 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71207858 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -212,6 +219,14 @@ public static long getInitialPlanningMemorySize() { return INITIAL_OFF_HEAP_ALLOCATION_IN_BYTES; } + public double getFilterMinSelectivityEstimateFactor() { +return options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR.getOptionName()).float_val; --- End diff -- Make the option validators above typed: `public static final FloatValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR ...` and change this line to: `return options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR);` Same for the other option. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382839#comment-15382839 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71207876 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +91,62 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v, final OptionManager manager) { + super.validate(v, manager); + if (manager != null) { --- End diff -- Is this null check necessary? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379781#comment-15379781 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71013627 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -27,6 +27,21 @@ import static com.google.common.base.Preconditions.checkArgument; public class TypeValidators { + + /** Interface implemented by option validators which depend on other options + * in order to perform validation. e.g. MIN/MAX option validators might be + * dependent if MIN should always be less than MAX. + * + */ + public interface DependentTypeValidators + { +/* Interface method requires providing an OptionManager which can + * be used to read option values of dependencies. As an example look at + * MinRangeDoubleValidator/MaxRangeDoubleValidator + */ +public void validate(OptionValue v, BaseOptionManager manager); --- End diff -- Would replacing `validate(OptionValue)` with `validate(OptionValue, OptionManager)` in the `OptionValidator` interface work? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379776#comment-15379776 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71013391 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -251,7 +254,12 @@ public void setOption(final OptionValue value) { final String name = value.name.toLowerCase(); final OptionValidator validator = getValidator(name); -validator.validate(value); // validate the option +/* If the validator depends on other options */ +if (validator instanceof DependentTypeValidators) { + ((DependentTypeValidators)validator).validate(value, this); // validate the option --- End diff -- Oops, I missed that. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379768#comment-15379768 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71012854 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -251,7 +254,12 @@ public void setOption(final OptionValue value) { final String name = value.name.toLowerCase(); final OptionValidator validator = getValidator(name); -validator.validate(value); // validate the option +/* If the validator depends on other options */ +if (validator instanceof DependentTypeValidators) { + ((DependentTypeValidators)validator).validate(value, this); // validate the option --- End diff -- The validation is also done in the FallbackOptionManager. A testcase is present which checks the validation when changing the option in the session. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379732#comment-15379732 ] ASF GitHub Bot commented on DRILL-4743: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71010198 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -251,7 +254,12 @@ public void setOption(final OptionValue value) { final String name = value.name.toLowerCase(); final OptionValidator validator = getValidator(name); -validator.validate(value); // validate the option +/* If the validator depends on other options */ +if (validator instanceof DependentTypeValidators) { + ((DependentTypeValidators)validator).validate(value, this); // validate the option --- End diff -- This will not work if the option is set at session level. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379687#comment-15379687 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r71005763 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -251,7 +254,12 @@ public void setOption(final OptionValue value) { final String name = value.name.toLowerCase(); final OptionValidator validator = getValidator(name); -validator.validate(value); // validate the option +/* If the validator depends on other options */ +if (validator instanceof DependentTypeValidators) { --- End diff -- It would be good to avoid doing an instanceof (although sometimes it becomes unavoidable). Can you see what other changes are needed to achieve that ? Seems like validate with a second parameter can be implemented by DependentTypeValidators only and other validators could throw an unsupported exception. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373871#comment-15373871 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70537404 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { --- End diff -- Changed to TestSelectivity > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373835#comment-15373835 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70535046 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { --- End diff -- Well, one reason I mentioned that was it is odd to have Explain plan matching tests in TestOptions. Also, what does it mean to 'test all options' ? Clearly, we have option settings spread throughout the unit tests, so options are tested implicitly. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373828#comment-15373828 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70534443 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -81,6 +83,11 @@ new RangeLongValidator("planner.identifier_max_length", 128 /* A minimum length is needed because option names are identifiers themselves */, Integer.MAX_VALUE, DEFAULT_IDENTIFIER_MAX_LENGTH); + public static final OptionValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor", --- End diff -- On second thoughts, I agree with keeping it as it is. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373539#comment-15373539 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100]] I have updated the pull request (https://github.com/apache/drill/pull/534). Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373525#comment-15373525 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70505090 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -81,6 +83,11 @@ new RangeLongValidator("planner.identifier_max_length", 128 /* A minimum length is needed because option names are identifiers themselves */, Integer.MAX_VALUE, DEFAULT_IDENTIFIER_MAX_LENGTH); + public static final OptionValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor", --- End diff -- This will reduce the flexibility for the user. I think we should still maintain one-on-one mappings for RelMdRowCount, RelMdSelectivity and the selectivity upper/lower bounds. Selectivity and bounds will be different e.g. for filters than for joins. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373516#comment-15373516 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70504720 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { +static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestOptions.class); + +static final String WORKING_PATH = TestTools.getWorkingPath(); +static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; +private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR "; --- End diff -- Removed. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373518#comment-15373518 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70504733 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { +static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestOptions.class); + +static final String WORKING_PATH = TestTools.getWorkingPath(); +static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; +private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR "; +private static final String EXPLAIN_WITH_ATTRIB_PREFIX = "EXPLAIN PLAN INCLUDING ATTRIBUTES FOR "; --- End diff -- Removed. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373520#comment-15373520 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70504774 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +105,66 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue maxValue = manager.getOption(maxValidatorName); + + if (v.float_val > maxValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be less than Option %s", getOptionName(), maxValidatorName)) --- End diff -- Done > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373519#comment-15373519 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70504761 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +105,66 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue maxValue = manager.getOption(maxValidatorName); + + if (v.float_val > maxValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be less than Option %s", getOptionName(), maxValidatorName)) +.build(logger); + } +} + } + + public static class MaxRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String minValidatorName; + +public MaxRangeDoubleValidator(String name, double min, double max, double def, String minValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.minValidatorName = minValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue minValue = manager.getOption(minValidatorName); + + if (v.float_val < minValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be greater than Option %s", getOptionName(), minValidatorName)) --- End diff -- Done > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373514#comment-15373514 ] ASF GitHub Bot commented on DRILL-4743: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70504693 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { --- End diff -- I assumed this would be used to test all options not just selectivity options. testSelectivity will limit the scope to only selectivity options (filter, join, aggregate, union, project, sort). Thoughts? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373415#comment-15373415 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70495155 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { +static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestOptions.class); + +static final String WORKING_PATH = TestTools.getWorkingPath(); +static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; +private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR "; +private static final String EXPLAIN_WITH_ATTRIB_PREFIX = "EXPLAIN PLAN INCLUDING ATTRIBUTES FOR "; --- End diff -- Not used ? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373414#comment-15373414 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70495124 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { +static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestOptions.class); + +static final String WORKING_PATH = TestTools.getWorkingPath(); +static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; +private static final String EXPLAIN_PREFIX = "EXPLAIN PLAN FOR "; --- End diff -- Not used anywhere ? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373410#comment-15373410 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70494718 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestOptions.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + +import org.apache.drill.common.util.TestTools; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestOptions extends BaseTestQuery { --- End diff -- TestOptions sounds too generic, since this test suite is not really testing all the options. How about TestSelectivity ? > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373405#comment-15373405 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70494339 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +105,66 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue maxValue = manager.getOption(maxValidatorName); + + if (v.float_val > maxValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be less than Option %s", getOptionName(), maxValidatorName)) +.build(logger); + } +} + } + + public static class MaxRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String minValidatorName; + +public MaxRangeDoubleValidator(String name, double min, double max, double def, String minValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.minValidatorName = minValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue minValue = manager.getOption(minValidatorName); + + if (v.float_val < minValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be greater than Option %s", getOptionName(), minValidatorName)) --- End diff -- should be "greater than or equal to" > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373403#comment-15373403 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70494274 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java --- @@ -90,6 +105,66 @@ public void validate(OptionValue v) { } } + public static class MinRangeDoubleValidator extends RangeDoubleValidator implements DependentTypeValidators { +private final double min; +private final double max; +private final String maxValidatorName; + +public MinRangeDoubleValidator(String name, double min, double max, double def, String maxValidatorName) { + super(name, min, max, def); + this.min = min; + this.max = max; + this.maxValidatorName = maxValidatorName; +} + +@Override +public void validate(OptionValue v) { + super.validate(v); +} + +@Override +public void validate(OptionValue v, final BaseOptionManager manager) { + super.validate(v); + OptionValue maxValue = manager.getOption(maxValidatorName); + + if (v.float_val > maxValue.float_val) { +throw UserException.validationError() +.message(String.format("Option %s must be less than Option %s", getOptionName(), maxValidatorName)) --- End diff -- should be "less than or equal to" > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373400#comment-15373400 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70493693 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -81,6 +83,11 @@ new RangeLongValidator("planner.identifier_max_length", 128 /* A minimum length is needed because option names are identifiers themselves */, Integer.MAX_VALUE, DEFAULT_IDENTIFIER_MAX_LENGTH); + public static final OptionValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new MinRangeDoubleValidator("planner.filter.min_selectivity_estimate_factor", --- End diff -- Thinking about this more, we should probably remove the term 'Filter' from the option. Calcite's RelMdSelectivity has a default implementation for various Rels, not just Filter. In the future we could potentially use the same min and max bounds for other Rels.What do you think ? If you agree, other places need to be modified too. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372060#comment-15372060 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100] I have updated the pull request (https://github.com/apache/drill/pull/534) according to your comments. Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371411#comment-15371411 ] ASF GitHub Bot commented on DRILL-4743: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/534#discussion_r70316280 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -81,6 +81,11 @@ new RangeLongValidator("planner.identifier_max_length", 128 /* A minimum length is needed because option names are identifiers themselves */, Integer.MAX_VALUE, DEFAULT_IDENTIFIER_MAX_LENGTH); + public static final OptionValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR = new RangeDoubleValidator("planner.filter.min_selectivity_estimate_factor", + 0.0, 1.0, 0.0d); + public static final OptionValidator FILTER_MAX_SELECTIVITY_ESTIMATE_FACTOR = new RangeDoubleValidator("planner.filter.max_selectivity_estimate_factor", --- End diff -- can you add validation if the min does not exceed the max and vice versa. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343061#comment-15343061 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100] I have updated the pull request. Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342826#comment-15342826 ] Gautam Kumar Parai commented on DRILL-4743: --- I have created a pull request https://github.com/apache/drill/pull/534 [~amansinha100] can you please take a look and provide the feedback > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342825#comment-15342825 ] ASF GitHub Bot commented on DRILL-4743: --- GitHub user gparai opened a pull request: https://github.com/apache/drill/pull/534 [DRILL-4743] HashJoin's not fully parallelized in query plan Provide a user parameter for defining a lower bound of selectivity to prevent under-estimates on filter selectivity. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gparai/drill MD-880-ADM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #534 > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)