[jira] [Commented] (IMPALA-7564) Conservative FK/PK join type detection with complex equi-join conjuncts

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626577#comment-16626577 ] Paul Rogers commented on IMPALA-7564: - Great description. I think we can tease apart

[jira] [Commented] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626508#comment-16626508 ] Paul Rogers commented on IMPALA-7604: - Thanks, [~tarmstrong], for the very clear exp

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626483#comment-16626483 ] Paul Rogers edited comment on IMPALA-7310 at 9/24/18 9:27 PM:

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626483#comment-16626483 ] Paul Rogers commented on IMPALA-7310: - Final solution is even simpler, since we don'

[jira] [Updated] (IMPALA-7601) Improve cardinality and selectivity estimates

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Updated] (IMPALA-7601) Improve cardinality and selectivity estimates

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Summary: Improve cardinality and selectivity estimates (was: Improve default selectivity values)

[jira] [Updated] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Updated] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Commented] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626203#comment-16626203 ] Paul Rogers commented on IMPALA-7601: - Based on the above reasoning, here is a recom

[jira] [Updated] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Updated] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Updated] (IMPALA-7601) Improve default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Summary: Improve default selectivity values (was: Define better default selectivity values) > Im

[jira] [Updated] (IMPALA-7601) Define better default selectivity values

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Summary: Define better default selectivity values (was: Define a-priori selectivity and NDV value

[jira] [Updated] (IMPALA-7603) Incorrect NDV expression for col1 mathop col2

2018-09-24 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7603: Summary: Incorrect NDV expression for col1 mathop col2 (was: Incorrect NDV expression for col1 op

[jira] [Updated] (IMPALA-7608) Estimate row count from file size when no stats available

2018-09-21 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7608: Description: Impala makes heavy use of stats, which is a good thing. Stats feed into query planni

[jira] [Created] (IMPALA-7608) Estimate row count from file size when no stats available

2018-09-21 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7608: --- Summary: Estimate row count from file size when no stats available Key: IMPALA-7608 URL: https://issues.apache.org/jira/browse/IMPALA-7608 Project: IMPALA Issu

[jira] [Updated] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-21 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-21 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624019#comment-16624019 ] Paul Rogers commented on IMPALA-7560: - Created a unit test for this. {noformat}

[jira] [Commented] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623051#comment-16623051 ] Paul Rogers commented on IMPALA-7604: - [~tarmstrong], in my experience, using planne

[jira] [Created] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7604: --- Summary: In AggregationNode.computeStats, handle cardinality overflow better Key: IMPALA-7604 URL: https://issues.apache.org/jira/browse/IMPALA-7604 Project: IMPALA

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622856#comment-16622856 ] Paul Rogers edited comment on IMPALA-7310 at 9/21/18 12:04 AM: ---

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622856#comment-16622856 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 11:36 PM: ---

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622856#comment-16622856 ] Paul Rogers commented on IMPALA-7310: - The planner uses NDVs to make binary decision

[jira] [Commented] (IMPALA-7603) Incorrect NDV expression for col1 op col2

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622779#comment-16622779 ] Paul Rogers commented on IMPALA-7603: - Turns out that a similar limitation exists fo

[jira] [Updated] (IMPALA-7603) Incorrect NDV expression for col1 op col2

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7603: Description: Consider the  [[{{ExprNdvTest}}|https://github.com/apache/impala/blob/master/fe/src/t

[jira] [Updated] (IMPALA-7603) Incorrect NDV expression for col1 op col2

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7603: Description: Consider the  [{{ExprNdvTest}}|https://github.com/apache/impala/blob/master/fe/src/te

[jira] [Created] (IMPALA-7603) Incorrect NDV expression for col1 op col2

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7603: --- Summary: Incorrect NDV expression for col1 op col2 Key: IMPALA-7603 URL: https://issues.apache.org/jira/browse/IMPALA-7603 Project: IMPALA Issue Type: Bug

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622712#comment-16622712 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 9:08 PM:

[jira] [Created] (IMPALA-7602) Definition of NDV differs between planner and stats mechanism

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7602: --- Summary: Definition of NDV differs between planner and stats mechanism Key: IMPALA-7602 URL: https://issues.apache.org/jira/browse/IMPALA-7602 Project: IMPALA

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622712#comment-16622712 ] Paul Rogers commented on IMPALA-7310: - Per the suggestion of [~jeszyb], created IMPA

[jira] [Commented] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622709#comment-16622709 ] Paul Rogers commented on IMPALA-7601: - Please see [~tarmstr...@cloudera.com]'s comme

[jira] [Updated] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated IMPALA-7601: Description: Impala makes extensive use of table stats during query planning. For example, the ND

[jira] [Created] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7601: --- Summary: Define a-priori selectivity and NDV values Key: IMPALA-7601 URL: https://issues.apache.org/jira/browse/IMPALA-7601 Project: IMPALA Issue Type: Improve

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621434#comment-16621434 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 3:02 AM:

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621434#comment-16621434 ] Paul Rogers commented on IMPALA-7310: - Odd. Looked at the tests in {{ExprNdvTest}}.

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621096#comment-16621096 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 1:50 AM:

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621222#comment-16621222 ] Paul Rogers commented on IMPALA-7310: - [~jeszyb], agree completely. Here I'm digging

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621196#comment-16621196 ] Paul Rogers commented on IMPALA-7310: - Here, it is worth pointing out the risk of an

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621171#comment-16621171 ] Paul Rogers commented on IMPALA-7310: - The original description pointed out the meth

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621126#comment-16621126 ] Paul Rogers commented on IMPALA-7310: - As noted above, the code uses -1 as an "undef

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621096#comment-16621096 ] Paul Rogers commented on IMPALA-7310: - Simplest case: a binary predicate. Current be

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620946#comment-16620946 ] Paul Rogers commented on IMPALA-7560: - The table in DRILL-5254 suggests how to use t

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619968#comment-16619968 ] Paul Rogers commented on IMPALA-7310: - Simple reproduction: {noformat} create table

[jira] [Work started] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7310 started by Paul Rogers. --- > Compute Stats not computing NULLs as a distinct value causing wrong estimates >

[jira] [Assigned] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers reassigned IMPALA-7310: --- Assignee: Paul Rogers > Compute Stats not computing NULLs as a distinct value causing wrong

[jira] [Comment Edited] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619827#comment-16619827 ] Paul Rogers edited comment on IMPALA-7560 at 9/19/18 1:34 AM:

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619827#comment-16619827 ] Paul Rogers commented on IMPALA-7560: - Turns out that Apache Drill did a similar ana

<    2   3   4   5   6   7