[ 
https://issues.apache.org/jira/browse/IMPALA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2842:
----------------------------------
    Fix Version/s: Impala 2.10.0

> "SCAN HDFS" "hosts" doesn't account for num_nodes or unsplittable formats
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-2842
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2842
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0
>            Reporter: Juan Yu
>            Priority: Minor
>             Fix For: Impala 2.10.0
>
>
> According to the comments, "hosts" should be "number of nodes on which the 
> plan tree rooted at this node would execute".
> But for "SCAN HDFS", it always equals to the # of backend where the data is.
> For example for query "select * from sc;"
> distributed plan
> {code}
> Query: explain select * from sc limit 1000
> +--------------------------------------------------------------+
> | Explain String                                               |
> +--------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=32.00MB VCores=1     |
> |                                                              |
> | F01:PLAN FRAGMENT [UNPARTITIONED]                            |
> |   01:EXCHANGE [UNPARTITIONED]                                |
> |      limit: 1000                                             |
> |      hosts=3 per-host-mem=unavailable                        |
> |      tuple-ids=0 row-size=58B cardinality=8                  |
> |                                                              |
> | F00:PLAN FRAGMENT [RANDOM]                                   |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED] |
> |   00:SCAN HDFS [default.sc, RANDOM]                          |
> |      partitions=1/1 files=3 size=163B                        |
> |      table stats: 8 rows total                               |
> |      column stats: all                                       |
> |      limit: 1000                                             |
> |      hosts=3 per-host-mem=32.00MB                            |
> |      tuple-ids=0 row-size=58B cardinality=8                  |
> +--------------------------------------------------------------+
> {code}
> single node plan
> {code}
> Query: explain select * from sc
> +-----------------------------------------------------+
> | Explain String                                      |
> +-----------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=0B VCores=0 |
> |                                                     |
> | F00:PLAN FRAGMENT [UNPARTITIONED]                   |
> |   00:SCAN HDFS [default.sc]                         |
> |      partitions=1/1 files=3 size=163B               |
> |      table stats: 8 rows total                      |
> |      column stats: all                              |
> |      hosts=3 per-host-mem=unavailable               |
> |      tuple-ids=0 row-size=58B cardinality=8         |
> +-----------------------------------------------------+
> {code}
> Query summary and profile do show correct # of executing nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to