[ 
https://issues.apache.org/jira/browse/IMPALA-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067101#comment-18067101
 ] 

Quanlong Huang commented on IMPALA-14796:
-----------------------------------------

A runtime filter can have multiple target nodes, e.g., in the above filters of 
TPCH-Q5, filter 2 has two target nodes (0,3). So a single boolean "effective" 
column is not enough. We can add a column to list the node ids that the filter 
is effective (i.e. rejected some data).

Uploaded a patch for review: https://gerrit.cloudera.org/c/24123/
The new "Final filter table" for TPCH-Q5:
{noformat}
    Final filter table: 
 ID  Src. Node  Tgt. Node(s)  Eff. Tgt. Node(s)     Target type  Partition 
filter  Pending (Expected)  First arrived  Completed  Enabled  Bloom Size   Est 
fpp  Min value  Max value   In-list size 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 10          6             2                  2           LOCAL             
false               0 (3)            N/A        N/A    false     1.00 MB  
2.06e-06     
  8          7             1                  1          REMOTE             
false               0 (3)      743.792ms  744.190ms     true     1.00 MB  
2.06e-06     
  5          8             2                  2           LOCAL             
false               0 (3)            N/A        N/A    false     1.00 MB  
3.54e-11     
  4          8             0                  N          REMOTE             
false               0 (3)      733.116ms  733.521ms     true     1.00 MB  
7.53e-16     
  2          9          0, 3               0, 3  REMOTE, REMOTE      false, 
false               0 (3)      725.755ms  726.151ms     true     1.00 MB  
7.53e-16     
  0         10             4                  4          REMOTE             
false               0 (3)      716.720ms  717.109ms     true     1.00 MB  
2.79e-17{noformat}
Note that filter 4 doesn't reject any rows so its corresponding "Eff. Tgt. 
Node(s)" column value is "N".

> Add "effective" column in "Final filter table" in query profile
> ---------------------------------------------------------------
>
>                 Key: IMPALA-14796
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14796
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>              Labels: ramp-up
>
> In the query profile, there is a section about runtime filters, e.g., for 
> TPCH-Q5:
> {noformat}
>     Final filter table: 
>  ID  Src. Node  Tgt. Node(s)     Target type  Partition filter  Pending 
> (Expected)  First arrived  Completed  Enabled  Bloom Size   Est fpp  Min 
> value  Max value   In-list size
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  10          6             2           LOCAL             false               
> 0 (3)            N/A        N/A     true     1.00 MB  2.06e-06                
>                      
>   8          7             1          REMOTE             false               
> 0 (3)      392.242ms  393.338ms     true     1.00 MB  2.06e-06                
>                      
>   5          8             2           LOCAL             false               
> 0 (3)            N/A        N/A     true     1.00 MB  3.54e-11                
>                      
>   4          8             0          REMOTE             false               
> 0 (3)      351.647ms  351.978ms     true     1.00 MB  7.53e-16                
>                      
>   2          9          0, 3  REMOTE, REMOTE      false, false               
> 0 (3)      347.219ms  347.494ms     true     1.00 MB  7.53e-16                
>                      
>   0         10             4          REMOTE             false               
> 0 (3)      342.907ms  343.293ms     true     1.00 MB  2.79e-17{noformat}
> It'd be helpful to add a boolean column "effective" to show whether the 
> filter actually rejects any data (rows/RowGroups/splits/files).
> Currently, we have to check the "rejected" counters of the ScanNodes, e.g.,
> {noformat}
>         Filter 2 (1.00 MB):
>            - Files processed: 0 (0)
>            - Files rejected: 0 (0)
>            - Files total: 0 (0)
>            - RowGroups processed: 1 (1)
>            - RowGroups rejected: 0 (0)
>            - RowGroups total: 1 (1)
>            - Rows processed: 150.00K (150000)
>            - Rows rejected: 119.82K (119817)
>            - Rows total: 150.00K (150000)
>            - Splits processed: 0 (0)
>            - Splits rejected: 0 (0)
>            - Splits total: 0 (0)
>         Filter 4 (1.00 MB):
>            - Files processed: 0 (0)
>            - Files rejected: 0 (0)
>            - Files total: 0 (0)
>            - RowGroups processed: 1 (1)
>            - RowGroups rejected: 0 (0)
>            - RowGroups total: 1 (1)
>            - Rows processed: 16.38K (16384)
>            - Rows rejected: 0 (0)
>            - Rows total: 30.18K (30183)
>            - Splits processed: 0 (0)
>            - Splits rejected: 0 (0)
>            - Splits total: 0 (0){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to