[jira] [Commented] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available

Padma Penumarthy (JIRA) Tue, 01 Nov 2016 09:38:31 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625907#comment-15625907
 ]


Padma Penumarthy commented on DRILL-4706:
-----------------------------------------

For the data mentioned in the description of the problem, 4 nodes  have 16 
files each, 3 nodes have 17 files and other 3 nodes have 15 files i.e. data is 
not distributed equally among all nodes. With soft affinity parallelizer, we 
are allocating 16 fragments on each node.  So, the nodes which have only 15 
parquet files locally are doing remote read from one of the fragments. 3 remote 
reads for the 3 rowGroups (512 MB *3 ~ 1.5G) explains 2% (of 70G) remote reads. 
With the local affinity parallelizer, we schedule 16 fragments on 4 nodes, 17 
on 3 nodes and 15 on the other 3 nodes. There were no remote reads in this 
case. 

> Fragment planning causes Drillbits to read remote chunks when local copies 
> are available
> ----------------------------------------------------------------------------------------
>
>                 Key: DRILL-4706
>                 URL: https://issues.apache.org/jira/browse/DRILL-4706
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.6.0
>         Environment: CentOS, RHEL
>            Reporter: Kunal Khatua
>            Assignee: Sorabh Hamirwasia
>              Labels: performance, planning
>
> When a table (datasize=70GB) of 160 parquet files (each having a single 
> rowgroup and fitting within one chunk) is available on a 10-node setup with 
> replication=3 ; a pure data scan query causes about 2% of the data to be read 
> remotely. 
> Even with the creation of metadata cache, the planner is selecting a 
> sub-optimal plan of executing the SCAN fragments such that some of the data 
> is served from a remote server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available

Reply via email to