[ 
https://issues.apache.org/jira/browse/DRILL-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068908#comment-16068908
 ] 

Paul Rogers commented on DRILL-5617:
------------------------------------

[~aengelbrecht], agree completely that good performance comes from writing to a 
local node. Because spill files are temporary, and never outlive a query, there 
is no reason for them to be replicated: no harm if the files are lost if the 
Drillbit node dies.

But, since we want Drill to work out-of-the-box, even if people start by doing 
something silly, we'll just add the node name and port number to the file path 
to ensure that spill file names are unique, even if someone does something 
silly. (The path already includes the query ID and fragment numbers.)

> Spill file name collisions when spill file is on a shared file system
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5617
>                 URL: https://issues.apache.org/jira/browse/DRILL-5617
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>            Reporter: Chun Chang
>            Assignee: Paul Rogers
>
> Spill location can be configured to be written on hdfs such as:
>   hashagg: {
>     # The partitions divide the work inside the hashagg, to ease
>     # handling spilling. This initial figure is tuned down when
>     # memory is limited.
>     #  Setting this option to 1 disables spilling !
>     num_partitions: 32,
>     spill: {
>         # The 2 options below override the common ones
>         # they should be deprecated in the future
>         directories : [ "/tmp/drill/spill" ],
>         fs : "maprfs:///"
>      }
>   }
> However, this could cause spill filename conflict since name convention does 
> not contain node name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to