[ https://issues.apache.org/jira/browse/DRILL-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068908#comment-16068908 ]
Paul Rogers commented on DRILL-5617: ------------------------------------ [~aengelbrecht], agree completely that good performance comes from writing to a local node. Because spill files are temporary, and never outlive a query, there is no reason for them to be replicated: no harm if the files are lost if the Drillbit node dies. But, since we want Drill to work out-of-the-box, even if people start by doing something silly, we'll just add the node name and port number to the file path to ensure that spill file names are unique, even if someone does something silly. (The path already includes the query ID and fragment numbers.) > Spill file name collisions when spill file is on a shared file system > --------------------------------------------------------------------- > > Key: DRILL-5617 > URL: https://issues.apache.org/jira/browse/DRILL-5617 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.11.0 > Reporter: Chun Chang > Assignee: Paul Rogers > > Spill location can be configured to be written on hdfs such as: > hashagg: { > # The partitions divide the work inside the hashagg, to ease > # handling spilling. This initial figure is tuned down when > # memory is limited. > # Setting this option to 1 disables spilling ! > num_partitions: 32, > spill: { > # The 2 options below override the common ones > # they should be deprecated in the future > directories : [ "/tmp/drill/spill" ], > fs : "maprfs:///" > } > } > However, this could cause spill filename conflict since name convention does > not contain node name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)