Abhishek Girish created DRILL-3230:
--------------------------------------

             Summary: Local file system plug-in must be disabled in distributed 
mode
                 Key: DRILL-3230
                 URL: https://issues.apache.org/jira/browse/DRILL-3230
             Project: Apache Drill
          Issue Type: Bug
          Components: Client - HTTP
            Reporter: Abhishek Girish
            Assignee: Jacques Nadeau


The local file system plug-in (The "file:///" connection string in dfs storage 
plug-in) does not behave as expected for both CTAS and querying files, when 
Drill is configured with distributed mode (multiple drill-bits across nodes). 

In case of CTAS, parquet files will be written to a specific node's local file 
system, depending on which Drill-bit the client connects to. And if the table 
is moderate to large in size, Drill may process them in a distributed manner 
and write data into more than one node - data is partitioned into different 
nodes. 

In case of queries, it could be confusing again, as the behavior will depend on 
which drill-bit the client connects to. Hence the behavior seen would be 
inconsistent - queries would return only partial results, which depend on the 
drillbit connected to.

My suggestion would be that the local file system plugin be disabled with 
distributed mode. With multiple drill bits and a centralized plugin for local 
file system, consistent behavior cannot be expected. 

It should be either disabled when distributed mode is detected or we could add 
support for multiple namespaces (using IP of nodes) with local file systems 
(might still not fix all issues). Or may be there could be other ways to 
resolve this, which I might be overlooking or not aware of. 

There have been many issues seen on the user ML, where inconsistent behaviors 
have been observed by users.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to