[ 
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2424.
--------------------------------
    Resolution: Duplicate
      Assignee: Mehant Baid  (was: Steven Phillips)

> Ignore hidden files in directory path
> -------------------------------------
>
>                 Key: DRILL-2424
>                 URL: https://issues.apache.org/jira/browse/DRILL-2424
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON, Storage - Text & CSV
>    Affects Versions: 0.7.0
>            Reporter: Andries Engelbrecht
>            Assignee: Mehant Baid
>             Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the 
> temporary write phase for the last file(s). These file typically have a 
> different extension like '.tmp' or can be marked hidden with a prefix of '.'  
> .
> Querying the directory path will Drill will then cause a query error as some 
> records may not be complete in the temporary files. Having the ability to 
> have Drill ignore hidden files and/or to only read files of designated 
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the 
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains 
> incomplete JSON objects till the file is closed and the .tmp extension (or 
> prefix) is removed. Attempting to query the directory structure with Drill 
> then results in errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to