[ 
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902288#comment-14902288
 ] 

Michael England commented on DRILL-2424:
----------------------------------------

I have also run into issues relating to the last paragraph of this feature. If 
flume writes to a .tmp file and renames it during a Drill query, it fails. As 
Drill is a very useful tool to query files in real time, especailly against 
files that are streamed in, i'd like a feature described above or at least the 
ability to certain files (maybe via a regex filter).

> Ignore hidden files in directory path
> -------------------------------------
>
>                 Key: DRILL-2424
>                 URL: https://issues.apache.org/jira/browse/DRILL-2424
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON, Storage - Text & CSV
>    Affects Versions: 0.7.0
>            Reporter: Andries Engelbrecht
>            Assignee: Steven Phillips
>             Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the 
> temporary write phase for the last file(s). These file typically have a 
> different extension like '.tmp' or can be marked hidden with a prefix of '.'  
> .
> Querying the directory path will Drill will then cause a query error as some 
> records may not be complete in the temporary files. Having the ability to 
> have Drill ignore hidden files and/or to only read files of designated 
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the 
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains 
> incomplete JSON objects till the file is closed and the .tmp extension (or 
> prefix) is removed. Attempting to query the directory structure with Drill 
> then results in errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to