[ https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Westin updated DRILL-2424: -------------------------------- Fix Version/s: (was: 1.0.0) 1.2.0 > Ignore hidden files in directory path > ------------------------------------- > > Key: DRILL-2424 > URL: https://issues.apache.org/jira/browse/DRILL-2424 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Text & CSV > Affects Versions: 0.7.0 > Reporter: Andries Engelbrecht > Assignee: Steven Phillips > Fix For: 1.2.0 > > > When streaming data to the DFS some records can be incomplete during the > temporary write phase for the last file(s). These file typically have a > different extension like '.tmp' or can be marked hidden with a prefix of '.' > . > Querying the directory path will Drill will then cause a query error as some > records may not be complete in the temporary files. Having the ability to > have Drill ignore hidden files and/or to only read files of designated > extension in the workspace will resolve this problem. > Example is using Flume to stream JSON files to a directory structure, the > HDFS sink creates .tmp files (can be hidden with . prefix) that contains > incomplete JSON objects till the file is closed and the .tmp extension (or > prefix) is removed. Attempting to query the directory structure with Drill > then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)