[ 
https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968502#comment-16968502
 ] 

ASF subversion and git services commented on IMPALA-8557:
---------------------------------------------------------

Commit 8b8a49e617818e9bcf99b784b63587c95cebd622 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8b8a49e ]

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
>From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

Several tables created during dataload, such as alltypes, already use
the '.txt' extension for their files. These tables are not created via
Impala's INSERT code path, they are copied into the table. However,
there are several tables created during dataload, such as
alltypesinsert, that are created via Impala. This patch will change
the files in these tables so that they end in '.txt'.

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Reviewed-on: http://gerrit.cloudera.org:8080/14621
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Impala on ABFS failed with error "IllegalArgumentException: ABFS does not 
> allow files or directories to end with a dot."
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8557
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8557
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Eric Lin
>            Assignee: Sahil Takiar
>            Priority: Major
>
> HDFS introduced below feature to stop users from creating a file that ends 
> with "." on ABFS:
> https://issues.apache.org/jira/browse/HADOOP-15860
> As a result of this change, Impala now writes to ABFS fails with such error.
> I can see that it generates temp file using this format "$0.$1.$2":
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329
> $2 is the file extension and will be empty if it is TEXT file format:
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65
> Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting 
> 6.2 and works in older versions.
> There is no way to override this empty file extension so no workaround is 
> possible, unless user choose another file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to