[
https://issues.apache.org/jira/browse/HDDS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719011#comment-17719011
]
Duong commented on HDDS-8496:
-----------------------------
The Yarn mapper logs indicate why the rename fails: "destination parent is not
a directory". This is thrown by
[S3AFileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2154].
{code:java}
2023-05-02 20:39:25,854 [INFO] [TezChild] |s3a.S3AFileSystem|: rename
`s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_task_tmp.-ext-10000/_tmp.000000_3'
to
`s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/000000_3':
destination parent is not a directory
{code}
S3AFileSystem relies on S3 to return NOT_FOUND for the destination parent key
head operation.
{code:java}
try {
// make sure parent isn't a file.
// don't look for parent being a dir as there is a risk
// of a race between dest dir cleanup and rename in different
// threads.
S3AFileStatus dstParentStatus = innerGetFileStatus(parent,
false, StatusProbeEnum.FILE);
// if this doesn't raise an exception then
// the parent is a file or a dir.
if (!dstParentStatus.isDirectory()) {
throw new RenameFailedException(src, dst,
"destination parent is not a directory");
}
} catch (FileNotFoundException expected) {
// nothing was found. Don't worry about it;
// expect rename to implicitly create the parent dir
}
{code}
In the Audit log:
{code:java}
2023-05-02 20:38:56,682 | ERROR | S3GAudit | user=asdsadsa | ip=172.27.14.194 |
op=HEAD_KEY {bucket=[obs-bucket-link],
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000]}
| ret=FAILURE | KEY_NOT_FOUND
org.apache.hadoop.ozone.om.exceptions.OMException:
Key:mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000
not found
2023-05-02 20:38:56,722 | INFO | S3GAudit | user=asdsadsa | ip=172.27.14.194 |
op=CREATE_KEY {bucket=[obs-bucket-link],
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/]}
| ret=SUCCESS |
2023-05-02 20:39:07,885 | INFO | S3GAudit | user=duong | ip=172.27.129.7 |
op=HEAD_KEY {bucket=[obs-bucket-link],
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000]}
| ret=SUCCESS |{code}
The destination parent directory is created as a fake key
"mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/"
(end with /).
It's expected that the head op on key
"mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000"
(not ending with /) returns a 404, but S3G returns 200. That's the root cause.
This issue is because we've brought the following logic from OFS to S3
(HDDS-7419), as the price of unifying getKeyInfo API for both FS and
object-storage scenarios:
{code:java}
private OmKeyInfo getOmKeyInfoDirectoryAware(String volumeName,
String bucketName, String keyName) throws IOException {
OmKeyInfo keyInfo = getOmKeyInfo(volumeName, bucketName, keyName);
// Check if the key is a directory.
if (keyInfo != null) {
keyInfo.setFile(true);
return keyInfo;
}
String dirKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
OmKeyInfo dirKeyInfo = getOmKeyInfo(volumeName, bucketName, dirKey);
if (dirKeyInfo != null) {
dirKeyInfo.setFile(false);
}
return dirKeyInfo;
} {code}
It automatically looks up for a folder by appending a traling "/" if the
original key is not hit.
A feasible fix for this is to from S3G perspective, if the KeyInfo from OM
indicates a directory, S3 should return a NOT_FOUND.
> Hive with s3a connector for OBS bucket hits error
> -------------------------------------------------
>
> Key: HDDS-8496
> URL: https://issues.apache.org/jira/browse/HDDS-8496
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Ozone Manager
> Affects Versions: 1.4.0
> Reporter: Saketa Chalamchala
> Assignee: Sumit Agrawal
> Priority: Blocker
> Labels: proton
>
> Error: seen
> {code:java}
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename
> output from:
> [s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-04-23_07-38-52_965_8384866535295320062-1/_task_tmp.-ext-10000/_tmp.000000_3]
> to:
> [s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-04-23_07-38-52_965_8384866535295320062-1/_tmp.-ext-10000/000000_3]
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commitOneOutPath(FileSinkOperator.java:296)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:254)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$400(FileSinkOperator.java:157)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1458)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:482)
> ... 17 more
> {code}
>
> Steps to reproduce
>
> {code:java}
> ozone volume create /vol1
> ozone sh bucket create /vol1/obs-bucket --layout OBJECT_STORE
> ozone sh bucket link /vol1/obs-bucket /s3v/obs-bucket-link
> beeline -u "jdbc:
> [hive2://<host>:10000/default;principal=hive/<host>@<REALM>;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;sslTrustStorePassword=<redacted>]
> "
> > create external table mytable1(key string, value int) location '
> [s3a://obs-bucket-link/mytable1]
> ';
> > insert into mytable1 values("cldr",1);
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]