[ 
https://issues.apache.org/jira/browse/HDDS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719011#comment-17719011
 ] 

Duong commented on HDDS-8496:
-----------------------------

The Yarn mapper logs indicate why the rename fails:  "destination parent is not 
a directory". This is thrown by 
[S3AFileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2154].
{code:java}
2023-05-02 20:39:25,854 [INFO] [TezChild] |s3a.S3AFileSystem|: rename 
`s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_task_tmp.-ext-10000/_tmp.000000_3'
 to 
`s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/000000_3':
 destination parent is not a directory
{code}
S3AFileSystem relies on S3 to return NOT_FOUND for the destination parent key 
head operation.
{code:java}
        try {
          // make sure parent isn't a file.
          // don't look for parent being a dir as there is a risk
          // of a race between dest dir cleanup and rename in different
          // threads.
          S3AFileStatus dstParentStatus = innerGetFileStatus(parent,
              false, StatusProbeEnum.FILE);
          // if this doesn't raise an exception then
          // the parent is a file or a dir.
          if (!dstParentStatus.isDirectory()) {
            throw new RenameFailedException(src, dst,
                "destination parent is not a directory");
          }
        } catch (FileNotFoundException expected) {
          // nothing was found. Don't worry about it;
          // expect rename to implicitly create the parent dir
        }
{code}
 

In the Audit log:
{code:java}
2023-05-02 20:38:56,682 | ERROR | S3GAudit | user=asdsadsa | ip=172.27.14.194 | 
op=HEAD_KEY {bucket=[obs-bucket-link], 
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000]}
 | ret=FAILURE | KEY_NOT_FOUND 
org.apache.hadoop.ozone.om.exceptions.OMException: 
Key:mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000
 not found


2023-05-02 20:38:56,722 | INFO  | S3GAudit | user=asdsadsa | ip=172.27.14.194 | 
op=CREATE_KEY {bucket=[obs-bucket-link], 
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/]}
 | ret=SUCCESS |


2023-05-02 20:39:07,885 | INFO  | S3GAudit | user=duong | ip=172.27.129.7 | 
op=HEAD_KEY {bucket=[obs-bucket-link], 
path=[mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000]}
 | ret=SUCCESS |{code}
The destination parent directory is created as a fake key 
"mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000/"
 (end with /).

It's expected that the head op on key 
"mytable1/.hive-staging_hive_2023-05-02_20-38-56_222_3823760491772440442-10/_tmp.-ext-10000"
 (not ending with /) returns a 404, but S3G returns 200. That's the root cause.

 

This issue is because we've brought the following logic from OFS to S3 
(HDDS-7419), as the price of unifying getKeyInfo API for both FS and 
object-storage scenarios:
{code:java}
private OmKeyInfo getOmKeyInfoDirectoryAware(String volumeName,
          String bucketName, String keyName) throws IOException {
  OmKeyInfo keyInfo = getOmKeyInfo(volumeName, bucketName, keyName);

  // Check if the key is a directory.
  if (keyInfo != null) {
    keyInfo.setFile(true);
    return keyInfo;
  }

  String dirKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
  OmKeyInfo dirKeyInfo = getOmKeyInfo(volumeName, bucketName, dirKey);
  if (dirKeyInfo != null) {
    dirKeyInfo.setFile(false);
  }
  return dirKeyInfo;
} {code}
It automatically looks up for a folder by appending a traling "/" if the 
original key is not hit. 

A feasible fix for this is to from S3G perspective, if the KeyInfo from OM 
indicates a directory, S3 should return a NOT_FOUND. 

> Hive with s3a connector for OBS bucket hits error
> -------------------------------------------------
>
>                 Key: HDDS-8496
>                 URL: https://issues.apache.org/jira/browse/HDDS-8496
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Manager
>    Affects Versions: 1.4.0
>            Reporter: Saketa Chalamchala
>            Assignee: Sumit Agrawal
>            Priority: Blocker
>              Labels: proton
>
> Error: seen
> {code:java}
>       ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> [s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-04-23_07-38-52_965_8384866535295320062-1/_task_tmp.-ext-10000/_tmp.000000_3]
>  to: 
> [s3a://obs-bucket-link/mytable1/.hive-staging_hive_2023-04-23_07-38-52_965_8384866535295320062-1/_tmp.-ext-10000/000000_3]
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commitOneOutPath(FileSinkOperator.java:296)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:254)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$400(FileSinkOperator.java:157)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1458)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:482)
>       ... 17 more
> {code}
>  
> Steps to reproduce 
>  
> {code:java}
> ozone volume create /vol1
> ozone sh bucket create /vol1/obs-bucket --layout OBJECT_STORE
> ozone sh bucket link /vol1/obs-bucket /s3v/obs-bucket-link
> beeline -u "jdbc:
> [hive2://<host>:10000/default;principal=hive/<host>@<REALM>;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;sslTrustStorePassword=<redacted>]
> "
> > create external table mytable1(key string, value int) location '
> [s3a://obs-bucket-link/mytable1]
> ';
> > insert into mytable1 values("cldr",1);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to