[GitHub] [hadoop-ozone] smengcl commented on pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


smengcl commented on pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#issuecomment-651539858


   Thanks @xiaoyuyao  for the review and comment. I have updated the doc. 
Please take another look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl merged pull request #1134: HDDS-3868. Implement getTrashRoot and getTrashRoots in o3fs

2020-06-29 Thread GitBox


smengcl merged pull request #1134:
URL: https://github.com/apache/hadoop-ozone/pull/1134


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


smengcl commented on a change in pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447411835



##
File path: hadoop-hdds/docs/content/design/ofs.md
##
@@ -22,12 +22,140 @@ author: Siyao Meng
 
 # Abstract
 
-  Existing scheme: o3fs://bucket.volume/key/../...
+  Scheme: ofs:///[//path/to/key]
+
+# The Basics
+
+Examples of valid OFS paths:
+
+```
+ofs://om1/
+ofs://om3:9862/
+ofs://omservice/
+ofs://omservice/volume1/
+ofs://omservice/volume1/bucket1/
+ofs://omservice/volume1/bucket1/dir1
+ofs://omservice/volume1/bucket1/dir1/key1
+
+ofs://omservice/tmp/
+ofs://omservice/tmp/key1
+```
+
+Volumes and mount(s) are located at the root level of an OFS Filesystem.
+Buckets are listed naturally under volumes.
+Keys and directories are under each buckets.
+
+Note that for mounts, only temp mount `/tmp` is supported at the moment.
+
+# Differences from existing o3fs
+
+## Creating files
+
+OFS doesn't allow creating keys(files) directly under root or volumes.
+Users will receive an error message when they try to do that:
+
+```
+$ ozone fs -touch /volume1/key1
+touch: Cannot create file under root or volume.
+```
+
+## Simplify fs.defaultFS
+
+With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a specific
+volume and bucket in its path like o3fs did.
+Simply put the OM host or service ID:
+
+```
+
+fs.defaultFS
+ofs://omservice
+
+```
+
+The client would then be able to access every volume and bucket on the cluster
+without specifying the hostname or service ID.
+
+```
+$ ozone fs -mkdir -p /volume1/bucket1
+```
+
+## Volume and bucket management directly from FileSystem shell
+
+Admins can create and delete volumes and buckets easily with Hadoop FS shell.
+Volumes and buckets are treated similar to directories so they will be created
+if they don't exist with `-p`:
+
+```
+$ ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/
+```
+
+Note that the supported volume and bucket name character set rule still 
applies.
+For instance, bucket and volume names don't take underscore(`_`):
+
+```
+$ ozone fs -mkdir -p /volume_1
+mkdir: Bucket or Volume name has an unsupported character : _
+```
+
+# Mounts
+
+In order to be compatible with legacy Hadoop applications that use /tmp/,
+we have a special temp mount located at the root of the FS.
+This feature may be expanded in the feature to support custom mount paths.
+
+Important: To use it, first, an **admin** needs to create the volume tmp
+(the volume name is hardcoded for now) and set its ACL to world ALL access.
+Namely:
+
+```
+$ ozone sh volume create tmp
+$ ozone sh volume setacl tmp -al world::a
+```
+
+These commands only needs to be done **once per cluster**.
+
+Then, **each user** needs to mkdir first to initialize their own temp bucket
+once.
+
+```
+$ ozone fs -mkdir /tmp
+2020-06-04 00:00:00,050 [main] INFO rpc.RpcClient: Creating Bucket: tmp/0238 
...
+```
+
+After that they can write to it just like they would do to a regular
+directory. e.g.:
+
+```
+$ ozone fs -touch /tmp/key1
+```
+
+# Delete to trash

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on pull request #1134: HDDS-3868. Implement getTrashRoot and getTrashRoots in o3fs

2020-06-29 Thread GitBox


smengcl commented on pull request #1134:
URL: https://github.com/apache/hadoop-ozone/pull/1134#issuecomment-651537831


   Thanks @xiaoyuyao  for the review. Will commit shortly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3901) [OFS] Add a dfsadmin command for temp volume provisioning

2020-06-29 Thread Siyao Meng (Jira)
Siyao Meng created HDDS-3901:


 Summary: [OFS] Add a dfsadmin command for temp volume provisioning
 Key: HDDS-3901
 URL: https://issues.apache.org/jira/browse/HDDS-3901
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Filesystem
Reporter: Siyao Meng


We should add a new dfsadmin command to simplify the temp volume provisioning 
for better user experience.

https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447315619



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


smengcl commented on a change in pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447411059



##
File path: hadoop-hdds/docs/content/design/ofs.md
##
@@ -22,12 +22,140 @@ author: Siyao Meng
 
 # Abstract
 
-  Existing scheme: o3fs://bucket.volume/key/../...
+  Scheme: ofs:///[//path/to/key]
+
+# The Basics
+
+Examples of valid OFS paths:
+
+```
+ofs://om1/
+ofs://om3:9862/
+ofs://omservice/
+ofs://omservice/volume1/
+ofs://omservice/volume1/bucket1/
+ofs://omservice/volume1/bucket1/dir1
+ofs://omservice/volume1/bucket1/dir1/key1
+
+ofs://omservice/tmp/
+ofs://omservice/tmp/key1
+```
+
+Volumes and mount(s) are located at the root level of an OFS Filesystem.
+Buckets are listed naturally under volumes.
+Keys and directories are under each buckets.
+
+Note that for mounts, only temp mount `/tmp` is supported at the moment.
+
+# Differences from existing o3fs
+
+## Creating files
+
+OFS doesn't allow creating keys(files) directly under root or volumes.
+Users will receive an error message when they try to do that:
+
+```
+$ ozone fs -touch /volume1/key1
+touch: Cannot create file under root or volume.
+```
+
+## Simplify fs.defaultFS
+
+With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a specific
+volume and bucket in its path like o3fs did.
+Simply put the OM host or service ID:
+
+```
+
+fs.defaultFS
+ofs://omservice
+
+```
+
+The client would then be able to access every volume and bucket on the cluster
+without specifying the hostname or service ID.
+
+```
+$ ozone fs -mkdir -p /volume1/bucket1
+```
+
+## Volume and bucket management directly from FileSystem shell
+
+Admins can create and delete volumes and buckets easily with Hadoop FS shell.
+Volumes and buckets are treated similar to directories so they will be created
+if they don't exist with `-p`:
+
+```
+$ ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/
+```
+
+Note that the supported volume and bucket name character set rule still 
applies.
+For instance, bucket and volume names don't take underscore(`_`):
+
+```
+$ ozone fs -mkdir -p /volume_1
+mkdir: Bucket or Volume name has an unsupported character : _
+```
+
+# Mounts
+
+In order to be compatible with legacy Hadoop applications that use /tmp/,
+we have a special temp mount located at the root of the FS.
+This feature may be expanded in the feature to support custom mount paths.
+
+Important: To use it, first, an **admin** needs to create the volume tmp
+(the volume name is hardcoded for now) and set its ACL to world ALL access.
+Namely:
+
+```
+$ ozone sh volume create tmp

Review comment:
   Good idea. Filed HDDS-3901





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3894) Noisy log at OM when the requested sequence is not written into DB

2020-06-29 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148310#comment-17148310
 ] 

Mukul Kumar Singh commented on HDDS-3894:
-

cc [~avijayan]

> Noisy log at OM when the requested sequence is not written into DB
> --
>
> Key: HDDS-3894
> URL: https://issues.apache.org/jira/browse/HDDS-3894
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rakesh Radhakrishnan
>Assignee: Rakesh Radhakrishnan
>Priority: Minor
>
> Too many logs at OM, which is noisy. It looks like Recon is making this call 
> to OM to update the delta info. OM can ignore this exception and reduce the 
> log priority to DEBUG or so.
> *Ozone-om.log*
> {code:java}
> 2020-06-26 10:02:52,963 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 5dca58f5-6231-4858-9589-4e64b1435aa4@group-C5BA1605619E-SegmentedRaftLogWorker:
>  Rolled log segment from 
> /data/3/jun27_new1/hadoop-ozone/om/ratis2/bf265839-605b-3f16-9796-c5ba1605619e/current/log_inprogress_106536
>  to 
> /data/3/jun27_new1/hadoop-ozone/om/ratis2/bf265839-605b-3f16-9796-c5ba1605619e/current/log_106536-106603
> 2020-06-26 10:02:52,982 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 5dca58f5-6231-4858-9589-4e64b1435aa4@group-C5BA1605619E-SegmentedRaftLogWorker:
>  created new log segment 
> /data/3/jun27_new1/hadoop-ozone/om/ratis2/bf265839-605b-3f16-9796-c5ba1605619e/current/log_inprogress_106604
> 2020-06-26 10:09:54,828 ERROR org.apache.hadoop.hdds.utils.db.RDBStore: 
> Unable to get delta updates since sequenceNumber 11268864
> org.rocksdb.RocksDBException: Requested sequence not yet written in the db
> at org.rocksdb.RocksDB.getUpdatesSince(Native Method)
> at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3588)
> at 
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:339)
> at 
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3422)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:257)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:194)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:171)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:109)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:99)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> 2020-06-26 10:11:08,442 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 5dca58f5-6231-4858-9589-4e64b1435aa4@group-C5BA1605619E-SegmentedRaftLogWorker:
>  Rolling segment log-106604_106637 to index:106637
> 2020-06-26 10:11:08,443 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 5dca58f5-6231-4858-9589-4e64b1435aa4@group-C5BA1605619E-SegmentedRaftLogWorker:
>  Rolling segment log-106638_106654 to index:106654
> 2020-06-26 10:11:08,444 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 5dca58f5-6231-4858-9589-4e64b1435aa4@group-C5BA1605619E-SegmentedRaftLogWorker:
>  Rolling segment log-106655_106671 to index:106671
> {code}
> *Recon.log*
> {code:java}
> 2020-06-26 10:08:55,883 INFO 
> org.apache.hadoop.ozone.recon.fsck.MissingContainerTask: Missing Container 
> task Thread took 88 milliseconds for processing 0 containers.
> 2020-06-26 10:09:54,824 INFO 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl: 
> Syncing data from Ozone Manager.
> 2020-06-26 

[jira] [Assigned] (HDDS-3897) OM startup failing to replay ratis log entries when configuring a different segment size

2020-06-29 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-3897:
---

Assignee: Tsz-wo Sze

> OM startup failing to replay ratis log entries when configuring a different 
> segment size
> 
>
> Key: HDDS-3897
> URL: https://issues.apache.org/jira/browse/HDDS-3897
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Rakesh Radhakrishnan
>Assignee: Tsz-wo Sze
>Priority: Major
>
> OM is not able to read log entries which was created with a different log 
> segment size.
> More details and the steps to re-produce the issue:-
> *1)* Configured OM with a {{ozone.om.ratis.segment.size=16KB}} and 
> {{ozone.om.ratis.segment.preallocated.size=16KB}}. Then perform user ops like 
> createFile, deleteFile.
>  *2)* Stop OM and reconfigure the segment size to 16MB like, 
> {{ozone.om.ratis.segment.size=16MB}} and 
> {{ozone.om.ratis.segment.preallocated.size=16MB}}.
> *3)* Now OM startup is failing to replay all these transactions and throwing 
> following exception.
>  
> *Ozone-om.log*
> {code:java}
> 2020-06-28 22:54:31,468 INFO org.eclipse.jetty.server.Server: 
> jetty-9.4.26.v20200117; built: 2020-01-17T12:35:33.676Z; git: 
> 7b38981d25d14afb4a12ff1f2596756144edf695; jvm 1.8.0_232-b09
> 2020-06-28 22:54:31,480 ERROR 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest: Key delete failed. 
> Volume:vol2, Bucket:bucket2, 
> Keyfsperf-Jun-27-2020/dir0/dir2/dir2/ve1320.halxg.cloudera.com8296f35232-2ed6-4d3b-8392-db848f707dda.
>  Exception:{}
> KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Key not found
>   at 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:135)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:240)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:418)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$applyTransaction$1(OzoneManagerStateMachine.java:236)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-06-28 22:54:31,481 ERROR 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest: Key delete failed. 
> Volume:vol2, Bucket:bucket2, 
> Keyfsperf-Jun-27-2020/dir0/dir3/dir0/ve1320.halxg.cloudera.com96bac30bc0-332c-442e-9691-244cb96b7c90.
>  Exception:{}
> KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Key not found
>   at 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:135)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:240)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:418)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$applyTransaction$1(OzoneManagerStateMachine.java:236)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

2020-06-29 Thread GitBox


ChenSammi commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-651514206


   @bharatviswa504  we have HDDS-3217 deployed. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447383235



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }

Review comment:
   Thanks, I missed that.  Will update the patch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#issuecomment-651495214


   > Question: Now when list at the root "/" with ofs filesystem, for link 
buckets they will be displayed as a new entry of only resolved buckets will be 
shown. What should be the behavior here.
   
   I'm afraid I don't understand the question.  Can you please give an example?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447378831



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java
##
@@ -149,6 +150,12 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager,
 List missingParentInfos;
 
 try {
+  ResolvedBucket bucket = ozoneManager.resolveBucketLink(keyArgs);
+  keyArgs = bucket.update(keyArgs);

Review comment:
   Audit log includes both.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447376939



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2197,20 +2234,25 @@ public void renameKey(OmKeyArgs args, String toKeyName) 
throws IOException {
*/
   @Override
   public void deleteKey(OmKeyArgs args) throws IOException {
+Map auditMap = args.toAuditMap();

Review comment:
   I would have preferred removing old write code path before implementing 
links to avoid duplicate work.  However, now that these changes are in place, I 
prefer keeping them, and removing old write code path completely in a separate 
step.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447374977



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Read permission on the link is required to follow it.  Do you propose to 
completely skip ACL on link and allow anyone to use it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


adoroszlai commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447373492



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  if (bek != null && hasSourceBucket) {
+throw new OMException("Encryption cannot be set for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  BucketEncryptionKeyInfo.Builder bekb =
+  createBucketEncryptionKeyInfoBuilder(bek);
+
+  OmBucketInfo.Builder omBucketInfoBuilder = bucketInfo.toBuilder()
+  .setCreationTime(Time.now());
+
+  List defaultAclList =

Review comment:
   Source bucket can be deleted any time after the link is created.  We 
would have to perform a reverse lookup to check if it leaves any dangling link. 
 Since this is not done, checking upon creation would be inconsistent.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1147: HDDS-3892. Datanode initialization is too slow when there are thousan…

2020-06-29 Thread GitBox


bharatviswa504 commented on pull request #1147:
URL: https://github.com/apache/hadoop-ozone/pull/1147#issuecomment-651480464


   Hi @ChenSammi 
   Thanks for the PR.
   I too have the same comment as @sodonnel. I think we need to close the DB 
instance if we get an instance from the cache.
   
   And also do you have this version on your deployed cluster
   
https://github.com/apache/hadoop-ozone/commit/a2ab8d6e35f60af9762a191265942071755329be
   
   This saves in iterating Db to compute block metadata, and get all the 
required metadata from db.get(). We have observed this issue during a billion 
object test.
   For older containers created before HDDS-3217, we still iterate and set 
block metadata. (But this fixed a bug, where we used to iterate thrice for open 
containers)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#issuecomment-651426116


   Question: Now when list at the root "/" with ofs filesystem, for link 
buckets they will be displayed as a new entry of only resolved buckets will be 
shown. What should be the behavior here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447318712



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  if (bek != null && hasSourceBucket) {
+throw new OMException("Encryption cannot be set for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  BucketEncryptionKeyInfo.Builder bekb =
+  createBucketEncryptionKeyInfoBuilder(bek);
+
+  OmBucketInfo.Builder omBucketInfoBuilder = bucketInfo.toBuilder()
+  .setCreationTime(Time.now());
+
+  List defaultAclList =

Review comment:
   Reading more understood that if we create link for /vol1/buck1 -> 
/vol2/buck2 (source)
   
   We create in DB /vol1/buck1 and they have sourceVolume as vol2 and 
sourceBucket as buck2.
   
   Now, when someone calls lookupKey on unresolved bucket, during actual 
request of lookupKey, this will result in Bucket_NOT_FOUND.  Do you think, we 
need to make sure that source volume/source bucket exists during link creation 
to avoid such scenarios? 
   
   My reason was this looks strange, the user thinks he created a link bucket 
with some source volume/source bucket that has passed without any issues, but 
now when creating key it is saying bucket does not exist.
   
   Following ln -s   looks confusing in our scenario.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447318712



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  if (bek != null && hasSourceBucket) {
+throw new OMException("Encryption cannot be set for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  BucketEncryptionKeyInfo.Builder bekb =
+  createBucketEncryptionKeyInfoBuilder(bek);
+
+  OmBucketInfo.Builder omBucketInfoBuilder = bucketInfo.toBuilder()
+  .setCreationTime(Time.now());
+
+  List defaultAclList =

Review comment:
   Reading more understood that if we create link for /vol1/buck1 -> 
/vol2/buck2 (source)
   
   We create in DB /vol1/buck1 and they have sourceVolume as vol2 and 
sourceBucket as buck2.
   
   Now, when someone calls lookupKey on unresolved bucket, during actual 
request of lookupKey, this will result in Bucket_NOT_FOUND.  Do you think, we 
need to make sure that source volume/source bucket exists during link creation 
to avoid such scenarios? 
   
   My reason was this looks strange, the user thinks he created a link bucket 
with some source volume/source bucket that has passed without any issues, but 
now when creating key it is saying bucket does not exist.
   
   Following ln -s <> <> looks confusing in our scenario.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447316328



##
File path: hadoop-hdds/docs/content/design/ofs.md
##
@@ -22,12 +22,140 @@ author: Siyao Meng
 
 # Abstract
 
-  Existing scheme: o3fs://bucket.volume/key/../...
+  Scheme: ofs:///[//path/to/key]
+
+# The Basics
+
+Examples of valid OFS paths:
+
+```
+ofs://om1/
+ofs://om3:9862/
+ofs://omservice/
+ofs://omservice/volume1/
+ofs://omservice/volume1/bucket1/
+ofs://omservice/volume1/bucket1/dir1
+ofs://omservice/volume1/bucket1/dir1/key1
+
+ofs://omservice/tmp/
+ofs://omservice/tmp/key1
+```
+
+Volumes and mount(s) are located at the root level of an OFS Filesystem.
+Buckets are listed naturally under volumes.
+Keys and directories are under each buckets.
+
+Note that for mounts, only temp mount `/tmp` is supported at the moment.
+
+# Differences from existing o3fs
+
+## Creating files
+
+OFS doesn't allow creating keys(files) directly under root or volumes.
+Users will receive an error message when they try to do that:
+
+```
+$ ozone fs -touch /volume1/key1
+touch: Cannot create file under root or volume.
+```
+
+## Simplify fs.defaultFS
+
+With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a specific
+volume and bucket in its path like o3fs did.
+Simply put the OM host or service ID:
+
+```
+
+fs.defaultFS
+ofs://omservice
+
+```
+
+The client would then be able to access every volume and bucket on the cluster
+without specifying the hostname or service ID.
+
+```
+$ ozone fs -mkdir -p /volume1/bucket1
+```
+
+## Volume and bucket management directly from FileSystem shell
+
+Admins can create and delete volumes and buckets easily with Hadoop FS shell.
+Volumes and buckets are treated similar to directories so they will be created
+if they don't exist with `-p`:
+
+```
+$ ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/
+```
+
+Note that the supported volume and bucket name character set rule still 
applies.
+For instance, bucket and volume names don't take underscore(`_`):
+
+```
+$ ozone fs -mkdir -p /volume_1
+mkdir: Bucket or Volume name has an unsupported character : _
+```
+
+# Mounts
+
+In order to be compatible with legacy Hadoop applications that use /tmp/,
+we have a special temp mount located at the root of the FS.
+This feature may be expanded in the feature to support custom mount paths.
+
+Important: To use it, first, an **admin** needs to create the volume tmp
+(the volume name is hardcoded for now) and set its ACL to world ALL access.
+Namely:
+
+```
+$ ozone sh volume create tmp
+$ ozone sh volume setacl tmp -al world::a
+```
+
+These commands only needs to be done **once per cluster**.
+
+Then, **each user** needs to mkdir first to initialize their own temp bucket
+once.
+
+```
+$ ozone fs -mkdir /tmp
+2020-06-04 00:00:00,050 [main] INFO rpc.RpcClient: Creating Bucket: tmp/0238 
...
+```
+
+After that they can write to it just like they would do to a regular
+directory. e.g.:
+
+```
+$ ozone fs -touch /tmp/key1
+```
+
+# Delete to trash
+
+When keys are deleted to trash, they are moved to a trash directory under

Review comment:
   deleted to trash=> deleted with trash enabled





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447316226



##
File path: hadoop-hdds/docs/content/design/ofs.md
##
@@ -22,12 +22,140 @@ author: Siyao Meng
 
 # Abstract
 
-  Existing scheme: o3fs://bucket.volume/key/../...
+  Scheme: ofs:///[//path/to/key]
+
+# The Basics
+
+Examples of valid OFS paths:
+
+```
+ofs://om1/
+ofs://om3:9862/
+ofs://omservice/
+ofs://omservice/volume1/
+ofs://omservice/volume1/bucket1/
+ofs://omservice/volume1/bucket1/dir1
+ofs://omservice/volume1/bucket1/dir1/key1
+
+ofs://omservice/tmp/
+ofs://omservice/tmp/key1
+```
+
+Volumes and mount(s) are located at the root level of an OFS Filesystem.
+Buckets are listed naturally under volumes.
+Keys and directories are under each buckets.
+
+Note that for mounts, only temp mount `/tmp` is supported at the moment.
+
+# Differences from existing o3fs
+
+## Creating files
+
+OFS doesn't allow creating keys(files) directly under root or volumes.
+Users will receive an error message when they try to do that:
+
+```
+$ ozone fs -touch /volume1/key1
+touch: Cannot create file under root or volume.
+```
+
+## Simplify fs.defaultFS
+
+With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a specific
+volume and bucket in its path like o3fs did.
+Simply put the OM host or service ID:
+
+```
+
+fs.defaultFS
+ofs://omservice
+
+```
+
+The client would then be able to access every volume and bucket on the cluster
+without specifying the hostname or service ID.
+
+```
+$ ozone fs -mkdir -p /volume1/bucket1
+```
+
+## Volume and bucket management directly from FileSystem shell
+
+Admins can create and delete volumes and buckets easily with Hadoop FS shell.
+Volumes and buckets are treated similar to directories so they will be created
+if they don't exist with `-p`:
+
+```
+$ ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/
+```
+
+Note that the supported volume and bucket name character set rule still 
applies.
+For instance, bucket and volume names don't take underscore(`_`):
+
+```
+$ ozone fs -mkdir -p /volume_1
+mkdir: Bucket or Volume name has an unsupported character : _
+```
+
+# Mounts
+
+In order to be compatible with legacy Hadoop applications that use /tmp/,
+we have a special temp mount located at the root of the FS.
+This feature may be expanded in the feature to support custom mount paths.
+
+Important: To use it, first, an **admin** needs to create the volume tmp
+(the volume name is hardcoded for now) and set its ACL to world ALL access.
+Namely:
+
+```
+$ ozone sh volume create tmp
+$ ozone sh volume setacl tmp -al world::a
+```
+
+These commands only needs to be done **once per cluster**.
+
+Then, **each user** needs to mkdir first to initialize their own temp bucket
+once.
+
+```
+$ ozone fs -mkdir /tmp
+2020-06-04 00:00:00,050 [main] INFO rpc.RpcClient: Creating Bucket: tmp/0238 
...
+```
+
+After that they can write to it just like they would do to a regular
+directory. e.g.:
+
+```
+$ ozone fs -touch /tmp/key1
+```
+
+# Delete to trash

Review comment:
   NIT: can we rename this to "Delete with trash enabled"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1144: HDDS-3803. [OFS] Add User Guide

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1144:
URL: https://github.com/apache/hadoop-ozone/pull/1144#discussion_r447315619



##
File path: hadoop-hdds/docs/content/design/ofs.md
##
@@ -22,12 +22,140 @@ author: Siyao Meng
 
 # Abstract
 
-  Existing scheme: o3fs://bucket.volume/key/../...
+  Scheme: ofs:///[//path/to/key]
+
+# The Basics
+
+Examples of valid OFS paths:
+
+```
+ofs://om1/
+ofs://om3:9862/
+ofs://omservice/
+ofs://omservice/volume1/
+ofs://omservice/volume1/bucket1/
+ofs://omservice/volume1/bucket1/dir1
+ofs://omservice/volume1/bucket1/dir1/key1
+
+ofs://omservice/tmp/
+ofs://omservice/tmp/key1
+```
+
+Volumes and mount(s) are located at the root level of an OFS Filesystem.
+Buckets are listed naturally under volumes.
+Keys and directories are under each buckets.
+
+Note that for mounts, only temp mount `/tmp` is supported at the moment.
+
+# Differences from existing o3fs
+
+## Creating files
+
+OFS doesn't allow creating keys(files) directly under root or volumes.
+Users will receive an error message when they try to do that:
+
+```
+$ ozone fs -touch /volume1/key1
+touch: Cannot create file under root or volume.
+```
+
+## Simplify fs.defaultFS
+
+With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a specific
+volume and bucket in its path like o3fs did.
+Simply put the OM host or service ID:
+
+```
+
+fs.defaultFS
+ofs://omservice
+
+```
+
+The client would then be able to access every volume and bucket on the cluster
+without specifying the hostname or service ID.
+
+```
+$ ozone fs -mkdir -p /volume1/bucket1
+```
+
+## Volume and bucket management directly from FileSystem shell
+
+Admins can create and delete volumes and buckets easily with Hadoop FS shell.
+Volumes and buckets are treated similar to directories so they will be created
+if they don't exist with `-p`:
+
+```
+$ ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/
+```
+
+Note that the supported volume and bucket name character set rule still 
applies.
+For instance, bucket and volume names don't take underscore(`_`):
+
+```
+$ ozone fs -mkdir -p /volume_1
+mkdir: Bucket or Volume name has an unsupported character : _
+```
+
+# Mounts
+
+In order to be compatible with legacy Hadoop applications that use /tmp/,
+we have a special temp mount located at the root of the FS.
+This feature may be expanded in the feature to support custom mount paths.
+
+Important: To use it, first, an **admin** needs to create the volume tmp
+(the volume name is hardcoded for now) and set its ACL to world ALL access.
+Namely:
+
+```
+$ ozone sh volume create tmp

Review comment:
   I think we could add a separate admin command called *provisionTmp* to 
hide these details. This can be a separate JIRA. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] maobaolong commented on pull request #1124: HDDS-3865. Export the SCM client IPC port in docker-compose

2020-06-29 Thread GitBox


maobaolong commented on pull request #1124:
URL: https://github.com/apache/hadoop-ozone/pull/1124#issuecomment-651419464


   @elek Thanks you for your review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447313113



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java
##
@@ -149,6 +150,12 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager,
 List missingParentInfos;
 
 try {
+  ResolvedBucket bucket = ozoneManager.resolveBucketLink(keyArgs);
+  keyArgs = bucket.update(keyArgs);

Review comment:
   And also do you think it would be useful to log the actual bucket/volume 
request which it resolved also?
   This might help during debug purposes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447313113



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java
##
@@ -149,6 +150,12 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager,
 List missingParentInfos;
 
 try {
+  ResolvedBucket bucket = ozoneManager.resolveBucketLink(keyArgs);
+  keyArgs = bucket.update(keyArgs);

Review comment:
   And also do you think it would be useful to log actual bucket/volume 
request, and then resolvedBucket/resolvedVolume instead of just updating 
volumeName/bucketName directly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447312649



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMAllocateBlockRequest.java
##
@@ -174,6 +175,12 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager,
 Result result = null;
 
 try {
+  ResolvedBucket bucket = ozoneManager.resolveBucketLink(keyArgs);
+  keyArgs = bucket.update(keyArgs);

Review comment:
   Looks like this logic is needed for all KeyRequests. Can we move this to 
a common method, instead of duplicating it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447311657



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Okay, I see is in resolveBucketLink we check Acls for read on provided 
bucket, and in actual request check acls on sourceBucket/sourceVolume which it 
is resolved to. Let me know if i am missing something here. Why do we need this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447311657



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Okay, I see is in resolveBucketLink we check Acls for read on provided 
bucket, and in actual request check acls on sourceBucket/sourceVolume which it 
is resolved to. Why do we need this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447302394



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2197,20 +2234,25 @@ public void renameKey(OmKeyArgs args, String toKeyName) 
throws IOException {
*/
   @Override
   public void deleteKey(OmKeyArgs args) throws IOException {
+Map auditMap = args.toAuditMap();

Review comment:
   The same comment for all write requests:
   Can we remove the changes from old write code path, which is not required? 
It is unnecessary now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447301716



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Now for each operation if it is link bucket where if it has 
sourceVolume/SourceBucket we do checkAcls twice. One with READ permission on 
sourceBucket/SourceVolume in resoleBucketLink and one in actual request with 
required ACL type. It is not clear why do we need the first check acl. 
   
   If it requires, do you think we need to have an API where we can check all 
required ACLS with a single checkAcl call?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447301716



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Now for each operation if it is link bucket where if it has 
sourceVolume/SourceBucket we do checkAcls twice. One with READ permission on 
sourceBucket/SourceVolume in resoleBucketLink and one in actual request with 
required ACL type. It is not clear why do we need that. 
   
   If it requires, do you think we need to have an API where we can check all 
required ACLS with a single checkAcl call?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447301716



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2145,37 +2168,51 @@ public OmKeyLocationInfo allocateBlock(OmKeyArgs args, 
long clientID,
*/
   @Override
   public OmKeyInfo lookupKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);

Review comment:
   Now for each operation if it is link bucket where if it has 
sourceVolume/SourceBucket we do checkAcls twice. One with READ permission on 
sourceBucket/SourceVolume. It is not clear why do we need that. 
   
   If it requires, do you think we need to have an API where we can check all 
required ACLS with a single checkAcl call?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1134: HDDS-3868. Implement getTrashRoot and getTrashRoots in o3fs

2020-06-29 Thread GitBox


smengcl commented on a change in pull request #1134:
URL: https://github.com/apache/hadoop-ozone/pull/1134#discussion_r447295269



##
File path: 
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
##
@@ -606,6 +607,52 @@ public String getUsername() {
 return userName;
   }
 
+  /**
+   * Get the root directory of Trash for a path.
+   * Returns /.Trash/
+   * Caller appends either Current or checkpoint timestamp for trash 
destination
+   * @param path the trash root of the path to be determined.
+   * @return trash root
+   */
+  @Override
+  public Path getTrashRoot(Path path) {
+final Path pathToTrash = new Path(OZONE_URI_DELIMITER, TRASH_PREFIX);
+return new Path(pathToTrash, getUsername());
+  }
+
+  /**
+   * Get all the trash roots for current user or all users.
+   *
+   * @param allUsers return trash roots for all users if true.
+   * @return all the trash root directories.
+   * Returns .Trash of users if {@code /.Trash/$USER} exists.
+   */
+  @Override
+  public Collection getTrashRoots(boolean allUsers) {
+Path trashRoot = new Path(OZONE_URI_DELIMITER, TRASH_PREFIX);
+List ret = new ArrayList<>();
+try {
+  if (!allUsers) {
+Path userTrash = new Path(trashRoot, userName);
+if (exists(userTrash)) {

Review comment:
   Done.
   
   Interestingly Hadoop's [default 
implementation](https://github.com/apache/hadoop/blob/263c76b678275dfff867415c71ba9dc00a9235ef/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3262-L3266)
 doesn't check directories. But it would be nice to have.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447292219



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2020,60 +2026,72 @@ public OmBucketInfo getBucketInfo(String volume, String 
bucket)
*/
   @Override
   public OpenKeySession openKey(OmKeyArgs args) throws IOException {
+ResolvedBucket bucket = resolveBucketLink(args);
+
 if (isAclEnabled) {
   try {
 checkAcls(ResourceType.KEY, StoreType.OZONE, ACLType.WRITE,
-args.getVolumeName(), args.getBucketName(), args.getKeyName());
+bucket.realVolume(), bucket.realBucket(), args.getKeyName());

Review comment:
   Same, for all, write requests old code is not used anymore.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447291849



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  if (bek != null && hasSourceBucket) {
+throw new OMException("Encryption cannot be set for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }
+
+  BucketEncryptionKeyInfo.Builder bekb =
+  createBucketEncryptionKeyInfoBuilder(bek);
+
+  OmBucketInfo.Builder omBucketInfoBuilder = bucketInfo.toBuilder()
+  .setCreationTime(Time.now());
+
+  List defaultAclList =

Review comment:
   Question: No, where we checked source volume/bucket exists or not.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1129: HDDS-3741. Reload old OM state if Install Snapshot from Leader fails

2020-06-29 Thread GitBox


arp7 commented on a change in pull request #1129:
URL: https://github.com/apache/hadoop-ozone/pull/1129#discussion_r447284712



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3040,45 +3024,61 @@ public TermIndex installSnapshot(String leaderId) {
 } catch (Exception e) {
   LOG.error("Failed to stop/ pause the services. Cannot proceed with " +
   "installing the new checkpoint.", e);
+
+  // During stopServices, if KeyManager was stopped successfully and
+  // OMMetadataManager stop failed, we should restart the KeyManager.
+  keyManager.start(configuration);
+
   return null;
 }
 
-//TODO: un-pause SM if any failures and retry?
+File dbBackup;
+TermIndex termIndex = omRatisServer.getLastAppliedTermIndex();
+long currentTerm = termIndex.getTerm();
+long lastAppliedIndex = termIndex.getIndex();
+boolean loadSuccess = false;
 
-long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex();
+try {
+  // Check if current applied log index is smaller than the downloaded
+  // checkpoint transaction index. If yes, proceed by stopping the ratis
+  // server so that the OM state can be re-initialized. If no then do not
+  // proceed with installSnapshot.
+  boolean canProceed = OzoneManagerRatisUtils.verifyTransactionInfo(
+  omTransactionInfo, lastAppliedIndex, leaderId, newDBLocation);
+  if (!canProceed) {
+return null;
+  }
 
-boolean canProceed =
-OzoneManagerRatisUtils.verifyTransactionInfo(omTransactionInfo,
-lastAppliedIndex, leaderId, newDBlocation);
+  try {
+dbBackup = replaceOMDBWithCheckpoint(lastAppliedIndex, oldDBLocation,

Review comment:
   Also the marker file should be created before starting the move 
operations, and deleted on success.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1129: HDDS-3741. Reload old OM state if Install Snapshot from Leader fails

2020-06-29 Thread GitBox


arp7 commented on a change in pull request #1129:
URL: https://github.com/apache/hadoop-ozone/pull/1129#discussion_r447280689



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3040,45 +3024,61 @@ public TermIndex installSnapshot(String leaderId) {
 } catch (Exception e) {
   LOG.error("Failed to stop/ pause the services. Cannot proceed with " +
   "installing the new checkpoint.", e);
+
+  // During stopServices, if KeyManager was stopped successfully and
+  // OMMetadataManager stop failed, we should restart the KeyManager.
+  keyManager.start(configuration);
+
   return null;
 }
 
-//TODO: un-pause SM if any failures and retry?
+File dbBackup;
+TermIndex termIndex = omRatisServer.getLastAppliedTermIndex();
+long currentTerm = termIndex.getTerm();
+long lastAppliedIndex = termIndex.getIndex();
+boolean loadSuccess = false;
 
-long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex();
+try {
+  // Check if current applied log index is smaller than the downloaded
+  // checkpoint transaction index. If yes, proceed by stopping the ratis
+  // server so that the OM state can be re-initialized. If no then do not
+  // proceed with installSnapshot.
+  boolean canProceed = OzoneManagerRatisUtils.verifyTransactionInfo(
+  omTransactionInfo, lastAppliedIndex, leaderId, newDBLocation);
+  if (!canProceed) {
+return null;
+  }
 
-boolean canProceed =
-OzoneManagerRatisUtils.verifyTransactionInfo(omTransactionInfo,
-lastAppliedIndex, leaderId, newDBlocation);
+  try {
+dbBackup = replaceOMDBWithCheckpoint(lastAppliedIndex, oldDBLocation,
+newDBLocation);
+  } catch (Exception e) {
+LOG.error("OM DB checkpoint replacement with new downloaded " +
+"checkpoint failed.", e);
+return null;
+  }
 
-// If downloaded DB has transaction info less than current one, return.
-if (!canProceed) {
-  return null;
+  loadSuccess = true;
+} finally {
+  if (!loadSuccess) {

Review comment:
   Remove this unpause, we already do a reload and unpause below.

##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3040,45 +3024,61 @@ public TermIndex installSnapshot(String leaderId) {
 } catch (Exception e) {
   LOG.error("Failed to stop/ pause the services. Cannot proceed with " +
   "installing the new checkpoint.", e);
+
+  // During stopServices, if KeyManager was stopped successfully and
+  // OMMetadataManager stop failed, we should restart the KeyManager.
+  keyManager.start(configuration);
+
   return null;
 }
 
-//TODO: un-pause SM if any failures and retry?
+File dbBackup;
+TermIndex termIndex = omRatisServer.getLastAppliedTermIndex();
+long currentTerm = termIndex.getTerm();
+long lastAppliedIndex = termIndex.getIndex();
+boolean loadSuccess = false;
 
-long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex();
+try {
+  // Check if current applied log index is smaller than the downloaded
+  // checkpoint transaction index. If yes, proceed by stopping the ratis
+  // server so that the OM state can be re-initialized. If no then do not
+  // proceed with installSnapshot.
+  boolean canProceed = OzoneManagerRatisUtils.verifyTransactionInfo(
+  omTransactionInfo, lastAppliedIndex, leaderId, newDBLocation);
+  if (!canProceed) {
+return null;
+  }
 
-boolean canProceed =
-OzoneManagerRatisUtils.verifyTransactionInfo(omTransactionInfo,
-lastAppliedIndex, leaderId, newDBlocation);
+  try {
+dbBackup = replaceOMDBWithCheckpoint(lastAppliedIndex, oldDBLocation,
+newDBLocation);
+  } catch (Exception e) {
+LOG.error("OM DB checkpoint replacement with new downloaded " +
+"checkpoint failed.", e);
+return null;
+  }
 
-// If downloaded DB has transaction info less than current one, return.
-if (!canProceed) {
-  return null;
+  loadSuccess = true;

Review comment:
   Don't need `loadSuccess` any more.

##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3040,45 +3024,61 @@ public TermIndex installSnapshot(String leaderId) {
 } catch (Exception e) {
   LOG.error("Failed to stop/ pause the services. Cannot proceed with " +
   "installing the new checkpoint.", e);
+
+  // During stopServices, if KeyManager was stopped successfully and
+  // OMMetadataManager stop failed, we should restart the KeyManager.
+  keyManager.start(configuration);
+
   return null;
 }
 
-//TODO: un-pause SM if any failures and retry?
+File dbBackup;

[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1104: HDDS-3612. Allow mounting bucket under other volume

2020-06-29 Thread GitBox


bharatviswa504 commented on a change in pull request #1104:
URL: https://github.com/apache/hadoop-ozone/pull/1104#discussion_r447279299



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
##
@@ -136,54 +137,49 @@ public void createBucket(OmBucketInfo bucketInfo) throws 
IOException {
 throw new OMException("Bucket already exist",
 OMException.ResultCodes.BUCKET_ALREADY_EXISTS);
   }
+
   BucketEncryptionKeyInfo bek = bucketInfo.getEncryptionKeyInfo();
-  BucketEncryptionKeyInfo.Builder bekb = null;
-  if (bek != null) {
-if (kmsProvider == null) {
-  throw new OMException("Invalid KMS provider, check configuration " +
-  CommonConfigurationKeys.HADOOP_SECURITY_KEY_PROVIDER_PATH,
-  OMException.ResultCodes.INVALID_KMS_PROVIDER);
-}
-if (bek.getKeyName() == null) {
-  throw new OMException("Bucket encryption key needed.", OMException
-  .ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// Talk to KMS to retrieve the bucket encryption key info.
-KeyProvider.Metadata metadata = getKMSProvider().getMetadata(
-bek.getKeyName());
-if (metadata == null) {
-  throw new OMException("Bucket encryption key " + bek.getKeyName()
-  + " doesn't exist.",
-  OMException.ResultCodes.BUCKET_ENCRYPTION_KEY_NOT_FOUND);
-}
-// If the provider supports pool for EDEKs, this will fill in the pool
-kmsProvider.warmUpEncryptedKeys(bek.getKeyName());
-bekb = new BucketEncryptionKeyInfo.Builder()
-.setKeyName(bek.getKeyName())
-.setVersion(CryptoProtocolVersion.ENCRYPTION_ZONES)
-.setSuite(CipherSuite.convert(metadata.getCipher()));
-  }
-  List acls = new ArrayList<>();
-  acls.addAll(bucketInfo.getAcls());
-  volumeArgs.getAclMap().getDefaultAclList().forEach(
-  a -> acls.add(OzoneAcl.fromProtobufWithAccessType(a)));
-
-  OmBucketInfo.Builder omBucketInfoBuilder = OmBucketInfo.newBuilder()
-  .setVolumeName(bucketInfo.getVolumeName())
-  .setBucketName(bucketInfo.getBucketName())
-  .setAcls(acls)
-  .setStorageType(bucketInfo.getStorageType())
-  .setIsVersionEnabled(bucketInfo.getIsVersionEnabled())
-  .setCreationTime(Time.now())
-  .addAllMetadata(bucketInfo.getMetadata());
+
+  boolean hasSourceVolume = bucketInfo.getSourceVolume() != null;
+  boolean hasSourceBucket = bucketInfo.getSourceBucket() != null;
+
+  if (hasSourceBucket != hasSourceVolume) {
+throw new OMException("Both source volume and source bucket are " +
+"required for bucket links",
+OMException.ResultCodes.INVALID_REQUEST);
+  }

Review comment:
   Old write code is not being used anymore.
   This logic needs to be added to new Class OMBucketCreateRequest.java





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] umamaheswararao commented on a change in pull request #1115: HDDS-3632. starter scripts can't manage Ozone and HDFS datandodes on the same machine

2020-06-29 Thread GitBox


umamaheswararao commented on a change in pull request #1115:
URL: https://github.com/apache/hadoop-ozone/pull/1115#discussion_r447274784



##
File path: hadoop-ozone/dist/src/shell/hdds/hadoop-functions.sh
##
@@ -2702,11 +2702,11 @@ function hadoop_generic_java_subcmd_handler
 
priv_outfile="${HADOOP_LOG_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
 
priv_errfile="${HADOOP_LOG_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.err"
 
priv_pidfile="${HADOOP_PID_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"
-
daemon_outfile="${HADOOP_LOG_DIR}/hadoop-${HADOOP_SECURE_USER}-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
-
daemon_pidfile="${HADOOP_PID_DIR}/hadoop-${HADOOP_SECURE_USER}-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"
+
daemon_outfile="${HADOOP_LOG_DIR}/ozone-${HADOOP_SECURE_USER}-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
+
daemon_pidfile="${HADOOP_PID_DIR}/ozone-${HADOOP_SECURE_USER}-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"
   else
-
daemon_outfile="${HADOOP_LOG_DIR}/hadoop-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
-
daemon_pidfile="${HADOOP_PID_DIR}/hadoop-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"
+
daemon_outfile="${HADOOP_LOG_DIR}/ozone-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
+
daemon_pidfile="${HADOOP_PID_DIR}/ozone-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"
   fi
 
   # are we actually in daemon mode?

Review comment:
   in below mode also: do we have any scenario where user sets their log 
DIR to a common folder for all services?
   If yes, below case also may need update?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1129: HDDS-3741. Reload old OM state if Install Snapshot from Leader fails

2020-06-29 Thread GitBox


arp7 commented on a change in pull request #1129:
URL: https://github.com/apache/hadoop-ozone/pull/1129#discussion_r447272528



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -2999,32 +3000,15 @@ public TermIndex installSnapshot(String leaderId) {
 }
 
 DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
-Path newDBlocation = omDBcheckpoint.getCheckpointLocation();
+Path newDBLocation = omDBcheckpoint.getCheckpointLocation();
 
 LOG.info("Downloaded checkpoint from Leader {}, in to the location {}",
-leaderId, newDBlocation);
+leaderId, newDBLocation);
 
-// Check if current ratis log index is smaller than the downloaded
-// checkpoint transaction index. If yes, proceed by stopping the ratis
-// server so that the OM state can be re-initialized. If no, then do not
-// proceed with installSnapshot.
+OMTransactionInfo omTransactionInfo = getTransactionInfoFromDB(
+newDBLocation);
 
-OMTransactionInfo omTransactionInfo = null;
-
-Path dbDir = newDBlocation.getParent();
-if (dbDir == null) {
-  LOG.error("Incorrect DB location path {} received from checkpoint.",
-  newDBlocation);
-  return null;
-}
-
-try {
-  omTransactionInfo =
-  OzoneManagerRatisUtils.getTransactionInfoFromDownloadedSnapshot(
-  configuration, dbDir);
-} catch (Exception ex) {
-  LOG.error("Failed during opening downloaded snapshot from " +
-  "{} to obtain transaction index", newDBlocation, ex);
+if (omTransactionInfo == null) {

Review comment:
   Let's log an error here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1145: HDDS-3895. Implement container related operations in ContainerManagerImpl

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1145:
URL: https://github.com/apache/hadoop-ozone/pull/1145#discussion_r447268958



##
File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##
@@ -279,6 +257,24 @@ public void notifyContainerReportProcessing(final boolean 
isFullReport,
 throw new UnsupportedOperationException("Not yet implemented!");
   }
 
+  @Override
+  public void deleteContainer(final ContainerID containerID)
+  throws IOException {
+final HddsProtos.ContainerID id = containerID.getProtobuf();

Review comment:
   Need to hold the writeLock?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1145: HDDS-3895. Implement container related operations in ContainerManagerImpl

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1145:
URL: https://github.com/apache/hadoop-ozone/pull/1145#discussion_r447269086



##
File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##
@@ -279,6 +257,24 @@ public void notifyContainerReportProcessing(final boolean 
isFullReport,
 throw new UnsupportedOperationException("Not yet implemented!");
   }
 
+  @Override
+  public void deleteContainer(final ContainerID containerID)
+  throws IOException {
+final HddsProtos.ContainerID id = containerID.getProtobuf();
+if (!containerStateManager.contains(id)) {

Review comment:
   Can we use checkIfContainerExist()? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1115: HDDS-3632. starter scripts can't manage Ozone and HDFS datandodes on the same machine

2020-06-29 Thread GitBox


arp7 commented on a change in pull request #1115:
URL: https://github.com/apache/hadoop-ozone/pull/1115#discussion_r447260017



##
File path: hadoop-ozone/dist/src/shell/hdds/hadoop-functions.sh
##
@@ -2702,11 +2702,11 @@ function hadoop_generic_java_subcmd_handler
 
priv_outfile="${HADOOP_LOG_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.out"
 
priv_errfile="${HADOOP_LOG_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}-${HOSTNAME}.err"
 
priv_pidfile="${HADOOP_PID_DIR}/privileged-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid"

Review comment:
   Same comment as @adoroszlai. Should we change the filenames for secure 
cluster also?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1134: HDDS-3868. Implement getTrashRoot and getTrashRoots in o3fs

2020-06-29 Thread GitBox


xiaoyuyao commented on a change in pull request #1134:
URL: https://github.com/apache/hadoop-ozone/pull/1134#discussion_r447251957



##
File path: 
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
##
@@ -606,6 +607,52 @@ public String getUsername() {
 return userName;
   }
 
+  /**
+   * Get the root directory of Trash for a path.
+   * Returns /.Trash/
+   * Caller appends either Current or checkpoint timestamp for trash 
destination
+   * @param path the trash root of the path to be determined.
+   * @return trash root
+   */
+  @Override
+  public Path getTrashRoot(Path path) {
+final Path pathToTrash = new Path(OZONE_URI_DELIMITER, TRASH_PREFIX);
+return new Path(pathToTrash, getUsername());
+  }
+
+  /**
+   * Get all the trash roots for current user or all users.
+   *
+   * @param allUsers return trash roots for all users if true.
+   * @return all the trash root directories.
+   * Returns .Trash of users if {@code /.Trash/$USER} exists.
+   */
+  @Override
+  public Collection getTrashRoots(boolean allUsers) {
+Path trashRoot = new Path(OZONE_URI_DELIMITER, TRASH_PREFIX);
+List ret = new ArrayList<>();
+try {
+  if (!allUsers) {
+Path userTrash = new Path(trashRoot, userName);
+if (exists(userTrash)) {

Review comment:
   should we check both existing and isDirectory before return?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3214) Unhealthy datanodes repeatedly participate in pipeline creation

2020-06-29 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148135#comment-17148135
 ] 

Arpit Agarwal commented on HDDS-3214:
-

Moved to 0.7.0.

> Unhealthy datanodes repeatedly participate in pipeline creation
> ---
>
> Key: HDDS-3214
> URL: https://issues.apache.org/jira/browse/HDDS-3214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Prashant Pogde
>Priority: Blocker
>  Labels: TriagePending, fault_injection
>
> steps taken :
> 1) Mounted noise injection FUSE on all datanodes
> 2) Selected 1 datanode from each open pipeline (factor=3)
> 3) Injected WRITE FAILURE noise with error code - ENOENT on 
> "hdds.datanode.dir" path of list of datanodes selected in step 2)
> 4) start PUT key operation of size  32 MB.
>  
> Observation :
> 
>  # Commit failed, pipelines were moved to exclusion list.
>  # Client retries , new pipeline is created with same set of datanodes. 
> Container creation fails as WRITE  FAILURE injection present.
>  # Pipeline is closed and the process is repeated for 
> "ozone.client.max.retries" retries.
> Everytime, same set of datanodes are selected for pipeline creation which 
> include 1 unhealthy datanode. 
> Expectation - pipeline should have been created by selecting 3 healthy  
> datanodes available.
>  
> cc - [~ljain]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3214) Unhealthy datanodes repeatedly participate in pipeline creation

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3214:

Target Version/s: 0.7.0  (was: 0.6.0)

> Unhealthy datanodes repeatedly participate in pipeline creation
> ---
>
> Key: HDDS-3214
> URL: https://issues.apache.org/jira/browse/HDDS-3214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Prashant Pogde
>Priority: Blocker
>  Labels: TriagePending, fault_injection
>
> steps taken :
> 1) Mounted noise injection FUSE on all datanodes
> 2) Selected 1 datanode from each open pipeline (factor=3)
> 3) Injected WRITE FAILURE noise with error code - ENOENT on 
> "hdds.datanode.dir" path of list of datanodes selected in step 2)
> 4) start PUT key operation of size  32 MB.
>  
> Observation :
> 
>  # Commit failed, pipelines were moved to exclusion list.
>  # Client retries , new pipeline is created with same set of datanodes. 
> Container creation fails as WRITE  FAILURE injection present.
>  # Pipeline is closed and the process is repeated for 
> "ozone.client.max.retries" retries.
> Everytime, same set of datanodes are selected for pipeline creation which 
> include 1 unhealthy datanode. 
> Expectation - pipeline should have been created by selecting 3 healthy  
> datanodes available.
>  
> cc - [~ljain]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3741) Reload old OM state if Install Snapshot from Leader fails

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3741:

Component/s: OM HA

> Reload old OM state if Install Snapshot from Leader fails
> -
>
> Key: HDDS-3741
> URL: https://issues.apache.org/jira/browse/HDDS-3741
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: OM HA
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Critical
>  Labels: pull-request-available
>
> Follower OM issues a pause on its services before installing new checkpoint 
> from Leader OM (Install Snapshot). If this installation fails for some 
> reason, the OM stays in paused state. It should be unpaused and the old state 
> should be reloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3612) Allow mounting bucket under other volume

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3612:

Target Version/s: 0.7.0  (was: 0.6.0)

> Allow mounting bucket under other volume
> 
>
> Key: HDDS-3612
> URL: https://issues.apache.org/jira/browse/HDDS-3612
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Manager
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: Triaged, pull-request-available
>
> Step 2 from S3 [volume mapping design 
> doc|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/docs/content/design/ozone-volume-management.md#solving-the-mapping-problem-2-4-from-the-problem-listing]:
> Implement a bind mount mechanic which makes it possible to mount any 
> volume/buckets to the specific "s3" volume.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3612) Allow mounting bucket under other volume

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3612:

Target Version/s: 0.6.0  (was: 0.7.0)

> Allow mounting bucket under other volume
> 
>
> Key: HDDS-3612
> URL: https://issues.apache.org/jira/browse/HDDS-3612
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Manager
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: Triaged, pull-request-available
>
> Step 2 from S3 [volume mapping design 
> doc|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/docs/content/design/ozone-volume-management.md#solving-the-mapping-problem-2-4-from-the-problem-listing]:
> Implement a bind mount mechanic which makes it possible to mount any 
> volume/buckets to the specific "s3" volume.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3402) Use proper acls for sub directories created during CreateDirectory operation

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3402:

Target Version/s: 0.7.0  (was: 0.6.0)

> Use proper acls for sub directories created during CreateDirectory operation
> 
>
> Key: HDDS-3402
> URL: https://issues.apache.org/jira/browse/HDDS-3402
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Rakesh Radhakrishnan
>Priority: Blocker
>  Labels: TriagePending
>
> Use proper ACLS for subdirectories created during create directory operation.
> All subdirectories/missing directories should inherit the ACLS from the 
> bucket if ancestors are not present in key table. If present should inherit 
> the ACLS from its ancestor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3612) Allow mounting bucket under other volume

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3612:

Priority: Blocker  (was: Critical)

> Allow mounting bucket under other volume
> 
>
> Key: HDDS-3612
> URL: https://issues.apache.org/jira/browse/HDDS-3612
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Manager
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>  Labels: Triaged, pull-request-available
>
> Step 2 from S3 [volume mapping design 
> doc|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/docs/content/design/ozone-volume-management.md#solving-the-mapping-problem-2-4-from-the-problem-listing]:
> Implement a bind mount mechanic which makes it possible to mount any 
> volume/buckets to the specific "s3" volume.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3685) Remove replay logic from actual request logic

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3685:

Priority: Blocker  (was: Critical)

> Remove replay logic from actual request logic
> -
>
> Key: HDDS-3685
> URL: https://issues.apache.org/jira/browse/HDDS-3685
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: OM HA
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>
> HDDS-3476 used the transaction info persisted in OM DB during double buffer 
> flush when OM is restarted. This transaction info log index and the term are 
> used as a snapshot index. So, we can remove the replay logic from actual 
> request logic. (As now we shall never have the transaction which is applied 
> to OM DB will never be again replayed to DB)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3599) [OFS] Add contract test for HA

2020-06-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3599:

Target Version/s: 0.7.0  (was: 0.6.0)

> [OFS] Add contract test for HA
> --
>
> Key: HDDS-3599
> URL: https://issues.apache.org/jira/browse/HDDS-3599
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Blocker
>  Labels: Triaged, pull-request-available
>
> Add contract tests for HA as well.
> Since adding HA contract tests will be another ~10 new classes. [~xyao] and I 
> decided to put HA OFS contract tests in another jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] nandakumar131 closed pull request #1145: HDDS-3895. Implement container related operations in ContainerManagerImpl

2020-06-29 Thread GitBox


nandakumar131 closed pull request #1145:
URL: https://github.com/apache/hadoop-ozone/pull/1145


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3874) ITestRootedOzoneContract tests are flaky

2020-06-29 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148034#comment-17148034
 ] 

Siyao Meng edited comment on HDDS-3874 at 6/29/20, 6:09 PM:


[~elek] I doubt this has anything to do with the FS interface. Looks like it is 
stuck in a lock in SCM.

OFS contract cluster config is exactly the same as o3fs 
({{RootedOzoneContract#createCluster}}) so this wouldn't be a variable.

I recall seeing a mini cluster setup/teardown related bug locally that, if I 
setup and teardown mini cluster more than once in the same test class, access 
to the second cluster would get stuck and the test would time out (try 
{{TestOzoneManagerListVolumes}}). I was suspecting some clean up issues back 
then but the problem disappears in GH workflow runs. Could be related.


was (Author: smeng):
[~elek] I doubt this has anything to do with the FS interface. Looks like it is 
stuck in a lock in SCM.

OFS contract cluster config is exactly the same as o3fs 
({{RootedOzoneContract#createCluster}}) so this wouldn't be a variable.

I recall seeing a mini cluster setup/teardown related bug locally that, if I 
setup and teardown mini cluster more than once in the same test class, access 
to the second cluster would get stuck and the test would time out. I was 
suspecting some clean up issues back then but the problem disappears in GH 
workflow runs.

> ITestRootedOzoneContract tests are flaky
> 
>
> Key: HDDS-3874
> URL: https://issues.apache.org/jira/browse/HDDS-3874
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Siyao Meng
>Priority: Blocker
>
> Different tests are failed with similar reasons:
> {code}
> java.lang.Exception: test timed out after 18 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:537)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:499)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:514)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:149)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:483)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:457)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:510)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.createFile(ContractTestUtils.java:638)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenFileTwice(AbstractContractOpenTest.java:135)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}
> Example:
> 

[jira] [Commented] (HDDS-3874) ITestRootedOzoneContract tests are flaky

2020-06-29 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148034#comment-17148034
 ] 

Siyao Meng commented on HDDS-3874:
--

[~elek] I doubt this has anything to do with the FS interface. Looks like it is 
stuck in a lock in SCM.

OFS contract cluster config is exactly the same as o3fs 
({{RootedOzoneContract#createCluster}}) so this wouldn't be a variable.

I recall seeing a mini cluster setup/teardown related bug locally that, if I 
setup and teardown mini cluster more than once in the same test class, access 
to the second cluster would get stuck and the test would time out. I was 
suspecting some clean up issues back then but the problem disappears in GH 
workflow runs.

> ITestRootedOzoneContract tests are flaky
> 
>
> Key: HDDS-3874
> URL: https://issues.apache.org/jira/browse/HDDS-3874
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Siyao Meng
>Priority: Blocker
>
> Different tests are failed with similar reasons:
> {code}
> java.lang.Exception: test timed out after 18 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:537)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:499)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:514)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:149)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:483)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:457)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:510)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.close(OzoneFSOutputStream.java:56)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.createFile(ContractTestUtils.java:638)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenFileTwice(AbstractContractOpenTest.java:135)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}
> Example:
> https://github.com/elek/ozone-build-results/blob/master/2020/06/16/1051/it-filesystem-contract/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.contract.rooted.ITestRootedOzoneContractOpen.txt
> But same problem here:
> https://github.com/elek/hadoop-ozone/runs/810175295?check_suite_focus=true 
> (contract)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3830) Introduce OM layout version 'v0'.

2020-06-29 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDDS-3830:
-

Assignee: Aravindan Vijayan

> Introduce OM layout version 'v0'.
> -
>
> Key: HDDS-3830
> URL: https://issues.apache.org/jira/browse/HDDS-3830
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: upgrade-p0
>
> The first layout version for OzoneManager will be '0' which will be written 
> to the version file. Until a future Ozone release with Upgrade & Finalize 
> support, this will just be a dummy number, to support backward compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3830) Introduce OM layout version 'v0'.

2020-06-29 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-3830:
--
Fix Version/s: 0.6.0

> Introduce OM layout version 'v0'.
> -
>
> Key: HDDS-3830
> URL: https://issues.apache.org/jira/browse/HDDS-3830
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: upgrade-p0
> Fix For: 0.6.0
>
>
> The first layout version for OzoneManager will be '0' which will be written 
> to the version file. Until a future Ozone release with Upgrade & Finalize 
> support, this will just be a dummy number, to support backward compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3805) [OFS] Remove usage of OzoneClientAdapter interface

2020-06-29 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-3805:
-
Summary: [OFS] Remove usage of OzoneClientAdapter interface  (was: [OFS] 
Use ClientProtocol directly in Adapter and FS)

> [OFS] Remove usage of OzoneClientAdapter interface
> --
>
> Key: HDDS-3805
> URL: https://issues.apache.org/jira/browse/HDDS-3805
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Use ClientProtocol (proxy) directly instead of OzoneClient / ObjectStore in 
> BasicRootedOzoneClientAdapterImpl and BasicRootedOzoneFileSystem as [~elek] 
> have suggested.
> This is part of the OFS refactoring effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3820) [OFS] Follow-up work post merge

2020-06-29 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-3820:
-
Component/s: (was: OFS)
 Ozone Filesystem

> [OFS] Follow-up work post merge
> ---
>
> Key: HDDS-3820
> URL: https://issues.apache.org/jira/browse/HDDS-3820
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3803) [OFS] Add User Guide

2020-06-29 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-3803:
-
Component/s: (was: Ozone Filesystem)
 documentation

> [OFS] Add User Guide
> 
>
> Key: HDDS-3803
> URL: https://issues.apache.org/jira/browse/HDDS-3803
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Blocker
>  Labels: pull-request-available
>
> Need to add a user guide markdown for OFS. Especially the usage for {{/tmp}}.
> Thanks [~umamaheswararao] and [~xyao] for the reminder.
> {{hadoop-hdds/docs/content/design/ofs.md}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3805) [OFS] Use ClientProtocol directly in Adapter and FS

2020-06-29 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-3805:
-
Component/s: Ozone Filesystem

> [OFS] Use ClientProtocol directly in Adapter and FS
> ---
>
> Key: HDDS-3805
> URL: https://issues.apache.org/jira/browse/HDDS-3805
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Use ClientProtocol (proxy) directly instead of OzoneClient / ObjectStore in 
> BasicRootedOzoneClientAdapterImpl and BasicRootedOzoneFileSystem as [~elek] 
> have suggested.
> This is part of the OFS refactoring effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-698) Support Topology Awareness for Ozone

2020-06-29 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-698:
-
Fix Version/s: 0.6.0

> Support Topology Awareness for Ozone
> 
>
> Key: HDDS-698
> URL: https://issues.apache.org/jira/browse/HDDS-698
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: SCM
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: HDDS-698.000.patch, network-topology-default.xml, 
> network-topology-nodegroup.xml
>
>
> This is an umbrella JIRA to add topology aware support for Ozone Pipelines, 
> Containers and Blocks. Long time since HDFS is created, we provide 
> rack/nodegroup awareness for reliability and high performance for data 
> access. Ozone need a similar mechanism and this can be more flexible for 
> cloud scenarios. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1930) Test Topology Aware Job scheduling with Ozone Topology

2020-06-29 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-1930:
--
Fix Version/s: 0.6.0

> Test Topology Aware Job scheduling with Ozone Topology
> --
>
> Key: HDDS-1930
> URL: https://issues.apache.org/jira/browse/HDDS-1930
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 0.6.0
>
>
> My initial results with Terasort does not seem to report the counter 
> properly. Most of the requests are handled by rack local but no node local. 
> This ticket is opened to add more system testing to validate the feature. 
> Total Allocated Containers: 3778
> Each table cell represents the number of NodeLocal/RackLocal/OffSwitch 
> containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests.
> Node Local RequestRack Local Request  Off Switch Request
> Num Node Local Containers (satisfied by)  0   
> Num Rack Local Containers (satisfied by)  0   3648
> Num Off Switch Containers (satisfied by)  0   96  34



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3097) Fallback to ozone.om.address if suffixed key is not defined

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-3097.
---
Resolution: Duplicate

I accidentally created a duplicate of this and created a patch:

HDDS-3878

It seems that we agree that everything should work as before if only one 
serviceid is defined for omha.

> Fallback to ozone.om.address if suffixed key is not defined
> ---
>
> Key: HDDS-3097
> URL: https://issues.apache.org/jira/browse/HDDS-3097
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: OM HA, Ozone Manager
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> Currently, if ozone.om.service.ids is defined but either 
> ozone.om.nodes. or ozone.om.address. keys are 
> not defined, then OM throws OzoneIllegalArgumentException. 
> For a single node OM cluster, we should always fallback to ozone.om.address 
> (even if service id is defined).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147988#comment-17147988
 ] 

Marton Elek commented on HDDS-3853:
---

Same comment as on HDDS-3852:

We discussed it during the Community Meeting. It seems to be hard to reproduce 
the problem, therefore we moved out from 0.7.0.  Feel free to move it back if 
you think it's important to fix (especially as you -- as the release manager -- 
have the final decision). 

Personally I think we need more test with long-running Ozone clusters. The 
upgrade tests introduced by Attila might also help. 

If you have any more logs or any information, please share, and we can 
investigate. 

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Major
>
> {code}
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3852) Failed to import replicated container

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147984#comment-17147984
 ] 

Marton Elek commented on HDDS-3852:
---

We discussed it during the Community Meeting. It seems to be hard to reproduce 
the problem, therefore we moved out from 0.7.0.  Feel free to move it back if 
you think it's important to fix (especially as you -- as the release manager -- 
have the final decision). 

Personally I think we need more test with long-running Ozone clusters. The 
upgrade tests introduced by Attila might also help. 

If you have any more logs or any information, please share, and we can 
investigate. 

> Failed to import replicated container
> -
>
> Key: HDDS-3852
> URL: https://issues.apache.org/jira/browse/HDDS-3852
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Priority: Major
>
> Find several container replication failure LOG after upgrade Ozone cluster to 
> June 12th master branch.  The tar file is deleted after import failure. 
>  
> {code}
>  2020-06-23 14:11:19,662 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Starting replication of container 206 from 
> [33b49c34-caa2-4b4f-894e-dce7db4f97b9{ip: 9.180.20.222, host: 
> host-9-180-20-222, networkLocation: /rack1, certSerialId: null}, 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e{ip: 9.179.142.251, host: host251, 
> networkLocation: /rack3, certSerialId: null}, 
> db854037-4846-4093-89de-e492e0f14239{ip: 9.179.142.198, host: host198, 
> networkLocation: /rack3, certSerialId: null}]
> 2020-06-23 14:11:20,504 [grpc-default-executor-111] INFO 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: 
> Container 206 is downloaded to /tmp/container-copy/container-206.tar.gz
> 2020-06-23 14:11:20,505 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is downloaded, starting to import.
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] ERROR 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Can't import the downloaded container data id=206
> java.io.IOException: Container descriptor is missing from the container 
> archive.
> at 
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:74)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:121)
> at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:129)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is replicated successfully
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: 
> Container 206 is replicated.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3853:
--
Description: 
{code}
INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
Message: ContainerID 1744 has been lost and and cannot be recreated on this 
DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 1744 has been lost and and cannot be recreated on this DataNode
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 40079 
msg : ContainerID 1744 has been lost and and cannot be recreated on this 
DataNode Container Result: CONTAINER_MISSING

 ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 pipeline Action CLOSE on pipeline 
PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
.Triggering pipeline close action
 {code}

  was:
INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
Message: ContainerID 1744 has been lost and and cannot be recreated on this 
DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 1744 has been lost and and cannot be recreated on this DataNode
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 40079 
msg : ContainerID 1744 has been lost and and cannot be recreated on this 
DataNode Container Result: CONTAINER_MISSING

 ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 pipeline Action CLOSE on pipeline 
PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
.Triggering pipeline close action
 


> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Major
>
> {code}
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 

[jira] [Updated] (HDDS-3852) Failed to import replicated container

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3852:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Failed to import replicated container
> -
>
> Key: HDDS-3852
> URL: https://issues.apache.org/jira/browse/HDDS-3852
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Priority: Major
>
> Find several container replication failure LOG after upgrade Ozone cluster to 
> June 12th master branch.  The tar file is deleted after import failure. 
>  
> {code}
>  2020-06-23 14:11:19,662 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Starting replication of container 206 from 
> [33b49c34-caa2-4b4f-894e-dce7db4f97b9{ip: 9.180.20.222, host: 
> host-9-180-20-222, networkLocation: /rack1, certSerialId: null}, 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e{ip: 9.179.142.251, host: host251, 
> networkLocation: /rack3, certSerialId: null}, 
> db854037-4846-4093-89de-e492e0f14239{ip: 9.179.142.198, host: host198, 
> networkLocation: /rack3, certSerialId: null}]
> 2020-06-23 14:11:20,504 [grpc-default-executor-111] INFO 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: 
> Container 206 is downloaded to /tmp/container-copy/container-206.tar.gz
> 2020-06-23 14:11:20,505 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is downloaded, starting to import.
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] ERROR 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Can't import the downloaded container data id=206
> java.io.IOException: Container descriptor is missing from the container 
> archive.
> at 
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:74)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:121)
> at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:129)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is replicated successfully
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: 
> Container 206 is replicated.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3855) Add upgrade smoketest

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3855:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Add upgrade smoketest
> -
>
> Key: HDDS-3855
> URL: https://issues.apache.org/jira/browse/HDDS-3855
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> The goal of this task is to create an acceptance test environment where 
> upgrade from prior version can be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3853:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Major
>
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3886) StackOverflowError in KeyValueBlockIterator.hasNext

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-3886.
---
Resolution: Duplicate

Closing based on the information from Attila. Feel free to re-open, if you 
think it's a problem.

> StackOverflowError in KeyValueBlockIterator.hasNext
> ---
>
> Key: HDDS-3886
> URL: https://issues.apache.org/jira/browse/HDDS-3886
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Priority: Critical
>
> Set a higer thread stack size to mitigate the StackOverflowError error.  By 
> default thread stack size is 1MB.
> Exception in thread "Thread-14" java.lang.StackOverflowError
> at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304)
> at java.lang.StringCoding.encode(StringCoding.java:344)
> at java.lang.String.getBytes(String.java:918)
> at 
> org.apache.hadoop.hdds.StringUtils.string2Bytes(StringUtils.java:83)
> at 
> org.apache.hadoop.hdds.utils.MetadataKeyFilters$KeyPrefixFilter.lambda$filterKey$3(MetadataKeyFilters.java:175)
> at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
> at 
> java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1359)
> at 
> java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
> at 
> java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
> at 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:521)
> at 
> org.apache.hadoop.hdds.utils.MetadataKeyFilters$KeyPrefixFilter.filterKey(MetadataKeyFilters.java:174)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:129)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3886) StackOverflowError in KeyValueBlockIterator.hasNext

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3886:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> StackOverflowError in KeyValueBlockIterator.hasNext
> ---
>
> Key: HDDS-3886
> URL: https://issues.apache.org/jira/browse/HDDS-3886
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Priority: Critical
>
> Set a higer thread stack size to mitigate the StackOverflowError error.  By 
> default thread stack size is 1MB.
> Exception in thread "Thread-14" java.lang.StackOverflowError
> at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304)
> at java.lang.StringCoding.encode(StringCoding.java:344)
> at java.lang.String.getBytes(String.java:918)
> at 
> org.apache.hadoop.hdds.StringUtils.string2Bytes(StringUtils.java:83)
> at 
> org.apache.hadoop.hdds.utils.MetadataKeyFilters$KeyPrefixFilter.lambda$filterKey$3(MetadataKeyFilters.java:175)
> at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
> at 
> java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1359)
> at 
> java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
> at 
> java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
> at 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:521)
> at 
> org.apache.hadoop.hdds.utils.MetadataKeyFilters$KeyPrefixFilter.filterKey(MetadataKeyFilters.java:174)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:129)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueBlockIterator.hasNext(KeyValueBlockIterator.java:137)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-390) Add method to check for valid key name based on URI characters

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-390:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> Add method to check for valid key name based on URI characters
> --
>
> Key: HDDS-390
> URL: https://issues.apache.org/jira/browse/HDDS-390
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: Dinesh Chitlangia
>Priority: Major
>  Labels: Triaged
> Attachments: HDDS-390.001.patch
>
>
> As per design, key names composed of all valid characters in URI set must be 
> treated as valid key name.
> For URI character set: [https://tools.ietf.org/html/rfc2396#appendix-A]
> This Jira proposes to define validateKeyName() similar to 
> validateResourceName() that validates bucket/volume name
>  
> Valid Key name must:
>  * conform to URI Character set
>  * must allow /
> TBD whether key names must impose other rules similar to volume/bucket names 
> like  -
>  * should not start with period or dash
>  * should not end with period or dash
>  * should not have contiguous periods
>  * should not have period after dash and vice versa
> etc
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1199) In Healthy Pipeline rule consider pipelines with all replicationType and replicationFactor

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-1199:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> In Healthy Pipeline rule consider pipelines with all replicationType and 
> replicationFactor
> --
>
> Key: HDDS-1199
> URL: https://issues.apache.org/jira/browse/HDDS-1199
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: TriagePending
>
> In the current HealthyPipelineRule, we considered only pipeline type ratis 
> and replication factor 3 pipelines for 10%.
>  
> This Jira is to consider all the pipelines with all replication factor for 
> the 10% threshold. (Means each pipeline-type with factor should meet 10%)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1199) In Healthy Pipeline rule consider pipelines with all replicationType and replicationFactor

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147957#comment-17147957
 ] 

Marton Elek commented on HDDS-1199:
---

[~bharat], please move back if you think it's still a problem.

> In Healthy Pipeline rule consider pipelines with all replicationType and 
> replicationFactor
> --
>
> Key: HDDS-1199
> URL: https://issues.apache.org/jira/browse/HDDS-1199
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: TriagePending
>
> In the current HealthyPipelineRule, we considered only pipeline type ratis 
> and replication factor 3 pipelines for 10%.
>  
> This Jira is to consider all the pipelines with all replication factor for 
> the 10% threshold. (Means each pipeline-type with factor should meet 10%)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1676) Datanode out file contains ReferenceCountedDB related exceptions

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-1676.
---
Resolution: Cannot Reproduce

> Datanode out file contains ReferenceCountedDB related exceptions
> 
>
> Key: HDDS-1676
> URL: https://issues.apache.org/jira/browse/HDDS-1676
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Aravindan Vijayan
>Priority: Critical
>  Labels: Triaged
>
> The following exception is continuously seen in HDDS Datanode out file after 
> startup.
> {code}
> java.lang.Exception at 
> org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.decrementReference(ReferenceCountedDB.java:68)
>  at 
> org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.close(ReferenceCountedDB.java:95)
>  at 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(BlockManagerImpl.java:127)
>  at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handlePutBlock(KeyValueHandler.java:416)
>  at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:182)
>  at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271)
>  at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:353)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$5(ContainerStateMachine.java:620)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1714) When restart om with Kerberos, NPException happened at addPersistedDelegationToken

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-1714:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> When restart om with Kerberos, NPException happened at 
> addPersistedDelegationToken 
> ---
>
> Key: HDDS-1714
> URL: https://issues.apache.org/jira/browse/HDDS-1714
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: luhuachao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: TriagePending
>
> the error stack:
> {code:java}
> 2019-06-21 15:17:41,744 [main] INFO - Loaded 11 tokens
> 2019-06-21 15:17:41,745 [main] INFO - Loading token state into token manager.
> 2019-06-21 15:17:41,748 [main] ERROR - Failed to start the OzoneManager.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.addPersistedDelegationToken(OzoneDelegationTokenSecretManager.java:371)
> at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.loadTokenSecretState(OzoneDelegationTokenSecretManager.java:358)
> at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.(OzoneDelegationTokenSecretManager.java:96)
> at 
> org.apache.hadoop.ozone.om.OzoneManager.createDelegationTokenSecretManager(OzoneManager.java:608)
> at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:332)
> at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:941)
> at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:859)
> 2019-06-21 15:17:41,753 [pool-2-thread-1] INFO - SHUTDOWN_MSG:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-1933:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode should use hostname in place of ip addresses to allow DN's to work 
> when ipaddress change
> -
>
> Key: HDDS-1933
> URL: https://issues.apache.org/jira/browse/HDDS-1933
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: runzhiwang
>Priority: Major
>  Labels: TriagePending
>
> This was noticed by [~elek] while deploying Ozone on Kubernetes based 
> environment.
> When the datanode ip address change on restart, the Datanode details cease to 
> be correct for the datanode. and this prevents the cluster from functioning 
> after a restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2086) ReconServer throws SQLException but path present for ozone.recon.db.dir in ozone-site

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-2086.
---
Resolution: Invalid

Please re-open if you see it again, it should be fixed with the recent changes.

> ReconServer throws SQLException but path present for ozone.recon.db.dir in 
> ozone-site
> -
>
> Key: HDDS-2086
> URL: https://issues.apache.org/jira/browse/HDDS-2086
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Recon
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
>  Labels: TriagePending
>
> java.sql.SQLException: path to 
> '/${ozone.recon.db.dir}/ozone_recon_sqlite.db': '/${ozone.recon.db.dir}' does 
> not exist
> But property present in ozone-site.xml:
> 
> ozone.recon.db.dir
> /tmp/metadata
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2309:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: TriagePending, performance
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2324) Enhance locking mechanism in OzoneMangaer

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2324:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Enhance locking mechanism in OzoneMangaer
> -
>
> Key: HDDS-2324
> URL: https://issues.apache.org/jira/browse/HDDS-2324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Rakesh Radhakrishnan
>Priority: Critical
>  Labels: Triaged, performance
> Attachments: om_lock_100_percent_read_benchmark.svg, 
> om_lock_reader_and_writer_workload.svg
>
>
> OM has reentrant RW lock. With 100% read or 100% write benchmarks, it works 
> out reasonably fine. There is already a ticket to optimize the write codepath 
> (as it incurs reading from DB for key checks).
> However, when small amount of write workload (e.g 3-5 threads) is added to 
> the running read benchmark, throughput suffers significantly. This is due to 
> the fact that the reader threads would get blocked often.  I have observed 
> around 10x slower throughput (i.e 100% read benchmark was running at 12,000 
> TPS and with couple of writer threads added to it, it goes down to 1200-1800 
> TPS).
> 1. Instead of single write lock, one option could be good to scale out the 
> write lock depending on the number of cores available in the system and 
> acquire relevant lock by hashing the key.
> 2. Another option is to explore if we can make use of StampedLocks of JDK 
> 8.x, which scales well when multiple readers and writers are there. But it is 
> not a reentrant lock. So need to explore whether it can be an option or not.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2351) Fix write performance issue in Non-HA OM

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2351:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Fix write performance issue in Non-HA OM 
> -
>
> Key: HDDS-2351
> URL: https://issues.apache.org/jira/browse/HDDS-2351
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>  Labels: TriagePending, performance
> Attachments: Screenshot 2019-10-23 at 2.27.05 PM.png
>
>
> HDDS-2333 enables sync option in OM non-HA mode. However, this flushes very 
> frequently causing disk to saturate its IOPS soon. It creates way too small 
> write workloads and disk hits limit.
> To put it in perspective, in simple write benchmark of creating keys with 10 
> clients, it generates {{0.33MB/s}} write workload with {{116 IOPS}}. This 
> causes disk to saturate at 98%.
>  
> !Screenshot 2019-10-23 at 2.27.05 PM.png|width=621,height=370!
>  
> Reverting back HDDS-2333 fixes this issue. I see >{{10x}} degradation with 
> HDDS-2333.
> In case non-HA is supported in OM, it would be good to call it out. 
> Currently, code explicitly enables sync option. 
> [https://github.com/apache/hadoop-ozone/commit/c6c9794fc590371ad9c3b8fdcd7a36ed42909b40#diff-3ed3ab4891d7b4fa31ca96740b78ae5bR261]
>  
> I used  {{commit 1baa5a158d13f469c12bef86ef288d60ef0eee85}} in master branch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2351) Fix write performance issue in Non-HA OM

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147945#comment-17147945
 ] 

Marton Elek commented on HDDS-2351:
---

Is it still a problem, can we close it? Seems to be better now?

> Fix write performance issue in Non-HA OM 
> -
>
> Key: HDDS-2351
> URL: https://issues.apache.org/jira/browse/HDDS-2351
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>  Labels: TriagePending, performance
> Attachments: Screenshot 2019-10-23 at 2.27.05 PM.png
>
>
> HDDS-2333 enables sync option in OM non-HA mode. However, this flushes very 
> frequently causing disk to saturate its IOPS soon. It creates way too small 
> write workloads and disk hits limit.
> To put it in perspective, in simple write benchmark of creating keys with 10 
> clients, it generates {{0.33MB/s}} write workload with {{116 IOPS}}. This 
> causes disk to saturate at 98%.
>  
> !Screenshot 2019-10-23 at 2.27.05 PM.png|width=621,height=370!
>  
> Reverting back HDDS-2333 fixes this issue. I see >{{10x}} degradation with 
> HDDS-2333.
> In case non-HA is supported in OM, it would be good to call it out. 
> Currently, code explicitly enables sync option. 
> [https://github.com/apache/hadoop-ozone/commit/c6c9794fc590371ad9c3b8fdcd7a36ed42909b40#diff-3ed3ab4891d7b4fa31ca96740b78ae5bR261]
>  
> I used  {{commit 1baa5a158d13f469c12bef86ef288d60ef0eee85}} in master branch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2694) HddsVolume#readVersionFile fails when reading older versions

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2694:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> HddsVolume#readVersionFile fails when reading older versions
> 
>
> Key: HDDS-2694
> URL: https://issues.apache.org/jira/browse/HDDS-2694
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: Triaged, upgrade
>
> {{HddsVolume#layoutVersion}} is a version number, supposed to be used for 
> handling upgrades from older versions.  Currently only one version is 
> defined.  But should a new version be introduced, HddsVolume would fail to 
> read older version file.  This is caused by a check in {{HddsVolumeUtil}} 
> that only considers the latest version as valid:
> {code:title=https://github.com/apache/hadoop-ozone/blob/1d56bc244995e857b842f62d3d1e544ee100bbc1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/HddsVolumeUtil.java#L137-L153}
>   /**
>* Returns layOutVersion if it is valid. Throws an exception otherwise.
>*/
>   @VisibleForTesting
>   public static int getLayOutVersion(Properties props, File versionFile) 
> throws
>   InconsistentStorageStateException {
> String lvStr = getProperty(props, OzoneConsts.LAYOUTVERSION, versionFile);
> int lv = Integer.parseInt(lvStr);
> if(DataNodeLayoutVersion.getLatestVersion().getVersion() != lv) {
>   throw new InconsistentStorageStateException("Invalid layOutVersion. " +
>   "Version file has layOutVersion as " + lv + " and latest Datanode " 
> +
>   "layOutVersion is " +
>   DataNodeLayoutVersion.getLatestVersion().getVersion());
> }
> return lv;
>   }
> {code}
> I think this should check whether the version number identifies a known 
> {{DataNodeLayoutVersion}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2694) HddsVolume#readVersionFile fails when reading older versions

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2694:
--
Priority: Critical  (was: Major)

> HddsVolume#readVersionFile fails when reading older versions
> 
>
> Key: HDDS-2694
> URL: https://issues.apache.org/jira/browse/HDDS-2694
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Aravindan Vijayan
>Priority: Critical
>  Labels: Triaged, upgrade
>
> {{HddsVolume#layoutVersion}} is a version number, supposed to be used for 
> handling upgrades from older versions.  Currently only one version is 
> defined.  But should a new version be introduced, HddsVolume would fail to 
> read older version file.  This is caused by a check in {{HddsVolumeUtil}} 
> that only considers the latest version as valid:
> {code:title=https://github.com/apache/hadoop-ozone/blob/1d56bc244995e857b842f62d3d1e544ee100bbc1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/HddsVolumeUtil.java#L137-L153}
>   /**
>* Returns layOutVersion if it is valid. Throws an exception otherwise.
>*/
>   @VisibleForTesting
>   public static int getLayOutVersion(Properties props, File versionFile) 
> throws
>   InconsistentStorageStateException {
> String lvStr = getProperty(props, OzoneConsts.LAYOUTVERSION, versionFile);
> int lv = Integer.parseInt(lvStr);
> if(DataNodeLayoutVersion.getLatestVersion().getVersion() != lv) {
>   throw new InconsistentStorageStateException("Invalid layOutVersion. " +
>   "Version file has layOutVersion as " + lv + " and latest Datanode " 
> +
>   "layOutVersion is " +
>   DataNodeLayoutVersion.getLatestVersion().getVersion());
> }
> return lv;
>   }
> {code}
> I think this should check whether the version number identifies a known 
> {{DataNodeLayoutVersion}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2697) SCM log is flooded with block deletion txId mismatch messages

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2697:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> SCM log is flooded with block deletion txId mismatch messages
> -
>
> Key: HDDS-2697
> URL: https://issues.apache.org/jira/browse/HDDS-2697
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: TriagePending
>
> When you run Hive queries on the cluster, but I think this is true for other 
> MapReduce stuff as well, then the interim and temporary data is created and 
> deleted quite often.
> This leads to the flood of similar messages in the SCM log:
> {code}
> 2019-12-07 05:00:41,112 INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode e590d08a-4a4e-428a-82e8-80f7221f639e for 
> containerID 307. Datanode delete txnID: 25145, SCM txnID: 25148
> {code}
> Either we need to decrease the log level of this messages, or we need to get 
> rid of the cause of the message. In a single log file I see over 21k lines 
> containing this message from ~37k lines of log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2697) SCM log is flooded with block deletion txId mismatch messages

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2697:
--
Priority: Critical  (was: Major)

> SCM log is flooded with block deletion txId mismatch messages
> -
>
> Key: HDDS-2697
> URL: https://issues.apache.org/jira/browse/HDDS-2697
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: TriagePending
>
> When you run Hive queries on the cluster, but I think this is true for other 
> MapReduce stuff as well, then the interim and temporary data is created and 
> deleted quite often.
> This leads to the flood of similar messages in the SCM log:
> {code}
> 2019-12-07 05:00:41,112 INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode e590d08a-4a4e-428a-82e8-80f7221f639e for 
> containerID 307. Datanode delete txnID: 25145, SCM txnID: 25148
> {code}
> Either we need to decrease the log level of this messages, or we need to get 
> rid of the cause of the message. In a single log file I see over 21k lines 
> containing this message from ~37k lines of log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2253) Add container state to snapshot to avoid spurious missing container being reported.

2020-06-29 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2253:

Target Version/s: 0.7.0  (was: 0.6.0)

> Add container state to snapshot to avoid spurious missing container being 
> reported.
> ---
>
> Key: HDDS-2253
> URL: https://issues.apache.org/jira/browse/HDDS-2253
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: TriagePending
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3090) Fix logging in OMFileRequest and OzoneManager

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3090:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Fix logging in OMFileRequest and OzoneManager
> -
>
> Key: HDDS-3090
> URL: https://issues.apache.org/jira/browse/HDDS-3090
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Trivial
>  Labels: Triaged
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-2940 introduced a INFO level log in 
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java
> This needs to be a TRACE because it occurs in the regular file create path.
> Also, trace logs introduced in OzoneManager and OMFileRequest.java need to be 
> parameterized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3039) SCM sometimes cannot exit safe mode

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3039:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> SCM sometimes cannot exit safe mode
> ---
>
> Key: HDDS-3039
> URL: https://issues.apache.org/jira/browse/HDDS-3039
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Attila Doroszlai
>Priority: Critical
>  Labels: Triaged
>
> Sometimes SCM cannot exit safe mode:
> {code:title=https://github.com/apache/hadoop-ozone/pull/563/checks?check_run_id=453543576}
> 2020-02-18T19:12:28.1108180Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 139.821 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.fsck.TestContainerMapper
> 2020-02-18T19:12:28.1169327Z [ERROR] 
> org.apache.hadoop.ozone.fsck.TestContainerMapper  Time elapsed: 139.813 s  
> <<< ERROR!
> 2020-02-18T19:12:28.1202534Z java.util.concurrent.TimeoutException: 
> ...
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:164)
>   at 
> org.apache.hadoop.ozone.fsck.TestContainerMapper.init(TestContainerMapper.java:71)
> {code}
> despite nodes and pipeline being ready:
> {code}
> 2020-02-18 19:10:18,045 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(169)) - Nodes are 
> ready. Got 3 of 3 DN Heartbeats.
> ...
> 2020-02-18 19:10:18,847 [RatisPipelineUtilsThread] INFO  
> pipeline.PipelineStateManager (PipelineStateManager.java:addPipeline(54)) - 
> Created pipeline Pipeline[ Id: b56478a3-8816-459e-a007-db5ee4a5572e, Nodes: 
> 86e97873-2dbd-4f1b-b418-cf9fba405476{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: 
> null}0fb407c1-4cda-4b3e-8e64-20c845872684{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: 
> null}31baa82d-441c-41be-94c9-8dd7468b728e{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:ALLOCATED, leaderId:null ]
> ...
> 2020-02-18 19:12:17,108 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(169)) - Nodes are 
> ready. Got 3 of 3 DN Heartbeats.
> 2020-02-18 19:12:17,108 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(172)) - Waiting 
> for cluster to exit safe mode
> 2020-02-18 19:12:17,151 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:shutdown(370)) - Shutting down the Mini Ozone 
> Cluster
> {code}
> [~shashikant] also noticed this in other integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1557) Datanode exits because Ratis fails to shutdown ratis server

2020-06-29 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan resolved HDDS-1557.
-
Resolution: Cannot Reproduce

No longer able to reproduce this. Please reopen if we see this issue again.

> Datanode exits because Ratis fails to shutdown ratis server 
> 
>
> Key: HDDS-1557
> URL: https://issues.apache.org/jira/browse/HDDS-1557
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: MiniOzoneChaosCluster, TriagePending
>
> Datanode exits because Ratis fails to shutdown ratis server 
> {code}
> 2019-05-19 12:07:19,276 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(965)) - 
> 80747533-f47c-43de-85b8-e70db448c63f: inconsistency entries. 
> Reply:99930d0a-72ab-4795-a3ac-f3c
> fb61ca1bb<-80747533-f47c-43de-85b8-e70db448c63f#3132:FAIL,INCONSISTENCY,nextIndex:9057,term:33,followerCommit:9057
> 2019-05-19 12:07:19,276 WARN  impl.RaftServerProxy 
> (RaftServerProxy.java:lambda$close$4(320)) - 
> e143b976-ab35-4555-a800-7f05a2b1b738: Failed to close GRPC server
> java.io.InterruptedIOException: e143b976-ab35-4555-a800-7f05a2b1b738: 
> shutdown server with port 64605 failed
> at 
> org.apache.ratis.util.IOUtils.toInterruptedIOException(IOUtils.java:48)
> at 
> org.apache.ratis.grpc.server.GrpcService.closeImpl(GrpcService.java:160)
> at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.lambda$close$2(RaftServerRpcWithProxy.java:76)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.close(RaftServerRpcWithProxy.java:76)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$4(RaftServerProxy.java:318)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:313)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.stop(XceiverServerRatis.java:432)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:201)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close(DatanodeStateMachine.java:270)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon(DatanodeStateMachine.java:394)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.stop(HddsDatanodeService.java:449)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.terminateDatanode(HddsDatanodeService.java:429)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:208)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:349)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.awaitTermination(ServerImpl.java:282)
> at 
> org.apache.ratis.grpc.server.GrpcService.closeImpl(GrpcService.java:158)
> ... 19 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3133:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
> -
>
> Key: HDDS-3133
> URL: https://issues.apache.org/jira/browse/HDDS-3133
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Mukul Kumar Singh
>Priority: Critical
>  Labels: Triaged
>
> Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds 
> need to be exported as fileIds to allow the caching to happen effectively.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3138) Fix pipeline datanode limit on cluster with rack awareness

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3138:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Fix pipeline datanode limit on cluster with rack awareness
> --
>
> Key: HDDS-3138
> URL: https://issues.apache.org/jira/browse/HDDS-3138
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Simon Su
>Priority: Major
>  Labels: TriagePending
>
> Deployed on cluster with 8 datanodes and rack awareness enabled. (2/3/3). 
> Pipeline limit on datanode is 5.
>  
> turned out there are pipeline which can have over 5.
>  
> Perhaps should not count out CLOSE pipeline when filter the limit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3214) Unhealthy datanodes repeatedly participate in pipeline creation

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3214:
--
Priority: Blocker  (was: Major)

> Unhealthy datanodes repeatedly participate in pipeline creation
> ---
>
> Key: HDDS-3214
> URL: https://issues.apache.org/jira/browse/HDDS-3214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Prashant Pogde
>Priority: Blocker
>  Labels: TriagePending, fault_injection
>
> steps taken :
> 1) Mounted noise injection FUSE on all datanodes
> 2) Selected 1 datanode from each open pipeline (factor=3)
> 3) Injected WRITE FAILURE noise with error code - ENOENT on 
> "hdds.datanode.dir" path of list of datanodes selected in step 2)
> 4) start PUT key operation of size  32 MB.
>  
> Observation :
> 
>  # Commit failed, pipelines were moved to exclusion list.
>  # Client retries , new pipeline is created with same set of datanodes. 
> Container creation fails as WRITE  FAILURE injection present.
>  # Pipeline is closed and the process is repeated for 
> "ozone.client.max.retries" retries.
> Everytime, same set of datanodes are selected for pipeline creation which 
> include 1 unhealthy datanode. 
> Expectation - pipeline should have been created by selecting 3 healthy  
> datanodes available.
>  
> cc - [~ljain]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3246) Include OM hostname info in "getserviceroles" subcommand of OM CLI

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3246:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Include OM hostname info in "getserviceroles" subcommand of OM CLI
> --
>
> Key: HDDS-3246
> URL: https://issues.apache.org/jira/browse/HDDS-3246
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone CLI, Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>  Labels: Triaged, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently,  "getserviceroles" subcommand of OM CLI displays only  node ID 
> along with its serviceRole. 
> ozone admin om getserviceroles -id=ozone1
> om2 : FOLLOWER
> om3 : FOLLOWER
> om1 : LEADER
> Need to include  Hostname info



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3246) Include OM hostname info in "getserviceroles" subcommand of OM CLI

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3246:
--
Priority: Major  (was: Minor)

> Include OM hostname info in "getserviceroles" subcommand of OM CLI
> --
>
> Key: HDDS-3246
> URL: https://issues.apache.org/jira/browse/HDDS-3246
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone CLI, Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>  Labels: Triaged, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently,  "getserviceroles" subcommand of OM CLI displays only  node ID 
> along with its serviceRole. 
> ozone admin om getserviceroles -id=ozone1
> om2 : FOLLOWER
> om3 : FOLLOWER
> om1 : LEADER
> Need to include  Hostname info



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3277) Datanodes do not close pipeline when pipeline directory is deleted.

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3277:
--
Priority: Critical  (was: Major)

> Datanodes do not close pipeline when pipeline directory is deleted.
> ---
>
> Key: HDDS-3277
> URL: https://issues.apache.org/jira/browse/HDDS-3277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.6.0
>Reporter: Mukul Kumar Singh
>Priority: Critical
>  Labels: MiniOzoneChaosCluster, Triaged
>
> First the pipeline was deleted
> {code}
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(49)) - failing with, DeletePipelineFailure
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-0/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,679 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,681 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-5/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> {code}
> However no pipeline failure handling was issued to SCM.
> {code}
> 2020-03-25 19:44:24,532 
> [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] 
> ERROR ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(302)) - group-C95A81785DF9: Failed 
> to write snapshot at:(t:1, i:2037) file 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
> 2020-03-25 19:44:24,532 
> [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] 
> ERROR impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(269)) - 
> b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater: 
> Failed to take snapshot
> java.io.FileNotFoundException: 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
>  (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:296)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:258)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:250)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:169)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3277) Datanodes do not close pipeline when pipeline directory is deleted.

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3277:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanodes do not close pipeline when pipeline directory is deleted.
> ---
>
> Key: HDDS-3277
> URL: https://issues.apache.org/jira/browse/HDDS-3277
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.6.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, Triaged
>
> First the pipeline was deleted
> {code}
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(49)) - failing with, DeletePipelineFailure
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-0/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,679 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,681 [pool-22-thread-1] INFO  failure.Failures 
> (Failures.java:fail(118)) - deleteing pipeline directory 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-5/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> {code}
> However no pipeline failure handling was issued to SCM.
> {code}
> 2020-03-25 19:44:24,532 
> [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] 
> ERROR ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(302)) - group-C95A81785DF9: Failed 
> to write snapshot at:(t:1, i:2037) file 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
> 2020-03-25 19:44:24,532 
> [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] 
> ERROR impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(269)) - 
> b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater: 
> Failed to take snapshot
> java.io.FileNotFoundException: 
> /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
>  (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:296)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:258)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:250)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:169)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3325) Handle Resource Unavailable exception in OM HA

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147926#comment-17147926
 ] 

Marton Elek edited comment on HDDS-3325 at 6/29/20, 4:29 PM:
-

Is it a blocker for 0.6.0? Please move back to 0.6.0 if you think it's blocker. 



was (Author: elek):
Is it a blocker for 0.6.0?

> Handle Resource Unavailable exception in OM HA
> --
>
> Key: HDDS-3325
> URL: https://issues.apache.org/jira/browse/HDDS-3325
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: OM HA, Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: TriagePending
>
> Right now, when the future fails with an exception, we send that exception to 
> the client, and retry with a new server. but when using ratis server when 
> resource unavailable exception future fails with exceptionally. So, in this 
> case we need to wrap the exception and retry to the same server with some 
> retry policy like MultiLinearRandomRetry or some retry policy.
> {code:java}
> try {
>  raftClientReply = server.submitClientRequestAsync(raftClientRequest)
>   .get();
> } catch (Exception ex) {
>   throw new ServiceException(ex.getMessage(), ex);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



  1   2   >