[ https://issues.apache.org/jira/browse/HDDS-2330?focusedWorklogId=330438&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330438 ]
ASF GitHub Bot logged work on HDDS-2330: ---------------------------------------- Author: ASF GitHub Bot Created on: 18/Oct/19 11:29 Start Date: 18/Oct/19 11:29 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #53: HDDS-2330. Random key generator can get stuck URL: https://github.com/apache/hadoop-ozone/pull/53 ## What changes were proposed in this pull request? Fix the problem that any exception/error not caught by `ObjectCreator` ends the object creation task, but Freon's main thread continues waiting indefinitely, since the exception is not stored. https://issues.apache.org/jira/browse/HDDS-2330 ## How was this patch tested? Verified that OOME is caught, reported, and results in Freon exiting with failure. ``` $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone $ docker-compose up -d $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' | bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' | bc -lq) ... 6.66% |??????? | 341/5120 Time: 0:00:17 [pool-2-thread-1] ERROR - Exception while adding key: key-357-74353 in bucket: bucket-0-90611 of volume: vol-0-95721. java.lang.OutOfMemoryError: Java heap space at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348) at org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:81) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:233) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208) at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) at org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) at org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710) at org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88) at org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 100.00% |?????????????????????????????????????????????????????????????????????????????????????????????????????| 5120/5120 Time: 0:00:20 java.lang.OutOfMemoryError: Java heap space *************************************************** Status: Failed Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf Number of Volumes created: 1 Number of Buckets created: 1 Number of Keys added: 357 Ratis replication factor: ONE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,190 Average Time spent in bucket creation: 00:00:00,030 Average Time spent in key creation: 00:00:02,826 Average Time spent in key write: 00:00:14,607 Total bytes written: 374341632 Total Execution time: 00:00:21,593 *************************************************** ``` Also verified that successful execution is not affected: ``` $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' | bc -lq) --numOfKeys 3 --bufferSize $(echo '2^16' | bc -lq) ... 100.00% |?????????????????????????????????????????????????????????????????????????????????????????????????????| 3/3 Time: 0:00:02 *************************************************** Status: Success Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf Number of Volumes created: 1 Number of Buckets created: 1 Number of Keys added: 3 Ratis replication factor: ONE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,083 Average Time spent in bucket creation: 00:00:00,012 Average Time spent in key creation: 00:00:00,069 Average Time spent in key write: 00:00:01,611 Total bytes written: 3145728 Total Execution time: 00:00:05,986 *************************************************** ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 330438) Remaining Estimate: 0h Time Spent: 10m > Random key generator can get stuck > ---------------------------------- > > Key: HDDS-2330 > URL: https://issues.apache.org/jira/browse/HDDS-2330 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: freon > Reporter: Attila Doroszlai > Assignee: Attila Doroszlai > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Freon's random key generator can get stuck waiting for completion (without > any hint to what's happening) if object creation encounters any > non-IOException. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: after a few hundred keys progress stops. > {noformat} > $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 > --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' > | bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' > | bc -lq) > 2019-10-18 10:44:45,224 INFO impl.MetricsConfig: Loaded properties from > hadoop-metrics2.properties > 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: Scheduled Metric > snapshot period at 10 second(s). > 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: ozone-freon metrics > system started > 2019-10-18 10:44:47,140 [main] INFO - Number of Threads: 1 > 2019-10-18 10:44:47,145 [main] INFO - Number of Volumes: 1. > 2019-10-18 10:44:47,146 [main] INFO - Number of Buckets per Volume: 1. > 2019-10-18 10:44:47,146 [main] INFO - Number of Keys per Bucket: 5120. > 2019-10-18 10:44:47,147 [main] INFO - Key size: 1048576 bytes > 2019-10-18 10:44:47,147 [main] INFO - Buffer size: 65536 bytes > 2019-10-18 10:44:47,147 [main] INFO - validateWrites : false > 2019-10-18 10:44:47,151 [main] INFO - Starting progress bar Thread. > ... > 7.07% |???????? > | 362/5120 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org