[jira] [Work logged] (HDDS-2330) Random key generator can get stuck

2019-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2330?focusedWorklogId=331775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331775
 ]

ASF GitHub Bot logged work on HDDS-2330:


Author: ASF GitHub Bot
Created on: 22/Oct/19 02:41
Start Date: 22/Oct/19 02:41
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #53: HDDS-2330. Random 
key generator can get stuck
URL: https://github.com/apache/hadoop-ozone/pull/53
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331775)
Time Spent: 20m  (was: 10m)

> Random key generator can get stuck
> --
>
> Key: HDDS-2330
> URL: https://issues.apache.org/jira/browse/HDDS-2330
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Freon's random key generator can get stuck waiting for completion (without 
> any hint to what's happening) if object creation encounters any 
> non-IOException.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: after a few hundred keys progress stops.
> {noformat}
> $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 
> --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' 
> | bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' 
> | bc -lq)
> 2019-10-18 10:44:45,224 INFO impl.MetricsConfig: Loaded properties from 
> hadoop-metrics2.properties
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: Scheduled Metric 
> snapshot period at 10 second(s).
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: ozone-freon metrics 
> system started
> 2019-10-18 10:44:47,140 [main] INFO   - Number of Threads: 1
> 2019-10-18 10:44:47,145 [main] INFO   - Number of Volumes: 1.
> 2019-10-18 10:44:47,146 [main] INFO   - Number of Buckets per Volume: 1.
> 2019-10-18 10:44:47,146 [main] INFO   - Number of Keys per Bucket: 5120.
> 2019-10-18 10:44:47,147 [main] INFO   - Key size: 1048576 bytes
> 2019-10-18 10:44:47,147 [main] INFO   - Buffer size: 65536 bytes
> 2019-10-18 10:44:47,147 [main] INFO   - validateWrites : false
> 2019-10-18 10:44:47,151 [main] INFO   - Starting progress bar Thread.
> ...
>  7.07% |  
>|  362/5120 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2330) Random key generator can get stuck

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2330?focusedWorklogId=330438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330438
 ]

ASF GitHub Bot logged work on HDDS-2330:


Author: ASF GitHub Bot
Created on: 18/Oct/19 11:29
Start Date: 18/Oct/19 11:29
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #53: HDDS-2330. 
Random key generator can get stuck
URL: https://github.com/apache/hadoop-ozone/pull/53
 
 
   ## What changes were proposed in this pull request?
   
   Fix the problem that any exception/error not caught by `ObjectCreator` ends 
the object creation task, but Freon's main thread continues waiting 
indefinitely, since the exception is not stored.
   
   https://issues.apache.org/jira/browse/HDDS-2330
   
   ## How was this patch tested?
   
   Verified that OOME is caught, reported, and results in Freon exiting with 
failure.
   
   ```
   $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
   $ docker-compose up -d
   $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 
--numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' | 
bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' | bc 
-lq)
   ...
6.66% |???  
|  341/5120 Time: 0:00:17
   [pool-2-thread-1] ERROR  - Exception while adding key: key-357-74353 in 
bucket: bucket-0-90611 of volume: vol-0-95721.
   java.lang.OutOfMemoryError: Java heap space
at java.base/java.nio.HeapByteBuffer.(HeapByteBuffer.java:61)
at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348)
at 
org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:81)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:233)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
100.00% 
|?|
  5120/5120 Time: 0:00:20
   java.lang.OutOfMemoryError: Java heap space
   
   ***
   Status: Failed
   Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
   Number of Volumes created: 1
   Number of Buckets created: 1
   Number of Keys added: 357
   Ratis replication factor: ONE
   Ratis replication type: RATIS
   Average Time spent in volume creation: 00:00:00,190
   Average Time spent in bucket creation: 00:00:00,030
   Average Time spent in key creation: 00:00:02,826
   Average Time spent in key write: 00:00:14,607
   Total bytes written: 374341632
   Total Execution time: 00:00:21,593
   ***
   ```
   
   Also verified that successful execution is not affected:
   
   ```
   $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 
--numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' | 
bc -lq) --numOfKeys 3 --bufferSize $(echo '2^16' | bc -lq)
   ...
100.00% 
|?|
  3/3 Time: 0:00:02
   
   ***
   Status: Success
   Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
   Number of Volumes created: 1
   Number of Buckets created: 1
   Number of Keys added: 3
   Ratis replication factor: ONE
   Ratis replication type: RATIS
   Average Time spent in volume creation: 00:00:00,083
   Average Time spent in bucket creation: 00:00:00,012
   Average Time spent in key creation: 00:00:00,069
   Average Time spent in key write: 00:00:01,611
   Total bytes written: 3145728
   Total Execution time: