[ https://issues.apache.org/jira/browse/OAK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Wehner updated OAK-4565: ------------------------------- Attachment: OAK-4565-TRUNK.patch OAK-4565-1.4.patch Attached are proposed patches (for trunk & 1.4) which spool the metadata record to a temporary file and hand that file over to the TransferManager. With this patch we were able to successfully run a Blob GC on our 47TB repository for the first time. It's a bit unfortunate that in the GC case the file already exists on disk and has to be recreated again, but since the signature of {{SharedDatastore}} requires an InputStream I see no other way of solving this. Also it kind of slows down the common use case of just creating 0-byte marker records, but this is so infrequent that it shouldn't be a problem. > S3Backend fails to upload large metadata records > ------------------------------------------------ > > Key: OAK-4565 > URL: https://issues.apache.org/jira/browse/OAK-4565 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob > Affects Versions: 1.4.5 > Reporter: Martin Wehner > Labels: gc, s3 > Attachments: OAK-4565-1.4.patch, OAK-4565-TRUNK.patch > > > If a large enough metadata record is added to a S3 DS (like the list of blob > references collected during the mark phase of the MarkSweepGC) the upload > will fail (i.e. never start). This is caused by > {{S3Backend.addMetadataRecord()}} providing an InputStream to the S3 > TransferManager without specifying the size in the Metadata. > A warning to this effect is logged by the AWS SDK each time you add a > metadata record: > {noformat} > [s3-transfer-manager-worker-1] AmazonS3Client.java:1364 No content length > specified for stream data. Stream contents will be buffered in memory and > could result in out of memory errors. > {noformat} > Normally this shouldn't be too big of a problem but in a repository with over > 36 million blob references the list of marked refs produced by the GC is over > 5GB. In this case the S3 transfer worker thread will be stuck in a seemingly > endless loop where it tries to allocate the memory reading the file into > memory and never finishes (although the JVM has 80GB of heap), eating away > resources in the process: > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.http.util.ByteArrayBuffer.append(ByteArrayBuffer.java:90) > at org.apache.http.util.EntityUtils.toByteArray(EntityUtils.java:137) > at > org.apache.http.entity.BufferedHttpEntity.<init>(BufferedHttpEntity.java:63) > at > com.amazonaws.http.HttpRequestFactory.newBufferedHttpEntity(HttpRequestFactory.java:247) > at > com.amazonaws.http.HttpRequestFactory.createHttpRequest(HttpRequestFactory.java:126) > at > com.amazonaws.http.AmazonHttpClient$ExecOneRequestParams.newApacheRequest(AmazonHttpClient.java:650) > at > com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:730) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:505) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:317) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3595) > at > com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1382) > at > com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131) > at > com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123) > at > com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139) > at > com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The last log message by the GC thread will be like this: > {noformat} > *INFO* [sling-oak-observation-1273] > org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of > valid blob references marked under mark phase of Blob garbage collection > [36147734] > {noformat} > followed by the above AWS warning, then it will stall waiting for the > transfer to finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)