Re: [jclouds] Delete objects in a container efficiently. (#214)
[jclouds-java-7-pull-requests #1165](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1165/) SUCCESS This pull request looks good --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-38656006
Re: [jclouds] Delete objects in a container efficiently. (#214)
} - private boolean parentIsFolder(final ListContainerOptions options, final StorageMetadata md) { - return options.getDir() != null md.getName().indexOf('/') == -1; + private void waitForCompletion(final Semaphore semaphore, + final SetListenableFutureVoid outstandingFutures) { + // Wait for all futures to complete by waiting to acquire all + // semaphores. + try { + // TODO: Each individual blob delete operation itself has a time + // limit of 'maxTime'. Therefore having a time limit for this + // semaphore acquisition provides little value. This could be + // removed. I removed the timeout as the comment suggests since we can easily encounter this situation with large containers. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10997822
Re: [jclouds] Delete objects in a container efficiently. (#214)
I committed to master with some Checkstyle, Javadoc, and indentation fixes. Let's address some of the TODOs and test more before backporting to 1.7.x. Thank you for your contribution @shrinandj ! The previous implementation has troubled us for some years now. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-38737591
Re: [jclouds] Delete objects in a container efficiently. (#214)
Closed #214. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ final String fullPath = parentIsFolder(options, md) ? options.getDir() + + / + md.getName() : md.getName(); + +// Attempt to acquire a semaphore within the time limit. At least +// one outstanding future should complete within this period for the +// semaphore to be acquired. + try { +if (!semaphore.tryAcquire(maxTime, TimeUnit.MILLISECONDS)) { + throw new TimeoutException(Timeout waiting for semaphore); +} + } catch (InterruptedException ie) { +logger.debug(Interrupted while deleting blobs); +Thread.currentThread().interrupt(); + } + + final ListenableFutureVoid blobDelFuture; If you wrap ```blobDelFuture``` in an ```Optional``` will this remove some of the duplicated null assignments? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10940709
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ switch (md.getType()) { + case BLOB: +blobDelFuture = executorService.submit(new CallableVoid() { + @Override + public Void call() { + blobStore.removeBlob(containerName, fullPath); + return null; + } + }); +break; + case FOLDER: +if (options.isRecursive()) { + blobDelFuture = executorService.submit(new CallableVoid() { + @Override + public Void call() { + blobStore.deleteDirectory(containerName, fullPath); Previously ```clearContainer``` issued a batch of delete blob operations within a subtree before removing the parent but now it can issue delete blob operations concurrently with delete parent. I believe the new logic will generate operation failures. This is generally a hard problem to solve; can we simplify this for now by making the recursive call blocking? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10912447
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ // If a future to delete a blob/directory actually got created above, + // keep a reference of that in the outstandingFutures list. This is + // useful in case of a timeout exception. All outstanding futures can + // then be cancelled. + if (blobDelFuture != null) { +final AtomicReferenceListenableFutureVoid atomicBlobDelFuture = + new AtomicReferenceListenableFutureVoid(blobDelFuture); +outstandingFutures.add(atomicBlobDelFuture); + +// Add a callback to release the semaphore. This is required for +// other threads waiting to acquire a semaphore above to make +// progress. +Futures.addCallback(blobDelFuture, new FutureCallbackObject() { + @Override + public void onSuccess(final Object o) { + outstandingFutures.remove(atomicBlobDelFuture); Done. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10872324
Re: [jclouds] Delete objects in a container efficiently. (#214)
import com.google.common.util.concurrent.ListenableFuture; import com.google.common.util.concurrent.ListeningExecutorService; import com.google.inject.Inject; /** * Deletes all keys in the container - * + * * @author Adrian Cole Done --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10872338
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ * a timeout. Also, when the reference is removed from this list and when + * the executorService removes the reference that it has maintained, the + * future will be marked for GC since there should be no other references + * to it. This is important because this code can generate an unbounded + * number of futures. + */ + final ListAtomicReferenceListenableFutureVoid outstandingFutures = Collections +.synchronizedList(new ArrayListAtomicReferenceListenableFutureVoid()); + while (retries 0) { + deleteFailure.set(false); + executeOneIteration(containerName, listOptions.clone(), semaphore, + outstandingFutures, deleteFailure); + // Wait for all futures to complete by waiting to acquire all + // semaphores. + try { +if (!semaphore.tryAcquire(numOutStandingRequests, maxTime, Added a TODO for this. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10872424
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ listing = blobStore.list(containerName, options); + } catch (ContainerNotFoundException ce) { + return listing; + } + + // recurse on subdirectories + if (options.isRecursive()) { + for (StorageMetadata md : listing) { +String fullPath = parentIsFolder(options, md) ? options.getDir() + + / + md.getName() : md.getName(); +switch (md.getType()) { +case BLOB: + break; +case FOLDER: +case RELATIVE_PATH: + if (options.isRecursive() Removed. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10872428
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ String marker = listing.getNextMarker(); + if (marker != null) { +logger.debug(%s with marker %s, message, marker); +options = options.afterMarker(marker); +listing = getListing(containerName, options, semaphore, + outstandingFutures, deleteFailure); + } else { +break; + } + } + } + + public void execute(final String containerName, + ListContainerOptions listOptions) { + final AtomicBoolean deleteFailure = new AtomicBoolean(); + final int numOutStandingRequests = 1024; Done. There is a configurable property called jclouds.max-parallel-deletes that can be used for this. You're point about whether the # of semaphores should be equal to the # of threads in the executorService is valid. The default value of jclouds.max-parallel-threads is equal to the # of user threads. Given that these two are equal, I thought about whether we need the extra config option. Can we just use the value of PROPERTY_USER_THREADS instead. However, I'm not sure if the executorService object injected into DeleteAllKeysInList is also being used somewhere else. If it is used somewhere, I guess, it would be good to have an option where the use can limit the number of parallel deletes being issued when deleting a container thereby limiting the number of threads being used from the executorService. Let me know if this isn't necessary. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10872420
Re: [jclouds] Delete objects in a container efficiently. (#214)
Looks like a network glitch on cloudbees. code ERROR: Error cloning remote repo 'origin' hudson.plugins.git.GitException: Could not clone git://github.com/jclouds/jclouds.git ... stderr: fatal: unable to connect to github.com: github.com[0: 192.30.252.129]: errno=Connection timed out /code Can we start another build? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-38413431
Re: [jclouds] Delete objects in a container efficiently. (#214)
import com.google.common.util.concurrent.ListenableFuture; import com.google.common.util.concurrent.ListeningExecutorService; import com.google.inject.Inject; /** * Deletes all keys in the container - * + * * @author Adrian Cole Add your name to since this is a large body of new code? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10853480
Re: [jclouds] Delete objects in a container efficiently. (#214)
Can you open a JIRA issue and reference it in the commit message? This will allow us to communicate this improvement in the release notes. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-38321551
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ * acquired in 'maxTime', a TimeoutException is thrown. Any outstanding + * futures at that time are cancelled. + */ + final Semaphore semaphore = new Semaphore(numOutStandingRequests); + /* + * When a future is created, a reference for that is added to the + * outstandingFutures list. This reference is removed from the list in the + * FutureCallback since it no longer needs to be cancelled in the event of + * a timeout. Also, when the reference is removed from this list and when + * the executorService removes the reference that it has maintained, the + * future will be marked for GC since there should be no other references + * to it. This is important because this code can generate an unbounded + * number of futures. + */ + final ListAtomicReferenceListenableFutureVoid outstandingFutures = Collections +.synchronizedList(new ArrayListAtomicReferenceListenableFutureVoid()); Does the AtomicReference provide value that the synchronized wrapper does not? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10853586
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ * a timeout. Also, when the reference is removed from this list and when + * the executorService removes the reference that it has maintained, the + * future will be marked for GC since there should be no other references + * to it. This is important because this code can generate an unbounded + * number of futures. + */ + final ListAtomicReferenceListenableFutureVoid outstandingFutures = Collections +.synchronizedList(new ArrayListAtomicReferenceListenableFutureVoid()); + while (retries 0) { + deleteFailure.set(false); + executeOneIteration(containerName, listOptions.clone(), semaphore, + outstandingFutures, deleteFailure); + // Wait for all futures to complete by waiting to acquire all + // semaphores. + try { +if (!semaphore.tryAcquire(numOutStandingRequests, maxTime, These timeouts provide little value over the per-operation timouts and we may want to remove them in a subsequent commit. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10853876
Re: [jclouds] Delete objects in a container efficiently. (#214)
@shrinandj This commit represents a big improvement and I apologize for my delayed comments. Can you address some of these and add TODOs for the rest so we can commit this as soon as possible? Specifically we must address the O(n) behavior and I do not understand some of the synchronization. Further I have some doubts about how the size of the ```ExecutorService``` interacts with the ```Semaphore```; should the sizes of the two be the same? --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-38323072
Re: [jclouds] Delete objects in a container efficiently. (#214)
+ listing = blobStore.list(containerName, options); + } catch (ContainerNotFoundException ce) { + return listing; + } + + // recurse on subdirectories + if (options.isRecursive()) { + for (StorageMetadata md : listing) { +String fullPath = parentIsFolder(options, md) ? options.getDir() + + / + md.getName() : md.getName(); +switch (md.getType()) { +case BLOB: + break; +case FOLDER: +case RELATIVE_PATH: + if (options.isRecursive() Repeated test of isRecusive from above. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214/files#r10856243
Re: [jclouds] Delete objects in a container efficiently. (#214)
but haven't been able to reproduce the problem. OK, let's kick the PR builders again, then. I also suspect it's transient. Thanks, @shrinandj! --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36737307
Re: [jclouds] Delete objects in a container efficiently. (#214)
Reopened #214. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214
Re: [jclouds] Delete objects in a container efficiently. (#214)
[jclouds » jclouds #889](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/889/) SUCCESS This pull request looks good [(what's this?)](https://www.cloudbees.com/what-is-buildhive) --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36740432
Re: [jclouds] Delete objects in a container efficiently. (#214)
[jclouds-pull-requests #639](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/639/) SUCCESS This pull request looks good --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36741644
Re: [jclouds] Delete objects in a container efficiently. (#214)
[jclouds-java-7-pull-requests #1109](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1109/) SUCCESS This pull request looks good --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36742089
Re: [jclouds] Delete objects in a container efficiently. (#214)
I also suspect it's transient. Bingo ;-) --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36742345
Re: [jclouds] Delete objects in a container efficiently. (#214)
jclouds » jclouds #882 UNSTABLE This looks like a real [failure](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/org.apache.jclouds$jclouds-blobstore/882/testReport/junit/org.jclouds.blobstore.strategy.internal/DeleteAllKeysInListTest/testExecuteInDirectory/)? Could you have a look, @shrinandj? We can also close'n'reopen this PR to trigger the builders again if we think it's likely to be transient... --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36670947
Re: [jclouds] Delete objects in a container efficiently. (#214)
I am trying to reproduce this locally. I have run 50 iterations of this test so far, but haven't been able to reproduce the problem. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36681788
Re: [jclouds] Delete objects in a container efficiently. (#214)
I tried varying the number of threads from 10 to 50 and deleting a container with at least 5K blobs. Things worked fine. I'll rebase this patch on top of the current ToT and send out. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-36594782
Re: [jclouds] Delete objects in a container efficiently. (#214)
Sorry for ignoring this. By default jclouds uses an unlimited number of user threads, set in ```BaseApiMetadata.defaultProperties```. I believe this will cause problems for users of Atmos blobstores. We can work around this by setting a lower number in ```AtmosApiMetadata.defaultProperties```. Previously I have successfully tested with 8 user and 8 IO threads although this depends on the Atmos version and configuration. @shrinandj will run some further tests with the defaults and I will look at a solution to [JCLOUDS-137](https://issues.apache.org/jira/browse/JCLOUDS-137). I also opened [JCLOUDS-459](https://issues.apache.org/jira/browse/JCLOUDS-459) to track adding some limit to all blobstores. --- Reply to this email directly or view it on GitHub: https://github.com/jclouds/jclouds/pull/214#issuecomment-34665263