Thanks guys, Let me know if I can help you in any way to debug it to the end. Sumit On Apr 26, 2014 2:10 AM, "Shrinand Javadekar" <[email protected]> wrote:
> Sumit, > > >> > After all experiments I feel there is a problem in Jclouds. > > There may be a problem with jclouds. I was trying to understand the > test and the environment better so as to get to the bottom of this. > > > On Fri, Apr 25, 2014 at 12:39 AM, Sumit Gaur <[email protected]> wrote: > > Hi Ignasi > > https://github.com/sumitkgaur/test > > > > 1) Example8.java is original programme and required all jclouds libs. > > 2) Worker.java is the delayed delete programme. > > > > Thanks > > sumit > > > > > > > > > > > > On Apr 25, 2014 3:45 PM, "Ignasi Barrera" <[email protected]> wrote: > > > >> Hi Sumit, > >> > >> Could you share the entire code of both programs in a git or pastie so > we > >> can understand better how your benchmark works, and reproduce it > locally? > >> El 25/04/2014 02:39, "Sumit Gaur" <[email protected]> escribió: > >> > >> > Hi Shri, > >> > After all experiments I feel there is a problem in Jclouds. > >> > > >> > 1) I tried retires for every 409 error. After successful retry Jclouds > >> > started reporting that blob is no more exists but in real it is still > >> there > >> > in SWIFT storage. > >> > 2) I try delaying the delete after 100 puts and voila there are no > 409 in > >> > 24 hours. That exactly says there are some race situation in jclouds > if > >> we > >> > do immediate Delete after PUT. > >> > 3) I know 409 Errors are coming all the way from SWIFT object server > but > >> > same is not happening even if I generate much higher "concurrent" load > >> from > >> > curl (PUT- GET-DEL) cycle. I was getting TPS of 150. > >> > 4) I have run SWIFT without any extra daemon like auditor and others > to > >> > avoid conflicts because of them. Storage node run only > >> > object/container/account server. > >> > 5) To generate concurrent curl load I am sending curl commands in the > >> > background. I ran this test for 48 hours and not even a single 409 > error. > >> > 6) For an idea of sequence of client code > >> > > >> > static BlobStoreContext getSwiftClientView() { > >> > return ContextBuilder.newBuilder("swift-keystone") > >> > .credentials("test:tester", "test123") > >> > .endpoint("http://a.x.y.z.:5000/v2.0/") > >> > .buildView(BlobStoreContext.class); > >> > } > >> > > >> > BlobStoreContext context = getSwiftClientView(); > >> > blobStore = context.getBlobStore(); > >> > blobStore.createContainerInLocation(null, containerName); > >> > blobStore.blobBuilder(key).payload(file).build(); > >> > > >> > blobStore.putBlob(containerName, blob); > >> > getBlob(containerName, key); > >> > blobStore.removeBlob(containerName, key); > >> > > >> > Let me know if you still see any gaps. > >> > Thanks > >> > sumit > >> > > >> > > >> > > >> > > >> > > >> > On Apr 23, 2014 2:36 AM, "Shrinand Javadekar" < > [email protected]> > >> > wrote: > >> > > >> > > So there are two problems: > >> > > > >> > > 1) 409 when deleting objects. > >> > > 2) Transactions taking longer after 24-48 hours. > >> > > > >> > > For (1), it looks like the request reached the Swift cluster but the > >> > > Swift cluster itself wasn't able to fulfill it. This could be > because > >> > > of the "eventual consistency" semantics of blobstores. When the > delete > >> > > request reached Swift, it could have been in the middle of some > >> > > operation on the object itself (e.g. reading the object for > >> > > replicating it, auditing it, etc). Jclouds did it's job of actually > >> > > sending the request. So not sure what else can be done here. Maybe > we > >> > > could add retries if the blobstore returns 409. But the main problem > >> > > lies on the Swift side. The Openstack mailing list would be a better > >> > > place for asking this question. There are many more Swift experts > >> > > there. > >> > > > >> > > For (2), from the curl example code, it looks like you're creating > >> > > multiple processes, each doing a put or a delete (no get). This is > >> > > different from jclouds spawning multiple threads. It would be great > if > >> > > the experiments count the number of transactions they're doing and > >> > > whether they both reach the same number of transactions in the given > >> > > amount of time. If they do and yet there are less txns via jclouds > >> > > compared to the shell script, we can conclude that jclouds is the > >> > > cause of the problem. > >> > > > >> > > Now, answering some of the questions below. > >> > > > >> > > > It would be great if someone let me know how jcloud delete works. > Is > >> > > there > >> > > > any internal queue while put or delete ? I saw if I put a small > sleep > >> > of > >> > > > 300ms between put n del call, it works fine. > >> > > > >> > > I presume the blobstore object you're using in Example9.blobStore is > >> > > of type "BlobStore" and not "AsyncBlobStore". AsyncBlobStore is > >> > > deprecated. The BlobStore object is synchronous. There is no queue. > >> > > When you call removeBlob, the request gets created and sent to the > >> > > Swift cluster. > >> > > > >> > > > Also I assume that jclouds calls are synchronous one n put could > not > >> > come > >> > > > out till object get saved in swift. > >> > > > >> > > For the BlobStore type, yes, it is sync. > >> > > > >> > > There are some jvm level settings that might also be at play here > >> > > related to the amount of memory you're allocating to the heap. You > >> > > could change the memory given to the jvm using the -Xms and -Xmx > >> > > options. > >> > > > >> > > -Shri > >> > > > >> > > > On Apr 22, 2014 11:59 AM, "Sumit Gaur" <[email protected]> > >> wrote: > >> > > > > >> > > >> Hi > >> > > >> Please find my answer below > >> > > >> > >> > > >> On Apr 22, 2014 10:49 AM, "Jasdeep Hundal" < > >> > > [email protected]> > >> > > >> wrote: > >> > > >> > > >> > > >> > Hey Sumit, > >> > > >> > > >> > > >> > I have a couple more questions that might help clarify the > >> > situation: > >> > > >> > > >> > > >> > 1. Are you running the stability test as a single long running > >> Java > >> > > >> process > >> > > >> > (that just keeps cycling through the 10 uploads/gets/deletes)? > >> > > >> > > >> > > >> > >> > > >> Yes. But this process has threads. > >> > > >> > >> > > >> > 2. Are you always running the test in the same container, or > are > >> you > >> > > >> > creating new containers for each test iteration? > >> > > >> > > >> > > >> No, I am doing roundrobin in 1000 containers > >> > > >> > >> > > >> > 3. If the answer to #2 is is that the test runs in a single > >> > container, > >> > > >> how > >> > > >> > many objects does that container currently have? > >> > > >> > > >> > > >> > >> > > >> 0 in ideal case. But as I m facing 409 delete fail also... so > there > >> > are > >> > > >> some objects on each container in hundreds only. > >> > > >> > >> > > >> > It may also help to time each of the individual blobstore > actions > >> as > >> > > you > >> > > >> > run the test to see if any particular one is slowing down. > >> > > >> > > >> > > >> > >> > > >> Even indivitual put and del time increase over the time. > >> > > >> > >> > > >> > Jasdeep > >> > > >> > > >> > > >> > > >> > > >> > On Mon, Apr 21, 2014 at 6:21 PM, Sumit Gaur < > [email protected] > >> > > >> > > >> wrote: > >> > > >> > > >> > > >> > > hi Shri, > >> > > >> > > Please find answers below > >> > > >> > > > >> > > >> > > On Tue, Apr 22, 2014 at 9:23 AM, Shrinand Javadekar < > >> > > >> > > [email protected] > >> > > >> > > > wrote: > >> > > >> > > Few more questions to try and understand this better: > >> > > >> > > > >> > > >> > > 1) On the Swift instance you are using, how many replicas do > you > >> > > have? > >> > > >> > > > >> > > >> > > 3 replica > >> > > >> > > > >> > > >> > > 2) Also, how are you using the curl command in the shell > script? > >> > > >> > > > >> > > >> > > send below command in backgroud for 10 iterations and wait > >> > similiar > >> > > to > >> > > >> the > >> > > >> > > 10 threads in jclouds. > >> > > >> > > > >> > > >> > > curl -X PUT -i -T 100k -H "X-Auth-Token: > >> > $OS_AUTH_TOKEN" > >> > > >> > > http:// > >> > > >> > > > >> > > >> > > > >> > > >> > >> > > > >> > > >> > $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt > >> > > >> > > curl -X DELETE -i -H "X-Auth-Token: > $OS_AUTH_TOKEN" > >> > > http:// > >> > > >> > > > >> > > >> > > > >> > > >> > >> > > > >> > > >> > $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt > >> > > >> > > > >> > > >> > > I > >> > > >> > > think the shell script and jclouds-with-10-parallel-threads > may > >> > not > >> > > be > >> > > >> > > doing the same amount of work. In 20 hours jclouds might be > >> doing > >> > > much > >> > > >> > > more work than the shell script. If you let the shell script > >> also > >> > go > >> > > >> > > upto that point, it might see failures too. Do you know how > many > >> > > >> > > PUT-GET-DEL operations have been performed when you start > seeing > >> > the > >> > > >> > > 409 errors. > >> > > >> > > > >> > > >> > > Actually 409 errors are coming since the start of the test > but > >> TPS > >> > > >> start > >> > > >> > > degrading after 24-48 hours. > >> > > >> > > On Apr 22, 2014 9:23 AM, "Shrinand Javadekar" < > >> > > [email protected] > >> > > >> > > >> > > >> > > wrote: > >> > > >> > > > >> > > >> > > > Few more questions to try and understand this better: > >> > > >> > > > > >> > > >> > > > 1) On the Swift instance you are using, how many replicas > do > >> you > >> > > >> have? > >> > > >> > > > 2) Also, how are you using the curl command in the shell > >> > script? I > >> > > >> > > > think the shell script and jclouds-with-10-parallel-threads > >> may > >> > > not > >> > > >> be > >> > > >> > > > doing the same amount of work. In 20 hours jclouds might be > >> > doing > >> > > >> much > >> > > >> > > > more work than the shell script. If you let the shell > script > >> > also > >> > > go > >> > > >> > > > upto that point, it might see failures too. Do you know how > >> many > >> > > >> > > > PUT-GET-DEL operations have been performed when you start > >> seeing > >> > > the > >> > > >> > > > 409 errors. > >> > > >> > > > > >> > > >> > > > -Shri > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > On Mon, Apr 21, 2014 at 4:55 PM, Sumit Gaur < > >> > [email protected] > >> > > > > >> > > >> > > wrote: > >> > > >> > > > > FYI ..This is block of code ..... also I am using > jclouds > >> > > 1.7.1 > >> > > >> > > (Stable > >> > > >> > > > > branch) > >> > > >> > > > > try { > >> > > >> > > > > String key = "objkey" + UUID.randomUUID(); > >> > > >> > > > > Blob blob = > >> > > >> > > > > > >> > > Example9.blobStore.blobBuilder(key).payload(Example9.file).build(); > >> > > >> > > > > > >> > > >> > > Example9.blobStore.putBlob(Example9.containerName+count, > >> > > >> > > > > blob); > >> > > >> > > > > > >> > > >> > > Example9.blobStore.getBlob(Example9.containerName+count, > >> > > >> > > > > key); > >> > > >> > > > > > >> > > >> > > > Example9.blobStore.removeBlob(Example9.containerName+count, > >> > > >> > > > > key); > >> > > >> > > > > } catch (Exception ace) { > >> > > >> > > > > System.out.println("Request failed for > >> objkey > >> > " > >> > > + > >> > > >> key > >> > > >> > > + " > >> > > >> > > > > " + ace); > >> > > >> > > > > } > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > On Tue, Apr 22, 2014 at 8:32 AM, Sumit Gaur < > >> > > [email protected]> > >> > > >> > > > wrote: > >> > > >> > > > > > >> > > >> > > > >> Hi Shri, > >> > > >> > > > >> Thanks for paying attention to it, Please find my > answers > >> > > below:- > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > >> On Tue, Apr 22, 2014 at 2:31 AM, Shrinand Javadekar < > >> > > >> > > > >> [email protected]> wrote: > >> > > >> > > > >> > >> > > >> > > > >>> Sumit, > >> > > >> > > > >>> > >> > > >> > > > >>> I realize that you had sent out a similar email > sometime > >> ago > >> > > >> about > >> > > >> > > > >>> performance degradation. I'm not sure if anyone has run > >> > these > >> > > >> types > >> > > >> > > of > >> > > >> > > > >>> long running experiments with jclouds. So this may be a > >> > first. > >> > > >> > > > >>> > >> > > >> > > > >> Tried to debug it in last 2 weeks without success. Want > to > >> > > >> understand > >> > > >> > > > more > >> > > >> > > > >> how jclouds code handle this use case or any pointers > that > >> > this > >> > > >> is a > >> > > >> > > > >> problematic use case would help > >> > > >> > > > >> > >> > > >> > > > >>> > >> > > >> > > > >>> The 409 status is returned because of a conflict [1]. > Are > >> > you > >> > > >> sure > >> > > >> > > you > >> > > >> > > > >>> didn't have two or more threads trying to delete the > same > >> > > object? > >> > > >> > > > >>> > >> > > >> > > > >> No two threads share the same object key in my programme > >> > > (String > >> > > >> key = > >> > > >> > > > >> "objkey" + UUID.randomUUID();). It is some kind of race > >> > between > >> > > >> PUT > >> > > >> > > and > >> > > >> > > > >> DEL call . If I put say 10 ms sleep between call then > there > >> > is > >> > > no > >> > > >> 409 > >> > > >> > > > error. > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > >>> Also, I see that that 409 is returned by Swift if you > try > >> to > >> > > >> delete a > >> > > >> > > > >>> container that isn't empty[2]. Is that something your > test > >> > > code > >> > > >> > > > >>> could've tried? > >> > > >> > > > >>> > >> > > >> > > > >> I am trying to delete objects .. not containers. > >> > > >> > > > >> > >> > > >> > > > >>> > >> > > >> > > > >>> When you say there was a similar test you're trying > with > >> > curl, > >> > > >> are > >> > > >> > > you > >> > > >> > > > >>> using the curl command-line utility or the libcurl > >> library? > >> > > >> > > > >> > >> > > >> > > > >> curl command in shell script with for loops. > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > >>> How are > >> > > >> > > > >>> you specifying the number of threads to use and what > >> object > >> > > each > >> > > >> > > > >>> thread should get/put/delete? > >> > > >> > > > >>> > >> > > >> > > > >> > >> > > >> > > > >> It is a java test programme using ThreadPoolExecutor. > >> > Somthing > >> > > >> > > similiar > >> > > >> > > > as > >> > > >> > > > >> here > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > > >> > > >> > > > >> > > >> > >> > > > >> > > >> > http://www.javacodegeeks.com/2013/01/java-thread-pool-example-using-executors-and-threadpoolexecutor.html > >> > > >> > > > >> > >> > > >> > > > >> Object is a 5KB file. with key = "objkey" + > >> > UUID.randomUUID(); > >> > > >> with > >> > > >> > > > Pool > >> > > >> > > > >> of 10 threads. > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > >> Hope this would give a good inside. Let me know if you > get > >> > any > >> > > >> problem > >> > > >> > > > >> here. > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > >>> > >> > > >> > > > >>> Thanks. > >> > > >> > > > >>> -Shri > >> > > >> > > > >>> > >> > > >> > > > >>> [1] > >> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html > >> > > >> > > > >>> [2] https://bugs.launchpad.net/horizon/+bug/1096084 > >> > > >> > > > >>> > >> > > >> > > > >>> On Sun, Apr 20, 2014 at 5:55 PM, Sumit Gaur < > >> > > >> [email protected]> > >> > > >> > > > wrote: > >> > > >> > > > >>> > Hi > >> > > >> > > > >>> > I using jclouds lib integrated with Openstack Swift+ > >> > > keystone > >> > > >> > > > >>> combinaiton. > >> > > >> > > > >>> > Things are working fine except stability test. After > >> 20-30 > >> > > >> hours of > >> > > >> > > > test > >> > > >> > > > >>> > jclouds/SWIFT start degrading in TPS and keep going > down > >> > > over > >> > > >> the > >> > > >> > > > time. > >> > > >> > > > >>> > > >> > > >> > > > >>> > 1) I am running the (PUT-GET-DEL) cycle in 10 > parallel > >> > > threads. > >> > > >> > > > >>> > 2) I am getting a lot of 409 and DEL failure for the > as > >> > > >> response > >> > > >> > > too > >> > > >> > > > >>> from > >> > > >> > > > >>> > SWIFT. > >> > > >> > > > >>> > 3) Direct similiar test from curl does not show much > >> > impact > >> > > >> and TPS > >> > > >> > > > >>> remain > >> > > >> > > > >>> > constant. > >> > > >> > > > >>> > > >> > > >> > > > >>> > Can sombody help me wht is going wrong here ? > >> > > >> > > > >>> > > >> > > >> > > > >>> > Thanks > >> > > >> > > > >>> > sumit > >> > > >> > > > >>> > >> > > >> > > > >> > >> > > >> > > > >> > >> > > >> > > > > >> > > >> > > > >> > > >> > >> > > > >> > > >> >
