Hi Shri,
After all experiments I feel there is a problem in Jclouds.
1) I tried retires for every 409 error. After successful retry Jclouds
started reporting that blob is no more exists but in real it is still there
in SWIFT storage.
2) I try delaying the delete after 100 puts and voila there are no 409 in
24 hours. That exactly says there are some race situation in jclouds if we
do immediate Delete after PUT.
3) I know 409 Errors are coming all the way from SWIFT object server but
same is not happening even if I generate much higher "concurrent" load from
curl (PUT- GET-DEL) cycle. I was getting TPS of 150.
4) I have run SWIFT without any extra daemon like auditor and others to
avoid conflicts because of them. Storage node run only
object/container/account server.
5) To generate concurrent curl load I am sending curl commands in the
background. I ran this test for 48 hours and not even a single 409 error.
6) For an idea of sequence of client code
static BlobStoreContext getSwiftClientView() {
return ContextBuilder.newBuilder("swift-keystone")
.credentials("test:tester", "test123")
.endpoint("http://a.x.y.z.:5000/v2.0/")
.buildView(BlobStoreContext.class);
}
BlobStoreContext context = getSwiftClientView();
blobStore = context.getBlobStore();
blobStore.createContainerInLocation(null, containerName);
blobStore.blobBuilder(key).payload(file).build();
blobStore.putBlob(containerName, blob);
getBlob(containerName, key);
blobStore.removeBlob(containerName, key);
Let me know if you still see any gaps.
Thanks
sumit
On Apr 23, 2014 2:36 AM, "Shrinand Javadekar" <[email protected]>
wrote:
> So there are two problems:
>
> 1) 409 when deleting objects.
> 2) Transactions taking longer after 24-48 hours.
>
> For (1), it looks like the request reached the Swift cluster but the
> Swift cluster itself wasn't able to fulfill it. This could be because
> of the "eventual consistency" semantics of blobstores. When the delete
> request reached Swift, it could have been in the middle of some
> operation on the object itself (e.g. reading the object for
> replicating it, auditing it, etc). Jclouds did it's job of actually
> sending the request. So not sure what else can be done here. Maybe we
> could add retries if the blobstore returns 409. But the main problem
> lies on the Swift side. The Openstack mailing list would be a better
> place for asking this question. There are many more Swift experts
> there.
>
> For (2), from the curl example code, it looks like you're creating
> multiple processes, each doing a put or a delete (no get). This is
> different from jclouds spawning multiple threads. It would be great if
> the experiments count the number of transactions they're doing and
> whether they both reach the same number of transactions in the given
> amount of time. If they do and yet there are less txns via jclouds
> compared to the shell script, we can conclude that jclouds is the
> cause of the problem.
>
> Now, answering some of the questions below.
>
> > It would be great if someone let me know how jcloud delete works. Is
> there
> > any internal queue while put or delete ? I saw if I put a small sleep of
> > 300ms between put n del call, it works fine.
>
> I presume the blobstore object you're using in Example9.blobStore is
> of type "BlobStore" and not "AsyncBlobStore". AsyncBlobStore is
> deprecated. The BlobStore object is synchronous. There is no queue.
> When you call removeBlob, the request gets created and sent to the
> Swift cluster.
>
> > Also I assume that jclouds calls are synchronous one n put could not come
> > out till object get saved in swift.
>
> For the BlobStore type, yes, it is sync.
>
> There are some jvm level settings that might also be at play here
> related to the amount of memory you're allocating to the heap. You
> could change the memory given to the jvm using the -Xms and -Xmx
> options.
>
> -Shri
>
> > On Apr 22, 2014 11:59 AM, "Sumit Gaur" <[email protected]> wrote:
> >
> >> Hi
> >> Please find my answer below
> >>
> >> On Apr 22, 2014 10:49 AM, "Jasdeep Hundal" <
> [email protected]>
> >> wrote:
> >> >
> >> > Hey Sumit,
> >> >
> >> > I have a couple more questions that might help clarify the situation:
> >> >
> >> > 1. Are you running the stability test as a single long running Java
> >> process
> >> > (that just keeps cycling through the 10 uploads/gets/deletes)?
> >> >
> >>
> >> Yes. But this process has threads.
> >>
> >> > 2. Are you always running the test in the same container, or are you
> >> > creating new containers for each test iteration?
> >> >
> >> No, I am doing roundrobin in 1000 containers
> >>
> >> > 3. If the answer to #2 is is that the test runs in a single container,
> >> how
> >> > many objects does that container currently have?
> >> >
> >>
> >> 0 in ideal case. But as I m facing 409 delete fail also... so there are
> >> some objects on each container in hundreds only.
> >>
> >> > It may also help to time each of the individual blobstore actions as
> you
> >> > run the test to see if any particular one is slowing down.
> >> >
> >>
> >> Even indivitual put and del time increase over the time.
> >>
> >> > Jasdeep
> >> >
> >> >
> >> > On Mon, Apr 21, 2014 at 6:21 PM, Sumit Gaur <[email protected]>
> >> wrote:
> >> >
> >> > > hi Shri,
> >> > > Please find answers below
> >> > >
> >> > > On Tue, Apr 22, 2014 at 9:23 AM, Shrinand Javadekar <
> >> > > [email protected]
> >> > > > wrote:
> >> > > Few more questions to try and understand this better:
> >> > >
> >> > > 1) On the Swift instance you are using, how many replicas do you
> have?
> >> > >
> >> > > 3 replica
> >> > >
> >> > > 2) Also, how are you using the curl command in the shell script?
> >> > >
> >> > > send below command in backgroud for 10 iterations and wait similiar
> to
> >> the
> >> > > 10 threads in jclouds.
> >> > >
> >> > > curl -X PUT -i -T 100k -H "X-Auth-Token: $OS_AUTH_TOKEN"
> >> > > http://
> >> > >
> >> > >
> >>
> $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt
> >> > > curl -X DELETE -i -H "X-Auth-Token: $OS_AUTH_TOKEN"
> http://
> >> > >
> >> > >
> >>
> $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt
> >> > >
> >> > > I
> >> > > think the shell script and jclouds-with-10-parallel-threads may not
> be
> >> > > doing the same amount of work. In 20 hours jclouds might be doing
> much
> >> > > more work than the shell script. If you let the shell script also go
> >> > > upto that point, it might see failures too. Do you know how many
> >> > > PUT-GET-DEL operations have been performed when you start seeing the
> >> > > 409 errors.
> >> > >
> >> > > Actually 409 errors are coming since the start of the test but TPS
> >> start
> >> > > degrading after 24-48 hours.
> >> > > On Apr 22, 2014 9:23 AM, "Shrinand Javadekar" <
> [email protected]
> >> >
> >> > > wrote:
> >> > >
> >> > > > Few more questions to try and understand this better:
> >> > > >
> >> > > > 1) On the Swift instance you are using, how many replicas do you
> >> have?
> >> > > > 2) Also, how are you using the curl command in the shell script? I
> >> > > > think the shell script and jclouds-with-10-parallel-threads may
> not
> >> be
> >> > > > doing the same amount of work. In 20 hours jclouds might be doing
> >> much
> >> > > > more work than the shell script. If you let the shell script also
> go
> >> > > > upto that point, it might see failures too. Do you know how many
> >> > > > PUT-GET-DEL operations have been performed when you start seeing
> the
> >> > > > 409 errors.
> >> > > >
> >> > > > -Shri
> >> > > >
> >> > > >
> >> > > > On Mon, Apr 21, 2014 at 4:55 PM, Sumit Gaur <[email protected]
> >
> >> > > wrote:
> >> > > > > FYI ..This is block of code ..... also I am using jclouds
> 1.7.1
> >> > > (Stable
> >> > > > > branch)
> >> > > > > try {
> >> > > > > String key = "objkey" + UUID.randomUUID();
> >> > > > > Blob blob =
> >> > > > >
> Example9.blobStore.blobBuilder(key).payload(Example9.file).build();
> >> > > > >
> >> > > Example9.blobStore.putBlob(Example9.containerName+count,
> >> > > > > blob);
> >> > > > >
> >> > > Example9.blobStore.getBlob(Example9.containerName+count,
> >> > > > > key);
> >> > > > >
> >> > > > Example9.blobStore.removeBlob(Example9.containerName+count,
> >> > > > > key);
> >> > > > > } catch (Exception ace) {
> >> > > > > System.out.println("Request failed for objkey "
> +
> >> key
> >> > > + "
> >> > > > > " + ace);
> >> > > > > }
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Apr 22, 2014 at 8:32 AM, Sumit Gaur <
> [email protected]>
> >> > > > wrote:
> >> > > > >
> >> > > > >> Hi Shri,
> >> > > > >> Thanks for paying attention to it, Please find my answers
> below:-
> >> > > > >>
> >> > > > >>
> >> > > > >> On Tue, Apr 22, 2014 at 2:31 AM, Shrinand Javadekar <
> >> > > > >> [email protected]> wrote:
> >> > > > >>
> >> > > > >>> Sumit,
> >> > > > >>>
> >> > > > >>> I realize that you had sent out a similar email sometime ago
> >> about
> >> > > > >>> performance degradation. I'm not sure if anyone has run these
> >> types
> >> > > of
> >> > > > >>> long running experiments with jclouds. So this may be a first.
> >> > > > >>>
> >> > > > >> Tried to debug it in last 2 weeks without success. Want to
> >> understand
> >> > > > more
> >> > > > >> how jclouds code handle this use case or any pointers that this
> >> is a
> >> > > > >> problematic use case would help
> >> > > > >>
> >> > > > >>>
> >> > > > >>> The 409 status is returned because of a conflict [1]. Are you
> >> sure
> >> > > you
> >> > > > >>> didn't have two or more threads trying to delete the same
> object?
> >> > > > >>>
> >> > > > >> No two threads share the same object key in my programme
> (String
> >> key =
> >> > > > >> "objkey" + UUID.randomUUID();). It is some kind of race between
> >> PUT
> >> > > and
> >> > > > >> DEL call . If I put say 10 ms sleep between call then there is
> no
> >> 409
> >> > > > error.
> >> > > > >>
> >> > > > >>
> >> > > > >>> Also, I see that that 409 is returned by Swift if you try to
> >> delete a
> >> > > > >>> container that isn't empty[2]. Is that something your test
> code
> >> > > > >>> could've tried?
> >> > > > >>>
> >> > > > >> I am trying to delete objects .. not containers.
> >> > > > >>
> >> > > > >>>
> >> > > > >>> When you say there was a similar test you're trying with curl,
> >> are
> >> > > you
> >> > > > >>> using the curl command-line utility or the libcurl library?
> >> > > > >>
> >> > > > >> curl command in shell script with for loops.
> >> > > > >>
> >> > > > >>
> >> > > > >>> How are
> >> > > > >>> you specifying the number of threads to use and what object
> each
> >> > > > >>> thread should get/put/delete?
> >> > > > >>>
> >> > > > >>
> >> > > > >> It is a java test programme using ThreadPoolExecutor. Somthing
> >> > > similiar
> >> > > > as
> >> > > > >> here
> >> > > > >>
> >> > > > >>
> >> > > >
> >> > >
> >>
> http://www.javacodegeeks.com/2013/01/java-thread-pool-example-using-executors-and-threadpoolexecutor.html
> >> > > > >>
> >> > > > >> Object is a 5KB file. with key = "objkey" + UUID.randomUUID();
> >> with
> >> > > > Pool
> >> > > > >> of 10 threads.
> >> > > > >>
> >> > > > >>
> >> > > > >> Hope this would give a good inside. Let me know if you get any
> >> problem
> >> > > > >> here.
> >> > > > >>
> >> > > > >>
> >> > > > >>>
> >> > > > >>> Thanks.
> >> > > > >>> -Shri
> >> > > > >>>
> >> > > > >>> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
> >> > > > >>> [2] https://bugs.launchpad.net/horizon/+bug/1096084
> >> > > > >>>
> >> > > > >>> On Sun, Apr 20, 2014 at 5:55 PM, Sumit Gaur <
> >> [email protected]>
> >> > > > wrote:
> >> > > > >>> > Hi
> >> > > > >>> > I using jclouds lib integrated with Openstack Swift+
> keystone
> >> > > > >>> combinaiton.
> >> > > > >>> > Things are working fine except stability test. After 20-30
> >> hours of
> >> > > > test
> >> > > > >>> > jclouds/SWIFT start degrading in TPS and keep going down
> over
> >> the
> >> > > > time.
> >> > > > >>> >
> >> > > > >>> > 1) I am running the (PUT-GET-DEL) cycle in 10 parallel
> threads.
> >> > > > >>> > 2) I am getting a lot of 409 and DEL failure for the as
> >> response
> >> > > too
> >> > > > >>> from
> >> > > > >>> > SWIFT.
> >> > > > >>> > 3) Direct similiar test from curl does not show much impact
> >> and TPS
> >> > > > >>> remain
> >> > > > >>> > constant.
> >> > > > >>> >
> >> > > > >>> > Can sombody help me wht is going wrong here ?
> >> > > > >>> >
> >> > > > >>> > Thanks
> >> > > > >>> > sumit
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > >
> >> > >
> >>
>