Hi Chris, (inline)

On 17/08/11 21:12, David Lutterkort wrote:
On Wed, 2011-08-17 at 09:57 -0400, Chris Lalancette wrote:
Hey Marios,
      I know this is several months out of date, but I was just doing some
testing on the blob creation stuff and noticing that my libdeltacloud tests
were failing.  I traced it down to the fact that the blob_id parameter changed
from param[:blob_id] to param[:blob] when you added the streaming stuff to
blobs.

Yes thats right. Initially we had one operation for creating blobs:

POST /api/buckets/:bucket

and this accepted (amongst others) the 'blob_id' parameter to define the name of the blob

Then, in order to implement streaming PUT through deltacloud I added:

PUT /api/buckets/:bucket/:blob

The name change for the parameter was, I can only guess, some attempt to maintain consistency (i.e. 'blob' over 'blob_id') though in hindsight was not really necessary. Your suggested patch:


post "#{Sinatra::UrlForHelper::DEFAULT_URI_PREFIX}/buckets/:bucket" do
   bucket_id = params[:bucket]
-  blob_id = params['blob']
+  blob_id = params['blob'] || params['blob_id']

seems fine to me in that it won't break anything. If it maintains compatibility with your stuff then I personally have no objection to making this addition. More on PUT vs POST below


I think it's another case where the code does somehing special for the
HTML UI - the official API for creating a new blob is
PUT /api/buckets/:bucket/:blob; looking at this now, it seems strange
that we have two different ways to create blobs, and I am wondering if
we shouldn't drop the PUT, and only use POST for everything.


Yes, we have two methods for creating blobs: POST (http://incubator.apache.org/deltacloud/api#h4_3_8) and PUT (http://incubator.apache.org/deltacloud/api#h4_3_7).

The POST method is non-streaming:

client ---TEMP_FILE---> deltacloud ---STREAM---> provider

i.e., the client sends the blob to deltacloud, which receives the entire request and creates a temp_file for the blob data, and then streams this to the provider.

The PUT operation is streaming:

client ---STREAM---> deltacloud ---STREAM---> provider

i.e., the client sends the blob to deltacloud, which does not wait to receive the entire request and instead starts streaming the blob data to the provider as this is received.

Now, in order to create a blob on a given cloud provider service, you invariably must specify the content_length of the blob. For a PUT operation, the content_length is exactly as defined by the sending client in the PUT to deltacloud. Thus, we can take that content_length and start sending the data to the provider as we are receiving it.

However, for a POST operation, the content_length of the blob is not what is sent for the client POST operation to deltacloud, due to the presence of the multipart/form-data boundary, which will vary depending on the sending client. It became very messy/difficult to try and parse the boundary and 'guess' the content length of the blob in order to start streaming, which is why we decided to go with PUT. In fact, the cloud providers themselves (EC2, rackspace, Azure) use PUT operations to create blobs (with POST supported as an alternative).

Thus, we have both POST (non streaming, only to support HTML forms and the web browser interface) and PUT (streaming). If we want to remove one of those methods then I would definitely vote to remove POST since imho the streaming functionality for creating blobs is absolutely necessary for 'real world' use. Forcing deltacloud to buffer all blob objects before sending them on to the provider is obviously not very useful.

marios

David



Reply via email to