A lot of these deficiencies are drastically improved with static large objects - and non-trivial to address (impossible?) with DLO's because of their dynamic nature. It's unfortunate, but DLO's don't really serve your use-case very well - and you should find a way to transition to SLO's [1].
We talked about improving the checksumming behavior in SLO's for the general naive sync case back at the hack-a-thon before the Vancouver summit - but it's tricky (MD5 => CRC) - and would probably require a API version bump. All we've been able to get done so far is improve the native client handling [2] - but if using SLO's you may find a similar solution quite manageable. Thanks for the feedback. -Clay 1. http://docs-draft.openstack.org/91/219991/7/check/gate-swift-docs/75fb84c//doc/build/html/overview_large_objects.html#module-swift.common.middleware.slo 2. https://github.com/openstack/python-swiftclient/commit/ff0b3b02f07de341fa9eb81156ac2a0565d85cd4 On Friday, October 9, 2015, Pierre SOUCHAY <pierre.souc...@cloudwatt.com> wrote: > Hi Swift Developpers, > > We have been using Swift as a IAAS provider for more than two years now, > but this mail is about feedback on the API side. I think it would be great > to include some of the ideas in future revisions of API. > > I’ve been developping a few Swift clients in HTML (in Cloudwatt Dashboard) > with CORS, Java with Swing GUI ( > https://github.com/pierresouchay/swiftbrowser) and Go for Swift to > filesystem (https://github.com/pierresouchay/swiftsync/), so I have now a > few ideas about how improving a bit the API. > > The API is quite straightforward and intuitive to use, and writing a > client is now that difficult, but unfortunately, the Large Object support > is not easy at all to deal with. > > The biggest issue is that there is now way to know whenever a file is a > large object when performing listings using JSON format, since, AFAIK a > large object is an object with 0 bytes (so its size in bytes is 0), but it > also has a hash of a zero file bytes. > > For instance, a signature of such object is : > {"hash": "d41d8cd98f00b204e9800998ecf8427e", "last_modified": > "2015-06-04T10:23:57.618760", "bytes": 0, "name": "5G", "content_type": " > octet/stream"} > > which is, exactly the hash of a 0 bytes file : > $ echo -n | md5 > d41d8cd98f00b204e9800998ecf8427e > > Ok, now lets try HEAD : > $ curl -vv -XHEAD -H X-Auth-Token:$TOKEN ' > https://storage.fr1.cloudwatt.com/v1/AUTH_61b8fe6dfd0a4ce69f6622ea74444e0f/large_files/5G > … > < HTTP/1.1 200 OK > < Date: Fri, 09 Oct 2015 19:43:09 GMT > < Content-Length: 5000000000 > < Accept-Ranges: bytes > < X-Object-Manifest: large_files/5G/.part-5000000000- > < Last-Modified: Thu, 04 Jun 2015 10:16:33 GMT > < Etag: "479517ec4767ca08ed0547dca003d116" > < X-Timestamp: 1433413437.61876 > < Content-Type: octet/stream > < X-Trans-Id: txba36522b0b7743d683a5d-00561818cd > > WTF ? While all files have the same value for ETag and hash, this is not > the case for Large files… > > Furthermore, the ETag is not the md5 of the whole file, but the hash of > the hash of all manifest files (as described somewhere hidden deeply in the > documentation) > > Why this is a problem ? > ------------------------------- > > Imagine a « naive » client using the API which performs some kind of Sync. > > The client download each file and when it syncs, compares the local md5 to > the md5 of the listing… of course, the hash is the hash of a zero bytes > files… so it downloads the file again… and again… and again. Unfortunaly > for our naive client, this is exactly the kind of files we don’t want to > download twice… since the file is probably huge (after all, it has been > split for a reason no ?) > > I think this is really a design flaw since you need to know everything > about Swift API and extensions to have a proper behavior. The minimum would > be to at least return the same value as the ETag header. > > OK, let’s continue… > > We are not so Naive… our Swift Sync client know that 0 files needs more > work. > > * First issue: we have to know whenever the file is a « real » 0 bytes > file or not. You may think most people do not create 0 bytes files after > all… this is dummy. Actually, some I have seen two Object Storage > middleware using many 0 bytes files (for instance to store meta data or two > set up some kind of directory like structure). So, in this cas, we need to > perform a HEAD request to each 0 bytes files. If you have 1000 files like > this, you have to perform 1000 HEAD requests to finally know that there are > not any Large file. Not very efficient. Your Swift Sync client took 1 > second to sync 20G of data with naive approach, now, you need 5 minutes… > hash of 0 bytes is not a good idea at all. > > * Second issue: since the hash is the hash of all parts (I have an idea > about why this decision was made, probably for performance reasons), your > client cannot work on files since the hash of local file is not the hash of > the Swift aggregated file (which is the hash of all the hash of manifest). > So, it means you cannot work on existing data, you have to either : > - split all the files in the same way as the manifest, compute the MD5 of > each part, than compute the MD5 of the hashes and compare to the MD5 on > server… (ok… doable, but I gave up with such system) > - have a local database in your client (when you download, store the REAL > Hash of file and store that in fact you have to compare it the the HASH > returned by server) > - perform some kind of crappy heuristics (size + grab the starting bytes > of each data of each part or something like that…) > > * Third issue: > - If you don’t want to store the parts of your object file, you have to > wait for all your HEAD requests to finish since it is the only way to guess > all the files that are referenced in your manifest headers. > > So summarize, I think the current API really need some refinements about > the listings since a competent developper may trust the bytes value and the > hash value and create an algorithm that does not behave nicely. So, the API > looks easy but is in fact much more complicated than expected. > > A few ideas to improve it : > > In listings, if an Object is a large object. > - either put the real MD5 of file if it is doable technically… or remove > it (so naive program will work nicely)… same thing about bytes. > - add an optional field in the JSON to tell the object is in fact a large > object. A nice field to explain the object is a large object would be to > use the object-manifest header value. So a client could know the file is a > large file or simply a zero byte object, and also know what are the object > that are in facts parts of a larger one (and do not wait for you thousands > of HEAD requests to finish) > > Finally, to help people creating interfaces quickly, add an Option to add > CORS for all containers of an account. In our Cloud provider, we added a > REST CALL in another WebService with CORS enabled that ensures a container > has CORS setup for a Container. So, browsing Swift with HTML5 interfaces is > easy. By doing so, it would - I think - greatly increase the Swift Usage > (by not needing any specific software to browse Swift). > > Best Regards > > > -- > Pierre Souchay <pierre.souc...@cloudwatt.com> > Software Architect @ CloudWatt > > Adresse : ETIK 892, Rue Yves Kermen 92100 Boulogne-Billancourt > N° Standard : +33 1 84 01 04 04 > N° Fax : +33 1 84 01 04 05 >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev