On 19 Nov 2013, at 16:36 , Nick North <[email protected]> wrote: > I agree - any sensible HTTP client will do chunking or supply a content > length for the entire request. It's getting lengths of the individual > attachments and putting them into the initial JSON part of the request > that's difficult.
That’s the part where we accept patches :) Best Jan -- > > Luckily my main requirements are for file, string, or byte array > attachments, rather than arbitrary streams, and it's easy to get the > lengths of all of those without having to do stream traversal. So I'm > leaving the problem alone for a while and also assuming non-chunked > requests. But I would like to come back to it at some point, as there are > hints in the Apache HTTP client that it may chunk requests even if you ask > it not to, so the ability to accept chunked requests seems useful for > CouchDb. > > Nick > > > On 19 November 2013 01:45, Jan Lehnardt <[email protected]> wrote: > >> >> On 16 Nov 2013, at 13:31 , Nick North <[email protected]> wrote: >> >>> One more thought before I leave off for the moment. Although this >> endpoint was built for the replicator, it is very useful for other clients, >> as it is the only way to submit a document and its attachments in a single >> action. This is important if you're not allowed to update documents or if >> you want to guarantee that readers of documents in the database and its >> replicas never see a partial set of the document and its attachments. This >> use case suggests to me that the endpoint should be easy to use for >> everyone, if that can be done without harming replication. But the chunking >> business means I need to think some more before making a proposal on it. >> >> The API should totally work as simple as possible for clients other than >> the >> replicator. It just hasn’t been built that way yet and we are happy to >> accept >> patches :) — The mention that is was custom built for the replicator is >> just >> to explain the current limitations. >> >> That said, I think you either need a length OR chunking, but any self >> respecting >> HTTP client should make that trivial for you as the end user :) >> >> Best >> Jan >> -- >> >> >>> >>> Nick >>> >>>> On 16 Nov 2013, at 18:57, Robert Newson <[email protected]> wrote: >>>> >>>> Ah, no. Http requires either content length or a chunked encoding. We >> could >>>> certainly enhance this. My point was that this endpoint was built for >> the >>>> replicator. >>>>> On 16 Nov 2013 18:54, "Nick North" <[email protected]> wrote: >>>>> >>>>> Thanks for the quick reply. I see what you're saying, though it still >>>>> seems to me that CouchDb could accept incoming non-chunked requests >> where >>>>> individual attachments do not have their lengths specified. They could >> be >>>>> calculated on receipt and kept for use in replication. That would make >> use >>>>> of client libraries like the Apache Java HttpClient easier. But maybe >> my >>>>> lack of detailed knowledge of HTTP is showing. >>>>> >>>>> Nick >>>>> >>>>>> On 16 Nov 2013, at 18:24, Robert Newson <[email protected]> wrote: >>>>>> >>>>>> Because we haven't written the code to handle multipart/related >>>>>> responses where each item is also a chunked response, and we haven't >>>>>> done that because the replicator could always form a non-chunked >>>>>> request since it already knows the sizes. >>>>>> >>>>>> B. >>>>>> >>>>>> >>>>>>> On 16 November 2013 18:11, Nick North <[email protected]> wrote: >>>>>>> I'm working with CouchDb documents with multiple attachments, >> submitted >>>>>>> using MIME multipart/related requests. In this case the document JSON >>>>> has >>>>>>> to have an "_attachments" property specifying each attachment's name, >>>>>>> content type and length as described >>>>>>> here< >>>>> http://wiki.apache.org/couchdb/HTTP_Document_API#Multiple_Attachments >>> . >>>>>>> The document and attachments are MIME-encoded and submitted in a >> single >>>>>>> request. >>>>>>> >>>>>>> Although this works, programming it is awkward as each attachment's >>>>> length >>>>>>> must be known in advance in order to populate the _attachments >> property. >>>>>>> Attachments are often in the form of streams, and finding the length >>>>> means >>>>>>> having to read through the whole stream. Then you have to spool >> through >>>>> the >>>>>>> stream again when submitting the HTTP request. (In some languages I >>>>> suspect >>>>>>> the only way to do this is to buffer the entire stream contents in >>>>> memory.) >>>>>>> If the length did not have to be put into the initial JSON object, >> then >>>>> the >>>>>>> stream could just be passed straight through to the HTTP request >> with no >>>>>>> need for reading twice or buffering in memory. >>>>>>> >>>>>>> So my question is: why does CouchDb require the length to be >> supplied? >>>>> It's >>>>>>> definitely necessary as I've tried giving the wrong length, or no >>>>> length at >>>>>>> all, and that causes the request to fail. But a quick look at the >> Erlang >>>>>>> source suggests that the length is not used when parsing the request, >>>>> and >>>>>>> presumably that parsing process could calculate each attachment's >> length >>>>>>> for use later on if it's needed. >>>>>>> >>>>>>> If, in principle, the length could be dropped when submitting >> requests, >>>>>>> then I'd be interested in trying to modify the code to make that >>>>> possible. >>>>>>> But, if there is a good reason why it has to be supplied, then I >> don't >>>>> want >>>>>>> to waste time working out what's going on in the Erlang. So any >> advice >>>>> on >>>>>>> why attachments were designed as they are would be very welcome. Many >>>>>>> thanks, >>>>>>> >>>>>>> Nick >>>>> >> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
