On 06/27/2017 08:36 AM, Brian Bouterse wrote:
> I thought that we pulled out the chunking uploads from the MVP. IIRC, @jortel 
> and I thought since that use
> case was for high performing (parallel) uploads and it should be on the 3.1+ 
> page.
> 
> +1 to just sending data without having a file handle. If the entire file is 
> delivered in one request then
> having a file ID to upload to in a second request is just cumbersome.
> +1 to having the handler receiving that file just make it an Artifact() right 
> away. This will work better with
> how Django handles file uploads.

A few things about uploading directly to an Artifact:

- The artifact FK to a content unit would need to become optional.

- Need to add use cases for cleaning up artifacts not associated with a content 
unit.

- The upload API would need additional information needed to create an 
artifact.  Like relative path, size,
checksums etc.

- Since (I assume) you are proposing uploading/writing directly to artifact 
storage (not staging in a working
dir), the flow would need to involve (optional) validation.  If validation 
fails, the artifact must not be
inserted into the DB.


-jeff


> 
> I also think we can skip making one Artifact from another. That is not going 
> to be a commonly used use case I
> think. So removing that use case and chunking that would be:
> 
>   * As an authenticated user, I can upload a file which becomes an Artifact. 
> At the end up the of upload, the
>     server returns the JSON representation of the created Artifact.
>   * As an authenticated user, I can create a content unit by providing the 
> content type, its Artifacts using
>     IDs for each Artifact, and the metadata supplied in the POST body. This 
> call is atomic, content unit is
>     created in the database and on the filesystem or not at all.
> 
> The biggest reason I think to do this adjustment is to aligns with the users 
> desire to have uploads take fewer
> calls. This removes at least two calls from the workflow. It also avoids 
> having to save the data multiple
> times which I don't think we can do practically.
> 
> Thoughts or ideas?
> 
> -Brian
> 
> On Tue, Jun 27, 2017 at 8:55 AM, Dennis Kliban <dkli...@redhat.com 
> <mailto:dkli...@redhat.com>> wrote:
> 
>     My motivations for writing this email include: recent discussion about 
> pulp 2 upload API in #pulp and
>     django's documentation on file uploads.
> 
>     Files uploaded to Django are initially stored in memory (if under 2.5 mb) 
> or Python's tempfile module is
>     used to write it to /tmp/ directory. The file created in /tmp is deleted 
> when and if the last file handle
>     is closed.
> 
>     If we implement the upload API as described in the MVP doc[0], then 
> according to Django docs[1] we will be
>     performing a write to disk 2 or 3 times for each upload. In cases where a 
> file is bigger than 2.5mb in
>     size, it will be first written to /tmp. The same file will then be 
> written to /var/lib/pulp/uploads (or
>     similar location) when the FileUpload model is saved. A third write will 
> occur when an artifact is created
>     using the FileUpload. This third write will likely be a move though.
> 
>     I propose that we eliminate writing the uploaded file to 
> /var/lib/pulp/upload and go directly to creating
>     an artifact. The use cases can then be rewritten as the following:
> 
>       * As an authenticated user, I can upload a file with an optional chunk 
> size, and an optional offset. At
>         the end up the of upload the server returns the JSON representation 
> of the artifact.
> 
> 
>       * As an authenticated user, I can create a new artifact by specifying 
> an existing artifact id.
> 
> 
>       * As an authenticated user, I can create a content unit by providing 
> the content type, its Artifacts
>         using IDs for each Artifact, and the metadata supplied in the POST 
> body. This call is atomic, content
>         unit is created in the database and on the filesystem or not at all.
> 
> 
> 
> 
>     [0] 
> https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_Viable_Product#Upload-amp-Copy
>     
> <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_Viable_Product#Upload-amp-Copy>
>     [1] 
> https://docs.djangoproject.com/en/1.9/topics/http/file-uploads/#handling-uploaded-files-with-a-model
>     
> <https://docs.djangoproject.com/en/1.9/topics/http/file-uploads/#handling-uploaded-files-with-a-model>
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to