Re: [pkg-discuss] Code review request for [bugs 1154, 1237, 1845, 1887, 1888]

Shawn Walker Thu, 15 May 2008 16:54:51 -0700

2008/5/15  <[EMAIL PROTECTED]>:
> Hey Dude,
> Thanks for doing this work.  I commented on your notes first, and the
> code-review feedback is follows that.
>
>> Notes:
>> * The tar streaming now has to write the file to a temporary file
>> first before streaming. This is an unfortunate side-effect of cherrypy
>> not providing a file object to write data directly back to the client.
>> Initially, I considered using a StringIO object, but that would have
>> bloated the server process quite a bit (memory usage).
>
> This effectively negates the intended benefit of streaming the files.
> The idea was to write the tar stream out to the socket as we read each
> file so that memory and disk usage would remain low.
>
> The approach that you're taking now doesn't need to employ streaming.
> You could just write the tarfile to a temporary file, serve it, and
> delete it.  I'm not sure that's the right approach, though.  Are you
> sure there's no sneaky way to get CherryPy to write a streaming response
> with an unknown length?


The issue is not the unknown length; cherrypy doesn't care about that.

The issue is that cherrypy does not provide a file object to write to
for the response.

tarfile requires that you have a file object to write to from what I can tell.

One possible thought I had was to try creating a temporary file and
use that for the tarfile object, and after each tar_strem.add(),
stream the temporary file and then truncate it. I would repeat that
until done. However, I'm not sure if such chicanery would work :-)

> In a previous e-mail you noted that it's possible to run CherryPy behind
> Apache.  The documentation for downloading files seems to hint that
> using Apache is preferable, at least for this task.
> (http://www.cherrypy.org/wiki/FileDownload).

That was one of my concerns and why I tried to ensure that all
"static" content was served via /static/.

Eventually, if we created /file/1/ and /filelist/1/ that did just GET
foo; we could let Apache serve those requests and let the depot
continue to handle /filelist/0/ and /file/0/.

> There has been a fair amount of hand-wringing about how filelist doesn't
> fit with our architectural principles.  Perhaps now would be an
> opportune time to investigate whether we could switch to sitting behind
> Apache and simply pipeline our requests for multiple files?
> To be clear, I'm not implying that you should do that work all by
> yourself.

I was actually considering that. The problem I have is that, as
Stephen pointed out, the current protocol must be supported for a
while. As such, that would have to be /filelist/1/.

>> * I have not yet done extensive performance testing, nor have I
>> checked for the threading issues that we previously had.
>
> Will you do both before you putback?  An extensive test may be above and
> beyond the call of duty; however, some amount of performance and
> threading testing would sure be nice.

Yes, and I have been doing some as time goes on.

> I have one more question that wasn't really in your notes.  One of the
> justifications for switching to CherryPy was that it would simplify our
> codebase.  However, most of the code in the depot seems to have been
> moved (more, or less) from depot.py to repository.py.  Is there some
> portion of the existing code that you think we could simplify once we
> switch to using CherryPy?  I'm also curious how you see this change
> improving our ability to add new features to the depot.  Perhaps you
> could give an example of how you see this helping future development?

Actually, I removed quite a large chunk of code even though I moved it
to repository.py.

In particular, all of the static file serving code is now gone and the
extra code that was required to setup a response or handle exceptions
is now gone as well in many cases.

For further simplification of output generation (mainly for the status
pages or other "web interfaces") a template system is needed and that
will come in a later phase.

The repository/depot code could use some further restructuring as well
which I believe could compact it further.

>> webrev:
>> http://cr.opensolaris.org/~swalker/pkg-depot-3/
>
> depot.py:
>
> 44 - Would it make sense to check if the import failed and print a message
> that CherryPy is required to run the depot?

I have made this change.

I have also added cherrypy as a dependency to the SUNWipkg pkgdef.

> catalog.py:
>
> I'm a little slow this morning.  Why do we need to embed two different
> methods in catalog.send()?  I'm sure it's justified.  I'm just having a
> hard time figuring out why.

The test cases for the api need the ability to output the catalog to a
file directly.

I can either move the file generation code to the tests and flatten
this function, or leave it as is.

Either one is fine with me.

> updatelog.py:
>
> 26 - This module is common to the client and the server.  Unless I'm
> being a moron, this import means the client will also depend upon
> cherrypy being installed.

Unfortunately, the updatelog.send() method is used by the server, and
it does need the request object.

However, I have worked around this by passing in the request and
response object to the function as arguments.

One of the things I struggled with while making these changes was
whether it was more efficient to pass the request and response object
around (and cleaner) or whether it was better to simply use the
singleton object to access them.

> repository.py:
>
> Just so I'm sure I understand, what does the @cherrypy.expose decorator do?

In short?

If you map an object (function) to a request path, and it hasn't been
explicitly "exposed", cherrypy will pass it off to the default page
handler you provide or return a 404 error.

> 79 - If cherrypy handles this automatically, can we assert this
> condition, or throw some kind of exception if we're processing a
> function that isn't exposed?

Yes, we can assert.

I have changed it as such.

> 307 - Does serve_file return a 404 if it can't find the file at the path
> that it has been given?

If you specify a request path that cherrypy is unable to map to an
object through the "mounted object tree" (see quickstart in depot.py)
it will pass it off to the default page handler if you have one. If
you don't have one it will return a 404.

So yes, currently the depot code is setup with a custom default page
handler that will return a 404 for unknown pages via face.py:unknown.

> There are a number of places in updatelog and transaction where you
> refer to the current request or response as cherrypy.request and
> cherrypy.response, respectively.  Are these guaranteed to always be for
> the HTTP session we're currently processing?  Do you know how cherrypy
> enforces this?

The cherrypy spec seems to indicate (though not explicitly) that it
enforces this [1].

I have posted an updated webrev with the changes you wanted except for
the one case noted above [2]:

http://cr.opensolaris.org/~swalker/pkg-depot-4/

The only files changed were:
src/depot.py
src/modules/server/repository.py
src/modules/updatelog.py
src/pkgdefs/SUNWipkg/SUNWipkg.import

Cheers,
-- 
Shawn Walker

"To err is human -- and to blame it on a computer is even more so." -
Robert Orben

[1] http://cherrypy.org/wiki/CherryPySpec (see sections 2.2.1, 2.2.2,
and 2.2.3).
_______________________________________________
pkg-discuss mailing list
pkg-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review request for [bugs 1154, 1237, 1845, 1887, 1888]

Reply via email to