Re: [pkg-discuss] Code Review: New Transport

johansen Wed, 24 Jun 2009 17:11:54 -0700

On Tue, Jun 23, 2009 at 11:13:18PM -0500, Shawn Walker wrote:
> On Jun 23, 2009, at 3:34 PM, [email protected] wrote:
>> The following webrev converts our transport from using the
>> urllib/httplib in python to PyCurl, which is based upon libcurl.
>>
>>      http://cr.opensolaris.org/~johansen/webrev-xport-1/


Thanks for responding with feedback so quickly.  I've fixed the
punctuation and spelling errors.  Anything else that requires follow-up
comments is in line.

> src/modules/client/api.py:
>   lines 165, 318:  does this mean that if a user already has everything 
> cached in /var/pkg/download, etc. and we can't contact the depot to 
> refresh the catalog (which they didn't explicitly request) that the 
> install/update will fail?

Yes, your concern is valid.  I thought we had a --no-refresh option that
would allow the user to work around this particular case.  The situation
that's going to catch people is when the catalog is up-to-date and all
files are on disk, we'll fail the captive portal test while trying to
send intent information.  I think the solution is to have an offline
mode, or to first install only from cache.  However, I probably won't be
able to get support for this into the current wad.  If you see a more
pragmatic option, I'm open to suggestion, though.

>   line 608: won't a bogus value in PKG_DUMP_STATS, like "true" cause a 
> traceback here?

Thanks.  I'll catch a ValueError here.

> src/modules/client/api_errors.py:
>   lines 298, 309: missing an extra newline

Thanks.  Is the standard two lines between classes?  I wasn't sure,
since I thought we were okay with just one newline between function
definitions.

> src/modules/client/image.py:
>   line 1228: question, what is the difference between 'raise' and 'raise 
> e'; i've seen both forms used

I'm not sure it matters -- we'd have to ask Danek.  I put this in here
while trying to debug where one of my InvalidDepotResponse exceptions
had gone.  I reasoned that since we had the exception assigned to e in
the except block, perhaps we should raise it by explicitly naming e.

>   line 1309: the comment says zero, but the code says '= None'

This should be None.  I've fixed the comment to match.

> src/modules/client/imagestate.py:
>   Nice!  This seems like a better home for intent.  Thanks for moving  
> it.

As I recall, you helped me find a suitable place to relocate this code.
Much obliged for your assistance.

> src/modules/client/transport/engine.py:
>   lines 28-32: nit: can you asciibetise imports in all your files?  it 
> makes it easier to update the imports later

Yes.  I had thought I did this, but apparently I forgot.

>   line 105: nit: shouldn't it be pathname instead of filename if it is 
> expected to optionally include the path?

Yes, in fact filename must be a path.  I'll change this accordingly.

>   line 111: nit: s/progress tracker/ProgressTracker/ to make it subtly 
> clear that only ProgressTracker objects are supported.  This likely is in 
> other places as well.

Fixed.  You were right.  There were a couple more cases at the end of
the file.

>   line 151: in the gui, the user might initiate another transport action 
> even after canceling, should the remove_handle, teardown, free happen 
> still (lines 218-220)?  I know we have the transport.reset() call in the 
> prepare() part of the client.api, but I thought I would ask.  It just 
> seemed odd that for raise_to_ex that we would cleanup first, while cancel 
> immediately returns instead of cleaning up.  Though perhaps you did that 
> because cancel has to be immediate.

No, you're right.  This needs to be fixed.  The ex_to_raise case got
caught because we failed to cleanup after an exception was raised, but
was caught and supressed elsewhere.  This lead to us continuing on with
fewer and fewer handles available.  In the case of the GUI and the
CanceledException, we'll always call transport.reset() but this code
should be modified to clean up in case that doesn't happen.

>   line 304: nit: since this is more like a read-only property than a  
> function, i'd decorate this with @property, and then callers can simply 
> say 'if engine.pending', but that's a matter of taste.

I don't really have a preference here.  I'm happy to change it.

>   line 404: should this function validate that size >= 0?

My instinct is to say yes, and that we should set a minimum size as
well.  However, the only code that currently calls this function gets
the size by calling statvfs(2), which tells us the optimal fs block size
to use.  Perhaps if it's given a nonsensical value for the buffer size,
it should choose a default like 4 or 8k.

>   line 554: if you made this 'if hdl.success: continue' you could de- 
> indent the block below and the formatting would be nicer

I would like to, but continue is only valid in a for or while loop.
There aren't any loops in __teardown_handle(). :(

> src/modules/client/transport/fileobj.py:
>
>   line 247: you hate it that much?

Heh.  Meant to delete that earlier.

> src/modules/client/transport/repo.py:

>   line 117: does this mean that clients of the transport engine aren't 
> intended to specify the versions of operations they want to perform?

The idea was that we choose the highest available version supported by
both the client and the server.  Since the client only supports one
version of every operation right now, I omitted the code to choose a
version.

>   line 256: why assign to respdata? not used

I was probably double-checking the output when debugging.  Changed to
just call .read() directly.

>   line 316: should this check to see if the engine object id() is the  
> same or does it matter?

I hadn't considered the case where someone might have multiple Transport
objects instantiated at once, since the CLI and the GUI only need one.
The short answer is yes, this matters.  For those who might be following
along, you sent me another e-mail about this off-list.  I think the
thing to do is make the repo cache per-Transport, which obviates the
need to check engine-id until we have a multi-engine transport.

> src/modules/client/transport/transport.py:
>   nit: pkg.client.transport.transport.Transport seems a little  
> redundant, shouldn't this be in pkg.client.transport.__init__.py so that 
> you have pkg.client.transport.Transport instead?  Or does the magic you 
> have in __init__.py do that?

I can move this to __init__.  I think the magic I have there right now
means that if someone were to try to import all of transport, they'd
only get the transport.transport module.

>   line 223: mcontent?

I wonder what that was too.  Removed.

>   line 434: right now, rename is fine because we aren't going cross- 
> device, but in the future, if the cache directory path is customisable  
> (?), or we separated things out into separate zfs filesystems, this may 
> become an issue

Ok.  In that case, we can write a helper that does a copy/unlink
two-step.

> src/modules/server/catalog.py:
>   lines 377-385: is this override actually needed?  I thought __init__ 
> got inherited

I thought that the parent class's __init__ method wasn't called by
default, making it necessary to do this.  Perhaps Danek can clarify?

> src/modules/server/depot.py:
>   You had way too much fun writing this.

Sorry.  Not trying to hog all the fun for myself.  :P

> src/modules/server/repository.py:
>   lines 339-380: Can you mark the differences between this and the  
> normal catalog method somehow?

I'll try.  I acutally got caught by this when I took my first pass over
the webrev.  I failed to notice that we're passing the scfg to
c.as_lines() and ul._gen_updates() and almost deleted the class because
I thought it was unused code.  *doh*

> src/patch/*:
>   It would be nice to see some comments at the beginning of each patch 
> file with a URL to the corresponding bug number of a brief explanation of 
> its purpose.

Can I do that without confusing patch(1)?  I left these as empty as
possible, so that they'd apply cleanly.

> src/tests/*:
>   Are there any unit tests missing or did you feel like the client just 
> using the depot is enough?

The existing unit tests actually turned up a lot of bugs in the code,
especially when I was modifying the search stuff.  I'm
comfortable that all of the common transport paths have been exercised.
I've spent a bunch of time performing operations against the Nasty
server in order to test the corner cases.  I'm running some more tests
there, just to make sure that catalog/manifest truncation detection
works.  (It seems to work, but I'd like to be confident).

> One final concern I had is that I noticed that touch_manifest now  
> performs a GET instead of a HEAD (I checked the depot server logs).  Was 
> this a limitation of pycurl?  The 'HEAD' operations should be faster for 
> the server...

It shouldn't be doing this.  You may have found a bug in PyCurl/libcurl.
In cases where we perform a "HEAD" I tell the handle to set
CURLOPT_NOBDOY.  According to the libcurl documentation, this should
make the client perform a HEAD.  If it's not doing this, then we have a
bug.  I'll investigate this further.

I have a list of about 4 remaining items to resolve.  Once I've got
those figured out, I'll send out a new webrev.

Thanks again for the feedback.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code Review: New Transport

Reply via email to