Re: [pkg-discuss] Code review for 7960 please

johansen Tue, 29 Sep 2009 19:31:14 -0700

On Tue, Sep 29, 2009 at 05:55:31PM -0700, Brock Pytlik wrote:
>> In other places in that code, you simply raise PermissionsException, I
>> believe.  Whether or not the interpreter tolerates this behavior, it's
>> confusing to humans.  I'd either rename this exception to something
>> else, or prefix it with something that makes it obvious that it belongs
>> to the FileManager. FMPermissionException, maybe?
>>   
> Well, considering that it has no way to know about an api exception, I'm  
> not really seeing the confusion. But since FMPermisisonException seems  
> better to you, I'll change it.


Thanks.

> Right now, I don't allow the FileManager's constructor to choose the  
> layout. That doesn't mean it won't in the future. At one point, it was  
> choosing the layout based on what was stored on disk, but I decided that  
> over complicated things for no good reason at this point. Why would an  
> image know what layouts it supports? Part of the point here is to  
> encapsulate the file layout and not make the image deal with it, so that  
> we can enable other functionality later if we deem it appropriate (like  
> some form of cache clean up as you mentioned before).

I'm trying to ascertain what abstractions you're using, and determine
the modularity of the underlying design.  The constructor is part of the
FileManager object's interface.  If you're stating that the layout is
encapsulated by the FileManger, then you're telling me that layout is an
implementation detail of FileManager.  However, I don't think that's
truly the case.  The file layout is eventually going to become part of
the on-disk format, and will be used by any packaging component that
needs to interact with the filesystem.  Even if the abstraction that the
FileManager implements doesn't expose the layout, implementors who need
to make changes to the way the packaging system interacts with files
will still need to understand the interfaces of the objects that the
FileManager uses.

The image used to be the place where we kept ahold of the download
cache.  I'd like to know more about how you expect to keep layouts
modular, and properly detected at runtime.

>> The two scenarios this code supports right now is that either the source
>> and destination are both read/write, or both read only.  I'm asking if
>> it's reasonable to expect a deployment scenario where the source is
>> readonly but the destination is read/write.  (A read/write source with
>> readonly destination is a degenerate case of both sides being
>> read-only).  In the case where the source is read-only, you can't move
>> the file to the new layout, but you could copy it, but I don't see
>> support for this case.
>>
>>   
> No, I don't think that's a reasonable scenario at this time for the file  
> manager to support. For now, I want to keep its job narrow and  
> constrained, not make it a general repo seeding method, or anything else.

I didn't say now. I asked if this was a reasonable deployment scenario,
irrespective of any period in time.

>>>> file_layout/layout.py:
>>>>       
>>
>>   
>>>>   - Why do you allow Original to name itself "s1" by defining SHA1_160,
>>>>     but Fan256Once allows the class that instantiates it to give its
>>>>     name.  Wouldn't it be better to encode the name as a class property
>>>>     in both cases instead?
>>>>         
>>> Original can only be SHA_160 since that's the old format and we're 
>>> not  going back to change it. Fan256Once can use SHA_256, when we add 
>>> it, as  well, hence it's an argument to the constructor.

>>   That would make it a candidate for a class propery, at least in
>>   this case.  
>
> Yes, but the other layouts aren't amenable to a class property, so why  
> special case this one?

One possibility is to take the approach that Shawn used with the
Publisher objects.  In his code, the class property gets the default,
but the instance property overrides the class default if the caller
specifies a different set of arguments.  This allows you to provide a
standard case, even to classes that inherit from you, but still modify
the behavior when the object is instantiated.

>> In the case of Fan256Once, it's not clear to me that setting hash
>> type to "s1" is descriptive at all.
>
> My understanding was I was free to choose whatever label I desired ...

In the sense that it would be unconsitutional for the government to
prosecute you for the choice, yes, I agree. :P

> since  it's entirely internal to the file manager. It's simply there
> to allow  us to know what kind of hash algorithm was used so that if
> we use hash  algo 1 (which produces N bits) and hash algo 2 (which
> produces N bits)  at different times, we'll be able to avoid
> collisions. If you'd like to  suggest a more meaningful name, by all
> means do.

I'm actually confused what you're doing with this, partially due to the
absence of documentation in the code or elsewhere.  Are you using the
label to so that if you saved the name to a file, you could re-load the
appropriate algorithm later?   Or are you using the label to identify a
function in a table of algorithms?  Or something entirely different?

It looked to me like you were going to use this as a key for a
dictionary, where the value was a hash-function.  If you're doing that,
it makes far more sense to pass an object that implements a hash
function to the layout manager.  Using the label as a enum is a C-ish
construction that we should probably avoid here.

>> This object doesn't check that it can use the hash type, nor is it
>> apparent what this heiararchy does with the value.    
> It doesn't check the hash input, you're correct. This is not the  
> "support multiple hashes in our actions" wad.

I'm not talking about Actions.  I want the interface for adding hash
functions to the layout manager to be modular.

>> I'd suggest that you use a more object-oriented approach and pass an
>> object that knows how to perform the hash function to each layout
>> manager.
>>   
> I don't understand.

I mean that instead of passing value "s1" to the object, you pass it a
HashSha1160 object that implements a operations described in the
AbstractHashObject.  Just as an example, it could support hash(), which
generates the secure hash, and id() which returns a identifier that
uniquely describes this type of hash, and any other operations that you
think are necessary here.

>> Moreover, it seems like changing the layout function to work more
>> generically would be helpful.  Instead of having it be Fan256Once, how
>> about FanoutUniform() with the depth and the number of directories given
>> as arguments to the constructor. 
>
> Because I want to keep this as small and simple as possible. If you'd  
> like a more generic layout class, you're free to write a subclass.

I think you've missed my point.  If you write the fanout logic to be
sufficiently generic, then implementations can specify the
characteristics they desire.  The code will be resuable, and then nobody
will have to actually write the subclass you're suggesting that I write.

>>>> General comments:

>>>>   - You've created the file layout and file manager as part of an
>>>>     extensible architecture.  Would you provide more information about
>>>>     your design and architecture?  As an example, how would I go about
>>>>     adding a new layout to this design?
>>>>         
>>> You'd add a new class to the layout file, and add it an instance of 
>>> it  as the first item in self.layouts in FileManager.
>>>     
>>
>> This isn't exactly what I'm after.  I'm more curious about your
>> architecture.  How are you envisioning that this will be extended?  Can
>> you document your design goals?  Could someone who looks at this
>> document figure out how to enhance the archtiecture later?  You're
>> checking this in as a new subsystem, but it's currently handling a
>> narrow set of tasks.
>>   
> Yes, it's handling a narrow set of tasks, why is that an issue?

It's obvious that we'll need to add more functionality to this code
later on.  It sits in a location where many different components need to
interact with it.  The project is at the point where we need to get
things right.  The fact that you won't give me design or architecture
information, and refuse to discuss possible enhancements as, "not this
bug," makes me concerned that this hasn't been thought through
completely.

> I would assume that anyone who wanted to enhance it later would have as  
> much clue as I do how to go about it. I'm not sure what you're after.  
> This isn't the be-all end-all to manage the cache. At best, it's laying  
> the ground work to allow future work to do cache management. What  
> that'll look like? I don't know. I have real questions whether any sort  
> of time based cache makes sense.

I'm asking that even if the code isn't complete, the design is.  That
way, when other people come along to work on this code we have artifacts
that they can use to make sure the code is consistent with our overall
design.

>> You might not be implementing support for it in this bug, but if you're
>> designing a subsystem to replace the existing, admittedly naive,
>> download cache, you should think about how you plan to address these
>> issues.
>>   
> That's not what I've signed on for. If that's necessary, I'll vastly  
> reduce my change set and simply rewhack the bits in misc to try and be  
> slightly better, or hand it off to someone who wants to attack the  
> larger issue of cache management.

Don't let the idea of a design scare you.  I think that if you spent an
afternoon thinking about these issues, you'd probably have enough to
satisfy most people.  There certainly isn't any need to give up your
code and go home, especially since you have something that works.

>> If you haven't thought about how you expect someone to accomplish these
>> tasks, you should make sure that the current design doesn't preclude
>> intelligent cache management, because that's the very next thing that
>> we'll have to do with this code.
>>   
> In short, I believe the right place to do this is actually in the same  
> place we do indexing. By default, what it should do is remove the files  
> from the download area which have been removed or changed in the new  
> packages. Only the imageplan has this knowledge, and I don't believe  
> there's a sane way to put that approach into the file_manager b/c it  
> doesn't have access to the plan.

Ah, this is the kind of topic I was trying to get you to discuss before.  
In this example, you don't think that the imageplan would want to pass
this information to the file_manager as it completes an operation?  In
essence, giving it a list of candidates to remove if the cache is full,
or perhaps, a pre-generated list of files when a pkg houskeeping command
is run from cron?

In your model, the imageplan has to do something with this information,
but it probably shouldn't be managing the cache directly.  You're
suggesting that something other than the FileManager handles this?  What
problems do you see with adding this functionality to the FileManager,
perhaps in some kind of FileManager.cleanup() operation?

> In fact, I see this work making those steps easier b/c now  there's a
> central location through which well behaving code will add and  remove
> files, making it easier for there to be something that has  knowledge
> of the state of the layout as a whole.

Yes, I agree, and this is why I'm trying to ask more detailed questions
about the design.  Other components in the system are going to use this
code to access files, and having a well abstracted and modular interface
is important.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 7960 please

Reply via email to