Re: [pkg-discuss] Code review for 7960 please

johansen Wed, 30 Sep 2009 16:50:48 -0700

On Wed, Sep 30, 2009 at 03:47:30PM -0700, Brock Pytlik wrote:
> Ok, let's take a step back. Here are the goals for this wad:
>
> 1) make the layout so that it  doesn't take forever to remove the
> directory hierarchy
>
> 2) Centralize the insertion, lookup, and deletion of files.

Part of the disagreement here is that your goals are too narrow.  You're
replacing the exising file caching mechanism with FileManager and
Layout.  In order for this to be successful, you need a design, and you
need a design that minimally articulates the following:

1. The interface between Layout and the filesystem
2. The interface between FileManager and Layout
3. The interface between the client's cache and FileManager
  a. The current naive cache's use of FileManager
  b. A generic cache's use of Filemanager
4. The interface Repository and FileManager (for the server)

Whether or not your implement everything in the design in one pass is a
separate issue.

> [email protected] wrote:
>> On Tue, Sep 29, 2009 at 05:55:31PM -0700, Brock Pytlik wrote:
>>> Right now, I don't allow the FileManager's constructor to choose the  
>>> layout. That doesn't mean it won't in the future. At one point, it 
>>> was  choosing the layout based on what was stored on disk, but I 
>>> decided that  over complicated things for no good reason at this 
>>> point.
>>
>> The image used to be the place where we kept ahold of the download
>> cache.  I'd like to know more about how you expect to keep layouts
>> modular, and properly detected at runtime.
>>
>>   
> For now, that's also not this wad. Originally, I'd planned to use a file  
> at a known location in the file directory to specify the layout. Then I  
> realized there was no reason to do this. If you can provide an example  
> of why/how we'd need this, I can be convinced to put it back in. I think  
> one thing we've tried to do is not lay track until we need it. I don't  
> believe we'll need further layout changes anytime soon, hence I've  
> decided not to put in lots of code to handle a case I consider 
> unnecessary.

The FileManager accepts different layouts, and has to determine what
type of file layout its looking at as it goes.  You've hard-coded the
supported layouts into FileManager, making this code impossible to
extend without modification.  I disagree with that approach.  It should
be possible to write this code so that the participating classes plug
together modularly, but aren't hard-coded to know about each other's
specific implentations.

>>>> The two scenarios this code supports right now is that either the source
>>>> and destination are both read/write, or both read only.  I'm asking if
>>>> it's reasonable to expect a deployment scenario where the source is
>>>> readonly but the destination is read/write.  (A read/write source with
>>>> readonly destination is a degenerate case of both sides being
>>>> read-only).  In the case where the source is read-only, you can't move
>>>> the file to the new layout, but you could copy it, but I don't see
>>>> support for this case.
>>>>
>>>>         
>>> No, I don't think that's a reasonable scenario at this time for the 
>>> file  manager to support. For now, I want to keep its job narrow and  
>>> constrained, not make it a general repo seeding method, or anything 
>>> else.
>>>     
>>
>> I didn't say now. I asked if this was a reasonable deployment scenario,
>> irrespective of any period in time.
>>   
> Ok, then no, I don't think that's ever a reasonable deployment scenario.

Would you please explain why you think this?

>> One possibility is to take the approach that Shawn used with the
>> Publisher objects.  In his code, the class property gets the default,
>> but the instance property overrides the class default if the caller
>> specifies a different set of arguments.  This allows you to provide a
>> standard case, even to classes that inherit from you, but still modify
>> the behavior when the object is instantiated.
>>
>>   
> Again, that seems overly complex for no reason for this situation. If  
> you really care, fine, I'll change it, but I don't think it's the right  
> thing to do here.

I don't agree with your assertion about the complexity.  The code for
the Publisher object is self-describing, well documented, and flexible.

>>> since  it's entirely internal to the file manager. It's simply there
>>> to allow  us to know what kind of hash algorithm was used so that if
>>> we use hash  algo 1 (which produces N bits) and hash algo 2 (which
>>> produces N bits)  at different times, we'll be able to avoid
>>> collisions. If you'd like to  suggest a more meaningful name, by all
>>> means do.
>>>     
>>
>> I'm actually confused what you're doing with this, partially due to the
>> absence of documentation in the code or elsewhere.  Are you using the
>> label to so that if you saved the name to a file, you could re-load the
>> appropriate algorithm later?   Or are you using the label to identify a
>> function in a table of algorithms?  Or something entirely different?
>>   
> I'm using the label to prevent collisions between filenames created  
> using different hash algorithms.

Uh, okay, except there's no code in here to deal with different hash
algorithms that are defined for a Layout.  What do you do when more than
one algorithm is defined, and a Layout consists of files named by
multiple hashes?  Should we be supporting this?  One could make an
argument that each layout should only support one naming scheme, and
that to change the naming, you need to migrate the layout -- much as
you're doing now with Original and 256.  On the other hand, if you do
this cleverly, it might be feasible to keep files with identical content
but different hashes in the same tree.

>>>> This object doesn't check that it can use the hash type, nor is it
>>>> apparent what this heiararchy does with the value.          
>>> It doesn't check the hash input, you're correct. This is not the   
>>> "support multiple hashes in our actions" wad.
>>
>> I'm not talking about Actions.  I want the interface for adding hash
>> functions to the layout manager to be modular.
>>
>>   
> Please define modular.

What I mean is that when it comes time for this code to support more
than one hash value, it should be able to support this with minimal
change.  Hard-coding strings into tables isn't really extensible.  I'd
prefer a structure that allows us to add/remove hash algorithms
dynamically.  If we have to ship software to export-controlled regions,
this may be mandatory.

>> I mean that instead of passing value "s1" to the object, you pass it a
>> HashSha1160 object that implements a operations described in the
>> AbstractHashObject.  Just as an example, it could support hash(), which
>> generates the secure hash, and id() which returns a identifier that
>> uniquely describes this type of hash, and any other operations that you
>> think are necessary here.
>   
> I think I've explained why this is currently overkill, and I believe it  
> will always be overkill b/c the file manager is not ever actually  
> computing the hash.

I understand this a bit better now.  What it means, I think, is that
instead of just passing hash values around as a string, we should
consider turning them into objects that provide more descriptive
properties about the hash.

>>>> Moreover, it seems like changing the layout function to work more
>>>> generically would be helpful.  Instead of having it be Fan256Once, how
>>>> about FanoutUniform() with the depth and the number of directories given
>>>> as arguments to the constructor.       
>>> Because I want to keep this as small and simple as possible. If you'd 
>>>  like a more generic layout class, you're free to write a subclass.
>>
>> I think you've missed my point.  If you write the fanout logic to be
>> sufficiently generic, then implementations can specify the
>> characteristics they desire.  The code will be resuable, and then nobody
>> will have to actually write the subclass you're suggesting that I write.
>>
>>   
> See previous comment about unused track. It will make the code more  
> complex for a situation I view as unlikely to happen.

Again, you've missed my point.  I didn't ask you to write unused code, I
asked you to write reusable code.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 7960 please

Reply via email to