Re: [pkg-discuss] Code review for 7960 please

Brock Pytlik Mon, 28 Sep 2009 21:46:28 -0700

[email protected] wrote:

Brock,


On Mon, Sep 28, 2009 at 02:24:37PM -0700, Brock Pytlik wrote:

Webrev:
http://cr.opensolaris.org/~bpytlik/ips-7960-v1/


transport.py:

  - lines 456-460:  Why is this exception handler necessary?

Why wouldn't it be? This seemed like a reasonable place to translate theexception into something the client code was prepared to deal with.

  - lines 669:  This check is different from the previous conditional.
    Is it possible for cache_path not to exist, even if lookup returns a
    path.

If lookup returns a path, then at that moment, that path exists in thecache. If more than one process (with different layout configurations orwith different readonly settings) is operating on that directory, it'spossible for race conditions to happen. That's one reason the lookupmethod has an option to return a file handle.

modules/file_layout:

  - The source under this directory hierarchy is < 300 lines of code.
    Could we coalesce this into a single file in modules/ or am I
    missing something?

I thought that since the layouts and file manager were fundamentallydifferent pieces of functionality, they should go into different files.

file_layout/__init__.py:

  - lines 29-31: This means that if you import file_layout without
    further arguments, you'll get everything in file_manager and layout.
    Is this what you intended?  You're generally importing these modules
    specificially when you need them in other code.

I've changed this to all = []. Thanks for catching this.

file_layout/file_manager.py:

  - line 33: How about, "This exception is raised when the caller
    attempts to modify a read-only FileManager."

Sure

  - lines 40-42: This might be more clear as, "The FileManager cannot
    create %s because it is configured read-only."\

Except that it's not always creation that's a problem. Assuming you'refine with create also being substitutable, I'm fine with this.

  - line 46: This name conflicts with a class in another module:
    api_errors.PermissionsException.

I was under the impression that it was fine to have names that were thesame but in different modules... that's why the module names areattached to them. In fact, there are other places in the code where thisis done today. (see api_errors.py vs search_errors.py)

  - General comment about these two exceptions: Should these inherit
    from some form of API exception?  Unless you're catching these in
    every code path that might emit them, it probably makes sense to
    give the API some kind of ability to catch these.

I don't think so. This is also what the server will be using to organizeits files, so I didn't want to include client specific exceptions (whichthe api exceptions are). I'll double check that the exceptions are beingcaught properly on the server side as well.

  - lines 72-73: Should this be a class property, it doesn't look like
    it's going to vary between instances?

I don't think so. If it ever changed, it'd probably vary by instance,not by class. For example, if the layouts were stored on the file system(something I considered for a bit before deciding it was currentlyunnecessary) then part of initialization of an instance would be to loadit off disk. Besides, in that world, if a multi threaded program wereoperating on different two images with different layouts, having it as aclass variable would break things.

  - line 91:  How does this function cope with files that may be
    readable in one tree, but not the other?  If you've found a file
    that exists in both layouts, but is only readable (or modifyable) in
    one, is there any mechanism to ensure that this works properly?
    What do you do if you can't move the file from the old layout to the
    new layout?  It looks like you have some code on lines 132-134 that
    tries to handle this, but it doesn't appear to make a distinction
    between the move failing because the destination wasn't writable,
    and the move failing because the source wasn't writable.  I can
    imagine some cases where we'd want to copy the file over instead,
    even if we can't remove it.

    In fact, if you had a depot on a stick, you might want the layout
    manager to copy the files over to a local disk as they're accessed,
    but this current model doesn't look like it would support that kind
    of incremental depot migration.

I'm not sure what you mean by "one tree but not the other." Assuminglayout can replace tree, then right now it doesn't cope. What happenscurrently if someone goes in and makes selective files in the downloaddirectory not readable? If it's useful, I can stat the file on line 110and ensure that it's readable.

I'm also happy to introduce logic that copies the files, but in general,I think that (like the stat issue) is introducing needles complexity. Ifa user's messing with the directories/permissions/files inside /var/pkgthen they're on their own to some extent (at least that's always been myunderstanding).

- lines 147-149: Why not defer the lookup on 146 until we've checked
    read-only?

Yeah, I had it that way originally, not sure what I was thinking when Ichanged it.

  - lines 182-184: Same comment as above.

file_layout/layout.py:

  - This module needs more documentation.

OK

  - Why do you allow Original to name itself "s1" by defining SHA1_160,
    but Fan256Once allows the class that instantiates it to give its
    name.  Wouldn't it be better to encode the name as a class property
    in both cases instead?

Original can only be SHA_160 since that's the old format and we're notgoing back to change it. Fan256Once can use SHA_256, when we add it, aswell, hence it's an argument to the constructor.

  - lines 59/60 and 65/66:  These should match the function signatures
    in the parent class that they are overriding.

They do except in one class they're declared static but not in theother. I can make them all non-static if you prefer. This works though.

General comments:

  - You did this work as part of a performance investigation surrounding
    file layouts and directory access times.  It would help the rest of
    us provide useful feedback if you would include information about
    the experiments that you ran, and the data that you obtained.  In
    particular, how did you decide that one level of 256 directories was
    optimal?

Here's the openoffice spread sheet if you'd like to see the numbers(http://cr.opensolaris.org/~bpytlik/ips-dir-timing.ods). I can also putout the python script I used, or the entire hg repo so you can see eachexperimental step I took.

Short version: I created various numbers of files, from 1 up to 10M, andplaced them on disk using different layouts. The layouts included flat(all files in one directory), multiple level fanout where each level hada fan out of N and the number of levels was determined by rounding uplog_(base N) TOTAL_NUMBER_OF_FILES for N of 16, 256, 1024, 4096, 32768,the final choice of one level of 256 fanout, and the current scheme. Istopped experimenting with the current scheme because the time tocomplete was growing at an unsustainable rate. I also looked atdifferent encodings for the file names to see whether that made adifference (due to the difference in length of the file names). Itdidn't make a noticeable difference at the scales we're looking at.

Initially, I was evaluating on four tasks. 1) The time it takes tocreate N files and place them in the layout correctly. 2) Take 10percent of the files, consider that the working set. Take 20% of thatset, consider that the highly active set. Drawing from the highly activeset twice as often as the working set (our rough guesstimate of a realworld situation), time how long it takes to open and close N files. 3)The time a rsync -n for the entire stricture. 4) The time it takes toremove the entire directory tree.

Because rsync -n and removal consistently produced the same graph shapesfor all the layouts, I eventually dropped it from the tests.

Looking at the numbers, flat was the best performer on ZFS (the only FSwe tested on) and also has the advantage of being the simplest toimplement. However, some other file systems we would like to supportcan't really cope with 1M files in a single directory. It also has somedownsides for a user. ls'ing the directory used 374M of RAM if Iremember right while running os.listdir within python used about 100M.For reference, the current dev depo was at about 1M files about a monthago when took my measurements. A client system seemed to be in the 300kto 600k depending on how long the machine had been upgraded, whatpackages were installed, etc... Rather than fork the code forplatform/use, I tried the 256 once layout. This had comparableperformance to flat (again, see the spreadsheet) and would allow us notto need to fork the code unless we had a client or server on a non-zfsfile system with more than 8,388,608 (256 * 32768) files in it. Thisseemed like a reasonable limit.

I don't claim that this is the optimal solution across all platforms. Imake the following claims:1) It won't break (in the absolute sense) on reasonably sized serversand clients across platforms.

2) It's a big improvement on performance for ZFS servers and clients.

3) Other platforms can be forked at later date if a more optimal layoutis found for them

4) A better layout can be added in the future if someone else thinks of one

5) The most important bit: It's faster than what we have now, roughly anorder of magnitude faster for removal and rsync.

  - You've created the file layout and file manager as part of an
    extensible architecture.  Would you provide more information about
    your design and architecture?  As an example, how would I go about
    adding a new layout to this design?

You'd add a new class to the layout file, and add it an instance of itas the first item in self.layouts in FileManager.

  - The FileManager implements a cache of sorts, but I don't see any
    hooks for some of the typical operations that would be performed on
    a cache.  How do you determine what items to evict when the cache is
    full?  How do you plan to support multiple policies, both for sizing
    and eviction?  Are there any options for those who need to perform
    external cache maintenance?

Not this bug.

-j


_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 7960 please

Reply via email to