[email protected] wrote:
On Mon, Sep 28, 2009 at 09:45:32PM -0700, Brock Pytlik wrote:
Webrev:
http://cr.opensolaris.org/~bpytlik/ips-7960-v1/
transport.py:
- lines 456-460: Why is this exception handler necessary?
Why wouldn't it be? This seemed like a reasonable place to translate the
exception into something the client code was prepared to deal with.
It seemed like an obvious question to me, since this logic wasn't
present before. Is this code more likely to encounter a permissions
error than the previous code? Explain, please.
The exception handling was there. It was in the call to os._makedirs.
Since lookup cannot raise an api exception (see below) it has to
translate the exception into something meaningful to the api user. It is
neither more nor less likely to take any error path than it was before,
the location of the except clause has merely been shifted.
modules/file_layout:
- The source under this directory hierarchy is < 300 lines of code.
Could we coalesce this into a single file in modules/ or am I
missing something?
I thought that since the layouts and file manager were fundamentally
different pieces of functionality, they should go into different files.
Maybe, but if the code is small enough, and you're not designing a
subsystem, it may be more trouble than it's worth.
If you feel strongly, or others agree with you, I'll merge them,
otherwise I'd prefer to leave them as is.
modules/file_manager.py:
- line 46: This name conflicts with a class in another module:
api_errors.PermissionsException.
I was under the impression that it was fine to have names that were the
same but in different modules... that's why the module names are
attached to them. In fact, there are other places in the code where this
is done today. (see api_errors.py vs search_errors.py)
In other places in that code, you simply raise PermissionsException, I
believe. Whether or not the interpreter tolerates this behavior, it's
confusing to humans. I'd either rename this exception to something
else, or prefix it with something that makes it obvious that it belongs
to the FileManager. FMPermissionException, maybe?
Well, considering that it has no way to know about an api exception, I'm
not really seeing the confusion. But since FMPermisisonException seems
better to you, I'll change it.
- lines 72-73: Should this be a class property, it doesn't look like
it's going to vary between instances?
I don't think so. If it ever changed, it'd probably vary by instance,
not by class. For example, if the layouts were stored on the file system
(something I considered for a bit before deciding it was currently
unnecessary) then part of initialization of an instance would be to load
it off disk. Besides, in that world, if a multi threaded program were
operating on different two images with different layouts, having it as a
class variable would break things.
I find this answer confusing, since you don't allow the FileManager's
constructor to choose the layout. It's essentially hard-coded on these
two lines. It would make more sense to define the layout as properties
of the image. Then the image could pass the layouts it supports to the
FileManager.
Right now, I don't allow the FileManager's constructor to choose the
layout. That doesn't mean it won't in the future. At one point, it was
choosing the layout based on what was stored on disk, but I decided that
over complicated things for no good reason at this point. Why would an
image know what layouts it supports? Part of the point here is to
encapsulate the file layout and not make the image deal with it, so that
we can enable other functionality later if we deem it appropriate (like
some form of cache clean up as you mentioned before).
- line 91: How does this function cope with files that may be
readable in one tree, but not the other?
I'm not sure what you mean by "one tree but not the other." Assuming
layout can replace tree, then right now it doesn't cope. What happens
currently if someone goes in and makes selective files in the download
directory not readable? If it's useful, I can stat the file on line 110
and ensure that it's readable.
The two scenarios this code supports right now is that either the source
and destination are both read/write, or both read only. I'm asking if
it's reasonable to expect a deployment scenario where the source is
readonly but the destination is read/write. (A read/write source with
readonly destination is a degenerate case of both sides being
read-only). In the case where the source is read-only, you can't move
the file to the new layout, but you could copy it, but I don't see
support for this case.
No, I don't think that's a reasonable scenario at this time for the file
manager to support. For now, I want to keep its job narrow and
constrained, not make it a general repo seeding method, or anything else.
file_layout/layout.py:
- Why do you allow Original to name itself "s1" by defining SHA1_160,
but Fan256Once allows the class that instantiates it to give its
name. Wouldn't it be better to encode the name as a class property
in both cases instead?
Original can only be SHA_160 since that's the old format and we're not
going back to change it. Fan256Once can use SHA_256, when we add it, as
well, hence it's an argument to the constructor.
I don't understand your response. All instances of Original must use
SHA1_160, correct?
Correct
That would make it a candidate for a class propery,
at least in this case.
Yes, but the other layouts aren't amenable to a class property, so why
special case this one?
In the case of Fan256Once, it's not clear to me
that setting hash type to "s1" is descriptive at all.
My understanding was I was free to choose whatever label I desired since
it's entirely internal to the file manager. It's simply there to allow
us to know what kind of hash algorithm was used so that if we use hash
algo 1 (which produces N bits) and hash algo 2 (which produces N bits)
at different times, we'll be able to avoid collisions. If you'd like to
suggest a more meaningful name, by all means do.
This object
doesn't check that it can use the hash type, nor is it apparent what
this heiararchy does with the value.
It doesn't check the hash input, you're correct. This is not the
"support multiple hashes in our actions" wad. This is the "fix our
layout" wad. Now, because I know some things are coming down the pipe,
I've tried to lay some groundwork to ease that transition, where it made
sense and most importantly, didn't greatly increase the complexity of
this wad.
I'd suggest that you use a more
object-oriented approach and pass an object that knows how to perform
the hash function to each layout manager.
I don't understand.
Moreover, it seems like changing the layout function to work more
generically would be helpful. Instead of having it be Fan256Once, how
about FanoutUniform() with the depth and the number of directories given
as arguments to the constructor.
Because I want to keep this as small and simple as possible. If you'd
like a more generic layout class, you're free to write a subclass.
You can state that each level will
have N directories, and if people want to write Fanouts that fan
differently at different depths, they can use this as a starting point.
This also means that it becomes trivial to implement FanNone, where all
files are in one directory.
It's trivial now to implement FanNone.
- lines 59/60 and 65/66: These should match the function signatures
in the parent class that they are overriding.
They do except in one class they're declared static but not in the
other. I can make them all non-static if you prefer. This works though.
As long as this doesn't create problems for future objects that inherit
from Original, I'm okay with this.
I hadn't considered that scenario. I'll just change it to all be non-static.
General comments:
- You did this work as part of a performance investigation surrounding
file layouts and directory access times. It would help the rest of
us provide useful feedback if you would include information about
the experiments that you ran, and the data that you obtained. In
particular, how did you decide that one level of 256 directories was
optimal?
Here's the openoffice spread sheet if you'd like to see the numbers
(http://cr.opensolaris.org/~bpytlik/ips-dir-timing.ods).
I'm getting a 403 (Forbidden) when trying to access this URL.
Well that's strange, so am I. Any idea how to fix that, or how it happened?
In any case, I'll send it to you off list. If anyone else wants a copy,
let me know.
- You've created the file layout and file manager as part of an
extensible architecture. Would you provide more information about
your design and architecture? As an example, how would I go about
adding a new layout to this design?
You'd add a new class to the layout file, and add it an instance of it
as the first item in self.layouts in FileManager.
This isn't exactly what I'm after. I'm more curious about your
architecture. How are you envisioning that this will be extended? Can
you document your design goals? Could someone who looks at this
document figure out how to enhance the archtiecture later? You're
checking this in as a new subsystem, but it's currently handling a
narrow set of tasks.
Yes, it's handling a narrow set of tasks, why is that an issue?
I have vague ideas of how this can be extended, but I haven't designed
that b/c it's not in the scope of the bug I'm trying to fix: the layout
of files on disk.
I would assume that anyone who wanted to enhance it later would have as
much clue as I do how to go about it. I'm not sure what you're after.
This isn't the be-all end-all to manage the cache. At best, it's laying
the ground work to allow future work to do cache management. What
that'll look like? I don't know. I have real questions whether any sort
of time based cache makes sense.
- The FileManager implements a cache of sorts, but I don't see any
hooks for some of the typical operations that would be performed on
a cache. How do you determine what items to evict when the cache is
full? How do you plan to support multiple policies, both for sizing
and eviction? Are there any options for those who need to perform
external cache maintenance?
Not this bug.
You might not be implementing support for it in this bug, but if you're
designing a subsystem to replace the existing, admittedly naive,
download cache, you should think about how you plan to address these
issues.
That's not what I've signed on for. If that's necessary, I'll vastly
reduce my change set and simply rewhack the bits in misc to try and be
slightly better, or hand it off to someone who wants to attack the
larger issue of cache management.
I would argue that you should include this functionality, especially
since it's something that we've been lacking for a long time and do very
much need. There was a post about it on our list today, in fact.
http://mail.opensolaris.org/pipermail/pkg-discuss/2009-September/017353.html
Yep, I saw that, still not this bug.
If you haven't thought about how you expect someone to accomplish these
tasks, you should make sure that the current design doesn't preclude
intelligent cache management, because that's the very next thing that
we'll have to do with this code.
In short, I believe the right place to do this is actually in the same
place we do indexing. By default, what it should do is remove the files
from the download area which have been removed or changed in the new
packages. Only the imageplan has this knowledge, and I don't believe
there's a sane way to put that approach into the file_manager b/c it
doesn't have access to the plan. Changing the layout will make emptying
the cache each time significantly faster, allowing users to more easily
choose that setting. Of course, we'd also need a "keep everything
setting" and there are probably other settings based on time/size/the
phase of the moon that we may eventually want to support. All of these
are not this bug. I see nothing in the current design that would prevent
an implementation of any of these, if you do, please give concrete
examples. In fact, I see this work making those steps easier b/c now
there's a central location through which well behaving code will add and
remove files, making it easier for there to be something that has
knowledge of the state of the layout as a whole.
As I've said, I'm willing to rip this out and make the truly minimal set
of changes needed to change the layout used. I don't think that's a good
idea since I think this lays useful ground work, but if it doesn't, I
can remove it all.
Brock
-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss