Hi Brock,
On 05/24/12 03:53 PM, Brock Pytlik wrote:
On 05/13/12 22:35, Tim Foster wrote:
https://cr.opensolaris.org/action/browse/pkg/timf/sysrepo-config-cache/sysrepo-config-cache-webrev
Thanks for taking a look.
sysrepo.py
line 317, 365 Nit: use open instead of file (at least I think we settled
on open)
Good point, thanks - I'll switch to open. We seem to use both in the
gate, but open is predominant.
line 326: shouldn't this also generate an error? or do we hit this if no
publishers are configured?
You're right, it should report the error, I'll add that.
434,435,437: Why the change from tuples to lists?
The version of simplejson we have always converts tuple to lists:
>>> import simplejson
>>> import StringIO
>>> s = StringIO.StringIO()
>>> o = ["list with tuples", ("foo", "bar")]
>>> simplejson.dump(o, s)
>>> s.seek(0)
>>> s.read()
'["list with tuples", ["foo", "bar"]]'
>>> s.seek(0)
>>> simplejson.load(s)
['list with tuples', ['foo', 'bar']]
>>>
The simplejson.dump() option 'tuple_as_array' comes in version 2.2.0
which I think would solve this, but we use 2.1. In this case, switching
to lists is fine I think.
Could you discuss briefly what happens in the situation where a user
adds a (currently) unreachable publisher with --no-refresh? (And perhaps
add a test case for it)
There's basically no change to the current behaviour: the cache file
gets deleted, and system repository service will get restarted - if
pkg.sysrepo manages to timeout before the service hits the SMF start (or
refresh) method timeout, then we write a new cache, and continue serving
requests.
That cached configuration may be incomplete (in terms of server-side
redirects being missing because of that unavailable publisher) in which
case, when the publisher becomes available, we'll get transport errors
when trying to install packages if they need to go through those redirects.
On the testing side of things, I don't think I see tests for a) an image
with no publishers and no packages installed or b) an image with no
publishers but packages installed. I think it's probably worth just
checking those two corner cases.
Sure, I'll add tests there.
Is it the _follow_redirects bit of sysrepo.py that currently requires
that we have networking available and publishers are reachable? Anything
else?
Right, that's the only part of sysrepo.py that wants networking.
Of course the system repository itself needs networking any time it
proxies a request that points to a non-local repository (when all the
zones have come up too and are making those requests), but by that time,
I would expect networking to be available.
Just to make sure I understand how this works...
When a user adds a publisher or an origin to an existing publisher, that
refreshes/restarts the sysrepo service.
Right (that's what happens in the code currently in the gate too)
When it starts the next time, it
will look at the cached information. realize it's out of date since the
publisher or origin isn't in the cached info, and revert to existing
behavior?
We're being more forceful than that: when publisher information changes,
we remove the cache file in pkg.client.image.Image.save_config() to
ensure the cache gets regenerated (just before we cause a
system-repository service refresh).
We could perhaps have relied on the cache validation code in
pkg.sysrepo(1) checking the cache validity, but a total cache rebuild is
going to be more reliable (in terms of being able to revalidate
server-side redirects are still current)
There are perhaps more efficient ways we could have done this if cache
regeneration was expensive, but it isn't so long as the network is up.
I think though, when a user removes a publisher or origin, after the
restart/refresh, the sysrepo will continue using the previously cached
data? Have I missed something?
Yep, by virtue of completely nuking the cache on publisher configuration
change, we should be ok here.
I have a couple of other concerns, though I'm not sure how major they are...
Is it possible to have the sysrepo use the cache as a fallback if
publishers aren't reachable? That seems like it might be the safer route.
That's the problem in the current gate bits really: given enough origins
in our image, and a fixed SMF start/timeout_seconds threshold for the
system-repository service, any time we're going out to the network to
see if publishers are reachable in sysrepo.py, we risk running into that
timeout.
I'm wary of making changes that dynamically tune the
start/timeout_seconds value, as it could result in a system-repository
service that either never comes online, or takes long enough that the
other services that depend on it drop to maintenance.
Using the cache seems to be the least worst solution, but I'm open to
better suggestions :-)
If I'm reading the code right, if a server changes it server-side
redirections, the sysrepo would never pick up on that change and
wouldn't provide the right set of redirects in its configuration,
leading to transport errors. While I think that could happen now, at
least if sysrepo was restarted/refreshed, it would pick up on the new
redirects, with this, it seems like it would take a publisher change by
the user to cause it to pick up the new redirects.
That's correct. I felt that the chances of server-side redirects
changing were low, and decided that the tradeoff of making pkg.sysrepo
go faster/depend less on the network during boot was better.
If server-side redirects do change, then yes, we'll see transport errors
when installing packages through the system repository, so it will be
obvious that something is wrong. The fix would be for the GZ
administrator to remove the cache, and restart the service, though you
have me wondering whether we try to do this automatically... (though the
more I think about it, the less sure I am: it'd mean somehow being able
to pass a message to the system repository that it should rebuild it's
cache and restart: a recipe for zone->global-zone DOS attacks I think)
Perhaps we could start up immediately with the cached info then
periodically after revalidate the redirects?
That's an idea, though it would mean the system repository refreshing
itself, potentially interrupting package installs. When and how often
do we check? We could get a bit neurotic here.
I'll work on a few extra tests, but other comments welcome.
cheers,
tim
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss