On 01/11/2012 12:32 AM, Dave Miner wrote:
On 01/10/12 05:17, Jan Damborsky wrote:
On 01/ 9/12 08:54 PM, Dave Miner wrote:
On 01/09/12 06:58, Jan Damborsky wrote:
On 01/ 6/12 06:26 PM, Dave Miner wrote:
On 01/06/12 04:40, Jan Damborsky wrote:
Hi Paul, Mike,
I am currently evaluating how to approach fix for following problem
you reported and commented on:
7058014 if svc-system-config creates rpool/export, it should mount
it at
/export
As it's been a while since that discussion happened, let me try to
start
with
summarizing the problem, then later take a look at possible solutions.
Overview
========
System configuration (config-user smf service in particular) provides
for
possibility to create initial user account. As part of that,
config-user
service creates separate ZFS dataset for user's home directory.
In default case, ZFS dataset '<root_pool>/export/home/<login>' is
created with
mountpoint inherited from '<root_pool>/export/home' parent ZFS
dataset.
Since all installers create<root_pool>/export
and<root_pool>/export/home
ZFS datasets during installation process (utilizing Target
Instantiation
module)
with mountpoints set to /export and /export/home respectively, we end
up with desired '/export/home/<login>' mountpoint for home ZFS
dataset.
Problem statement
=================
That said, Automated Installer (used for installation of non-global
zones)
is a little bit special in a sense it provides for complete control
over
hierarchy of ZFS datasets created.
That means it's possible to end up with a system without
'<root_pool>/export' and
'<root_pool>/export/home' datasets created during installation. Such
configuration
is accomplished via omitting appropriate entries in target section of
customized
AI manifest.
In such case, '<root_pool>/export' and '<root_pool>/export/home'
datasets
are later created by config-user service along with home ZFS
dataset as
a side
effect of calling 'zfs create' with '-p' option which forces creating
all
non-existent parent ZFS datasets. The problem is that those
datasets are
mounted on mountpoints inherited from parent dataset (<root_pool> ZFS
dataset
in this case), so we end up with following structure:
dataset:mountpoint
------------------
<root_pool>/export:/<root_pool>/export
<root_pool>/export/home:/<root_pool>/export/home
<root_pool>/export/home/<login>:/<root_pool>/export/home/<login>
Which is what user currently neither expects nor desires.
Solution A
==========
If my understanding is correct you propose to address that in
config-user smf
service by explicit setting desired mountpoints for all parents
created.
To be honest I am not quite convinced that's solution which fits the
existing model, as sysconfig should not explicitly manipulate datasets
which
are out its scope (parent datasets). It's goal of Target Instantiation
module
to handle that task and spreading that logic across several places
would
be confusing as well as it does not sound as a good principle in
general.
Another issue I can see with this is that those datasets are
explicitly
configured in default AI manifests. If user intentionally omits those
entries
in customized AI manifest, I believe we should honor that and not
implicitly
create those datasets despite user's intent.
Based on that, I propose following alternative.
Solution B
==========
If config-user is asked to create ZFS home dataset and its parents are
missing, treat
that as a fatal error. In such case, let config-user smf service
inform
user on console
about that and let the service enter maintenance mode.
The reasoning behind this is that such situation would be result of a
misconfiguration
on user's side, in particular that there seems to be a requirement to
create ZFS dataset
in ZFS hierarchy not compliant with the one explicitly expressed
via AI
manifest.
I believe we shouldn't try to remedy such state, as we can't assure
the
result would
be compliant with user's intent. Instead, we should let user know that
invalid configuration
was supplied.
Please let me know if that may be a reasonable alternative or if I am
missing
other aspects of this problem which should be taken into account when
looking
for a solution of this problem.
I'm not sure this proposal addresses how the user would recover from
and correct the invalid input. Can you walk through that?
Let me elaborate more on that, as I agree I missed that part.
In accordance with current design, if config-user ends up in maintenance
mode
as a result of fatal failure, user is provided with sulogin prompt.
In such situation, user is recommended (on console) to login and observe
smf log file.
In our particular case, I think we could populate smf log file with more
information
about error as well as instructions how to proceed. I am wondering if
saying something
like following may do the job:
"Service failed to create home directory ZFS dataset for initial user,
likely
because one or more parent ZFS dataset is missing (was not created
during installation).
Reinstall the system with ai_manifest(4) specifying appropriate entries
for all
non-existent parent datasets (see
/usr/share/auto-install/manifest/default.xml
for example).
Alternatively, to recover from the failure, create parent dataset(s)
manually on command
line and reboot the system."
I would expect the user should just be able to clear the config-user
service without having to go through a reboot cycle.
Thank you for pointing this out, Dave.
I agree that in such case, it should be possible just to 'clear' the
service in order to recover.
I have verified that in fact it's already the case, i.e. 'svcadm clear
config-user' works, reboot
is not needed. It's been reflected in the updated error message below.
A third possible fix is, of course, to modify the home directory
setting in the profile to use a dataset that's subordinate to a
dataset that has been created, then clear config-user (probably
requires a re-run of manifest-import, too).
Yep, though it would be more complex comparing to the previous solution
(would require manifest-import step) and I am not sure if the result would
be compliant with the original user's intent.
From what I understand (Mike may correct me), the original problem
happened during
transition phase when AI switched from implicit creation of shared ZFS
datasets
to the explicit one (and appropriate entries were added to AI manifest).
The problem was caused by the fact that old AI manifest was used as a
template
to install system with new AI.
In that particular scenario, the desired result was to end up with a system
with those shared ZFS datasets created and I have been assuming that this is
what we should aim for in proposed recovery solutions.
We shouldn't be running into this with that original case anymore. The thing that concerns me to a fair extent is that a system that's installed with AI using a profile that has been deliberately
modified to exclude the export and export home entries but then configured with sysconfig will now end up with a failure in configuration if there's a user account created. What can we do to
prevent a failure from happening there? And how can we make it more obvious in the configuration profiles and AI manifests that there is this linkage for those that aren't using sysconfig? The
mkdir solution mitigates that failure case, but I agree it has its own problems.
Yet another possibility seems to be that we could, in this situation,
not create a separate dataset at all, and merely mkdir -p the home
directory. Yes, this would be a significantly different behavior.
The attraction is that it seems to get the system up and running,
though perhaps not ideally.
I can see this would be the least intrusive solution (no need for user's
intervention),
but to be honest, I am not quite convinced that would be desirable approach.
The fact that we end up in that error situation may be a sign that there
is likely
something wrong with user's AI configuration and I believe we should
make user
aware of this, so that one can repair the cause of the problem before
"malformed"
configuration is used to deploy more systems.
Also, I think that the result of this solution would be a system neither
in optimal
nor in 'supported' state which I think is not acceptable for enterprise
scenario
where AI is used. Such systems may live for a while, thus I think that
form long
term point of view, it would be better to deploy them as expected at the
beginning.
I think it would be most unfortunate if the suggested solution is to
re-install, since that's 10-15 minutes at best, more likely quite a
bit longer.
I think it depends on number of systems affected. If more systems
ended up in such state as a result of using inappropriate AI configuration,
then it might be faster to repair that configuration (done once) and restart
installations rather than manually repair all of them.
That said, I can see it would make sense to first propose 'local' solution,
so I am wondering if you think rewording the error message in following way
may better fit the intent:
"Service failed to create home directory ZFS dataset for initial user,
likely because one or more parent ZFS dataset is missing (was not created
during installation).
To recover from the failure, create parent dataset(s) manually on command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4) specifying
appropriate entries for all non-existent parent datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."
We need this error message to be definitive (not "most likely") and it needs to
prescribe the datasets that we think need to be created.
That would be no doubt also more professional :-)
Looking at existing config-user start method, determining that information
should not be a big deal. The modified error message would then look like:
"Service failed to create home directory ZFS dataset for initial user,
because following parent ZFS datasets are missing (were not created
during installation):
rpool/export
rpool/export/home
To recover from the failure, create parent dataset(s) manually on command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4) specifying
appropriate entries for those ZFS datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."
Somewhat tangential, but while I'm thinking about it, is there a reason config-user can't just let useradd create the datasets now that it has that functionality?
In fact, I took a quick look at that feature WRT potential fix for 7030232
and found out that what useradd currently provides does not quite fit
config-user needs. In particular, useradd does not provide for possibility
to customize home ZFS dataset (only its mountpoint), something which config-user
supports via SC manifest. So for non-default case, config-user would still
need to go with 'zfs create'.
Also, it would not help with the problem being discussed, as it relies
on parent ZFS datasets being already created.
I am wondering if those limitations could be candidates for useradd RFE.
Jan
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss