William Schumann wrote:
> Ethan,
> I coded your suggestions, if not otherwise mentioned below.
>
> Made 2nd round of changes and issued webrev at:
> http://cr.opensolaris.org/~wmsch/bug-6590
> Previous webrev at:
> http://cr.opensolaris.org/~wmsch/bug-6590/oldwebrev2/
>
> Ethan Quach wrote:
>> William,
>>
>> General comments/questions
>> --------------------------
>> 1. On every AI boot now, where iSCSI is *not* specified, it seems
>> td will go asking DHCP to check if this client has an iscsi
>> Rootpath specified for it there, is that correct? If so, I'll
>> probably have follow-on comments/questions...
> Yes. The query will be replied to locally by dhcpagent(1m). At this
> time, since only one interface is configured at install time, it will
> take the value from that interface. When multiple interfaces is
> supported, the code may have to query individual interfaces.
a) Did you consider making this a declarative specification in
the manifest? i.e. make the user specify in the manifest
whether or not they want to probe DHCP for the client's iscsi
root target info. If so, what were the tradeoffs?
b) If always checked, there's going to be a lot more noise
in the log file from auto_td.c when it does this DHCP
querying. Can you make sure these messages are as
non-alarming as possible, since they will be seen on
successful install cases. (I can work with you here if
you want explicit suggestions.)
>>
>> 2. Is it possible for a user to specify iscsi parameters but
>> then the ai_target_device they've actually specified to install
>> onto isn't that device? If so, what would result?
> Yes. As coded, the iSCSI-mounted device would become the specified
> install device.
How/when does this happen? I'm not seeing this in the code.
> This assumes that if the user specifies iSCSI boot
> parameters, there is a specific target disk.
Is that specific target disk expected to be equal to whatever
disk the iSCSI boot disk parameters describe?
> The code takes the same
> path used when the device name is explicit. Any other ai_target_device
> device selection parameters are silently ignored.
Let me ask the question with an example. If the user specified an
explicit <target_device_name> value that is some local disk (so it
obviously *isn't* equal to the iSCSI boot target name) like this:
<ai_target_device>
<target_device_name>
c0t0d0
</target_device_name>
<target_device_iscsi_target_name>
c9t9d9
</target_device_iscsi_target_name>
<target_device_iscsi_target_ip>
129.146.111.10
</target_device_iscsi_target_ip>
... /*and whatever else is required for a valid isci spec*/
</ai_target_device>
What is the result? It would seem to me that the code will
go ahead and process the iscsi disk parameters, initiate and
validate the iscsi disk, and touch the /.isci_boot marker file.
But then the disk that will actually get selected to be installed
onto is the local disk, c0t0d0. Is this not the case? If it is,
we need to prevent against that.
>>
>>
>> ai_manifest.rng
>> ---------------
>> Is a flat set of all optional tags here what we really want?
>> Namely, in the auto_td code it seems 'iscsi_name' and 'iscsi_ip'
>> are expected to either both be there, or not at all. If that is
>> the case why didn't you define that in the schema instead?
> It is possible to precisely describe the legal combination of tags in
> the RNG schema for iSCSI.
>
> Recall that the manifest parser/validator is run early - before AI looks
> at the parsed output - and AI terminates if errors are found before AI
> can look at the result.
> If the user is in error, for whatever reason, the log has 2 lines:
> error validating the manifest
> Auto install failed - Invalid manifest <manifest name> specified.
> and says nothing explaining the error further.
>
> To test, I left out a mandatory iSCSI parameter in the manifest, ran AI,
> and the console showed the following messages:
> Relax-NG validity error: Extra element ai_target_device in interleave
> [this is overly technical; does not name the "extra element"
> correctly, but the parent element]
> /tmp/ai_manifest_temp_1116.xml:15: element ai_target_device: Relax
> -NG validity error : Element ai_manifest fails to validate content
> [temp file is deleted so user can't examine line 15, provided the
> user can decipher the message; the error message makes no sense literally]
> ... and so on.
>
> Now, even if the validation messages were improved, they would still be
> generic and still wouldn't help as much as logging statements provided
> from AI. However, if the schema were improved, these errors would be
> found on the AI server side during 'installadm add' without any
> additional code to validate. Because of the latter, perhaps the better
> option is to do the improved validation you suggest in the schema, and
> work on improving the validator output.
> Upgrading ai_manifest.rng per your suggestion.
So in addition to that, are there other aspects of these manifest
changes that can be dealt with in this way? For example, should
'chap_name' and 'chap_secret' be handled like this as well?
>>
>>
>> auto_install.h
>> --------------
>> 206-215 - since these are iscsi specific, why not define a new,
>> separate structure for this set of info and include an instance
>> in auto_disk_info?
> Made a separate typedef for the iSCSI target parameters.
>> auto_install.c
>> --------------
>> 63 - why is this declaration here instead of in td_lib.h
>> or something like that ?
> It is AI code in auto_td.c.
So then it should be in auto_install.h then it seems?
>>
>>
>> 664 - If the answer to general question #2 above is yes, it
>> seems 'p' should be compared against 'iscsidevnam' before
>> creating the iSCSI marker file?
> iscsidevnam is written as the install device, so no comparison is
> necessary.
Are you saying that if the user specifies iscsi disk information,
that the code that validates (and instantiates) that iscsi disk
also writes adi->diskname? Where is that happening? I'm not
finding that in the code. This logic might be happening somewhere
in code you're not touching, so can you point me to it if that is
the case?
>>
>>
>> auto_td.c
>> ---------
>> 454-455 - What's a "serious error" as opposed to a normal error? :-)
> Not every inconsistency or failure is sufficient to justify terminating
> the install. If dhcpinfo fails, we want to know about it, but not give
> up completely, eh?
>> Can you clarify in the comment when devname could be set? It looks
>> like with a return of -1, devname is untouched. With a return of 0,
>> devname will have been populated if found?
> Yes, you read it correctly. Will try again to improve the explanation.
Perhaps I'm being a little picky here, but the comment still doesn't
clarify which return codes correlate with when devname could be
modified, and when its left untouched.
>>
>> 645 - any reason why this is hardcoded instead of using MAXNAMELEN ?
> As indicated in the comment, the size of the buffer is used as a maximum
> length in sscanf. Since the scanf format string cannot have a variable
> maximum length, given options of token-pasting or dynamically building
> the sscanf format string, I thought this was the simplest choice.
I really would prefer against the hardcoding. Would a strncmp() and
then a strlcpy() be too much trouble?
Also, where did you come up with the value of 512? I looked, but
couldn't find a define in /usr/include that tells us the max size
of a disk name. Otherwise, wouldn't it be MAXNAMELEN, which is 256?
>>
>> 653 - What's the significance of the length 4? Why is something
>> less than that an invalid device name?
> The shortest device name I can think of for disks is cNdN, length 4.
> Changed length 4 to strlen("cNdN");
I would suggest that if there's a supported interface to validate
a disk name, then use it. If not, I don't see the point of this
sanity check here. I think it'd potentially get us into more
trouble than be helpful. I suggest removing it.
auto_td.c
---------
473 - initialize to NULL.
474,475,476,478 - These are always set to something before
being used, so I suggest initializing them to NULL instead.
479 - this variable doesn't seem to be used anywhere.
497,500 - Initialize to NULL.
664 - Initialize to NULL.
724 - Aren't IPv6 addresses separated by colons, and aren't
there up to eight sections, not six?
>>
>>
>> td_iscsi.c
>> ----------
>> 452 - These three pointers should all be initialized to NULL.
>> Not quite sure what you're trying to achieve by initializing
>> *lun_num = "" ??
> Subsequent code handles NULL-terminated strings.
*target_name and *ip_address aren't NULL-terminated strings. They're
not initialized so they're garbage aren't they? I know you fill
them with the calls to nvlist_lookup_string() at 458 and 463, but as
a matter of principle, can you please initialize them to NULL?
As for *lun_num, I *think* I see why you initialize it to the empty
string now.
At 482, you're not checking for the return code. If your intent is
to make it *okay* for the value to not exist in the nvlist, then
can I suggest using nvlist_lookup_pairs() instead.
if (nvlist_lookup_pairs(attrs, NV_FLAG_NOENTOK,
TD_ISCSI_ATTR_LUN, &lun_num, NULL) != 0) {
td_debug_print(LS_DBGLVL_ERR, "failed to lookup XXX\n");
return (TD_E_XXX);
}
That way you can still catch if some internal failure happened
with the nvlist processing. The only caveat is that you had to
have to initialized attrs with NV_UNIQUE_NAME, which you do.
thanks,
-ethan
> William
>> thanks,
>> -ethan
>>
>>
>> William Schumann wrote:
>>> Updated webrev including all requested changes with Bing's changes
>>> included.
>>>
>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=6590
>>> http://cr.opensolaris.org/~wmsch/bug-6590/
>>> previous webrev: http://cr.opensolaris.org/~wmsch/bug-6590/oldwebrev
>>>
>>> Thank you,
>>> William
>>> _______________________________________________
>>> caiman-discuss mailing list
>>> caiman-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>