Ethan,
Replies below.
 >>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=6590
 >>>> http://cr.opensolaris.org/~wmsch/bug-6590/
 >>>> previous webrev: http://cr.opensolaris.org/~wmsch/bug-6590/oldwebrev3

Ethan Quach wrote:
> William Schumann wrote:
>> Ethan,
>> I coded your suggestions, if not otherwise mentioned below.
>>
>> Made 2nd round of changes and issued webrev at:
>>    http://cr.opensolaris.org/~wmsch/bug-6590
>> Previous webrev at:
>>    http://cr.opensolaris.org/~wmsch/bug-6590/oldwebrev2/
>>
>> Ethan Quach wrote:
>>> William,
>>>
>>> General comments/questions
>>> --------------------------
>>> 1. On every AI boot now, where iSCSI is *not* specified, it seems
>>> td will go asking DHCP to check if this client has an iscsi
>>> Rootpath specified for it there, is that correct?  If so, I'll
>>> probably have follow-on comments/questions...
>> Yes. The query will be replied to locally by dhcpagent(1m).  At this 
>> time, since only one interface is configured at install time, it will 
>> take the value from that interface.  When multiple interfaces is 
>> supported, the code may have to query individual interfaces.
>
>    a) Did you consider making this a declarative specification in
>       the manifest?  i.e. make the user specify in the manifest
>       whether or not they want to probe DHCP for the client's iscsi
>       root target info.  If so, what were the tradeoffs?
After discussing this with you and Sanjay, we decided to add a tag directing AI 
to the source of the iSCSI parameters.
New tag: target_device_iscsi_parameter_source
Default: "manifest"
Optional: "DHCP"
This was chosen over the existing logic, which automatically took the DHCP 
Rootpath for iSCSI parameters if they were present (and
not in the AI manifest), in order to avoid a user accidentally using Rootpath.

So, if "manifest" is selected or the tag isn't present, DHCP Rootpath is 
ignored.  If "DHCP" is selected, the manifest is ignored 
for iSCSI parameters.

This opens the question of what AI does if "DHCP" is selected and the DHCP 
Rootpath doesn't contain iSCSI parameters.  Currently, it 
is coded to fail the install in this case.  This might not be flexible enough 
for more heterogeneous sites.  An alternative would be 
not to abort the AI install and to allow it to continue to search for a install 
disk target.  Another option might be to allow the 
user to choose the behavior in this regard.  Perhaps it could come in the form 
of a prioritized list, as one you might see in a BIOS 
definition of boot device order, e.g.:
- DHCP, otherwise fail
- DHCP, try local disk
- DHCP, manifest, otherwise fail
- manifest, DHCP, otherwise fail
- manifest, DHCP, try local disk
- etc.
Suggest to discuss this with real users before getting into complicated logic.
>
>    b) If always checked, there's going to be a lot more noise
>       in the log file from auto_td.c when it does this DHCP
>       querying.  Can you make sure these messages are as
>       non-alarming as possible, since they will be seen on
>       successful install cases.  (I can work with you here if
>       you want explicit suggestions.)
Changed so that it does DHCP query only when instructed (above).
>
>>>
>>> 2. Is it possible for a user to specify iscsi parameters but
>>> then the ai_target_device they've actually specified to install
>>> onto isn't that device?  If so, what would result?
>> Yes. As coded, the iSCSI-mounted device would become the specified 
>> install device.
>
> How/when does this happen?  I'm not seeing this in the code.
auto_install.c:
647                 (void) strncpy(adi.diskname, iscsi_devnam,
648                     sizeof (adi.diskname));

>
>> This assumes that if the user specifies iSCSI boot parameters, there 
>> is a specific target disk.  
>
> Is that specific target disk expected to be equal to whatever
> disk the iSCSI boot disk parameters describe?
OpenSolaris decides what the device name is in the end, and the IMA
library returns that device name for the device matching the iSCSI
parameters.
>
>> The code takes the same path used when the device name is explicit.  
>> Any other ai_target_device device selection parameters are silently 
>> ignored.
>
> Let me ask the question with an example.  If the user specified an
> explicit <target_device_name> value that is some local disk (so it
> obviously *isn't* equal to the iSCSI boot target name) like this:
>
>    <ai_target_device>
>       <target_device_name>
>          c0t0d0
>       </target_device_name>
>
>       <target_device_iscsi_target_name>
>          c9t9d9
>       </target_device_iscsi_target_name>
>
>       <target_device_iscsi_target_ip>
>          129.146.111.10
>       </target_device_iscsi_target_ip>
>
>       ... /*and whatever else is required for a valid isci spec*/
>
>    </ai_target_device>
>
>
> What is the result?  It would seem to me that the code will
> go ahead and process the iscsi disk parameters, initiate and
> validate the iscsi disk, and touch the /.isci_boot marker file.
> But then the disk that will actually get selected to be installed
> onto is the local disk, c0t0d0.  Is this not the case?  If it is,
> we need to prevent against that.
It is not, because initially, the AI manifest values are written in for
the target disk.  Then, the iSCSI target device name is written as the
target disk.  The target disk, if supplied, is taken as the install
target, so the iSCSI disk will always be the install target, regardless
of the other disk selection parameters.
>
>>>
>>>
>>> ai_manifest.rng
>>> ---------------
>>> Is a flat set of all optional tags here what we really want?
>>> Namely, in the auto_td code it seems 'iscsi_name' and 'iscsi_ip'
>>> are expected to either both be there, or not at all.  If that is
>>> the case why didn't you define that in the schema instead?
>> It is possible to precisely describe the legal combination of tags in 
>> the RNG schema for iSCSI.
>>
>> Recall that the manifest parser/validator is run early - before AI 
>> looks at the parsed output - and AI terminates if errors are found 
>> before AI can look at the result.
>> If the user is in error, for whatever reason, the log has 2 lines:
>>    error validating the manifest
>>    Auto install failed - Invalid manifest <manifest name> specified.
>> and says nothing explaining the error further.
>>
>> To test, I left out a mandatory iSCSI parameter in the manifest, ran 
>> AI, and the console showed the following messages:
>>    Relax-NG validity error: Extra element ai_target_device in interleave
>>    [this is overly technical; does not name the "extra element" 
>> correctly, but the parent element]
>>    /tmp/ai_manifest_temp_1116.xml:15: element ai_target_device: Relax 
>> -NG validity error : Element ai_manifest fails to validate content
>>    [temp file is deleted so user can't examine line 15, provided the 
>> user can decipher the message; the error message makes no sense 
>> literally]
>> ... and so on.
>>
>> Now, even if the validation messages were improved, they would still 
>> be generic and still wouldn't help as much as logging statements 
>> provided from AI.  However, if the schema were improved, these errors 
>> would be found on the AI server side during 'installadm add' without 
>> any additional code to validate.  Because of the latter, perhaps the 
>> better option is to do the improved validation you suggest in the 
>> schema, and work on improving the validator output.
>> Upgrading ai_manifest.rng per your suggestion.
>
> So in addition to that, are there other aspects of these manifest
> changes that can be dealt with in this way?  For example, should
> 'chap_name' and 'chap_secret' be handled like this as well?
Removing CHAP parameters from schema until CHAP authentication is fully 
implemented.
>
>>>
>>>
>>> auto_install.h
>>> --------------
>>> 206-215 - since these are iscsi specific, why not define a new,
>>> separate structure for this set of info and include an instance
>>> in auto_disk_info?
>> Made a separate typedef for the iSCSI target parameters.
>>> auto_install.c
>>> --------------
>>> 63 - why is this declaration here instead of in td_lib.h
>>> or something like that ?
>> It is AI code in auto_td.c.
>
> So then it should be in auto_install.h then it seems?
Doing so would enforce type checking against a single typedef in both
places, so taking your suggestion.
>
>>>
>>>
>>> 664 - If the answer to general question #2 above is yes, it
>>> seems 'p' should be compared against 'iscsidevnam' before
>>> creating the iSCSI marker file?
>> iscsidevnam is written as the install device, so no comparison is 
>> necessary.
>
> Are you saying that if the user specifies iscsi disk information,
> that the code that validates (and instantiates) that iscsi disk
> also writes adi->diskname?  Where is that happening?  I'm not
> finding that in the code.  This logic might be happening somewhere
> in code you're not touching, so can you point me to it if that is
> the case?
In auto_install.c: auto_select_install_target() takes the diskname from
the auto_disk_info, which has by this time been loaded with iSCSI device
name.  Then in auto_td.c: auto_validate_target(), TD is run, then the
diskname is validated against those found by TD.  See else clause for
*diskname == NULL (here, line 190).
>
>>>
>>>
>>> auto_td.c
>>> ---------
>>> 454-455 - What's a "serious error" as opposed to a normal error? :-)
>> Not every inconsistency or failure is sufficient to justify 
>> terminating the install.  If dhcpinfo fails, we want to know about 
>> it, but not give up completely, eh?
>>> Can you clarify in the comment when devname could be set?  It looks
>>> like with a return of -1, devname is untouched.  With a return of 0,
>>> devname will have been populated if found?
>> Yes, you read it correctly.  Will try again to improve the explanation.
>
> Perhaps I'm being a little picky here, but the comment still doesn't
> clarify which return codes correlate with when devname could be
> modified, and when its left untouched.
Trying again.
>
>>>
>>> 645 - any reason why this is hardcoded instead of using MAXNAMELEN ?
>> As indicated in the comment, the size of the buffer is used as a 
>> maximum length in sscanf.  Since the scanf format string cannot have 
>> a variable maximum length, given options of token-pasting or 
>> dynamically building the sscanf format string, I thought this was the 
>> simplest choice.
>
> I really would prefer against the hardcoding.  Would a strncmp() and
> then a strlcpy() be too much trouble?
>
Using sscanf() is clear and concise and uses a single line of code.
However, there was another complaint, so taking the suggestion, finally.
> Also, where did you come up with the value of 512?  I looked, but
> couldn't find a define in /usr/include that tells us the max size
> of a disk name.  Otherwise, wouldn't it be MAXNAMELEN, which is 256?
>
Using suggested.
>>>
>>> 653 - What's the significance of the length 4?  Why is something
>>>       less than that an invalid device name?
>> The shortest device name I can think of for disks is cNdN, length 4.  
>> Changed length 4 to strlen("cNdN");
>
> I would suggest that if there's a supported interface to validate
> a disk name, then use it.  If not, I don't see the point of this
> sanity check here.  I think it'd potentially get us into more
> trouble than be helpful.  I suggest removing it.
Removed.
>
>
> auto_td.c
> ---------
> 473 - initialize to NULL.
Pointers should not be initialized to NULL unless there is a specific meaning 
for NULL in the context.  In this case, the pointers
all point to NULL-terminated strings, so assigning them to NULL goes against 
the intended usage and can confuse programmers who pick 
up the code in the future.  This is a long-standing Sun coding standard.

Use of lint checking for uninitialized variables should be restored to the 
build process.  I will file a bug for this.
>
> 474,475,476,478 - These are always set to something before
> being used, so I suggest initializing them to NULL instead.
Correct, so they do not need to be initialized.
>
> 479 - this variable doesn't seem to be used anywhere.
>
> 497,500 - Initialize to NULL.
>
> 664 - Initialize to NULL.
>
> 724 - Aren't IPv6 addresses separated by colons, and aren't
> there up to eight sections, not six?
>
IPv6 is not supported for iSCSI boot, so removing the code.
>
>>>
>>>
>>> td_iscsi.c
>>> ----------
>>> 452 - These three pointers should all be initialized to NULL.
>>> Not quite sure what you're trying to achieve by initializing
>>> *lun_num = "" ??
>> Subsequent code handles NULL-terminated strings.
>
> *target_name and *ip_address aren't NULL-terminated strings.  They're
> not initialized so they're garbage aren't they?  I know you fill
> them with the calls to nvlist_lookup_string() at 458 and 463, but as
> a matter of principle, can you please initialize them to NULL?
>
> As for *lun_num, I *think* I see why you initialize it to the empty
> string now.
>
> At 482, you're not checking for the return code.  If your intent is
> to make it *okay* for the value to not exist in the nvlist, then
> can I suggest using nvlist_lookup_pairs() instead.
>
>    if (nvlist_lookup_pairs(attrs, NV_FLAG_NOENTOK,
>        TD_ISCSI_ATTR_LUN, &lun_num, NULL) != 0) {
>            td_debug_print(LS_DBGLVL_ERR, "failed to lookup XXX\n");
>            return (TD_E_XXX);
>    }
Above code is missing parameter for data type - here, DATA_TYPE_STRING.
Thank you,
William
>
> That way you can still catch if some internal failure happened
> with the nvlist processing.  The only caveat is that you had to
> have to initialized attrs with NV_UNIQUE_NAME, which you do.
>
>
> thanks,
> -ethan
>
>> William
>>> thanks,
>>> -ethan
>>>
>>>
>>> William Schumann wrote:
>>>> Updated webrev including all requested changes with Bing's changes 
>>>> included.
>>>>
>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=6590
>>>> http://cr.opensolaris.org/~wmsch/bug-6590/
>>>> previous webrev: http://cr.opensolaris.org/~wmsch/bug-6590/oldwebrev
>>>>
>>>> Thank you,
>>>> William
>>>> _______________________________________________
>>>> caiman-discuss mailing list
>>>> caiman-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>>
>



Reply via email to