presto phase II [LSARC/2008/389 FastTrack timeout 06/25/2008]
Closing as approved. --Irene Artem Kachitchkine wrote: I have no further questions. -Artem This case is due timeout on Tuesday, any further issues, please send an email before then. Thanks --Irene Halton Huo wrote: On Wed, 2008-06-18 at 23:48 -0700, Artem Kachitchkine wrote: The signal between X and Y or X sends signal to Y language is confusing. DBus signals in general are not peer to peer, they are messages broadcast on a bus. Arbitrary number of applications can listen on the bus. It is possible, however, to establish a peer to peer DBus connection between two applications, much like FIFOs or System V message queues. You need to specify: Artem, Thanks for your review. Here is updates on this part, please review. Network Printer (via SNMP): - Enable network printer discovery service, svc:/network/device-discovery/printers:snmp - The hald network printer add-on broadcast a SNMP GET - Network printer which is SNMP capable would then respond to it - The SNMP agent then populates the HAL Device Tree with the network printer data. - hald detected changes in the HAL device tree and deduces that these are printers, it sends out the DeviceAdded DBUS signal. - ospm-applet, which is a user's session daemon, is waiting and responding to these signals. Based on the unique udi (Unique Device Identifier) it received from hald, it looks up the rest of the data from the Hal device tree. Then it adds print queues for these printers in the background until these are all done. - ospm-applet pop-ups a generic message as a notification bubble notifying the user that network print queues have been added. - ospm-applet also sends out a DBUS message, PrinterAdded. - If the Print Manager is running at the time, it will be notified by the message PrinterAdded, and will refresh its view immediately and hence shows the newly added queues. Otherwise, these messages are ignored. - the path of the object(s) that implement the org.opensolaris.ospm.applet interface - which of the many possible DBus buses (system? session?) or private connections the object is instantiated on - signal parameters, if any (UDI? queue name?) We're using two DBus signals. One is a system one ??DeviceAdded, another one is our application customized one ???PrinterAdded 1. ???DeviceAdded path: /org/freedesktop/Hal/Manager interface: ???org.freedesktop.Hal.Manager bus: system 2. PrinterAdded path: ???/org/opensolaris/ospm/applet interface: ???org.opensolaris.ospm.applet ??? bus: session Where need I mention this in the arc document? Thanks, Halton. Also your proposal has three variations on the signal name: PrinterAdded, printerAdded and Printeradded. Which one is it? -Artem
Xsane [LSARC FastTrack 2008/385 timeout 06/24/2008]
Closing as approved. --Irene Irene Huang wrote: Hi, all I am sponsoring this case. The timeout is set to be 06/24/2008 XSane is targeting opensolaris 2008/11 (hopefully). Please review the attached proposal. --Irene
Meta Tracker - A Desktop Search Tool [LSARC/2008/375 FastTrack timeout 07/01/2008]
Resetting time out to be July 1st. --Irene Robert Kinsella - Sun Microsystems Ireland - Software Engineer wrote: Darren J Moffat wrote: Stephen Browne wrote: On Tue, 2008-06-24 at 14:52, Darren J Moffat wrote: /Jerry Tan wrote: When tracker is integrated, it will be disable by default. users can run gnome-session-properties to enable it. How does that work with TX ? Does that enable an instance per labelled zone or only in the global zone ? / Since the startup is configurable it will be configurable per zone. The default for the zones will be the same as teh default for teh global zone. How is that done with gnome-session-properties ? I don't believe that is label aware, is it ? If I select gnome-session-properties from the menu as Preferences-Sessions I see no indication that it is able to set separate policy per label and isn't it running in the global zone ?. Am I missing something ? Hi Darren, when an application/preference dialog is launched - it displays the label it is launched in. Launching if from e.g. the internal zone, the window label Internal. Any settings changed in the window labeled Internal will affect the settings for the internal zone for that user. Some settings are only applicable in the Global zone, these preference dialogs / applications are always launched in the global zone. To review a list of these (global zone only preference dialogs/applications) see /usr/share/gnome/TrustedPathExecutables Bob
libtasn1 for OpenSolaris [LSARC/2008/390 FastTrack timeout 06/25/2008]
Hi, Mike if you are OK with the license indicated in the copyright file, please let me know and I would like to close this case as approved. Thanks --Irene Irene Huang wrote: Generally, we don't include license information in the manpage. For Opensolaris project, we ship a copyright file in each package, I think that will do. --Irene On Tue, 2008-06-24 at 10:10 +0800, Jeff Cai wrote: On Mon, 2008-06-23 at 16:24 -0700, Mike Oliver wrote: Irene Huang wrote: ... 4.2. Interfaces: Exported Interfaces InterfaceClassification Comments ----- --- ... /usr/lib/libtasn1.soVolatileShared library ... Why is this library not versioned in the usual manner? (E.g. libtasn1.so.1, accompanied by a .so symlink pointing to the current version for use by the normal linker environment.) Sorry, this is my mistake. /usr/lib/libtasn1.so Volatile Symbolic link /usr/lib/libtasn1.so.3Volatile Symbolic link /usr/lib/libtasn1.so.3.0.15 Volatile Shared library Spec will be changed accordingly. I don't know whether there's a SAC Best Practice for bringing the license terms of libraries to the notice of developers who might wish to use those libraries. AFAICT this one is LGPL; does that need to be mentioned anywhere? I'll mention that in libtasn1 man page. Thanks Jeff Mike.
Fast Reboot PSARC/2008/382
This is an e-mail note indicating that the dry-run information will be suppressed from man pages, and are reclassified as Project Private per disscussions between the project team, Jerry and Garrett. Sherry -- Sherry Moore, Solaris Core Kernel http://blogs.sun.com/sherrym
Fast Reboot PSARC/2008/382
Sherry Moore wrote: This is an e-mail note indicating that the dry-run information will be suppressed from man pages, and are reclassified as Project Private per disscussions between the project team, Jerry and Garrett. Sherry Thank you. Just for the clarity of the record, I think this means that the project team agrees that the dry run options to reboot, as well as to uadmin(2), are project private. Btw, given that the dry run option to uadmin is private, it seems that you probably could just skip modifying reboot for dry-run. For the internal testing purposes for which I think this is intended, uadmin(1M) should be adequate for testing. See the work done by the CPR project, where they use various different subcommands to uadmin for testing. -- Garrett
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Darren J Moffat writes: The behavior of HP-UX and AIX is unknown to the author of this case, and since they are closed source there is no easy we to determine what their implementation does. AIX prints an empty string, as though had been passed in. Our HP-UX box has locked up, or I'd give you that one as well. :-/ This isn't a scalable way to approach the problem and hurts the reputation of Solaris and OpenSolaris releases. It also hinders the The sad thing here is that it's really the bug-ridden application code that mishandles NULL pointers that's of poor quality, so it's not OpenSolaris's reputation that should be at stake. So, with this one under our belts, should we also fix up the str*(3C) family of functions so that they quietly ignore NULL pointers as well? An application that's incautious with NULL can't possibly just make that mistake with printf alone, can it? Is NULL the only bad pointer worth caring about? What sorts of bad pointer checks need to be made so that malfunctioning applications can continue running without dropping core? How deep does the rabbit hole go? -- James Carlson, Solaris Networking james.d.carlson at sun.com Sun Microsystems / 35 Network Drive71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
My only possible objection to this is that this change might mask certain kinds of bugs which might be useful to catch. Can we have an environment variable to turn off this behavior for debugging purposes (perhaps leaving it turned off by default in a debug build of ON?) A perhaps better solution might be to place an assert() in printf for the string not being NULL. (Hmmm... do we even *have* debug/non-debug versions of libc as we do for the kernel?) -- Garrett Darren J Moffat wrote: Template Version: @(#)sac_nextcase 1.66 04/17/08 SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: libc printf behaviour for NULL string 1.2. Name of Document Author/Supplier: Author: Darren Moffat 1.3 Date of This Document: 25 June, 2008 4. Technical Description Background -- The current behavior of the printf(3C) family of functions in libc when passed a NULL value for a string format is undefined and usually results in a SEGV and crashed application. The workaround to applications written to depend on this behavior is to LD_PRELOAD=/usr/lib/0 at 0.so.1 (or the 64 bit equivalent). The workaround isn't always easy to apply (or it is too late data has been lost or corrupted by that point). Some will often state, myself included, that you shouldn't assume that the printf(3C) family will deal with a NULL argument for a string and that arguments should be checked before calling printf(3C). The behavior of the SunOS 4.x printf(3C) and that of the still shipping binary compatibility library /usr/4lib/libc.so.1 was to use the string (null) if the argument for a %s was NULL. This is also the behavior in current versions of GNU libc (as used on most (maybe all) mainstream Linux kernel based distributions) and the libc for FreeBSD, NetBSD, OpenBSD. The behavior of HP-UX and AIX is unknown to the author of this case, and since they are closed source there is no easy we to determine what their implementation does. It is reasonably common to find FOSS applications that when run on Solaris/OpenSolaris fail due to the behavior of the printf(3C) in libc. The GNOME libraries in snv_92 even appear to make this assumption in some places (g_log function) based on cores I've seen for gnome-settings-daemon and pidgin, I know that the Sun GNOME team has fixed such issues in the past. In both of those cases it wasn't the application that caused it but an assumption in some library code they both use for error handling. In some FOSS communities the upstream authors are often not interested in changing their source and view the failure as a Solaris bug (taking the stance that Linux and *BSD don't have this issue). Some are more accommodating and have taken a half way step and have their code check that 0 at 0.so.1 is LD_PRELOAD'd when running on Solaris, sometimes the authors will fix the code. This isn't a scalable way to approach the problem and hurts the reputation of Solaris and OpenSolaris releases. It also hinders the building of binaries for upstream FOSS components targeting an OpenSolaris release repository. There is a large volume of software not originally authored on Solaris/OpenSolaris that is critical to the success of OpenSolaris. So a permanent fix for this is needed that scales well in OpenSolaris and upstream community developer time. Proposal This case proposes to change the default Solaris/OpenSolaris libc behavior for the printf(3C) family so that it reverts to the SunOS 4.x behavior of printing (null) instead of (likely) causing an application crash. This change will apply the XPG and wide char variants as well. There are no documentation changes from this case, as the current Solaris documentation says nothing about the behavior of printf(3C) family when passed a NULL. Since no application should be depending on the current behavior of getting a SEGV when a passing a NULL there is no need to making this change configurable. In fact doing so could cause even more harm than the current situation. There are no interface taxonomy changes. The release binding for this change is patch. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
Fast Reboot PSARC/2008/382
Sherry Moore wrote: On Wed, Jun 25, 2008 at 04:56:06AM -0700, Garrett D'Amore wrote: Sherry Moore wrote: This is an e-mail note indicating that the dry-run information will be suppressed from man pages, and are reclassified as Project Private per disscussions between the project team, Jerry and Garrett. Sherry Thank you. Just for the clarity of the record, I think this means that the project team agrees that the dry run options to reboot, as well as to uadmin(2), are project private. Yes. Thank you. Btw, given that the dry run option to uadmin is private, it seems that you probably could just skip modifying reboot for dry-run. For the internal testing purposes for which I think this is intended, uadmin(1M) should be adequate for testing. See the work done by the CPR project, where they use various different subcommands to uadmin for testing. I believe that's implementation detail that the project team can choose to implement. Agreed. I was just offering some friendly implementation advice, not architectural guidance. -- Garrett -- next part -- An HTML attachment was scrubbed... URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20080625/390eae92/attachment.html
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
2008/6/25 James Carlson james.d.carlson at sun.com: Darren J Moffat writes: The behavior of HP-UX and AIX is unknown to the author of this case, and since they are closed source there is no easy we to determine what their implementation does. AIX prints an empty string, as though had been passed in. Our HP-UX box has locked up, or I'd give you that one as well. :-/ According to Sun's developer pages [1], HP-UX has the same behaviour. -- Shawn Walker [1] http://developers.sun.com/solaris/articles/portingUNIXapps.html
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
James Carlson james.d.carlson at sun.com wrote: Darren J Moffat writes: The behavior of HP-UX and AIX is unknown to the author of this case, and since they are closed source there is no easy we to determine what their implementation does. AIX prints an empty string, as though had been passed in. Our HP-UX box has locked up, or I'd give you that one as well. :-/ HP-UX also prints an empty string like: J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de(uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Updated NPIV support for xVM [PSARC/2008/404 FastTrack timeout 07/02/2008]
Template Version: @(#)sac_nextcase 1.66 04/17/08 SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Updated NPIV support for xVM 1.2. Name of Document Author/Supplier: Author: Jack Meng 1.3 Date of This Document: 25 June, 2008 4. Technical Description 1.3.1. Date this project was conceived: N/A 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.1. The PAC or CPT you expect to review your project: Solaris PAC 1.4.2. The ARC(s) you expect to review your project: PSARC 1.4.3. The Director/VP who is Sponsoring this project: Scott.Tracy at Sun.COM 1.4.4. The name of your business unit: Solaris SOFTWARE Group 1.5. Email Aliases: 1.5.1. Responsible Manager: Roger.Dong at sun.com 1.5.2. Responsible Engineer: Jack.Meng at sun.com 1.5.3. Marketing Manager: 1.5.4. Interest List: npiv-iteam at sun.com 2. Project Summary 2.1. Project Description: Support NPIV device in Solaris xVM 2.2. Risks and Assumptions: This work is for Solaris xVM hosts only, therefore guest domains configured with NPIV device may not be able to be migrated to hosts running on other platforms, e.g., Linux. 3. Business Summary 3.1. Problem Area: N/A 3.2. Market/Requester: N/A 3.3. Business Justification: N/A 3.4. Competitive Analysis: N/A 3.5. Opportunity Window/Exposure: N/A 3.6. How will you know when you are done?: They are able to use NPIV within virtual machines in xVM. 4. Technical Description: 4.1. Details: This project introduces two extensions for Solaris xVM utilities to configure NPIV devices with paravirtualized guest domains. NPIV is enable in Solaris by PSARC 2007/501, refer to section 5 for more info. The first one is to attach a specified LUN from a virtual FC port to guest domain. xVM hypevisor in Solaris is extended to accept a new type of blk device, npiv, and to trigger according script to create the virtual port on specified physical port, discovery the lun on specified target and finally attach it as a normal blk device to guest domain. The second one is to attach a specified virtual FC port to guest domain as a pseudo device. 'Pseudo' means there will be no corresponding frontend in guest domain for that virtual FC port. xVM hypevisor in Solaris is extended to accept a new kind of device, pseudo, and to trigger script to work on different pseudo devices. Currently the only pseudo device will be NPIV port and the corresponding script will create the virtual port on specified physical port and then, 1)attach existing luns from that virtual port to guest domain 2)register a script for device sysevents happenning on the virtual port, afterwards newly added/deleted luns will be attached/detached from the guest domain. Eigher way the npiv device is able to be migrated if the remote the destination host has the specified physical FC port and on the same Fabric with the physical port on source host. 4.2. Bug/RFE Number(s): 6713736 NPIV lun support in XVM 6713700 Dynamic blk dev support in kernel 4.3. In Scope: N/A 4.4. Out of Scope: N/A 4.5. Interfaces: N/A 4.6. Doc Impact: Man page: virsh(1M), xm(1M) System Administration Guide: Virtualization Using the Solaris Operating System 4.7. Admin/Config Impact: Introduces a new format of options in 'virsh' and 'xm'. Refer to docs listed in 4.6 for details. 4.8. HA Impact: N/A 4.9. I18N/L10N Impact: N/A 4.10. Packaging Delivery: N/A 4.11. Security Impact: N/A 4.12. Dependencies: N/A 5. Reference Documents: http://sac.sfbay/PSARC/2007/501/ 6. Resources and Schedule: 6.1. Projected Availability: Solaris Nevada B94/B95 6.2. Cost of Effort: N/A 6.3. Cost of Capital Resources: N/A 6.4. Product Approval Committee requested information: 6.4.1. Consolidation or Component Name: xvm, on 6.4.3. Type of CPT Review and Approval expected: FastTrack 6.4.4. Project Boundary Conditions: N/A 6.4.5. Is this a necessary project for OEM agreements: N/A 6.4.6. Notes: N/A 6.4.7. Target RTI Date/Release: Nevada B94/B95 6.4.8. Target Code Design Review Date: 25/06/2008 6.4.9. Update approval addition: N/A 6.6.1.
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Garrett D'Amore wrote: My only possible objection to this is that this change might mask certain kinds of bugs which might be useful to catch. Doesn't seem to be a problem for other platforms. Can we have an environment variable to turn off this behavior for debugging purposes (perhaps leaving it turned off by default in a debug build of ON?) I don't think that is a good idea. None of the other platforms I viewed have this and we didn't have this with SunOS 4.x either. More importantly I don't like the idea of having to check an environment variable on every printf(3C) family call, there could be a noticable performance hit for that. Basically I see the current behaviour as a long standing regression from SunOS 4.x and was really tempted to not even file an ARC case but just do a bug fix. Lets not over design this or beat it to death in ARC unless there is a standards reason our sound architectural reason why this is the wrong thing do to (and given the behaviour of all the other platforms I'd have a hard time believing that). A perhaps better solution might be to place an assert() in printf for the string not being NULL. (Hmmm... do we even *have* debug/non-debug versions of libc as we do for the kernel?) No we don't, but then with DTrace we don't need to either. If the goal is to find applications that could be fixed to not do this then use DTrace with the pid provider to find them. -- Darren J Moffat
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
James Carlson wrote: Stefan Teleman writes: Darren J Moffat wrote: The sad thing here is that it's really the bug-ridden application code that mishandles NULL pointers that's of poor quality, so it's not OpenSolaris's reputation that should be at stake. Oh I agree completely but we are the odd one out here. I apologize for interjecting in this discussion, but, wouldn't the character string (null) or (nichts) or NULL being printed on stdout/stderr act as a clear indicator of the bug, and of its precise location ? Actually, no, it's not. You know its apparent location in the stdout character stream, but nothing about where the problem might be in the code. In other words, the fact that i see (null) instead of some other printable value, printed out, provides me with absolutely no indication as to which char* pointer was NULL, in the sequence of arguments passed to printf(3C) and friends. That is because I do not have a reasonably defined set of expectations with respect to what *should* be the output of printf(3C) and friends. Speaking only for myself: when i see the string (null) printed out, when in fact i was expecting Giraffe, i do not think Oh, the stdout character stream contains the string \(null\). How odd. I wonder what could have caused this.. I think Why is Giraffe NULL, when it shouldn't be ?. --Stefan -- Stefan Teleman Sun Microsystems, Inc. Stefan.Teleman at Sun.COM
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
On Wed, 25 Jun 2008 12:30:24 -0400 James Carlson wrote: Garrett D'Amore writes: My leaning is to #1. It seems like we're trying to make bad applications happy, to satisfy what is probably a small minority of developers who feel that such use of NULL should be legal (despite documentation to the contrary) -- and who are unwilling to use a perfectly reasonable workaround, at a potential cost to the greater set of well-behaved applications. I think what's missing there is that this is (unfortunately) not just a minority of developers. The bulk of user-space software looks like this these days. People are just plain careless, ... this is a bit harsh people tend to code to what their local system tolerates even the most meticulous coders can fall prey to this by assuming their favorite standard implementation actually implements the standard e.g., glibc implements posix well, yes, it might, but it also makes choices on some implementation defined behavior that could be mistaken for standard behavior, sometimes even after meticulous reading of the standard -- Glenn Fowler -- ATT Research, Florham Park NJ --
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
On Wed, Jun 25, 2008 at 09:39:55AM -0700, Garrett D'Amore wrote: Is the next step really to start checking for null arguments to other string functions? What about null pointers passed to other library routines, such as free(), qsort(), bsearch()? free(NULL) is already allowed, always. To provide a general answer: if it were to turn out that, say, strlen(NULL) works (e.g., returns 0) on Linux and *BSD and that *many* applications depend on this behaviour, then we may have to consider making our strlen() do the same. If this were to violate some standard, then that will complicate the decision process -- we may need to resort to compile-/link-time behaviour selections (for libraries and executables both).
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Stefan Teleman writes: Actually, no, it's not. You know its apparent location in the stdout character stream, but nothing about where the problem might be in the code. In other words, the fact that i see (null) instead of some other printable value, printed out, provides me with absolutely no indication as to which char* pointer was NULL, in the sequence of arguments passed to printf(3C) and friends. In many cases, it doesn't tell you which executable was involved, or which loadable module in that executable, or anything about the conditions that caused the problem. Speaking only for myself: when i see the string (null) printed out, when in fact i was expecting Giraffe, i do not think Oh, the stdout character stream contains the string \(null\). How odd. I wonder what could have caused this.. I think Why is Giraffe NULL, when it shouldn't be ?. I suspect that's because you may be the developer of that software, and thus know the locations and contents of the printf() strings themselves. Try it as a user: Installing software modules. Unable to load module (null). Operation complete; 327 modules loaded. Does that tell you anything useful? What executable failed? Was it a library function or the main program? Could it have been one of the modules that was (apparently) being loaded by way of dlopen? It may as well have said somthing bad happened. Glenn Fowler writes: On Wed, 25 Jun 2008 12:30:24 -0400 James Carlson wrote: I think what's missing there is that this is (unfortunately) not just a minority of developers. The bulk of user-space software looks like this these days. People are just plain careless, ... this is a bit harsh people tend to code to what their local system tolerates I've thought about it a bit, and I'm going to stand by it. Even if your system 'tolerates' something as an implementation artifact, that doesn't make it right. In order for this to be something other than simple carelessness, I think the developer would have had to have seen that (null) output, and decided, eh, that's good enough. If that's what happened, then what I wrote isn't probably harsh enough. e.g., glibc implements posix well, yes, it might, but it also makes choices on some implementation defined behavior that could be mistaken for standard behavior, sometimes even after meticulous reading of the standard Regardless of whether glibc implements POSIX, I don't think that ignoring bad pointers is good programming practice. Perhaps it's just my opinion alone, but I think failing to consider whether a pointer ought to be NULL and doing something about it reflects a lack of due care. Sure; that can affect anyone. If you find it in my code, feel free to point the example out as an instance of carelessness on my part. (I'm not seeking a Gary Hart moment here, but rather saying that I don't think the criticism is unfair.) -- James Carlson, Solaris Networking james.d.carlson at sun.com Sun Microsystems / 35 Network Drive71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
fbconfig configuration utility for Xorg [PSARC/2008/396 FastTrack timeout 06/30/2008]
No substantive unresolved issues remaining, this case was approved during this morning's PSARC meeting. To summarize the modifications to the original proposal, here is the revised interface table: Interfaces exported: /usr/sbin/fbconfig Uncommitted modified program /usr/lib/fbconfig/fbconf_xorgProject Private new /usr/lib/fbconfig/libfbconf_xorg.so Project Private new /usr/lib/fbconfig/libSUNWkfb_conf.so Project Private new, kfb-specific /etc/X11/xorg.conf External Volatile -- Eric
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
James Carlson wrote: Regardless of whether glibc implements POSIX, I don't think that ignoring bad pointers is good programming practice. Perhaps it's just my opinion alone, but I think failing to consider whether a pointer ought to be NULL and doing something about it reflects a lack of due care. Indeed... but we're just being particularly picky in about printf... in the libc malloc implementation, we were far more accommodating of broken programs - the existing implementation allows the same pointer to be freed multiple times (so long as there was no intervening call to {re,m,c}alloc), and realloc works if given a pointer that was already freed. These are much more pernicious and dangerous than allowing a NULL pointer to be passed to %s format specifiers. Also, we print NaN when an illegal floating point number is passed to printf rather tjhan just raising a FP exception. We can either choose to be compatible w/ virtually everyone else, or we can continue to be particular about printf's string arguments. Personally, I'd vote for compatibility. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts You will contribute more with mercurial than with thunderbird.
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
On Wed, 25 Jun 2008 13:07:51 -0400 James Carlson wrote: Glenn Fowler writes: On Wed, 25 Jun 2008 12:30:24 -0400 James Carlson wrote: I think what's missing there is that this is (unfortunately) not just a minority of developers. The bulk of user-space software looks like this these days. People are just plain careless, ... this is a bit harsh people tend to code to what their local system tolerates I've thought about it a bit, and I'm going to stand by it. Even if your system 'tolerates' something as an implementation artifact, that doesn't make it right. In order for this to be something other than simple carelessness, I think the developer would have had to have seen that (null) output, and decided, eh, that's good enough. If that's what happened, then what I wrote isn't probably harsh enough. as is this case with many of my bugs, they are data dependent and my regression tests never hit all of the cases cooked up by ingenous users so instead of eh, that's good enough its didn't think of that and the next release will have a fix and companion test(s) -- different from carelessness -- Glenn Fowler -- ATT Research, Florham Park NJ --
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Darren Moffat wrote: ???James Carlson wrote: [...] So, with this one under our belts, should we also fix up the str*(3C) family of functions so that they quietly ignore NULL pointers as well? The goal of this case was parity with the other mentioned libc implementations. I have looked at what the others do for strlen(NULL) and they will SEGV on that. I haven't looked at every str*(3C) function. That's fine, but FYI, the Microsoft C runtime does this substitution of for NULL in the str* functions. (Or they used to. I haven't tried recently.) An application that's incautious with NULL can't possibly just make that mistake with printf alone, can it? Probably not but this is a safety net that is available on other platforms. Similar saftey nets for the str*(3C) functions don't at initial glance appear to exist. If the applicaiton/lib is that free and loose with NULL then we still have the ability to LD_PRELOAD=0 at 0.so.1 if the code can't be fixed. And for the record, that is not a sufficient solution, because then you won't trap on other errant NULL pointers. But again, OK, not this case. This case is about fixing the very commonly encountered case and the case were Solaris is disastrously different to the common platforms. Is NULL the only bad pointer worth caring about? What sorts of bad pointer checks need to be made so that malfunctioning applications can continue running without dropping core? How deep does the rabbit hole go? The Rabbit hole is very deep but this case is just about getting dinner for tonight, someone else can explore the rest of the warren. Understood. Later discussion is concerned with what to replace the null pointer with. Here's a suggestion for that: In libc:printf #pragma weak _printf_null_str_replacment() const char * _printf_null_str_replacement() { return (); } and in printf if (str_ptr == NULL) str_ptr = _printf_null_str_replacement(); and then let whatever links with libc provide something different if it wants to. I.e. to get SIGSEGV: provide a function that returns NULL instead.
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Jyri Virkki wrote: Another ARC gone wild thread? Can we keep the lessons of the past months in mind? I read the materials and there are no exported interface changes, no imported interface changes and not even any documentation changes. Only an implementation change to something formally defined as undefined, so while your code reviewers should have something to say if the implementation chooses to, say, reboot the system, code reviews are not in scope for ARC. So there's actually nothing for ARC to review here.. why file this case? My vote is you close it approved automatic and go fix the bug already. I think someone else commented on the potential for changes in debuggability and performance. ARC review is not inappropriate. Everything below is just me adding to the noise, so ignore for this case purposes. James Carlson wrote: An application that's incautious with NULL can't possibly just make that mistake with printf alone, can it? They're not being incautious with NULLs, they (C developers) do it because printf is known and documented to handle it. Oh, not on OpenSolaris? Too bad for us, nobody cares. A great way to make people avoid adopting OpenSolaris is to make sure the apps they run succesfully everywhere else crash only on OpenSolaris. GNU printf is documented to print '(null)', so no big surprise developers rely on documented behavior. If you accidentally pass a null pointer as the argument for a `%s' conversion, the GNU library prints it as `(null)'. We think this is more useful than crashing. http://www.gnu.org/software/libtool/manual/libc/Other-Output-Conversions.html (The text goes on to say But it's not good practice to pass a null argument intentionally but in true human/developer nature, people don't pay attention that that. Once the behavior has been promised and implemented, people will use it.) So if this is the case, then lets just follow suit. But lets do so explicitly, with similar language in our printf() documentation, rather than just silently doing something. I'd assume for familiarity that the same (null) string should be used, as well. Admittedly, I'm not thrilled with this (I want my cycles back!) but I'm OK with it, particularly if we just go ahead and document it as an acceptable practice. (In particular, I'm thinking about all the cases of (x ? x : null) that are in debug statements around. If you're going to make me burn the cycles to make the test in libc, at least let me reclaim them in my other code. :-) -- Garrett -- next part -- An HTML attachment was scrubbed... URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20080625/6bd3e309/attachment.html
[zfs-discuss] zfs primarycache and secondarycache properties [PSARC/2008/393 FastTrack timeout 06/27/2008]
On Jun 25, 2008, at 11:49 AM, Darren Reed wrote: This would seem to be a significant use case for the model of having non-overlapping data types in each of the two caches. Since no reply was received on zfs-discuss, I'm redirecting it to psarc to indicate that this question isn't closed. I see some comments, but no direct question. So what is the question? eric Darren J Moffat wrote: Darren Reed wrote: So I spent some time thinking about different directions you could build on this in the future, for example: 1) controlling the size of the ARC/L2ARC by controlling the cache size 2) specifying different backing storage for primary/secondary cache 3) having more than two levels of cache ...none of which is precluded by current efforts. With (2), if the backing storage for each cache is different and it is slower to access the secondary cache than the primary, then you may not want metadata to be stored in the secondary cache for performance reasons. As an example, you might be using NVRAM (be it flash or otherwise) for the primary cache and ordinary RAM for the secondary. In this case you probably don't want any metadata to be stored in the secondary cache (power failure issues) but the same may not hold for user data. But I'm probably wrong about that. I doubt you would be, the primarycache is system memory not a cache device. The secondarycache is the L2ARC devices specified with the cache vdev type to zpool so your examle would be the otherway around.
2008/403 [libc printf behaviour for NULL string]
Date: Wed, 25 Jun 2008 15:45:44 -0500 From: Rick Matthews Richard.Matthews at sun.com Subject: Re: libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008] ... About the case: It seems to me this is about being bug compatible with other implementations. This one doesn't seem particularly offensive. WRT compatibility, is this more a gang of four issue as to whether this is the familiarity we want? Speaking both as a PSARC member and a gang of four member, yes, this case definitely helps achieve the sort of familiarity we're aiming for. From my perspective, it's a no-brainer. I was a charter member of the purity camp whose position I see many of the posters to this case arguing. But that kind of purity is a luxury we can no longer afford. -- Glenn
[zfs-discuss] zfs primarycache and secondarycache properties [PSARC/2008/393 FastTrack timeout 06/27/2008]
eric kustarz wrote: On Jun 25, 2008, at 12:53 PM, Darren Reed wrote: eric kustarz wrote: On Jun 25, 2008, at 12:02 PM, Darren Reed wrote: eric kustarz wrote: On Jun 25, 2008, at 11:49 AM, Darren Reed wrote: This would seem to be a significant use case for the model of having non-overlapping data types in each of the two caches. Since no reply was received on zfs-discuss, I'm redirecting it to psarc to indicate that this question isn't closed. I see some comments, but no direct question. So what is the question? If the primary and secondary cache are different media, especially in the case of one being non-volatile, shouldn't it be possible to allow the user to specify that they want to use the non-volatile cache for meta data without requiring them to forgo caching user data in a volatile cache? Sure: # zfs set primarycache=all tank/fs # zfs set secondarycache=metadata tank/fs ARC (server memory) is the primary cache, l2ARC (SSD) is the secondary cache. eric Oh. are you saying that because metadata is directly s[ecofoed to be cached in one place, it won't also be cached in the other? The case didn't make that behaviour clear, if so. No - the ARC will cache both data and metadata. The l2ARC will only cache metadata. the desire would be primary=user data, secondary=meta data... Desire for what workload? You would have to *always* go to the secondary cache (or disk) for metadata in order to get to the data cached in the primary cache. I don't see a sensible use case for this - this is why we are not allowing a data only option. But we've been over this already. Ugh, brain fade... I was thinking of the caches as being in parallel rather than layered. Darren
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
On Wed, Jun 25, 2008 at 09:20:54AM -1000, Joseph Kowalski wrote: Jyri Virkki wrote: I read the materials and there are no exported interface changes, no imported interface changes and not even any documentation changes. Sigh,... But its a very visible semantic change. Yes: it lets certain apps run! :) Note: no tongue in cheek.
[zfs-discuss] zfs primarycache and secondarycache properties [PSARC/2008/393 FastTrack timeout 06/27/2008]
eric kustarz wrote: On Jun 25, 2008, at 12:02 PM, Darren Reed wrote: eric kustarz wrote: On Jun 25, 2008, at 11:49 AM, Darren Reed wrote: This would seem to be a significant use case for the model of having non-overlapping data types in each of the two caches. Since no reply was received on zfs-discuss, I'm redirecting it to psarc to indicate that this question isn't closed. I see some comments, but no direct question. So what is the question? If the primary and secondary cache are different media, especially in the case of one being non-volatile, shouldn't it be possible to allow the user to specify that they want to use the non-volatile cache for meta data without requiring them to forgo caching user data in a volatile cache? Sure: # zfs set primarycache=all tank/fs # zfs set secondarycache=metadata tank/fs ARC (server memory) is the primary cache, l2ARC (SSD) is the secondary cache. eric Oh. are you saying that because metadata is directly s[ecofoed to be cached in one place, it won't also be cached in the other? The case didn't make that behaviour clear, if so. the desire would be primary=user data, secondary=meta data... Darren Darren J Moffat wrote: Darren Reed wrote: So I spent some time thinking about different directions you could build on this in the future, for example: 1) controlling the size of the ARC/L2ARC by controlling the cache size 2) specifying different backing storage for primary/secondary cache 3) having more than two levels of cache ...none of which is precluded by current efforts. With (2), if the backing storage for each cache is different and it is slower to access the secondary cache than the primary, then you may not want metadata to be stored in the secondary cache for performance reasons. As an example, you might be using NVRAM (be it flash or otherwise) for the primary cache and ordinary RAM for the secondary. In this case you probably don't want any metadata to be stored in the secondary cache (power failure issues) but the same may not hold for user data. But I'm probably wrong about that. I doubt you would be, the primarycache is system memory not a cache device. The secondarycache is the L2ARC devices specified with the cache vdev type to zpool so your examle would be the otherway around.
2008/403 [libc printf behaviour for NULL string]
Darren (several timezones removed), When you get through this tread, I need your position (as submitter) on the proposed binding of Patch. From there, I can decide what to do with this case. - thanks, - jek3
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Jyri Virkki wrote: I read the materials and there are no exported interface changes, no imported interface changes and not even any documentation changes. Sigh,... But its a very visible semantic change. - jek3
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Nicolas Williams wrote: On Wed, Jun 25, 2008 at 09:20:54AM -1000, Joseph Kowalski wrote: Jyri Virkki wrote: I read the materials and there are no exported interface changes, no imported interface changes and not even any documentation changes. Sigh,... But its a very visible semantic change. Yes: it lets certain apps run! :) Note: no tongue in cheek. About the fast track length and protocol: I picked Nico's message to reply for no particular reason except the header should ensure it be recorded properly (note: use the ARC ans case as part of the header). This thread is also getting a bit long. Darren, is it time to make it a discussion at PSARC rather than a email trail. About the case: It seems to me this is about being bug compatible with other implementations. This one doesn't seem particularly offensive. WRT compatibility, is this more a gang of four issue as to whether this is the familiarity we want? -- - Rick Matthews email: Rick.Matthews at sun.com Sun Microsystems, Inc. phone:+1(651) 554-1518 1270 Eagan Industrial Road phone(internal): 54418 Suite 160 fax: +1(651) 554-1540 Eagan, MN 55121-1231 USAmain: +1(651) 554-1500 - -- next part -- An HTML attachment was scrubbed... URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20080625/369cfdb6/attachment.html
[zfs-discuss] zfs primarycache and secondarycache properties [PSARC/2008/393 FastTrack timeout 06/27/2008]
eric kustarz wrote: On Jun 25, 2008, at 11:49 AM, Darren Reed wrote: This would seem to be a significant use case for the model of having non-overlapping data types in each of the two caches. Since no reply was received on zfs-discuss, I'm redirecting it to psarc to indicate that this question isn't closed. I see some comments, but no direct question. So what is the question? If the primary and secondary cache are different media, especially in the case of one being non-volatile, shouldn't it be possible to allow the user to specify that they want to use the non-volatile cache for meta data without requiring them to forgo caching user data in a volatile cache? Darren Darren J Moffat wrote: Darren Reed wrote: So I spent some time thinking about different directions you could build on this in the future, for example: 1) controlling the size of the ARC/L2ARC by controlling the cache size 2) specifying different backing storage for primary/secondary cache 3) having more than two levels of cache ...none of which is precluded by current efforts. With (2), if the backing storage for each cache is different and it is slower to access the secondary cache than the primary, then you may not want metadata to be stored in the secondary cache for performance reasons. As an example, you might be using NVRAM (be it flash or otherwise) for the primary cache and ordinary RAM for the secondary. In this case you probably don't want any metadata to be stored in the secondary cache (power failure issues) but the same may not hold for user data. But I'm probably wrong about that. I doubt you would be, the primarycache is system memory not a cache device. The secondarycache is the L2ARC devices specified with the cache vdev type to zpool so your examle would be the otherway around.
libc printf behaviour for NULL string [PSARC/2008/403 FastTrack timeout 07/02/2008]
Another ARC gone wild thread? Can we keep the lessons of the past months in mind? I read the materials and there are no exported interface changes, no imported interface changes and not even any documentation changes. Only an implementation change to something formally defined as undefined, so while your code reviewers should have something to say if the implementation chooses to, say, reboot the system, code reviews are not in scope for ARC. So there's actually nothing for ARC to review here.. why file this case? My vote is you close it approved automatic and go fix the bug already. Everything below is just me adding to the noise, so ignore for this case purposes. James Carlson wrote: An application that's incautious with NULL can't possibly just make that mistake with printf alone, can it? They're not being incautious with NULLs, they (C developers) do it because printf is known and documented to handle it. Oh, not on OpenSolaris? Too bad for us, nobody cares. A great way to make people avoid adopting OpenSolaris is to make sure the apps they run succesfully everywhere else crash only on OpenSolaris. GNU printf is documented to print '(null)', so no big surprise developers rely on documented behavior. If you accidentally pass a null pointer as the argument for a `%s' conversion, the GNU library prints it as `(null)'. We think this is more useful than crashing. http://www.gnu.org/software/libtool/manual/libc/Other-Output-Conversions.html (The text goes on to say But it's not good practice to pass a null argument intentionally but in true human/developer nature, people don't pay attention that that. Once the behavior has been promised and implemented, people will use it.) Garrett D'Amore wrote: Is the next step really to start checking for null arguments to other string functions? What about null pointers passed to other library routines, such as free(), qsort(), bsearch()? I didn't see Darren propose that so not this case. But if you'd like to go research all those functions to see if there are some other areas where there is a serious disconnect between the defacto industry standards and the OpenSolaris implementation, hurting OpenSolaris adoption, it would be useful info to share later. -- Jyri J. Virkki - jyri.virkki at sun.com - Sun Microsystems