[2] FOSS case: Yersinia Layer 2 Attack Tool [PSARC/2009/643 FastTrack timeout 12/01/2009]
the case should've used the term multi-protocol packet generation instead of attack tool. At this point I'm withdrawing the case, and will be consulting with the project team whether to submitted it back as a full case, or deliver it under /contrib as it was suggested here and off-line. Kais On 11/25/09 07:06, John Fischer wrote: Kais, Although this case might be a familiarity case I think it fails the non-controversial condition for a fast track. Garrett and Darren have already questioned it raising the controversial question. I also question the need to supply this in our repositories. I would suggest that it be a full case. John
[2] FOSS case: Yersinia Layer 2 Attack Tool [PSARC/2009/643 FastTrack timeout 12/01/2009]
FOSS questionnaire and draft man page have been placed in the case directory. Kais.
FOSS case: Yersinia Layer 2 Attack Tool [PSARC/2009/643 FastTrack timeout 12/01/2009]
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: FOSS case: Yersinia Layer 2 Attack Tool 1.2. Name of Document Author/Supplier: Author: Si-wei Liu 1.3 Date of This Document: 24 November, 2009 2. Project Summary 2.1 Project Description This project introduces the package of yersinia 0.7.1 into the SFW consolidation. 4. Technical Description Yersinia implements several attacks for the following protocols: Spanning Tree (STP), Cisco Discovery (CDP), Dynamic Host Configuration (DHCP), Hot Standby Router (HSRP), Dynamic Trunking (DTP), 802.1q, Inter-Switch Link Protocol (ISL), and VLAN Trunking (VTP). It helps the pen-tester in different tasks, such as becoming the root role in the Spanning Tree, creating virtual CDP neighbors, setting up rogue DHCP servers, becoming the active router in a HSRP scenario, enabling trunk, performing ARP spoofing over VLAN hopping, adding or deleting VLANs (via VTP), and more. yersinia is quite portable and runs on a variety of platforms. Command name Notes === yersinia Penetration testing tool for layer 2 attacks 5. Interfaces Exported interface Classification Interface type ==== == SUNWyersinia Uncommitted Package name /usr/bin/yersiniaUncommitted Command /usr/share/man/man8/yersinia.8 Uncommitted Manpage Imported interface Classification Interface type ==== === /usr/lib/libnet.so.1.1.2.1 Volatile Library provided by SUNWlibnet Yersinia does not use any environment variable. draft man page and FOSS questionnaire to follow 6. Resources and Schedule: 6.4. Product Approval Committee requested information: 6.4.1. Consolidation or Component Name: SFW 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: sfw 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
Public GLDv3 Interfaces [PSARC/2009/638 FastTrack timeout 11/26/2009]
+1 Kais On 11/20/09 11:25, Nicolas Droux wrote: Seb, My intent is to include this entry point as part of the GLDv3 APIs being committed. It is documented it in the mac(9F) man page draft [1], but I did not list it in the overview. I will update the spec to list it there as well. Nicolas. [1] Available in the materials for the case, see http://arc.opensolaris.org/caselog/PSARC/2009/638/materials/man/mac-9f.txt Sebastien Roy wrote: I haven't reviewed the materials fully yet, but a quick string search of the spec doesn't turn up any references to mac_init_ops(), and this function must be called in drivers' _init() routines. This may have been an oversight. -Seb
Network Auto-Magic (NWAM) Phase 1 Updates [PSARC/2009/577 FastTrack timeout 10/29/2009]
On 10/22/09 10:40, Sebastien Roy wrote: 11. Add default-route properties to IP Interface NCUs Two new properties, ipv4-default-route and ipv6-default-route, allow the user to specify statically configured default router address(es), to be associated with a specific interface. This provides a static alternative to a DHCP-specified default router, which may be associated with an interface if DHCP is in use. alternative to the DHCP-specified default router (returned by dhcpinfo I presume) or in addition to it, as a second default router? 17. Change upgrade behavior Upon upgrade, earlier nwam link and interface configuration will be imported into the User NCP. However, the Automatic NCP will be active by default. The rationale for this change is that the default config implemented in earlier nwam versions is the same as the Automatic NCP behavior, and we expect that most users will not have made changes, and therefore will want the Automatic NCP. The previously discussed change with respect to automatic addition/removal of inserted/removed links makes this especially desirable. Users who actually modified their earlier configuration (which should be a small minority) can switch to the User NCP to get their changes. There is one exception: if any static addresses are specified in the llp file, it is very clear that the user did in fact modify that file; therefore, if a static address is found, the User NCP will be active upon upgrade. Location profiles did not exist in earlier versions of NWAM, so any configuration that NWAM does based on Location specifications may overwrite previous system configuration. On upgrade, the existing configuration will be saved into a User location. This location will be activated if it includes an nsswitch.conf file which uses a nameservice other than DNS (i.e. a nameservice that cannot be configured by NWAM you mean other tan DNS *and* files, ? based solely on information obtained from the network). Kais
OVF Support in virt-convert [PSARC/2009/548 FastTrack timeout 10/16/2009]
On 10/19/09 07:12, Sebastien Roy wrote: Kais, Are you satisfied with Susan's answers to your questions? almost there. On 10/12/09 11:17, Susan Kamm-Worrell wrote: The open source virt-convert import does not yet support the TAR (ova) format. It does support an input of a directory that contains the OVF package files or an input of the ovf file directly. If specifying the ovf file directly the ovf file will describe the other files required by the OVF package. OK, Could you give an example or list in the text of the draft man page what the content of such dir looks like? Can you clarify if there are files ignored in the package content dir while importing (I'm thinking about the cert files for integrity checks). Kais,.
OpenSSL RSA keys by reference in PKCS#11 keystores through the PKCS11 engine [PSARC/2009/555 FastTrack timeout 10/20/2009]
On 10/19/09 01:21, Darren J Moffat wrote: While these are all good points we (Solaris) don't own the documentation for these APIs and we didn't design them. These are OpenSSL APIs that are documented in OpenSSL documentation and we don't modify those docs. is there any doc that comes from Sun where we could capture these gotchas that a developer will encounter? Kais. What you have said is true regardless of which OpenSSL ENGINE is in use and isn't unique to the Solaris provided pkcs11 engine.
OpenSSL RSA keys by reference in PKCS#11 keystores through the PKCS11 engine [PSARC/2009/555 FastTrack timeout 10/20/2009]
I may add a note to our openssl(5) draft change that high level API must be used for that, and can add an example of few such functions so that a user can get the picture. Is that OK? sounds good. +1. Kais thanks, Jan.
OpenSSL RSA keys by reference in PKCS#11 keystores through the PKCS11 engine [PSARC/2009/555 FastTrack timeout 10/20/2009]
+0.75 a couple of questions below + OpenSSL can access RSA keys in PKCS#11 keystores using the + following functions of the ENGINE API: + + EVP_PKEY *ENGINE_load_private_key(ENGINE *e, + const char *key_id, UI_METHOD *ui_method, + void *callback_data) + + EVP_PKEY *ENGINE_load_public_key(ENGINE *e, + const char *key_id, UI_METHOD *ui_method, + void *callback_data) given the semantics described in the case, these functions will fail for multiple reasons: bad argument, key not found, bad internal state (engine hasn't initialized or hasn't authenticated to the token). Yet the return value can be either NULL: failure or Not NULL: a matching key was retrieved. It will be more helpful to give the app developers some info as to the reason of failure, so that they know what to do when the load function returns NULL. Possibly Missing: -- 1. Need to mention somewhere that the caller of the load functions is responsible for calling EVP_PKEY_free(). 2. since the private parts of the on-token keys are never read by the engine, there is an implication on all OpenSSL access routines, like EVP_PKEY_copy_parameters(), EVP_PKEY_get1_RSA(), etc. The'll all gonna fail when the pkey arg comes from a token. Rather than chasing the dozens of functions that use RSA private keys in openssl, maybe it suffices to document that EVP_Decrypt() and EVP_PKEY_free() are the only routines that can use an RSA private key by reference. Kais.
Pass-through iconv code conversion [PSARC/2009/561 FastTrack timeout 10/21/2009]
+1 Kais
OVF Support in virt-convert [PSARC/2009/548 FastTrack timeout 10/16/2009]
- Scope of this case: Typically, you don't get a naked .ovf file. You get an OVF package, which is a tar ball of the content (actual vmdk's of a disk image + possible .iso if needed for startup, optional signing cert ...) along with the .ovf that describes the metatdata for configuring the VM. The package is what VWare's OVF tool exports and imports. It seems to me that stopping at importing and producing only the .ovf falls short of delivering a complete answer and leaves the user on his/her own to go assemble the needed parts. - Interoperability: OVF defines 3 levels of conformance, depending on the attributes and optional extensions implemented. What is the conformance level of OVF files produced/exported by virt-convert? On the import side, what level is understood by this implementation? - Evolution Any tying to a particular version of the format? Kais. On 10/09/09 12:57, Sebastien Roy wrote: The Open Virtualization Format (OVF) is the latest industry wide format used to move guest VMs between different v12n platforms. In following the upstream virt-install project, virt-convert has been enhanced to auto-detect the OVF format as an input type or to have it explicitly set on use as follows: usage: virt-convert -i ovf inputdir|input.vmx|input.ovf [outputdir|output.xml] The manpage has also been updated to reflect the support of the OVF format. (see the man page in the materials directory). 3. Input/output formats virt-convert now supports both .vmx and OVF format input files. 4. Interface table An additional input format is listed in the interface table describing the still changing OVF specification. virt-convert command line Uncommitted virt-convert outputNot-an-interface virt-instance output formatUncommitted VMX input format Volatile OVF input format Volatile 5. References PSARC/2008/579 virt-convert
flowadm(1m) remote_port flow attribute [PSARC/2009/488 FastTrack timeout 09/21/2009]
This case has its +1 and timed out yesterday. Marking it closed. Kais.
Dynamic Ring Grouping on NICs [PSARC/2009/501 FastTrack timeout 09/25/2009]
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Dynamic Ring Grouping on NICs 1.2. Name of Document Author/Supplier: Author: Venu Iyer 1.3 Date of This Document: 18 September, 2009 4. Technical Description I'm filing this fasttrack for Venu Iyer. The release binding is patch. The interface taxonomy is Uncommitted Background == Project Crossbow (PSARC/2006/357) enables creating hardware-based MAC clients (some of these MAC clients are data links such as VNICs) both on the RX and TX side. We define hardware-based MAC clients as having dedicated hardware resources; a RX hardware-based MAC client will have one or more RX ring for exclusive use while a TX hardware-based MAC client will have one or more TX ring for exclusive use. MAC clients that are not hardware-based (RX or TX) share hardware resources with other MAC clients, such MAC clients will not have any TX or RX rings exclusively reserved for them. MAC clients may be hardware-based on RX, but not on TX (and vice-versa). Currently, when a NIC registers with MAC it informs MAC if it supports dedicated hardware RX or TX rings. MAC assigns hardware rings to MAC clients as groups, where a group may contain 1 or more hardware rings. dladm show-phys is currently used to show how RX rings are used by MAC clients. # dladm show-phys -H nxge4 LINK GROUPGROUPTYPE RINGS CLIENTS nxge40RX3 nxge4 nxge41RX1 vnic1 which says we have 1 RX hardware-based MAC client - vnic1 with 1 ring. nxge4, the primary MAC client, is using 3 rings, but will share these with any other MAC client that is subsequently created on the data link nxge4 (i.e. if vnic2 is created on nxge4, MAC clients vnic2 and nxge4 will share group 1, and hence the 3 rings), eg: # dladm show-phys -H nxge4 LINK GROUPGROUPTYPE RINGS CLIENTS nxge40RX3 nxge4,vnic2 nxge41RX1 vnic1 Information about TX rings is not shown by the show-phys subcommand. Today, an administrator can specify that a VNIC must be hardware-based on the RX side (using the -H option to dladm create-vnic). However, there is no way for an administrator to specify o that a MAC client (VNIC or primary MAC client) should be software based, i.e. should not have any dedicated hardware resource, o that a MAC client should be hardware or software based on TX. o the number of RX or TX rings needed for a MAC client. Proposal This proposal gives administrative control over whether a MAC client should be hardware-based or not (RX and TX) and also allows them to specify the number of RX or TX rings that a MAC client needs, if it is hardware-based. We introduce two properties for a link: rxringcnt: The number of RX rings needed. txringcnt: The number of TX rings needed. The values for these properties could be: 0 : This link must not assigned any hardware rings of the specified type. x 0 : This link needs x rings. If the property is not specified for a link, the system will attempt to maxmize the hardware resource utilization by making this MAC client hardware-based depending on rings availability. E.g: # dladm create-vnic -p rxringcnt=0 -l nxge0 vnic1 Will create vnic1 which will not be RX hardware-based. # dladm create-vnic -p txringcnt=2 -l nxge0 vnic2 Will create vnic2 that will be TX hardware-based with 2 TX rings. # dladm create-vnic -p rxringcnt=2,txringcnt=2 vnic3 Will create vnic3 which will be both RX and TX hardware-based with 2 RX and TX rings resp. Modifying the RX or TX rings assigned to an existing link, say nxge0, can be done using set-linkprop, e.g. if nxge0 needs to be given 2 RX rings: # dladm set-linkprop -p rxringcnt=2 nxge0 or for a VNIC, say vnic1, as: # dladm set-linkprop -p txringcnt=2 vnic1 The rings assigned to a link can be viewed using show-linkprop as: # dladm show-linkprop nxge0 LINK PROPERTYPERM VALUEDEFAULT POSSIBLE ... nxge0rxringscnt rw 2-- 0-4 nxge0txringscnt rw 5-- 0-6 ... These new properties obsolete the -H option of dladm create-vnic (i.e. the -H option will be removed). Given that we allow specifying RX and TX rings for links, we need a way to display how many rings are available for use. Additionally, we need to provide the number of hardware-based MAC clients that can be created on the RX and TX side. We introduce 4 additional read-only properties to display this information: rxringavailcnt: The total number of RX rings available for use, i.e. not exclusively given to any MAC client.
IP_DONTFRAG socket option [PSARC/2009/494 FastTrack timeout 09/23/2009]
+1 Kais, On 09/16/09 09:03, Sebastien Roy wrote: I'm submitting this fast-track for Erik Nordmark, it times out on 09/23/2009. The release binding is Patch. Background: -- Busy DNS servers, and other servers that do UDP request/response protocols, typically want to avoid Path MTU discovery since path MTU discovery both adds latency (a packet would be dropped by routers instead of being fragmented and forward) and adds state on the server (Path MTU state would be created for the destination IP address). That is counterproductive when there are lots of clients that do a single UDP request/response to the server. Details: --- For IPv6 we have had two socket options to control this (from RFC 3542) IPv6_USE_MIN_MTU and IPV6_DONTFRAG. But there is no standard for IPv4. However, FreeBSD implements an IP_DONTFRAG socket option, which is used by the BIND DNS server software. This case is to introduce IP_DONTFRAG in Solaris. Exported Interfaces - InterfaceClassification Comments - IP_DONTFRAG Committed ip(7P) - Man page updates: Add this text to ip(7P) after IP_TOS: IP_DONTFRAG If enabled (the default) then the Don't Fragment flag is set on IP packets. Disabling the option means that Don't Fragment will not be set which will result in not creating any Path MTU state due to this socket.
netstat -r flags for blackhole and reject routes [PSARC/2009/495 FastTrack timeout 09/23/2009]
+1 Kais. On 09/16/09 09:10, Sebastien Roy wrote: I'm submitting this fast-track for Erik Nordmark. It times out on 09/23/2009. The release binding is Minor due to the change in semantics of the netstat -r 'B' route flag.
flowadm(1m) remote_port flow attribute [PSARC/2009/488 FastTrack timeout 09/21/2009]
On 09/15/09 07:16, Garrett D'Amore wrote: +1 I guess this is mainly useful for sites that reuse the same connection (or reconnect on the same port) repeatedly, e.g. for making offsite backups or somesuch? yep. Kais. -- Garrett
flowadm(1m) remote_port flow attribute [PSARC/2009/488 FastTrack timeout 09/21/2009]
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: flowadm(1m) remote_port flow attribute 1.2. Name of Document Author/Supplier: Author: Kais Belgaied 1.3 Date of This Document: 14 September, 2009 4. Technical Description flowadm(1m) currently offers only the specification of the local port with TCP and UDP transports. That addressed the needs for expressing bandwidth and priority constraints for services, ifentified by the transport and the local port number. Support for transport + remote port is needed to allow the creation of flows that describe and regulate outbound communication to remote services. This case adds a new flow attribute remote_port to flowadm(1m) with the same restrictions and interface taxonomy as the existing flow attributes. The draft man page will be placed in the case directory. The releas binding is patch. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
Opinion for PSARC review - PSARC/2009/364 dlstat and flowstat
On 09/04/09 08:04, Sebastien Roy wrote: On Wed, 2009-09-02 at 12:10 -0700, Kais Belgaied wrote: 3. Interfaces Exported Interfaces Interface NameClassification Comments - -- /usr/sbin/dlstat CommittedSUNWcsu /usr/sbin/flowstatCommittedSUNWcnetr The newly Obsolete interface should be listed there, no? yes I added dladm show-* -s Obsolete See draft man pages. flowadm show-flow -s Obsolete to the text. thanks, Kais.
Opinion for PSARC review - PSARC/2009/364 dlstat and flowstat
The project's modified doc per requested spec update and off-line editorial comments is in the finals.materials of the case directory. Attached is the opinion for PSARC review by 09/09/09. Kais. http://blogs.sun.com/kais -- next part -- An embedded and charset-unspecified text was scrubbed... Name: opinion.ascii URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20090902/9d1d1a0d/attachment.ksh
Opinion for PSARC review - PSARC/2009/436 Anti-spoofing Link Protection
The project's modified doc per requested spec updates is in the finals.materials of the case directory. Attached is the opinion for PSARC review by 09/03/2009. Kais. http://blogs.sun.com/kais -- next part -- An embedded and charset-unspecified text was scrubbed... Name: opinion.ascii URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20090826/0afbe354/attachment.ksh
pool dladm link property [PSARC/2009/448 FastTrack timeout 08/25/2009]
On 08/19/09 01:58, Darren J Moffat wrote: Looks perfectly reasonable and has the user interface I'd expect so +1 from me. +1 from me too Kais.
PSARC/2009/374 libxmlsec
The libxmlsec source code tarball will be in the SFW gate and use a build harness similar to other libxml2 libraries. It will be configured and compiled with only its libxmlsec-openssl module to support OpenSSL as the underlying encryption library. The libxmlsec-openssl crypto module is libxmlsec's default module, is MIT licensed and can make use of Sun's OpenSSL crypto engine to use the Userland Encryption Framework. Legal approval for this usage is covered by OSR 7806. A follow up ARC cases may be filed after RFE#6479874 integrates in our OpenSSL implementation to improve crypto engine usage. A future ARC case could also switch us from using the OpenSSL module to a new module with more direct access to the crypto framework. Such a module would first need to be integrated in the community project. so, any dependency on a particular version of OpenSSL's lib{crypto,ssl} ? doesn't this case require an ARC contract against PSARC/2003/500 for the import of OpenSSL ? Kais
PSARC/2009/374 libxmlsec
On 07/08/09 13:36, Nicolas Williams wrote: On Wed, Jul 08, 2009 at 04:24:10PM -0400, Will Young wrote: On Wed, 2009-07-08 at 15:19 -0500, Nicolas Williams wrote: On Wed, Jul 08, 2009 at 04:06:53PM -0400, Will Young wrote: On Wed, 2009-07-08 at 12:11 -0700, Kais Belgaied wrote: so, any dependency on a particular version of OpenSSL's lib{crypto,ssl} ? doesn't this case require an ARC contract against PSARC/2003/500 for the import of OpenSSL ? I no longer saw a need for a contract as of the integration of: 6806387 Move OpenSSL from ON to SFW The move alone could not imply a change of interface stability. PSARC/2006/555 (Move OpenSSL to /usr) did not change the interface stability of any part of OpenSSL (see section 4.5 of the one-pager). It's my understanding that within one gate no contract is needed (updating any element of the gate one is responsible for examining/updating related elements.) Is that not accurate? well, looking back at PSARC/2003/500. it exports the interfaces as Project Private not Consolidation Private. So the contract would still be needed. Alternatively, the supplier of PSARC/2003/500 can judge if it is the right thing for the openssl libs' visibility to be upgraded to consolidation private. Kais. Ah, sorry, another thinko on my part.
PSARC/2009/374 libxmlsec
On 07/08/09 13:48, Garrett D'Amore wrote: Actually, it depends. A contract would be required if the interfaces are Project Private, even within the same consolidation. If the interfaces are Consolidation Private, then yes, no contract would be required. Actually, with that last statement, its clear that moving a subsystem which has Consolidation Private interfaces to a new subsystem has *ARCHITECTURAL* impact. With that in mind, I hope such moves are properly reviewed at ARC. (Something for folks doing such moves to consider...) yep. Also, I haven't seen an answer to my other question about dependency on a particular version of openssl libs. Kais - Garrett
Crossbow Import of Interrupt Affinity Interfaces [PSARC/2009/382 Self Review]
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Crossbow Import of Interrupt Affinity Interfaces 1.2. Name of Document Author/Supplier: Author: Rajagopal Kunhappan 1.3 Date of This Document: 07 July, 2009 4. Technical Description This fasttrack covers the changes required for the Crossbow architecture to import the DDI interfaces below, introduced by PSARC 2009/340. The interface taxonomy is Contracted Project Private. A copy of the contract is placed in this case's directory. Since the architectural impact of the interrupt affinity DDI interfaces and its consumers has already been reviewed and approved with PSARC 2009/340, I am marking this case as approved, pending the manager's emails accepting the contracts terms. I'll be happy to set a timer if needed. typedef processorid_t ddi_intr_target_t; int ddi_intr_get_affinity(ddi_intr_handle_t h, ddi_intr_target_t *tgt_p); int ddi_intr_set_affinity(ddi_intr_handle_t h, ddi_intr_target_t tgt); Crossbow framework will be a consumer of these interfaces. More details on Crossbow requirements: 1) Crossbow provides a framework by which NIC resources such as Rx and Tx rings are exposed to the MAC layer. The MAC layer doles out these resources to VNICs when they get created while reserving a fixed amount for the primary NIC. CPUs, on which the processing of packets take place, can be specified at VNIC creation time or later. If they are specified, the interrupts associated with the Rx/Tx rings need to be re-targeted to the specified CPUs. A mechanism by which a specific MSI-X interrupt can be re-targeted to a different CPU is needed. This is for the virtualization part of Crossbow. 2) For optimal performance of regular NICs (as well as VNICs), the poll thread associated with an Rx ring should be bound to the same CPU as the interrupt CPU. So given an interrupt handle and a CPU, a mechanism is needed to re-target the interrupt to the specified CPU. The above 2 requirements are addressed by the interfaces introduced in PSARC 2009/340. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open
inception review summary of PSARC/2009/364 - dlstat and flowstat
Below are the main architectural issues from the PSARC/2009/364 inception review, to be addressed before the commitment review. - Use of kstats as underlying stats, rather than a new way. The project team will need to justify why kstats aren't suitable for accumulating and reporting the counters needed to be extracted by dlstat and flowstat - Drop the verb (show, show-history, reset) from the subcommand, replacing them with simple option. This is consistent with the rest of {vm,io,net,..}stat commands in the system. - Multiple issues with reset stats: - The loss of information accumulated since boot time may hurt the diagnosability of problem with flows and links. - The action needs to be privileged. - suggestion: move the reset subcommand to {dl,flow}adm, with expected effect of resetting the state of the datalink/flow, which includes the usage statistics thereof. - Clarify change of interface commitment level of existing show-* statistic related subcommands of dladm and flowadm, from 'committed' to 'committed obsolete' - The '-o' (+ or -) option, Be consistent with the rest of the commands in general, (ps, dladm etc.). Use -o with the exact list of columns to be displayed. - Before commitment, the man pages need to be updated to reflect the changes from the inception review. Kais.
inception review summary of PSARC/2009/364 - dlstat and flowstat
Thanks Jim (and good to hear from you). I captured this issue as jdc-01 in the issues file, and Shri indicated off-line that he will answer and follow up on crossbow-discuss at opensolaris.org. Kais. On 06/24/09 12:09, James Carlson wrote: Kais Belgaied writes: - Multiple issues with reset stats: - The loss of information accumulated since boot time may hurt the diagnosability of problem with flows and links. - The action needs to be privileged. - suggestion: move the reset subcommand to {dl,flow}adm, with expected effect of resetting the state of the datalink/flow, which includes the usage statistics thereof. I suggest getting rid of reset altogether. Besides being fundamentally incompatible with SNMP instrumentation, reset just isn't necessary or complete as long as you have decent delta- calculating tools. (And I'd really rather have a good way of computing deltas -- especially as a non-privileged user -- than having an inaccessible way to nuke kernel counters.)
Materials for PSARC/2009/364 (dlstat and flowstat) submitted for review on 06/24/2009
The materials have been submitted in the case directory and should be reflected on the opensolaris.org shortly. The case still needs an intern, so, volunteers welcome. Kais.
Interrupt affinity interfaces and PCITool enhancements [PSARC/2009/340 FastTrack timeout 06/17/2009]
On 06/15/09 20:38, Evan Yan wrote: Hi Kais, Thanks for the comments. Pcitool and the interrupt affinity interfaces use the same under-layer implementation to re-target interrupts to some cpu. Whatever read operation will reflect the current binding status and whatever write operation will override the former settings. so, back the example, let's say you you use pcitool to bind the interrupts from a physical NIC nxge0 to cpu1, the usecrossbow's dladm to set-linprop cpus=2 vnic1 and cpus=3 vnic2 (where vnic1 and vnic2 are built over nxge0) will pcitool show that nxge0's interrupts are bound to cpus 1, 2 and 3? Kais Thanks, -Evan Kais Belgaied wrote: This case also includes the contract for Crossbow framework to use these interrupt affinity interfaces in place of existing PCITool ioctl interfaces. If I look at the this case in isolation from its expected consumers, and with pcitool as the only consumer of the CPU affinity APIs, I have no trouble sending a +1. However, when considering the overall architecture that includes both this case's deliverables as well as the changes expected imminently from its external consumers, I am unclear on how the system will behave when we use both pcitool and those consumers' interrupt settings. I'll use the interaction with Crossbow as an example. The point is similar for the interaction with other tools (intd(1m), etc). Say the system has a physical NIC nxge0, whose interrupts are bound the cpu's 1,2,3,4 using pcitool modified by this case. Later one creates vnic1 over nxge0 which used a couple of hardware rings out of nxge0's, and uses dladm set-linkprop cpus=5,6 vnic1. With the changes imported from this case, the implementation of that call of dladm will attempt to have the MSI/X interrupts assigned to the rings (thus the vnic1) bound to CPUs 5 and 6. Will such setting of vnic1's interrupts, fail because of a conflict with a previous binding by pcitool? will it succeed silently? what will the call to pcitool querying about the interrupt binding for nxge0 return then? CPUs 1,2,3,4 only, as set ? or will it surprisingly show 1,2,3,4,5,6? how about the other way around? will dladm get-linkprop vnic1 see the settings that were previously done by pcitool? Kais.
Interrupt affinity interfaces and PCITool enhancements [PSARC/2009/340 FastTrack timeout 06/17/2009]
This case also includes the contract for Crossbow framework to use these interrupt affinity interfaces in place of existing PCITool ioctl interfaces. If I look at the this case in isolation from its expected consumers, and with pcitool as the only consumer of the CPU affinity APIs, I have no trouble sending a +1. However, when considering the overall architecture that includes both this case's deliverables as well as the changes expected imminently from its external consumers, I am unclear on how the system will behave when we use both pcitool and those consumers' interrupt settings. I'll use the interaction with Crossbow as an example. The point is similar for the interaction with other tools (intd(1m), etc). Say the system has a physical NIC nxge0, whose interrupts are bound the cpu's 1,2,3,4 using pcitool modified by this case. Later one creates vnic1 over nxge0 which used a couple of hardware rings out of nxge0's, and uses dladm set-linkprop cpus=5,6 vnic1. With the changes imported from this case, the implementation of that call of dladm will attempt to have the MSI/X interrupts assigned to the rings (thus the vnic1) bound to CPUs 5 and 6. Will such setting of vnic1's interrupts, fail because of a conflict with a previous binding by pcitool? will it succeed silently? what will the call to pcitool querying about the interrupt binding for nxge0 return then? CPUs 1,2,3,4 only, as set ? or will it surprisingly show 1,2,3,4,5,6? how about the other way around? will dladm get-linkprop vnic1 see the settings that were previously done by pcitool? Kais. Constraints: a) Set affinity limitations for certain interrupt types Fixed or INTx interrupts could be either exclusive or sharable depending on hardware. Because there is no good way to detect that, the current implementation will refuse any set affinity requests for INTx interrupts. On x86 platforms, multiple MSI interrupts of a single PCI function need to be rerouted together since all MSI interrupts share the same MSI address, which in turn includes same CPU number. Hence the current x86 implementation will refuse any set affinity requests for MSI interrupts. The future phase of this project may support MSI group retarget, similar to PCITool method. b) CPU offline considerations CPUs may be online/offlined through administrative interfaces. When a CPU is offlined, all of the interrupts targeting it are re-targeted. The OS will pick any set of the surviving CPUs for re-targeting. The OS is under no obligation to maintain drivers' interrupt affinity preferences. The first phase of this project will not provide any callback on CPU online/offline events. Such callback events need to be defined in the future. If a driver or framework is interested in maintaining optimal CPU targeting, it should monitor its interrupt CPU bindings on a regular basis using ddi_intr_get_affinity(9f) or register a callback to receive various CPU specific events using register_cpu_setup_func(). Where as, the userland entities should subscribe to CPU DR specific sysevents. 4.5.2 PCITool Enhancements Current syntax: pcitool pci@unit-address -i ino=ino [ -r [ -c ] | -w cpu=CPU [ -g ] ] [ -v ] [ -q ] Proposed syntax: pcitool pci@unit-address -i ino# | all [ -r [ -c ] | -w cpu# [ -g ] ] [ -v ] [ -q ] pcitool pci@unit-address -m msi# | all [ -r [ -c ] | -w cpu# [ -g ] ] [ -v ] [ -q ] The PCItool is a low-level tool which provides a facility for getting and setting interrupt routing information. This project is making some minor syntax changes to PCITool since the current syntax is not compliant with existing userland guidelines. In addition, this project is adding a new -m option to retrieve and reroute the interrupt target CPU for MSI/Xs on SPARC platforms. On SPARC platforms, the INO is mapped to an interrupt mondo, and where as one or more MSI/Xs are mapped to an INO. So, INO and MSI/Xs are individually retargetable. Use -i option to retrieve or reroute a given INO, and where as use -m option for MSI/Xs. On x86 platforms, both INOs and MSI/Xs are mapped to the same interrupt vectors. Use -i option to retrieve and reroute any interrupt vectors (both INO and MSI/Xs). So, -m option is not required on x86 platforms. Hence it is not supported. 4.6 Interfaces 4.6.1 Exported Interfaces Interface Stability Comments +---+-- ddi_intr_target_t Project Interrupt target CPU Private ddi_intr_get_affinity Project Get interrupt target CPU
sysbench [PSARC/2009/351 FastTrack timeout 06/18/2009]
+1 , a quick question though: the release binding is minor, is there a dependency on future minor release features that are not available yet as patches to the current minor release of Solaris? Kais On 06/11/09 20:57, James Walker wrote: I'm sponsoring this familiarity case for Peter Rival. The requested release binding is minor. The man page has been posted in the materials directory. Tim Cook of PAE has accepted this benchmark for inclusion in Solaris and a pointer to filebench has been added to the man page. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: sysbench 1.2. Name of Document Author/Supplier: Author: Peter Rival 1.3 Date of This Document: 11 June, 2009 4. Technical Description Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: sysbench 1.2. Name of Document Author/Supplier: Author: Frank Rival 1.3 Date of This Document: 04 May, 2009 4. Technical Description Sysbench Check List 1.0 Project Information 1.1 Name of project/component sysbench 1.2 Author of document Frank.Rival at Sun.COM 2.0 Project Summary 2.1 Project Description SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load. The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks or even without installing a database at all. Current features allow to test the following system parameters: * file I/O performance * scheduler performance * memory allocation and transfer speed * POSIX threads implementation performance * database server performance (OLTP benchmark) 2.2 Release binding What is is the release binding? (see http://opensolaris.org/os/community/arc/policies/release-taxonomy/) [ ] Major [*] Minor [ ] Patch or Micro [ ] Unknown -- ARC review required 2.3 Type of project Is this case a Linux Familiarity project? [ ] Yes [*] No 2.4 Originating Community 2.4.1 Community Name Sysbench[1] 2.4.2 Community Involvement Indicate Sun's involvement in the community [ ] Maintainer [ ] Contributor [*] Monitoring Will the project team work with the upstream community to resolve architectural issues of interest to Sun? [*] Yes [ ] No - briefly explain Will we or are we forking from the community? [ ] Yes - ARC review required prior to forking [*] No 3.0 Technical Description 3.1 Installation Sharable 3.1.1S Solaris Installation - section only required for Solaris Software (see http://opensolaris.org/os/community/arc/policies/install-locations/ for details) Does this project follow the Install Locations best practice? [*] Yes [ ] No - ARC review required Does this project install into /usr under [sbin|bin|lib|include|man|share]? [ ] Yes [*] No or N/A /usr/benchmarks/ - standard benchmark directory Does this project install into /opt? [ ] Yes - explain below [*] No or N/A Does this project install into a different directory structure? [*] Yes - ARC review required [ ] No or N/A Do any of the components of this project conflict with anything under /usr? (see http://opensolaris.org/os/community/arc/caselog/2007/047/ for details) [ ] Yes - explain below [*] No If conflicts exist then will this project install under /usr/gnu? [ ] Yes [ ] No - ARC review required [*] N/A Is this project installing into /usr/sfw? [ ] Yes - ARC review required [*] No 3.1.1W Windows Installation - section only required for Windows Software (see http://sac.sfbay/WSARC/2002/494 for details) Does this project install software into a system drive:\Program Files\Sun\product or system drive:\Sun\product directory? [ ] Yes [ ] No - ARC review required Does the project use the Windows registry? [ ] Yes [ ] No - ARC review required Does the project use HKEY_LOCAL_MACHINE\SOFTWARE\Sun Microsystems\product\version for the registry key? [ ] Yes [ ] No - ARC review required Is the project's stored location HKEY_LOCAL_MACHINE\SOFTWARE\Sun
Time Stamp Option for xxstat Commands Phase II [PSARC/2009/307 Self Review]
On 05/15/09 16:21, Sherry Moore wrote: I am sponsoring this case for Chad Mynhier and closing it as approved automatic as it's simply a follow-on to PSARC/2009/105 to cover more commands. will there be more follow-ons to 2009/105 for the remaining of xxstat commands? netstat, dladm show-link -s -i, flowadm show-flow -s -i, etc? Kais. Thanks, Sherry
bfe fast ethernet driver [PSARC/2009/242 FastTrack timeout 04/22/2009]
+1 Kais On 04/15/09 17:54, Garrett D'Amore - sun microsystems wrote: The following case is being submitted on behalf of Saurabh Mishra. It probably should qualify as automatic, but since its a GLDv3 driver and there was some doubt about it, I'm submitting it as a fast track. Patch binding is appropriate, although a backport to Solaris 10 is probably unlikely. Note that as this driver is for a simple 10/100 part with only a single TX and a single RX ring, questions about Crossbow support, etc. are probably not terribly relevant. I believe that Saurabh *is* planning on supporting the appropriate Brussels interfaces.
PSARC/2009/232 Berkeley Packet Filter for OpenSolaris
Hi Darren, while the architecture looks sound, it has so many pieces and interactions with other subsystems (the pluggale sockets, MAC, etc), way beyond what's suitable for a fasttrack. This should be a full case. Kais. On 04/10/09 14:01, Darren Reed wrote: This is a self sponsored fast track, timeout set for 2 weeks... Abstract This case seeks to build on the Crossbow (PSARC/2006/357[7]) infrastructure and provide a new (to OpenSolaris) mechanism for capturing packets: the use of the Berkeley Packet Filter (BPF). The goal of this project is to provide a method to capture packets that has higher performance than what we have to offer today on Solaris (DLPI based schemes.) It also has the added benefit of increasing our compatibility with other software that has been built to use BPF. Release Binding --- This case seeks to obtain approval for minor release binding. Background == Packet capture on Solaris is currently built around the use of DLPI. Whilst the introduction of libdlpi (PSARC/2006/436[1]) has made it easier to program using DLPI and the IP Observability Project (PSARC/2006/475[2]) introduced the means by which packets that are local to the host could be intercepted, neither did anything to address the primary problem with DLPI: compared to other mechanisms, it is slow, the in-kernel filtering is either not used or very primitive and provides very little useful information about the packet capturing itself by way of statistics. Introduction The architecture of BPF lends itself to more efficient means of doing packet capture, where a single read can transfer large numbers of packets per call. It also allows the sniffer to choose how much data from each packet they wish to copy, be it the entire packet or just the first 128 bytes to capture headers. Internal Architecture ~ Internally, the architecture of BPF is very simple: it has a lower half that receives packets from the NIC drivers, copying matching packets into a static buffer and an upper half that implements a character pseudo-device. Buffers --- The backing for the pseudo-device operating as a character device is a buffer allocated by the driver for storing packet data in. The buffersize used by the device for storing copied packet data in is set by the application. By default libpcap sets this size the the same size as the driver's default: 32k. The maximum this project allows is 16M. Two buffers of this size are allocated by the driver: an active buffer and a hold buffer. This supports applications doing sleeping reads, if they aren't using poll, and reading an entire buffer of data whilst the system continues to catch new packets. Applications can set the buffer size using libpcap or with the BIOCSBLEN ioctl (see man page.) List of Interfaces -- BPF maintains an internal list of network interfaces that it supports capturing packets for. What distinguishes this list from that either in the mac or ip modules is that it uses the datalink type as a part of the key for determining what is an identical entry. Additionally, on OpenSolaris the device structure used inside of the ip module is different to the mac module, preventing either one being used as a master list by BPF. Answering queries such as returning the complete list of datalink types supported by a device (BIOCGDLTLIST), would be much more complicated without that internal list. Packet Capture -- When BPF is called from the mac layer, it is handed the packet as it is received from the NIC driver as part of the promiscuous callback handling in the mac layer. It is the same mblk_t for the packet that will later be passed on though the stack and has neither the mblk_t's nor dblk_t's duplicated. Thus the capturing of the packet becomes part of the execution of the datapath for each packet. Interactions with existing technology in Solaris This section goes into detail about what impact this project has on other areas of Solaris or what impact they have on this project. Vanity Naming ~ The Vanity Naming Project[6] introduced the means by which link names could be changed to be a different name than the underlying mac name. This project will only support packet capture on interfaces using the interface name allocated by the dls module that was delivered by the vanity naming project. IP Observability The IP observerability project introduced the ability to capture packets from within IP, presenting them through devices files in /dev/ipnet for libdlpi to use. This project will update some of the interfaces introduced by IP observability. Updating IPNET -- Unfortunately the mechanism used to do this is bound up within IP. To build upon the work done here, this project will change the
10G link properties [PSARC/2009/206 Self Review]
I requested more time for this case (timing out tomorrow) I'm fine with Paul Garrett's answers. +1 Kais,
libvirt 0.6 [PSARC/2009/212 FastTrack timeout 04/09/2009]
On 04/02/09 10:45, Tim Marsland wrote: I'm sponsoring the following fast-track for John Levon. Updated manpages in materials directory. +1 Kais.
2009/211 SMIT for OpenSolaris
Does this have anything to do with the April 1st today's date? Kais. On 04/01/09 08:55, James Carlson wrote: I'm just tickled pink to sponsor this request for Dan McDonald. The change looks entirely obvious to me, so I've marked it as closed approved automatic. OpenSolaris currently lacks a standard, interoperable system management tool. Fortunately, we are able to discern both the requirements for such a tool and the overall design by just looking at artifacts on other operating systems, so the architecture and top level design needed are trivial. This project provides SMIT for OpenSolaris. SMIT stands for System Modified by Invisible Things, but the project team isn't sure why. The user/administrator interfaces are: /usr/bin/smit [-C] [-D] [-m menu-entry] [-R alternate-root] /usr/bin/smitty [-D] [-m menu-entry] [-R alternate-root] /usr/bin/xsmit [-D] [-m menu-entry] [-R alternate-root] smitty is equivalent to smit -C. xsmit is an enhancement designed by the project team, and is just a symlink to smit. These commands all bring up configuration menus, allowing the user (with appropriate privileges) to modify system configuration by exec-ing commands that he could otherwise learn about via the system man pages. Menus to be delivered with the smit tool are not described in detail here, but will include: SMF FMRI management Networking Interfaces Dtrace Zones ZFS file systems and pools SMIT is a system management tool but it's located in /usr/bin on other systems, so we're placing it there on OpenSolaris as well for familiarity reasons. Other interfaces delivered by this project include: /etc/objrepos - symlink to /etc/svc/ smit.log, - droppings left in current directory smit.script A desktop link for GNOME will be provided. The icon will depict a stick figure frozen in mid-step. The release binding is Tight. The interfaces described are all Difficult. Related OpenSolaris projects may include Visual Panels. The SMIT project team is not in contact with that team, and doesn't expect their agreement with this project, but would like to proceed anyway.
call for email vote: 2008/772 Command Assistant
I vote to approve. Kais. On 03/19/09 12:03, James Carlson wrote: The project team has placed updated materials in the 'post-inception.materials' directory. The new materials reflect the change of direction (to implement an applet). From looking at them, it appears that we've gotten all that we need, so I'm calling for the vote by email. Members: please let me know if you're not ready to vote. Otherwise, please reply with your vote. I'm voting approve.
bwm-ng [PSARC/2009/160 FastTrack timeout 03/13/2009]
OK. +1 (in case it is needed) Kais. On 03/12/09 19:42, caijian guo - Sun Microsystems - Beijing China wrote: Kais, you misunderstood me. I meant that (1) bwm-ng can show statistics of physical links (2) bwm-ng can also show statistics of vnic , aggregation and vnic over aggregation . the outputs were not empty . the outputs were curses format (like dladm show-link -S), so I could not copy the results to this email. You can try it too, my bwm-ng x86 executable file is available at : /net/ns-x4200-22.sfbay/var/tmp/pkg/proto/root_i386/usr/bin/bwm-ng Caijian
bwm-ng [PSARC/2009/160 FastTrack timeout 03/13/2009]
Caijian, I guess you're saying that bwm-ng shows statistics about physical links. only. In that case how come the output of bwm-ng -I e1000g3 came out empty? Kais On 03/12/09 00:05, caijian guo - Sun Microsystems - Beijing China wrote: Kais, I have tested that : Besides physical links, bwm-ng does be able to display the statistics of vnic , aggregation and vnic over aggregation. I used netperf to test. the throughput displayed by bwm-ng is the same as netperf. I tested it like so: # dladm create-vnic -l e1000g3 vnic1 # dladm create-aggr -l e1000g1 -l e1000g2 -L active aggr1 # dladm create-vnic -l aggr1 vnic2 # bwm-ng -I e1000g3 # bwm-ng -I vnic1 # bwm-ng -I vnic2 # bwm-ng -I aggr1 # bwm-ng Caijian ? 2009?03?12? 07:22, Kais Belgaied ??: For networking devices, is the command expected to display the stats of the physical links (the list reported by dladm show-phys) or the network links (the list reported by dladm show-link) ? The lists may differ on Solaris when vnics or link aggregations are present. Kais.
bwm-ng [PSARC/2009/160 FastTrack timeout 03/13/2009]
For networking devices, is the command expected to display the stats of the physical links (the list reported by dladm show-phys) or the network links (the list reported by dladm show-link) ? The lists may differ on Solaris when vnics or link aggregations are present. Kais. On 03/08/09 22:56, caijian guo - Sun Microsystems - Beijing China wrote: Comments : Since we have kstat, there is no need to use libstatgrab, I have tested it and have done lots of experiments . It works well using only kstat, The application does not lack any of its normal functionality, the output is the same. The functionality is the same. Caijian ? 2009?03?07? 05:59, James Carlson ??: Jim Walker writes: James Carlson wrote: bwm-ng is compile such that it is not dependent on libstatgrab. Assuming that change mean that the application lacks any of its normal functionality, I'll give this +1. I meant doesn't lack, obviously. :-/ Right. It does fine using only kstat. Sounds goodl; thanks.
conflict [PSARC/2009/003 FastTrack timeout 01/12/2009]
nit: the case seems to import the PATH env variable. That should be listed in the imported interface Quick question about the -t type. Is the expected output limited to executables too ? Kais
2008/688 Sun Cluster TCP/IP Hooks Update
On 12/03/08 08:06, James Carlson wrote: I'm restarting the timer on this fast-track for Huafeng Lu and the Sun Cluster team. The changes from the last go-around include removing the version number string and adding a netstack ID and a flexible void * argument for future expansion, and the contract (contract-01) has been updated. The timer is set to 12/10/2008. what about the Sun Cluster hooks for SCTP ? Kais.
2008/688 Sun Cluster TCP/IP Hooks Update
On 12/10/08 09:32, James Carlson wrote: The specification sent out for review says this: talk to external servers. Note: this proposal only handles TCP and UDP; SCTP is beyond its scope. I assume that's a future project, if SCTP is to be supported within Sun Cluster at all. (Just like IPv6, it'd likely require non-trivial changes to the Sun Cluster code to do it.) OK. +1 Kais
PSARC/2008/249 - Packet Interception for the MAC layer
There is a rather long discussion about the design and some architectural questions in parallel, outside this alias. I was hoping that discussion converges before the case times out. Darren, this case needs to be put back in waiting need spec, at least until the variety of changes being proposes over the last week settle. Kais
Opinion for review: 2006/357 Crossbow - Network Virtualization and Resource Partitioning
Thanks for review Garrett. Fixed in the case directory. It now reads One of the reviewers pointed out that the metering information produced for flows and datalinks always follows the raw format intended for gnuplot. Kais. On 11/27/08 07:53, Garrett D'Amore wrote: The first sentence in section 4.2 doesn't parse properly. Otherwise it looks OK to me. -- Garrett
Opinion for review: 2006/357 Crossbow - Network Virtualization and Resource Partitioning
Attached is the opinion of the Crossbow project, submitted for PSARC review by December 2nd 2008. Since the commitment review, the project team needed to make four minor changes. The changes were motivated by internal and external feed-back from early adopters and Beta customers, and by the integration with other consumers of the project interfaces. The changes do not constitute any architectural depart from the original specifications voted on, therefore, I'm including them in this opinion. In the case directory, final.materials has the exact specifications, taking into account the commitment review TCR and spec updates. The revised.materials directory there includes the four changes below. I can file a separate fasttrack to cover these changes if members believe it is necessary. - Phased delivery of the some of the resource controls. Initially, maxbw (maximum bandwidth), priority, cpus and fanout properties were approved for flows and datalinks. The support of the cpus property for flows and the fanout property for flows and datalinks are now targeted after the first integration. All other properties, are still supported, and provide sufficient added value for the first phase. - Add a -H option to dladm show-phys and dladm create-vnic. One of the projects added value is exposing a CLI to control the assignment of some of the NICs hardware resources to MAC clients (i.e. VNICs). At commitment time, factory MAC addresses were the only such hardware resource. The need for controlling the assignment of Receive Rings became increasingly obvious during the implementation and Beta testing, thus the -H option. - The integration with new features of LDOMs required the ability to allow an exclusive MAC client to set the interface's MTU, thus a new Consolidation Private MAC client function: mac_set_mtu(). - A limitation in the multi-threadedness of a major NIC device driver necessitated the addition of a flag to request the serialization of transmit operations submitted to that driver. A new flag needed to be added to the Consolidation Private MAC provider interface (MAC_VIRT_SERIALIZE flag of the mac_register_t's m_v12n). Kais. -- next part -- An embedded and charset-unspecified text was scrubbed... Name: opinion.ascii URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20081126/e0293c90/attachment.ksh
Welcome Sebastien Roy as a new PSARC member
Please join me to welcome Sebastien Roy as a new PSARC member, Kais.
2007/272 Project Clearview: IPMP Rearchitecture (Commitment)
* Section 4.12: To improve security, the IP filter interaction has been tweaked such that once an IP interface joins a group, it is subject to any filtering rules for the associated IPMP group interface. and, conversely, when an IP interface leaves the group, it is not subject to the group's filtering rules any more, right? Kais.
Volo Interfaces Amendment [PSARC/2008/694 FastTrack timeout 11/18/2008]
Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Volo Interfaces Amendment 1.2. Name of Document Author/Supplier: Author: Rao Shoaib 1.3 Date of This Document: 11 November, 2008 4. Technical Description I am sponsoring the following fast-track for Rao Shoaib and the Volo project team. This case is a collection of minor changes to the interfaces introduced by Project Volo PSARC/2007/587. This amendment does not affect the original minor release binding. All interfaces added by this amendment are Consolidation Private. The updated full Volo design document will be placed in the case directory. Below is a summary of the changes covered by this fasttrack. *) In the initial design a socket module writer was allowed to register it own sonodeops. This facility was deemed unnecessary and problematic. In the current design module writer registers only one create function that is called by the socket framework after allocating an sonode. *) Functions used in fallback are no longer part of the public upcalls and downcalls vector. Since fallback is supported only on native protocols it is handled via private function calls. There is no change in how fallback works. *) To support 3rd party socket modules two new downcalls sd_send_uio and sd_recv_uio have been introduced. These interfaces all protocol writer to control how data is copied to and from the user buffer. *) A new down call sd_poll has been introduced. This down call support polling when the protocol is doing it own buffering *) To support evolution the interface is versioned. Current versions are obtained via the macros SOCK_UC_VERSION (upcall interface) SOCK_DC_VERSION (downcall interface) *) /etc/sock2path now supports either a module name or a device name as the fourth member of the table. *) Two new socket options have been added SO_SNDTIMEO SO_RCVTIMEO both take a pointer to struct timeval and return EWOUBLOCK if the timer expires. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: on 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
Integrate gbm (gnu-dbm) into Solaris [PSARC/2008/645 FastTrack timeout 10/28/2008]
On 10/29/08 06:23, Martina Tomisova wrote: Hi Rainer, I don't know about any other system which makes this special directory. I can remove the compatibility files from the package and place gdbm.h into /usr/include/ - that's no problem. there's already a /usr/include/ndbm.h shipped with Solaris. The The exported interface table by this case includes a /usr/include/gdbm/ndbm.h it seems that a usr/include/gdbm directory is unavoidable here. Kais Could someone else please express his opinion of this topic? Thank you and have a nice day, Martina
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
On 10/15/08 12:07, Ted H. Kim wrote: Kais, Kais Belgaied wrote: - Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? Most of the IB modules in ON - framework: IBTL IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL HCA Drivers (CI): Tavor, Hermon at the risk of re-stating the obvious, all changes in the above 3 sets of components are in-scope of this case, right? - I am not clear on the consumer side of this new interface: What prompts a ULP to start using this interface? Is it expected to attempt ibt_alloc_io_mem() until it exhausts all resources? It would be easier to assess the completeness and the usefulness of the TI if you either extended the case's scope to include at least the changes on one transport consumer or gave a real example thereof. We are in the process of fixing bugs in certain IB ULPs to be good citizens in big SPARC platforms with memory DR where we have to be careful about what is in/out of the cage. The current plan for ULP usage (i.e. TI usage) is related to this motivation. - mi_ibt_version seems to be an enumeration of apparently mutually exclusive values IBTI_V{1,2,3} yet the definition suggests a combination of independent (discrete) capabilities (FMR support, DMA wrapper support, etc.) The features are examples of what was included at each version change. More discussion of the relationship between ABI and features below ... . Is there any consumer of this interface that uses DMA wapper but not FMR? Well to be honest FMR in the current form turned out to be a failure. So no one uses FMR in ON right now. But I think more generally what is going to happen is that the features will be used independently of each other, since they are generally not related to each other. . For future evolution, is the mi_ibt_version always intended to express a monotonically increasing set of capabilities (capabs of V(n+1) includes all capabs of N(n)) ? Yes, that is the intent, but it is not guaranteed. However, as you might imagine, it would involve a great deal of discussion/agreement to remove anything and the ARC would be in the loop. Basically I'm trying to see if information of different nature if being encoded in the same field. Without slipping in a design discussion you should consider if two fileds are more appropriate: 1 version (number or enum) and one capabs (bitmask). There are in fact capability bitmasks elsewhere. In IB there are a number of optional features. So the bitmasks generally are for saying you have these optional features. But the version number is more an ABI thing, and it is mistake to conflate the too, though the reason we have to change ABI is that new features demand more fields in the structs, etc. so are you fixing the mistake of conflating the capabs + version ? I still see the updated material unchanged on that. Kais.
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
On 10/20/08 15:01, Ted H. Kim wrote: I am also not sure where this is leading. Are you suggesting some specific change to the case? I'm not clear on the future compatibility expectations around the interface introduced by this case: I asked whether versions are incremental all the time and you answered yes ((capabs of V(n+1) includes all capabs of V(n)) I asked if some capabs can be used independently, you also said yes, which suggests (capabs of V(n+1)) don't necessarily have to include all capabs of V(n). Capabs are independent. The former means that an IBT client module written to V(n) is guaranteed to work unmodified on a framework+HCAs that evolved to V(n+1) or later. The latter means modules may break or may continue to work. No backward compatibility is guaranteed. Choose one semantic for the interface and clearly document it in the case. Kais. -ted
Derailing PSARC/2008/628 Interrupt Resource Management
Roamer since this is a full case now, the procedure is to add the comments to the issues file (for internal contributors). Kais. On 10/10/08 19:14, Yunsong (Roamer) Lu wrote: A few more concerns about the IRM proposed interfaces. 1. When the material talks about current interface limitation, 4.1.2, why it's a problem to allow a driver to get more that *2* MSI-X? Those integrated device drivers should be prepared that it can not get any MSI-X interrupt vector, and it might try the legacy INTX instead. So it should not be a problem even all MSI-X vectors have been given to those attached drivers. Late-attached drivers will just use legacy INTX interrupts. The justification for current *hard-coded* limitation doesn't make sense. 2. How the IRM framework decide to decrease the number of interrupt vectors that have been given to a driver? 4.2.1 talk about how driver participate the IRM interfaces, but it's obscure how the framework can wisely move interrupt resources around drivers. 3. How the IRM framework make *wise* decision about which driver can take more interrupt vectors than others? For example, when you have a 10GbE NIC and a 1GbE NIC in the box, both drivers ask for 16 vectors when you don't have enough vectors left. To give the same amount of interrupt vectors to two driver instances are unreasonable. As part of Crossbow project, hardware resources are allocated depending on the real link speed and bandwidth need. But as the low level I/O framework, IRM don't have knowledge about those information. How do you prove that your management is reasonable? 4. What's the perimeter of IRM? In a virtualized environment, interrupts might have been bound to CPUs in an exclusive zone or a guest domain, when IRM asks such interrupt vectors back from the driver, who will take care of the interrupt re-targeting? It's out of driver's control, and I can not find any relevant information from this document. Thanks, Roamer
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
- Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? - I am not clear on the consumer side of this new interface: What prompts a ULP to start using this interface? Is it expected to attempt ibt_alloc_io_mem() until it exhausts all resources? It would be easier to assess the completeness and the usefulness of the TI if you either extended the case's scope to include at least the changes on one transport consumer or gave a real example thereof. - mi_ibt_version seems to be an enumeration of apparently mutually exclusive values IBTI_V{1,2,3} yet the definition suggests a combination of independent (discrete) capabilities (FMR support, DMA wrapper support, etc.) . Is there any consumer of this interface that uses DMA wapper but not FMR? . For future evolution, is the mi_ibt_version always intended to express a monotonically increasing set of capabilities (capabs of V(n+1) includes all capabs of N(n)) ? Basically I'm trying to see if information of different nature if being encoded in the same field. Without slipping in a design discussion you should consider if two fileds are more appropriate: 1 version (number or enum) and one capabs (bitmask). - Under what condition can the caller of ibt_alloc_io_mem()/ibt_free_io_mem() expect the following error to be returned? 59 IBT_MR_ACCESS_REQ_INVALID Invalid Access Control Specified. 60 Remote Write or Remote Atomic access is 61 requested without specifying Local Write. - in ibc_alloc_io_mem.9e these two sections are in conflict: 11 ibt_status_t prefix_ibc_alloc_io_mem(ibc_hca_hdl_t hca_hdl, 12 size_t size, ibt_mr_flags_t mr_flag, caddr_t *kaddrp, 13 ibc_mem_alloc_hdl_t *mem_alloc_hdl); and 23 hca_hdl IBTF channel Interface (TI) HCA Handle previously obtained 24 by calling ibt_open_hca(9F). 25 Kais. On 10/10/08 11:04, Ted Kim wrote: Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: IBTF IO Memory 1.2. Name of Document Author/Supplier: Author: Lida HornI 1.3 Date of This Document: 10 October, 2008 4. Technical Description A. Background The DDI distinguishes between different types of memory. Memory from ddi_dma_mem_alloc(9F) is usable for DMA and takes into account various factors such alignment and other device attributes. Memory from kmem_(z)alloc is not guaranteed to be usable for DMA, though most of the time it does work for that purpose, because of the capability of modern platforms. Nevertheless, there is value in maintaining these DDI distinctions, especially when considering certain platform issues such as memory DR. In the context of InfiniBand, registered memory is the target of DMA operations. This case introduces new InfiniBand related interfaces analogous to the ddi_dma_mem_alloc family of functions to IBTF (InfiniBand Transport Framework, PSARC/2002/132 and follow-on cases). This addition will help the InfiniBand stack maintain the proper DDI memory distinctions important for certain types of platforms. B. Proposal The proposal is to make additions to the IBTF Channel and Transport interfaces. The functionality added to the Transport Interface (TI) is used by the ULPs to allocate memory suitable for DMA and IB memory registration. In turn, the framework uses new entry points in the Channel Interface (CI) to request memory allocation from the underlying HCA driver. These interfaces are basically a wrapper for DDI functions which on the one hand abstract away HCA device specific details at the ULP level, but at the same time allow for the HCA driver to adjust the memory attributes (alignment, etc.) as necessary for efficiency. These additions include an IBTF ABI change, so this case also marks an internal flag day, incrementing our interface version numbers for both the TI and CI as noted below. All interface additions and changes in this proposal have a micro/patch binding. Transport Interface (ON Consolidation Private): ibt_alloc_io_mem() - Allocates DMA memory (at the transport level) ibt_free_io_mem() - Deallocates DMA memory IBTI_V3 - TI version change Channel Interface (ON Consolidation Private): ibc_alloc_io_mem() - Allocates DMA memory (at the HCA driver level) ibc_free_io_mem() - Deallocates DMA memory IBCI_V3 - CI version change C. Summary of Changes by man page See materials directory for copies of man pages. Modified man pages have change bars in the left margin. ibci.9 - modfied (new CI entry points added) ibc_alloc_io_mem.9e - new (alloc free CI entry points) ibt_alloc_io_mem.9f - new (alloc free TI functions)
Derailing PSARC/2008/628 Interrupt Resource Management
I am derailing this case on grounds of non-obviousness of its architectural impact, and possible incompleteness. The discussion already uncovered that there is more than a minor amendment to PSARC/2004/253 Advanced DDI Interrupt Functions . To prepare for the full review, the architecture should address the impact on device drivers and on the subsystems they are part of. If the scope of the project is intended to remain generic enough, the material needs to reflect that more than one class of device drivers were considered in the architecture. To elaborate (see Garrett's previous email), the interrupt handles that a NIC driver acquired are actually exposed to the MAC layer (see PSARC/2006/357 - Crossbow), for enabling/disable the interrupts on demand. The proposal should be clear on how the behavior of such drivers is intended to be modified when ported to the IRM interfaces. Should there be an extra notification event between MAC and the drivers to invalidate the interrupt handles registered with MAC? Are drivers supposed to insulate MAC from the real interrupt handles instead, and, internally map to real handles that can be added/removed? are they supposed to start faking the polling mode in software on rx rings that lost their real interrupts for example? Cryptographic accelerators are another class of I/O where an external framework (the Solaris crypto framework) relies on driver notifications coming from job completion interrupts. See PSARC/2001/557. What such drivers are supposed to do for proper handling DDI_CB_INTR_REMOVE ? Should they block until the jobs drain and they get to call crypto_provider_notification(READY), should they immediately notify an error for all pending crypto requests? Kais.
No more monthly late meeting for PSARC
Given the limited interest in the monthly late meeting of PSARC, the PSARC members decided today to move back to regular time meeting. Kais.
PSARC 2008/514 Python interface to dlpi(7P)
Cecilia Hu wrote: I am sponsoring this case for Max Zhen. This project is to provide a wrapper for dlpi(7p) functions that enables sending/receiving layer2 network packet directly from Python, and getting/setting link related configuration. The requested release binding is patch. The interface and architecture are clear enough to be a self-review. Of cause, if there is different opinion, I would like to shift it to a regular fast-track. Otherwise, case is closed approved automatically. No so fast. Please re-open and put a timer on this. The is not an obvious case. Thanks, Kais Thanks, Cecilia
PSARC 2008/514 Python interface to dlpi(7P)
Cecilia, I see that you already did. Never mind. Kais Kais Belgaied wrote: Cecilia Hu wrote: I am sponsoring this case for Max Zhen. This project is to provide a wrapper for dlpi(7p) functions that enables sending/receiving layer2 network packet directly from Python, and getting/setting link related configuration. The requested release binding is patch. The interface and architecture are clear enough to be a self-review. Of cause, if there is different opinion, I would like to shift it to a regular fast-track. Otherwise, case is closed approved automatically. No so fast. Please re-open and put a timer on this. The is not an obvious case. Thanks, Kais Thanks, Cecilia
Unix Domain Sockets for X11 clients in Trusted Extensions [LSARC/2008/506 FastTrack timeout 08/14/2008]
Nicolas Williams wrote: On Thu, Aug 07, 2008 at 02:14:52PM -0700, Alan Coopersmith wrote: Ric Aleshire wrote: Yes - currently in the kernel socket I/O code, there is a check that the AF_UNIX socket endpoint is in the same zone as the server peer. The proposal for a) above means that this check will be modified, so that when TX is enabled and the socket zone and server zone do not match, then the server must be in the global zone. Thanks for the answer Ric. Which raises the interesting question of whether that check should really be for TX, or if this should be something that can be set on for any machine with Zones, and which TX just happens to always set. It would seem things like running X clients in Etude or BrandZ zones could also benefit from this. this sounds tempting. anyway, the project team has the choice here whether to keep the scope of this case as-is, or extend it tp permit privileged cross-zone communication through AF_UNIX sockets beyond tx. Kais I agree, though being careful to use untrusted cookies, of course. The problem this case is trying to solve affects non-TX zones uses too.
Unix Domain Sockets for X11 clients in Trusted Extensions [LSARC/2008/506 FastTrack timeout 08/14/2008]
Solution a) Allow labeled zones to access global zone X11 server via UNIX domain sockets If Trusted Extensions is enabled, the kernel will permit labeled zones to connect to global zone clients if the global zone UNIX domain rendezvous file is made available to the zone via a loopback mount. When you do (b), (a) follows naturally without any extra change. connect(3SOCKET)'ing to the AF_UNIX socket named /var/tsol/door/.X11-unix will succeed the moment that node is visible to the zone. Am I missing a change proposed in sockfs or other part of the Solaris kernel as part of this case? Kais. b) The X11 server will use a new rendezvous directory when TX is enabled. Normally, the UNIX domain rendezvous files are in the directory /tmp/.X11-unix. To allow the rendezvous files to be exported to labeled zones, the directory pathname will be changed to: /var/tsol/door/.X11-unix. This directory pathname is chosen because /var/tsol/doors is already loopback mounted into every labeled zone, to export the door rendezvous files for nscd and the label daemon. To make this change transparent to clients, a symbolic link to /tmp/.X11-unix will be created in each zone, including the global zone. This solution will permit labeled zone X11 clients to use any of the various DISPLAY environment variables they have been using previously, and not require the use of TCP.
PSARC 2008/498 datalink sysevents
completeness question: who's the intended consumer for this event? Kais Sebastien Roy wrote: I'm submitting this case for Cathy Zhou. It is being filed as closed approved automatic. ???Datalink sysevents -- release binding: patch Summary --- This case proposes to introduce a new EC_DATALINK sysevent class to report data-link related sysevents. For now, only one subclass (ESC_DATALINK_PHYS_ADD) will be introduced. It will be generated when a new physical data-link shows up on the system. In the future, the EC_DATALINK sysevent class can be extended to report other data-link sysevents, such as a data-link renaming event. Since we are still experimenting the new sysevent class, the format of the ESC_DATALINK_PHYS_ADD sysevent will be classified as Project Private. Interface Table --- - Interface Commitment Level Comments - EC_DATALINK Consolidation PrivateEvent class ESC_DATALINK_PHYS_ADD Project Private Event subclass
PSARC 2008/498 datalink sysevents
Garrett D'Amore wrote: Thanks for the clarification. ditto. Kais (I wasn't aware that RCM was being used this way. I recall that once upon a time there was a separate sysevent architecture where insertion events were handled without RCM interposing. The point, at the time, of having a separate RCM from sysevent was that RCM could interpose, and ultimately refuse certain operations, based on consuming nodes. Of course, this goes back to the Solaris 8 timeframes and the design discussions I had surrounding RCM. Ancient history, now. Anyway, your suggested usage seems sane to me.) -- Garrett Cathy Zhou wrote: RCM is also used to restore all the configuration when the device is plugged back in and that is when this sysevent will be used. - Cathy RCM is normally (historically, anyway) used for device *removal*, rather than addition. What do you intend the RCM module to do with this event? -- Garrett Cathy Zhou wrote: This event will be consumed by a syseventd module which in turn will generate a RCM event which will then be consumed by the RCM modules. But the usage of the EC_DATALINK class would not be limited to this. - Cathy completeness question: who's the intended consumer for this event? Kais Sebastien Roy wrote: I'm submitting this case for Cathy Zhou. It is being filed as closed approved automatic. ???Datalink sysevents -- release binding: patch Summary --- This case proposes to introduce a new EC_DATALINK sysevent class to report data-link related sysevents. For now, only one subclass (ESC_DATALINK_PHYS_ADD) will be introduced. It will be generated when a new physical data-link shows up on the system. In the future, the EC_DATALINK sysevent class can be extended to report other data-link sysevents, such as a data-link renaming event. Since we are still experimenting the new sysevent class, the format of the ESC_DATALINK_PHYS_ADD sysevent will be classified as Project Private. Interface Table --- - Interface Commitment Level Comments - EC_DATALINK Consolidation PrivateEvent class ESC_DATALINK_PHYS_ADD Project Private Event subclass
PSARC 2008/473 Fine-Grained Privileges for Datalink Administration
could you include a delta of privileges(5) man page and the out-of-the-box exec_attr(4), and dladm(1m) as modified by this case? Kais
libnet [PSARC/2008/409 FastTrack timeout 07/03/2008]
The changes made for using /dev/net seems to be a good compromise for benefiting from the UV features while inhaling this library quickly enough in OpenSolaris, and opening the door for adding further dependent apps and libs to FOSS. Architecturally, this library injects, captures 'n parses packets at the Ethernet frame level and at IP and higher level protocols level, so I see value in a future project for porting libnet to the PF_PACKET socket as soon as the latter is ready. The PF_PACKET porting project to Opensolaris is being implemented by the gld-iteam. GLD-iteam it would be good to ARC it soon. Kais. Mark A. Carlson wrote: More time was requested at today's PSARC meeting so I have extended the timer on this case to 07/16/2008 -- mark Mark A. Carlson wrote: The Project team has updated the FOSS checklist for libnet, with changes in sections: 2.3.2 (upstream support), 3.4.7 (privileges), 3.7 (code modifications - dlpi). -- mark -- next part -- An HTML attachment was scrubbed... URL: http://mail.opensolaris.org/pipermail/opensolaris-arc/attachments/20080709/a5288da7/attachment.html
PSARC 2007/611 Intel 10GbE PCIE NIC Driver
Cecilia Hu wrote: I am sponsoring this case for Samuel Tu. This case is to provide a new NIC driver, ixge(7D), for Intel 10GbE PCI Express Adapter. The requested releas binding is micro/patch. I-team consider it is better to archive this driver in PSARC, while the architecture is straight forwarding and the interface is clear, I am marking it as closed approved automatic. -Cecilia Template Version: @(#)sac_nextcase 1.56 10/26/05 SMI This information is Sun Proprietary: Need-to-Know 1. Introduction 1.1. Project/Component Working Name: Intel 10GbE PCIE NIC Driver 1.2. Name of Document Author/Supplier: Author: Samuel Tu 1.3 Date of This Document: 22 October, 2007 4. Technical Description This case adds support for Intel 10GbE PCI Express Adapter Driver into ON. The architecture of the Intel 10GbE PCI Express Adapter differs significantly from the Intel 82597EX based PCIX Adapter, which is supported by ixgb. An important new feature of this adapter is I/OAT (I/O Acceleration Technology) from Intel which will be helpful for performance improvement. So we introduce a new driver to support them. let's see. I/O AT is a collection of new capabilities that may involve the NIC, the chipset and/or the CPU. Could you say more about which of these capabilities that this case will be using/supporting? which are expected to actually be present on SPARC, Intel, AMD systems? Will the driver invoke any new kernel interface to query whether the platform specific features are present or not? is this case introducing these interfaces or are they covered elsewhere? Also looking at Intel's docs (and previous presentations), there is a hint to a need for an optimized TCP/IP stack in order to benefit from I/O AT. (see for instance http://download.intel.com/technology/comms/perfnet/download/98856.pdf). Is this case introducing changes for the OpenSolaris TCP/IP stack to be able to use I/OAT ? Any new interfaces needed to negotiate such capabs? Last, one comment about the Asynchronous low cost data copy (a.k.a. Intel's QuickData component of the I/O AT), this seems to be a generic enough functionality, with benefits beyond the networking. My suggestion is to consider exposing the interfaces that use it. Kais. Intel has software license agreement with Sun to allow Sun integrate this driver and distribute the software in both source, and binary object code forms. This SLA also grants Sun the right to make modification to the source code and distribute the modified driver in open source and binary forms. The driver supports x86/x64 and SPARC platform. The vendor ID and device ID of the chips supported are: pci8086,10C6 pci8086,10C7
[PSARC/2007/599 FastTrack timeout 10/23/2007]
David Marx wrote: quick question: When the RAM is shared with multiple OS instances (virtual machines), is 1/4 of all available memory a reasonable limit? Should this be 1/4 of RAM available to the domain (host or guest) ? Like the resources project.max-crypto-memory and project.max-shm-memory, project.max-device-locked-memory is based on the kernel variable availrmem_initial. Therefore, I suspect that this is 1/4 of the memory available to the guest. what about dom0 ? Assuming all RAM is seen as available to dom0, locking 1/4 of it is probably excessive. I'm not sure much can be done for such situation, other than a word of caution in the man page documenting project.max-device-locked-memory, for the case of xVM. Kais
[PSARC/2007/599 FastTrack timeout 10/23/2007]
3. Proposed Solution To solve this, we propose increasing this to 1/4 of available memory which is the limit that in addition, agpgart imposes. 4. Risks; There is the risk that increasing this resource may allow the system to allocate too much memory, which may cause the Solaris kernel to run out. The kernel is probably not graceful when it runs out of memory. If increasing this resource is not acceptable, and having the user manually increase the resource is not acceptable, then either Sun or the Xorg community need to change the Xorg Intel graphics drivers to use less memory for Sun to incorporate these drivers into the Solaris product. Increasing this resource affects both x86 and sparc, although it is only currently needed on x86. quick question: When the RAM is shared with multiple OS instances (virtual machines), is 1/4 of all available memory a reasonable limit? Should this be 1/4 of RAM available to the domain (host or guest) ? Kais.
[clearview-discuss] 2007/527 Addendum for Clearview Vanity Naming and Nemo Unification
Cathy Zhou wrote: John Plocher wrote: Kais Belgaied wrote: Ah! in this case, it seems just an internal interface between Nemo and itself. If that's true then it's an implemetnation choice and shouldn't be exposed to device driver writers as part of the MAC_CAPAB* interface maturing to soon become committed, and the case can just be withdrawn as it turns out to be below the radar for an ARC review. Probably better to change it to closed approved automatic - the project isn't being withdrawn, it /is/ going into the product, it just doesn't need the ARCs to do anything formal along the way :-) Please be noted that other than the MAC_CAPAB_NO_NATIVEVLAN interface, this case also proposed other interfaces (DLIOCMARGININFO ioctl, m_margin etc.) that would be exposed to the device drivers. noted. The DLIOCMARGININFOand the gldm_margin field in gld_mac_info_t (from a an existing reserved field) seemed non controversial to me. Kais Thanks - Cathy
[clearview-discuss] 2007/527 Addendum for Clearview Vanity Naming and Nemo Unification
Sebastien Roy wrote: Kais Belgaied wrote: this sounds a little upside-down. A driver has to advertise a negative capability, essentially saying Hey, I can't handle this feature as opposed to the more intuitive approach: drivers that can handle native VLAN expose a capab (MAC_CAPAB_NATIVEVLAN), and those who don't do not. Any reason for this choice? This discussion veered off a little bit, and I want to bring it back on-topic and make sure that progress is being made on this case. Kais, was your original question answered? well, not really. I'm not sure I understand the following argument: It sounds really odd-ball to me, too. Plus, it would require touching all the existing GLDv3 driver. No. We introduce MAC_CAPAB_NO_NATIVEVLAN exactly for the reason that we do not want to touch most of the GLDv3 driver. MAC_CAPAB_NO_NATIVEVLAN means that this driver cannot handle VLAN PPA access itself (therefore, it might also implies that this driver does not handle the hardware checksum for VLAN packets). Existing GLDv3 drivers should *not* advertise this capability, except the aggr driver, which might based on the underlying aggregated drivers. does it mean that by default *all* GLDv3 drivers do or are assumed to support native VLAN, with the exception of only a few? another two unclear points that were brought up by the discussion is the interaction of the proposed capab with with the HW Checksum capability, and with the MAC_CAPAB_PERSTREAM: We handle MAC_CAPAB_NO_NATIVEVLAN differently in two places: a. If mac_open() is for a VLAN PPA accessed stream, and the underlying MAC supports MAC_CAPAB_PERSTREAM, but *not* MAC_CAPAB_NO_NATIVEVLAN, we can open the underlying driver directly using its native VLAN PPA access. b. If the MAC is MAC_CAPAB_NO_NATIVEVLAN, then do not advertise its HW_CKSUM capability on VLAN streams even the MAC claims it is capable of doing HW CKSUM. To summarize, the only driver expected to implement this is the softmac driver introduced by UV (PSARC/2006/499). The capability's semantics were defined in such a way to not require every other driver from having to care about its existence. Ah! in this case, it seems just an internal interface between Nemo and itself. If that's true then it's an implemetnation choice and shouldn't be exposed to device driver writers as part of the MAC_CAPAB* interface maturing to soon become committed, and the case can just be withdrawn as it turns out to be below the radar for an ARC review. Kais. -Seb
[clearview-discuss] 2007/527 Addendum for Clearview Vanity Naming and Nemo Unification
** MAC_CAPAB_NO_NATIVEVLAN A MAC_CAPAB_NO_NATIVEVLAN MAC capability will be added to the GLDv3 framework to indicate that a specific MAC cannot support VLAN PPA access by itself. this sounds a little upside-down. A driver has to advertise a negative capability, essentially saying Hey, I can't handle this feature as opposed to the more intuitive approach: drivers that can handle native VLAN expose a capab (MAC_CAPAB_NATIVEVLAN), and those who don't do not. Any reason for this choice? Kais.
2007/271 HME/QFE updates
Garrett, the transition plan for a customer that deployed a trunking of qfe's is not clear to me. say someone used the Sun trunking software to build a 4 port trunk, with qfe2 as the head trunk, defined some load balancing policy, and used the name 'qfe2' in various config places (hostname.qfe0, IPFilter config files, third party firewalls, etc ...), What happens when they upgrade to the new version? who will convert the trunking configurations to create aggrs, replace the 'qfe2' names to aggr1 everywhere? BTW, there was a precedent to this kind of renaming, with the transition from ipge to e1000g (the *e1000g* **transition* patch, 123334-01)* I don't believe documentation is sufficient here. Kais Sun Trunking Impact --- Using the Nemo interfaces means that the owners of QFE will have to use the nemo link aggregation commands with dladm(1M). While this is to be viewed as a good thing, it does represent change that will need to be noted in release notes, and such. The other Sun NIC drivers which are supported by Sun Trunking are the GEM (ge) and Cassini (ce) drivers. We hope to move both of those to Nemo as well, in the near future, and follow up with an EOF of the Sun Trunking product altogether. However, this is out of scope for this particular case.
GLDv3 link status logging [PSARC/2007/298 Self Review]
Problem --- Various network drivers are inconsistent in their handling of logging of link messages. One of the more annoying things that some drivers do is flood the logs with link down messages (usually once every 10sec or so) when trying to transmit packets out the link. The root cause of the problem as you describe seems to be the fact that the stack above kept submitting the packets to a link that is known to be down, causing the flood of syslogs. Somehow the event of link-down was not generated, lost during the notification, or mishandled. That is a bug to be fixed between the stack and the specific drivers you observed the misbehavior on. The bug is probably below the radar screen for ARC. Now, back the the symptoms (scope of this case): Each futile submission of a packet to be transmitted on a link down indicates a problem worth paying attention to. It could be uncovering a bug such as the above, or it could be transient race. I don't believe it is a bad practice from driver writers to adopt a defensive approach and log an error on every occurrence of the offense. Kais Further, the detailed contents for link status changes are not consistent from one driver to another. Notably, the WIFI drivers generally do not do this.