Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Friday 26 March 2021 11:43:10 Don Bollinger wrote: > > Hello Don! > > > > I have read whole discussion and your EEPROM patch proposal. But for me it > > looks like some kernel glue code for some old legacy / proprietary access > > method which does not have any usage outside of that old code. > > I don't know if 'kernel glue code' is good or bad. It is a driver. It > locks access to a device so it can perform multiple accesses without > interference. It organizes the data on a weird device into a simple linear > address space that can be accessed with open(), seek(), read() and write() > calls. > > As for 'old code', this code and variations of it are under active > development by multiple Network OS vendors and multiple switch vendors, and > in production on hundreds of thousands of switches with millions of > SFP/QSFP/CMIS devices. This stuff is running the biggest clouds in the > world. > > > > > Your code does not contain any quirks which are needed to read different > > EEPROMs in different SFP modules. As Andrew wrote there are lot of broken > > SFPs which needs special handling and this logic is already implemented in > > sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM > > content to userspace via ethtool -m API in unified way and userspace does > > not implement any quirks (nor does not have to deal with quirks). > > As a technical matter, you handle those quirks in the code that interprets > EEPROM data. You have figured out what devices have what quirks, then coded > up solutions to make them work. Then, after all the quirk handling is done, > you call the actual access routines (sfp_i2c_read() and sfp_i2c_write()) to > access the module EEPROMs. My code works the same way, except in my > community all the interpretation of EEPROM data is done in user space. You > may not like that, but it is not clear why you should care how my community > chooses to handle the data. In this architecture, the only thing needed > from the kernel is the equivalent of sfp_i2c_read() and sfp_i2c_write, which > optoe provides. The key point here is that my community wants the kernel to > just access the data. No interpretation, no identification, no special > cases. > > > > > If you try to read EEPROM "incorrectly" then SFP module with its EEPROM > > chip (or emulation of chip) locks and is fully unusable after you unplug > it and > > plug it again. Kernel really should not export API to userspace which can > > cause "damage" to SFP modules. And currently it does *not* do it. > > In my community, such devices are not tolerated. Modules which can be > "damaged" should be thrown away. Well, those tested / problematic modules are not damaged by real. They just stop working until you do power off / on cycle and unplug / plug them again. But I agree with you. Russel wrote about those SFP modules that they should have biohazard label :-) The best would be if those modules disappear, but that would not happen in near future. These modules are already used by lot of users and users wanted support for them. Hence we wrote what was possible. The main issue is that we do know know any well-behaved GPON SFP module. GPON is now very common technology for internet access, so it is not obvious that more people are asking about support for GPON HW compatible with Linux kernel. > Please be clear, I am not disagreeing with your implementation. For your > GPON devices, you handle this in kernel code. Cool. Keep it that way. > Just don't make my community implement that where it is not needed and not > wanted. Optoe just accesses the device. Other layers handle interpretation > and quirks. Your handling is in sfp.c, mine is in user space. Not right or > wrong, just different. Both work. > > > > > I have contributed code for some GPON SFP modules, so their EEPROM can > > be correctly read and exported to userspace via ethtool -m. So I know that > > this is very fragile area and needs to be properly handled. > > My code is in use in cloud datacenters and campus switches around the world. > I know it needs to be properly handled. > > > > > So I do not see any reason why can be a new optoe method in _current_ > > form useful. It does not implemented required things for handling > different > > EEPROM modules. > > optoe would be useful in your current environment as a replacement for > sfp_i2c_read() and sfp_i2c_write(). Those routines just access the EEPROM. > They don't identify or interpret or implement quirk handling. Neither does > optoe. Now that we know that there are SFP modules which needs special handling without it they lock themselves on i2c bus, we really cannot export to userspace interface which allows this locking. I understand that you do not care for these broken GPON SFP modules, but "generic" API which would be part of mainline kernel really cannot have known issue that can cause some modules to lock by just simple "read" operation. > AND, optoe is
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > What I have works. Your consumers get quirk handling, mine don't need it. > > No compromise. > > Hi Don > > All this discussion is now a mute point. GregKH has spoken. Ack. I actually checked in with Greg a couple of days ago and got that answer. I just thought the discussion was going in an interesting direction and didn't want to end it yet. Response below is in the same vein. > > But i'm sure there are some on the side lines, eating popcorn, maybe > learning from the discussion. Honestly not sure what they are learning from the discussion. I think what I learned is not what you think you taught me. > > Would you think it is O.K. to add a KAPI which works for 3 1/2" SCSI disks, but > not 2", because you only make machines with 3 1/2" bays? No. Not sure how the analogy works. QSFP is a larger form factor than SFP, and the EEPROM layout changed at the same time, but optoe and my community had no problem accommodating both. CMIS changed the EEPROM layout again, but it was easily accommodated. > > This is an extreme, absurd example, but hopefully you get the point. We > don't design KAPIs with the intention to only work for a subset of devices. It > needs to work with as many devices as possible, even if the first > implementation below the KAPI is limited to just a subset. Let me run with your example... Suppose I used those 3 1/2" SCSI disks to build storage servers. Let's call them RAID devices. Innovation follows, I figure out how to make these devices hot swappable, hot backup, massively parallel... Innovation follows and I evolve this into a distributed architecture with redundant data, encrypted, compressed, distributed across servers and data centers. Massively parallel, synchronized, access to the data anywhere in the world, at bandwidth limited only by the size of your network pipe. Let's call it RIDSS (Redundant, Independent, Distributed Storage Servers). And I can use 2" disks. Or non-spinning storage. My fancy technology thinks the storage is dumb. I present a track/sector/length and I get back the bits. Just for grins, let's say I can also query the device for a list of bad blocks (sectors, whatever). Recently, with the addition of 2" devices, YOU figured out that you can stuff several disks into a small space, and manage them with software and call it RAID. You build a nice abstraction for storage, which contains individual disks, and handles parallelism, redundancy, hot swap. Very cool, works well, a genuine innovation. I've been using Linux for RIDSS for years, I get the memo that my linux driver to access these SCSI devices should be in the upstream kernel. Oddly, there are no drivers in the kernel that I can just present track/sector/length and get back the bits. So, I offer mine. The answer is no, that is a RAID device, you need to access the device through RAID data structures and APIs. Sorry, no, that is not a RAID device. That is a storage device. You use it for RAID storage, I use it for RIDSS storage. We need a layered architecture with raw data access at the bottom (we both need the same track/sector/length access). Beyond that, your RAID stack, while brilliant, has virtually nothing to do with my RIDSS stack. They sound superficially similar, but the technology and architecture are wildly different. The RAID stack is unnecessary for my RIDSS architecture, and requires widespread changes to my software that yield no benefit. So, no, I don't get your point. I think there is value in a simple layered architecture, where access to the module EEPROM is independent of the consumer of that access. You can access it for kernel networking, which is useful, innovative, valuable. I can access it for TOR switches which do not use kernel networking but are nonetheless useful, innovative, valuable. Your decision to reject optoe means the TOR community has to maintain this simple bit of kernel code outside the mainline tree. The judges have ruled, case closed. > > Anyway, i'm gratefull you have looked at the new ethtool netlink KAPI. It will > be better for your contributions. And i hope you can make use of it in the Thanks for the acknowledgement. > future. But i think this discussion about optoe in mainline is over. This discussion is indeed over if you say it is. I'll be moving on :-(. > > Andrew Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> What I have works. Your consumers get quirk handling, mine don't need it. > No compromise. Hi Don All this discussion is now a mute point. GregKH has spoken. But i'm sure there are some on the side lines, eating popcorn, maybe learning from the discussion. Would you think it is O.K. to add a KAPI which works for 3 1/2" SCSI disks, but not 2", because you only make machines with 3 1/2" bays? This is an extreme, absurd example, but hopefully you get the point. We don't design KAPIs with the intention to only work for a subset of devices. It needs to work with as many devices as possible, even if the first implementation below the KAPI is limited to just a subset. Anyway, i'm gratefull you have looked at the new ethtool netlink KAPI. It will be better for your contributions. And i hope you can make use of it in the future. But i think this discussion about optoe in mainline is over. Andrew
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, Mar 26, 2021 at 02:09:36PM -0700, Don Bollinger wrote: > > You keep missing the point. I always refer to the KAPI. The driver we can > > rework and rework, throw away and reimplement, as much as we want. > > The KAPI cannot be changed, it is ABI. It is pretty much frozen the day > the > > code is first committed. > > Maybe I don't understand what you mean by KAPI. The KAPI that optoe exposes > is in two parts. > > First, it makes the EEPROM accessible via the nvmem() interface, an existing > KAPI that I call from optoe. at24 implemented it, I made use of it. This > interface exposes EEPROM data to user space through a defined sysfs() file. > I didn't invent this, nor am I proposing it, it already exists. Again, a "raw" interface to a device that is just memory-mapping all of the device information directly is no sort of a real KABI at all. It is no different from trying to use /dev/mem/ to write a networking driver, just because you can mmap in the device's configuration space to userspace. That is not a real api, it is only using the kernel as a "pass-through" which works fine for one-off devices, and other oddities, but is not a unified user/kernel api for a class of device types at all. thanks, greg k-h
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> -Original Message- > From: Andrew Lunn [mailto:and...@lunn.ch] > Sent: Friday, March 26, 2021 2:54 PM > To: Don Bollinger > Cc: 'Jakub Kicinski' ; ar...@arndb.de; > gre...@linuxfoundation.org; linux-kernel@vger.kernel.org; > brandon_chu...@edge-core.com; wally_w...@accton.com; > aken_...@edge-core.com; g...@microsoft.com; jolev...@microsoft.com; > xinx...@microsoft.com; 'netdev' ; 'Moshe > Shemesh' > Subject: Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS > EEPROMS > > > The only thing wrong I can see is that it doesn't use the kernel > > network stack. Here's where "you keep missing the point". The kernel > > network stack is not being used in these systems. Implementing EEPROM > > access through ethtool is of course possible, but it is a lot of work > > with no benefit on these systems. The simple approach works. Let me use > it. > > > > > > > > The optoe KAPI needs to handle these 'interesting' SFP modules. The > > > KAPI design needs to be flexible enough that the driver underneath > > > it can be extended to support these SFPs. The code does not need to > > > be there, but the KAPI design needs to allow it. And i personally > > > cannot see how the > > optoe > > > KAPI can efficiently support these SFPs. > > > > Help me understand. Your KAPI specifies ethtool as the KAPI, and the > > ethtool/netlink stack as the interface through which the data flows. > > Nearly. It specifies netlink and the netlink messages definitions. > They happen to be in the ethtool group, but there is no requirement to use > ethtool(1). > > But there is a bit more to it. > > Either the MAC driver implements the needed code, or it defers to the > generic sfp.c code. In both cases, you have access to the GPIO lines. > > If you are using the generic sfp.c, the Device Tree definitions of the GPIOs > become part of the KAPI. DT is consider ABI. > > So the low level code knows when there was been a hotplug. It then can > determine what quirks to apply for all future reads until the module is > unplugged. Got it. All good. I agree, I would like optoe to have access to the GPIO lines, but it is a "nice to have", not a requirement... > > Your KAPI is missing how optoe gets access to the GPIOs. Without knowing if Right. It doesn't get access to the GPIOs. So, that is not part of its KAPI. > the module has been hotplugged, in a robust manor, you have problems > with quirks. For every userspace read, you need to assume the module has > been changed, read the ID information from the EEPROM a byte at a time, > figure out what quirks to apply, and then do the user requested read. I doubt Again, optoe does not deal with quirks. Your code filters them out before calling optoe or sfp_i2c_read. My stack does not deal with them. In my community, quirky devices are not tolerated. In the 4 years this code has been in use, nobody has ever asked me to accommodate a weird module. I inherited a limitation of writing one byte at a time from the ancestor of optoe, which I have kept. I don't know if it is needed, but someone once thought it was required. I apply it universally, all devices of all types. I do need one item of state from the EEPROM, the register that tells me whether pages are supported by the device. Due to the hotplug risk, I actually do read that register once for each access to non-zero pages. (Once per read or write call.) Access to GPIOs would eliminate that. It turns out the vast majority of calls are to page 0 or lower memory, and the performance penalty is at most 25% since there is also a page write, data access and page write in the same call. The penalty goes down as the number of bytes read increases. Overall, it has never come up as an issue. People don't expect these things to be fast. > that is good for performance. The design which has been chosen is that > userspace is monitoring the GPIO lines. So to make it efficient, your KAPI > need some way to pass down that the module has/has not been hot- > plugged since the last read. Nope. optoe does not need to know, it assumes a new device every time it is accessed. > > Or do you see some other way to implement these quirks? What I have works. Your consumers get quirk handling, mine don't need it. No compromise. > >Andrew Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> The only thing wrong I can see is that it doesn't use the kernel network > stack. Here's where "you keep missing the point". The kernel network stack > is not being used in these systems. Implementing EEPROM access through > ethtool is of course possible, but it is a lot of work with no benefit on > these systems. The simple approach works. Let me use it. > > > > > The optoe KAPI needs to handle these 'interesting' SFP modules. The KAPI > > design needs to be flexible enough that the driver underneath it can be > > extended to support these SFPs. The code does not need to be there, but > > the KAPI design needs to allow it. And i personally cannot see how the > optoe > > KAPI can efficiently support these SFPs. > > Help me understand. Your KAPI specifies ethtool as the KAPI, and the > ethtool/netlink stack as the interface through which the data flows. Nearly. It specifies netlink and the netlink messages definitions. They happen to be in the ethtool group, but there is no requirement to use ethtool(1). But there is a bit more to it. Either the MAC driver implements the needed code, or it defers to the generic sfp.c code. In both cases, you have access to the GPIO lines. If you are using the generic sfp.c, the Device Tree definitions of the GPIOs become part of the KAPI. DT is consider ABI. So the low level code knows when there was been a hotplug. It then can determine what quirks to apply for all future reads until the module is unplugged. Your KAPI is missing how optoe gets access to the GPIOs. Without knowing if the module has been hotplugged, in a robust manor, you have problems with quirks. For every userspace read, you need to assume the module has been changed, read the ID information from the EEPROM a byte at a time, figure out what quirks to apply, and then do the user requested read. I doubt that is good for performance. The design which has been chosen is that userspace is monitoring the GPIO lines. So to make it efficient, your KAPI need some way to pass down that the module has/has not been hot-plugged since the last read. Or do you see some other way to implement these quirks? Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, Mar 26, 2021 at 01:37PM -0700, Andrew Lunn wrote: > On Fri, Mar 26, 2021 at 01:16:14PM -0700, Don Bollinger wrote: > > > > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per > > > > switch) often cost more than the switch itself. Consumers (both > > > > vendors and > > > > customers) extensively test these devices to ensure correct and > > > > reliable operation. Then they buy them literally by the millions. > > > > Quirks lead to quick rejection in favor of reliable parts from > > > > reliable vendors. In this environment, for completely different > > > > reasons, optoe does not need to handle quirks. > > > > > > Well, if optoe were to be merged, it would not be just for your > community. > > It > > > has to work for everybody who wants to use the Linux kernel with an > SFP. > > > You cannot decide to add a KAPI which just supports a subset of > > > SFPs. It needs to support as many as possible, warts and all. > > > > > > So how would you handle these SFPs with the optoe KAPI? > > > > Just like they are handled now. Folks who use your stack would filter > > through sfp.c, with all the quirk handling, and eventually call optoe > > for the actual device access. Folks who don't use kernel networking > > would call optoe directly and limit themselves to well behaved SFPs, > > or would handle quirks in user space. You think that's dumb, but > > there is clearly a market for that approach. It is working, at scale, today. > > > > BTW, why can't we have a driver > > You keep missing the point. I always refer to the KAPI. The driver we can > rework and rework, throw away and reimplement, as much as we want. > The KAPI cannot be changed, it is ABI. It is pretty much frozen the day the > code is first committed. Maybe I don't understand what you mean by KAPI. The KAPI that optoe exposes is in two parts. First, it makes the EEPROM accessible via the nvmem() interface, an existing KAPI that I call from optoe. at24 implemented it, I made use of it. This interface exposes EEPROM data to user space through a defined sysfs() file. I didn't invent this, nor am I proposing it, it already exists. Second, it specifies how the data in the EEPROM is accessed. It says that low memory is in offset 0-127, and paged memory starts at offset (128 + (page * 128)). For SFP devices, memory at i2c address 0x50 is the first 256 bytes, and everything at 0x51 is pushed up 256 bytes. This is exactly the behavior of ethtool. Again, I didn't invent this, I adopted the existing convention for how to flatten the SFP/QSFP/CMIS address space. With these two parts, EEPROM data can be accessed by standard open(2), seek(2), read(2), write(2) calls. Nothing special there, the actual syntax is as old school and standard as you can possibly get. So, what is wrong with that KAPI? The only thing wrong I can see is that it doesn't use the kernel network stack. Here's where "you keep missing the point". The kernel network stack is not being used in these systems. Implementing EEPROM access through ethtool is of course possible, but it is a lot of work with no benefit on these systems. The simple approach works. Let me use it. > > The optoe KAPI needs to handle these 'interesting' SFP modules. The KAPI > design needs to be flexible enough that the driver underneath it can be > extended to support these SFPs. The code does not need to be there, but > the KAPI design needs to allow it. And i personally cannot see how the optoe > KAPI can efficiently support these SFPs. Help me understand. Your KAPI specifies ethtool as the KAPI, and the ethtool/netlink stack as the interface through which the data flows. As I see it, your KAPI only specifies i2c address, page, bank, offset and length. How does your KAPI support interesting SFP modules but mine does not? optoe could be reworked ad infinitum to implement support for quirks, just as yours can. Neither has any hint of quirks in the KAPI. Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, Mar 26, 2021 at 01:16:14PM -0700, Don Bollinger wrote: > > > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per > > > switch) often cost more than the switch itself. Consumers (both > > > vendors and > > > customers) extensively test these devices to ensure correct and > > > reliable operation. Then they buy them literally by the millions. > > > Quirks lead to quick rejection in favor of reliable parts from > > > reliable vendors. In this environment, for completely different > > > reasons, optoe does not need to handle quirks. > > > > Well, if optoe were to be merged, it would not be just for your community. > It > > has to work for everybody who wants to use the Linux kernel with an SFP. > > You cannot decide to add a KAPI which just supports a subset of SFPs. It > > needs to support as many as possible, warts and all. > > > > So how would you handle these SFPs with the optoe KAPI? > > Just like they are handled now. Folks who use your stack would filter > through sfp.c, with all the quirk handling, and eventually call optoe for > the actual device access. Folks who don't use kernel networking would call > optoe directly and limit themselves to well behaved SFPs, or would handle > quirks in user space. You think that's dumb, but there is clearly a market > for that approach. It is working, at scale, today. > > BTW, why can't we have a driver You keep missing the point. I always refer to the KAPI. The driver we can rework and rework, throw away and reimplement, as much as we want. The KAPI cannot be changed, it is ABI. It is pretty much frozen the day the code is first committed. The optoe KAPI needs to handle these 'interesting' SFP modules. The KAPI design needs to be flexible enough that the driver underneath it can be extended to support these SFPs. The code does not need to be there, but the KAPI design needs to allow it. And i personally cannot see how the optoe KAPI can efficiently support these SFPs. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per > > switch) often cost more than the switch itself. Consumers (both > > vendors and > > customers) extensively test these devices to ensure correct and > > reliable operation. Then they buy them literally by the millions. > > Quirks lead to quick rejection in favor of reliable parts from > > reliable vendors. In this environment, for completely different > > reasons, optoe does not need to handle quirks. > > Well, if optoe were to be merged, it would not be just for your community. It > has to work for everybody who wants to use the Linux kernel with an SFP. > You cannot decide to add a KAPI which just supports a subset of SFPs. It > needs to support as many as possible, warts and all. > > So how would you handle these SFPs with the optoe KAPI? Just like they are handled now. Folks who use your stack would filter through sfp.c, with all the quirk handling, and eventually call optoe for the actual device access. Folks who don't use kernel networking would call optoe directly and limit themselves to well behaved SFPs, or would handle quirks in user space. You think that's dumb, but there is clearly a market for that approach. It is working, at scale, today. BTW, why can't we have a driver that only handles devices that work correctly? Indeed, why would we tolerate an endless stream of special cases for each new device that comes along? Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per switch) > often cost more than the switch itself. Consumers (both vendors and > customers) extensively test these devices to ensure correct and reliable > operation. Then they buy them literally by the millions. Quirks lead to > quick rejection in favor of reliable parts from reliable vendors. In this > environment, for completely different reasons, optoe does not need to handle > quirks. Well, if optoe were to be merged, it would not be just for your community. It has to work for everybody who wants to use the Linux kernel with an SFP. You cannot decide to add a KAPI which just supports a subset of SFPs. It needs to support as many as possible, warts and all. So how would you handle these SFPs with the optoe KAPI? Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > > Our experience is that a number of SFPs are broken, they don't > > > follow the standard. Some you cannot perform more than 16 bytes > > > reads without them locking up. Others will perform a 16 byte read, > > > but only give you one > > useful > > > byte of data. So you have to read enough of the EEPROM a byte at a > > > time to get the vendor and product strings in order to determine > > > what quirks need to be applied. optoe has nothing like this. Either > > > you don't care and only support well behaved SFPs, or you have the > > > quirk handling in user space, > > in > > > the various vendor code blobs, repeated again and again. To make > > > optoe generically usable, you are going to have to push the quirk > > > handling into optoe. The brokenness should be hidden from userspace. > > > > Interesting. I would throw away such devices. That's why switch > > vendors publish supported parts lists. > > > > Can you point me to the code that is handling those quirks? Since I > > haven't seen those problems, I don't know what they are and how to > address them. > > Take a look in drivers/net/phy/sfp.c Thanks for the pointers, I appreciate the chance to understand exactly what quirks are under discussion. It turns out that those quirks are not relevant to my patch. I don't mean you are wrong, or that they don't have to be handled for your use cases, just that my patch does not need to deal with them. If optoe were adopted into your framework, it would replace the routines sfp_i2c_read() and sfp_i2c_write(). By the time these two routines are called, all of the quirk handling has already been applied (there is no quirk handling in these routines today). The data requests have been broken down as necessary into short reads or targeted reads for each device. Those calls are then sent to the module EEPROM with no further need for quirk handling. So, optoe does not need to handle quirks to fit into your code. The quirks are handled, without optoe being involved. In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per switch) often cost more than the switch itself. Consumers (both vendors and customers) extensively test these devices to ensure correct and reliable operation. Then they buy them literally by the millions. Quirks lead to quick rejection in favor of reliable parts from reliable vendors. In this environment, for completely different reasons, optoe does not need to handle quirks. > > commit f0b4f847673299577c29b71d3f3acd3c313d81b7 > Author: Pali Rohár > Date: Mon Jan 25 16:02:28 2021 +0100 > > net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant > > The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information > stored in its EEPROM. It claims to support all transceiver types including > 10G Ethernet. Clear all claimed modes and set only 1000baseX_Full, which > is > the only one supported. > > This module has also phys_id set to SFF, and the SFP subsystem currently > does not allow to use SFP modules detected as SFFs. Add exception for this > module so it can be detected as supported. In my community, all interpretation of the SFP/QSFP/CMIS content is done in user space. So, these issues don't affect optoe. I know you find that distasteful, but it is what my community wants and needs. > > This change finally allows to detect and use SFP GPON module Ubiquiti > U-Fiber Instant on Linux system. > > EEPROM content of this SFP module is (where XX is serial number): > > 00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8 ?????.?? > 10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20UBNT > 20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41 .??)UF-INSTA > 30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36NT 4 ??.6 > 40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX .?..UBNT > 50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41140123 `??A > > > commit 426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1 > Author: Pali Rohár > Date: Mon Jan 25 16:02:27 2021 +0100 > > net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips > > The workaround for VSOL V2801F brand based GPON SFP modules added > in commit > 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 > workaround") works only for IDs added explicitly to the list. Since there > are rebranded modules where OEM vendors put different strings into the > vendor name field, we cannot base workaround on IDs only. You might be fighting a losing battle here. There are companies that buy cheap SFP devices and reprogram the EEPROM to identify them as higher quality parts from reliable vendors. Checking the Vendor ID and part number may tell you have a high quality part, but the code inside still has the quirks you are trying to avoid. I developed optoe while working for a (high quality) vendor of these modules. Their support team runs into
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> Hello Don! > > I have read whole discussion and your EEPROM patch proposal. But for me it > looks like some kernel glue code for some old legacy / proprietary access > method which does not have any usage outside of that old code. I don't know if 'kernel glue code' is good or bad. It is a driver. It locks access to a device so it can perform multiple accesses without interference. It organizes the data on a weird device into a simple linear address space that can be accessed with open(), seek(), read() and write() calls. As for 'old code', this code and variations of it are under active development by multiple Network OS vendors and multiple switch vendors, and in production on hundreds of thousands of switches with millions of SFP/QSFP/CMIS devices. This stuff is running the biggest clouds in the world. > > Your code does not contain any quirks which are needed to read different > EEPROMs in different SFP modules. As Andrew wrote there are lot of broken > SFPs which needs special handling and this logic is already implemented in > sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM > content to userspace via ethtool -m API in unified way and userspace does > not implement any quirks (nor does not have to deal with quirks). As a technical matter, you handle those quirks in the code that interprets EEPROM data. You have figured out what devices have what quirks, then coded up solutions to make them work. Then, after all the quirk handling is done, you call the actual access routines (sfp_i2c_read() and sfp_i2c_write()) to access the module EEPROMs. My code works the same way, except in my community all the interpretation of EEPROM data is done in user space. You may not like that, but it is not clear why you should care how my community chooses to handle the data. In this architecture, the only thing needed from the kernel is the equivalent of sfp_i2c_read() and sfp_i2c_write, which optoe provides. The key point here is that my community wants the kernel to just access the data. No interpretation, no identification, no special cases. > > If you try to read EEPROM "incorrectly" then SFP module with its EEPROM > chip (or emulation of chip) locks and is fully unusable after you unplug it and > plug it again. Kernel really should not export API to userspace which can > cause "damage" to SFP modules. And currently it does *not* do it. In my community, such devices are not tolerated. Modules which can be "damaged" should be thrown away. Please be clear, I am not disagreeing with your implementation. For your GPON devices, you handle this in kernel code. Cool. Keep it that way. Just don't make my community implement that where it is not needed and not wanted. Optoe just accesses the device. Other layers handle interpretation and quirks. Your handling is in sfp.c, mine is in user space. Not right or wrong, just different. Both work. > > I have contributed code for some GPON SFP modules, so their EEPROM can > be correctly read and exported to userspace via ethtool -m. So I know that > this is very fragile area and needs to be properly handled. My code is in use in cloud datacenters and campus switches around the world. I know it needs to be properly handled. > > So I do not see any reason why can be a new optoe method in _current_ > form useful. It does not implemented required things for handling different > EEPROM modules. optoe would be useful in your current environment as a replacement for sfp_i2c_read() and sfp_i2c_write(). Those routines just access the EEPROM. They don't identify or interpret or implement quirk handling. Neither does optoe. AND, optoe is useful to my community. An ethtool -m solution could of course be implemented, and all of the user space code that currently accesses module EEPROM could be rewritten, but there would be no value in that to my community. What they have works just fine. > > I would rather suggest you to use ethtool -m IOCTL API and in case it is not > suitable for QSFP (e.g. because of paging system) then extend this API. optoe already handles QSFP and CMIS just fine. The API does not need to be extended for pages. Indeed, the ethtool API has already implemented the same linear address space to flatten out the two i2c addresses plus pages in the various SFP/QSFP/CMIS specs. optoe flattens the same way. > > There were already proposals for using netlink socket interface which is > today de-facto standard interface for new code. sysfs API for such thing > really looks like some legacy code and nowadays we have better access > methods. netlink is the de-facto standard interface for kernel networking. My community does not have kernel networking (except for a puny management port). All of the switch networking is handled by the switch ASIC, and in user space network management software. Which is 'better' is complicated, depending on the needs and requirements of the application. A large and vibrant
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > Our experience is that a number of SFPs are broken, they don't follow the > > standard. Some you cannot perform more than 16 bytes reads without them > > locking up. Others will perform a 16 byte read, but only give you one > useful > > byte of data. So you have to read enough of the EEPROM a byte at a time to > > get the vendor and product strings in order to determine what quirks need > > to be applied. optoe has nothing like this. Either you don't care and only > > support well behaved SFPs, or you have the quirk handling in user space, > in > > the various vendor code blobs, repeated again and again. To make optoe > > generically usable, you are going to have to push the quirk handling into > > optoe. The brokenness should be hidden from userspace. > > Interesting. I would throw away such devices. That's why switch vendors > publish supported parts lists. > > Can you point me to the code that is handling those quirks? Since I haven't > seen those problems, I don't know what they are and how to address them. Take a look in drivers/net/phy/sfp.c commit f0b4f847673299577c29b71d3f3acd3c313d81b7 Author: Pali Rohár Date: Mon Jan 25 16:02:28 2021 +0100 net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information stored in its EEPROM. It claims to support all transceiver types including 10G Ethernet. Clear all claimed modes and set only 1000baseX_Full, which is the only one supported. This module has also phys_id set to SFF, and the SFP subsystem currently does not allow to use SFP modules detected as SFFs. Add exception for this module so it can be detected as supported. This change finally allows to detect and use SFP GPON module Ubiquiti U-Fiber Instant on Linux system. EEPROM content of this SFP module is (where XX is serial number): 00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8?????.?? 10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20UBNT 20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41.??)UF-INSTA 30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36NT 4 ??.6 40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX.?..UBNT 50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41140123 `??A commit 426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1 Author: Pali Rohár Date: Mon Jan 25 16:02:27 2021 +0100 net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips The workaround for VSOL V2801F brand based GPON SFP modules added in commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround") works only for IDs added explicitly to the list. Since there are rebranded modules where OEM vendors put different strings into the vendor name field, we cannot base workaround on IDs only. Moreover the issue which the above mentioned commit tried to work around is generic not only to VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 and RTL9601C chips. These include at least the following GPON modules: * V-SOL V2801F * C-Data FD511GX-RM0 * OPTON GP801R * BAUDCOM BD-1234-SFM * CPGOS03-0490 v2.0 * Ubiquiti U-Fiber Instant * EXOT EGS1 These Realtek chips have broken EEPROM emulator which for N-byte read operation returns just the first byte of EEPROM data, followed by N-1 zeros. Introduce a new function, sfp_id_needs_byte_io(), which detects SFP modules with broken EEPROM emulator based on N-1 zeros and switch to 1 byte EEPROM reading operation. Function sfp_i2c_read() now always uses single byte reading when it is required and when function sfp_hwmon_probe() detects single byte access, it disables registration of hwmon device, because in this case we cannot reliably and atomically read 2 bytes as is required by the standard for retrieving values from diagnostic area. (These Realtek chips are broken in a way that violates SFP standards for diagnostic interface. Kernel in this case simply cannot do anything less of skipping registration of the hwmon interface.) This patch fixes reading of EEPROM content from SFP modules based on Realtek RTL8672 and RTL9601C chips. Diagnostic interface of EEPROM stays broken and cannot be fixed. commit 624407d2cf14ff58e53bf4b2af9595c4f21d606e Author: Russell King Date: Sun Jan 10 10:58:32 2021 + net: sfp: cope with SFPs that set both LOS normal and LOS inverted The SFP MSA defines two option bits in byte 65 to indicate how the Rx_LOS signal on SFP pin 8 behaves: bit 2 - Loss of Signal implemented, signal inverted from standard definition in SFP MSA (often called "Signal Detect"). bit 1 - Loss of Signal implemented, signal as defined in SFP MSA (often called
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > I have offered, in every response, to collaborate with the simple > > integration to use optoe as the default upstream driver to access the > > module EEPROMs. optoe would be superior to the existing default > > routines in sfp.c > > Actually, i'm not sure they would be. Since the KAPI issues are pretty much a > NACK on their own, i didn't bother raising other issues. Both Russell King and > I has issues with quirks and hotplug. > > Our experience is that a number of SFPs are broken, they don't follow the > standard. Some you cannot perform more than 16 bytes reads without them > locking up. Others will perform a 16 byte read, but only give you one useful > byte of data. So you have to read enough of the EEPROM a byte at a time to > get the vendor and product strings in order to determine what quirks need > to be applied. optoe has nothing like this. Either you don't care and only > support well behaved SFPs, or you have the quirk handling in user space, in > the various vendor code blobs, repeated again and again. To make optoe > generically usable, you are going to have to push the quirk handling into > optoe. The brokenness should be hidden from userspace. Interesting. I would throw away such devices. That's why switch vendors publish supported parts lists. Can you point me to the code that is handling those quirks? Since I haven't seen those problems, I don't know what they are and how to address them. Note there are a VAST number of data items in those EEPROMs, including proprietary capabilities. Many of the items are configuration dependent, and mean different things depending on the value of other data items. Most of these items are not of any interest to kernel networking. I try to minimize the size of the kernel footprint and move those decoding and management functions to user space. > > And then you repeat all the quirk handling sfp.c has. That does not scale, we > don't want the same quirks in two different places. However, because SFPs > are hot pluggable, you need to re-evaluate the quirks whenever there is a > hot-plug event. optoe has no idea if there has been a hotplug event, since it > does not have access to the GPIOs. Your user space vendor code might > know, it has access to the GPIOs. So maybe you could add an IOCTL call or > something, to let optoe know the module has changed and it needs to > update its quirks. Or for every user space read, you actually re-read the > vendor IDs and refresh the quirks before performing the read the user > actually wants. That all seems ugly and is missing from the current patch. Actually I do need to know whether the device supports paging, that's the only device state I need. Since I don't detect hotplug events, I read the 'paging supported' bit on every read that changes the page register. There is a GPIO line to detect 'presence', which presumably could be accessed via device tree configuration with the GPIO driver. I haven't figured out how to connect those pieces so I just read the page register on every access. Adding that would be a useful feature. > > I fully agree with Jakub NACK. > > Andrew Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Tue, Mar 23, 2021 at 12:08:32PM -0700, Don Bollinger wrote: > On Tue, Mar 23, 2021 at 12:00AM -0700, Greg KH wrote: > > On Tue, Mar 23, 2021 at 11:43:55AM -0700, Don Bollinger wrote: > > > On Tue, Mar 23, 2021 at 7:12AM-0700, Greg KH wrote: > > > > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > > > > optoe is an i2c based driver that supports read/write access to > > > > > all the pages (tables) of MSA standard SFP and similar devices > > > > > (conforming to the SFF-8472 spec), MSA standard QSFP and similar > > > > > devices (conforming to the SFF-8636 spec) and CMIS and similar > > > > > devices (conforming to the Common Management Interface > > Specfication). > > > > > > > > > > I promise not to engage in a drawn out email exchange over this, but I > > > would like to appeal your decision, just once... > > Thanks for your response. As promised, I'm done. > > Is there a correct protocol for withdrawing a patch, or does it just get > abandoned? Still trying to be a good citizen. Patches just get abandonded, nothing special to do here, we have mailing lists littered with them :) thanks, greg k-h
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Tue, Mar 23, 2021 at 12:00AM -0700, Greg KH wrote: > On Tue, Mar 23, 2021 at 11:43:55AM -0700, Don Bollinger wrote: > > On Tue, Mar 23, 2021 at 7:12AM-0700, Greg KH wrote: > > > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > > > optoe is an i2c based driver that supports read/write access to > > > > all the pages (tables) of MSA standard SFP and similar devices > > > > (conforming to the SFF-8472 spec), MSA standard QSFP and similar > > > > devices (conforming to the SFF-8636 spec) and CMIS and similar > > > > devices (conforming to the Common Management Interface > Specfication). > > > > > > > I promise not to engage in a drawn out email exchange over this, but I > > would like to appeal your decision, just once... Thanks for your response. As promised, I'm done. Is there a correct protocol for withdrawing a patch, or does it just get abandoned? Still trying to be a good citizen. Don > > > > > Given this thread, I think that using the SFP interface/api in the > > > kernel already seems like the best idea forward. > > > > > > That being said, your api here is whack, and I couldn't accept it anyway. > > > > I don't understand. I don't mean you are wrong, I literally don't > > understand what is whack about it. The interface is provided by > > nvmem. I modeled the calls on at24. The layout of the data provided > > by the driver is exactly the same layout that ethtool provides (device, > offset, length). > > Mapping i2c address, page and offset is exactly what ethtool provides. > > So, which part of this is whack? > > It's sysfs. Does nvmem use sysfs for device discovery and enablement? > > nvmem is just a "raw" maping of hardware (memory) to userspace. > > You have a "real" device here that you are trying to also map to userspace, > but when you just expose the "raw" registers (i.e. memory) to userspace, > you are forcing userspace to handle all of the device differences, instead of > the kernel. > > That's fine, for some things, but for anything with a standard, that's not ok, > that's what a kernel is for. > > In other words, you could do what you want today probably with a UIO > driver, just get the kernel out of the way and do it all in userspace. > But that's not a viable or suportable api in the long-run for any standard > hardware type. > > > > Not for the least being it's not even documented in > > > Documentation/ABI/ > > like > > > all sysfs files have to be :) > > > > This could obviously be fixed. I wasn't aware of this directory. Now > > that you've pointed it out, I see that nvmem is actually documented > > there, which is the API I am using. I document that optoe uses the > > nvmem interface, and the mapping of paged memory to linear memory in > > my patch in Documentation/misc-devices/optoe.rst. If you think it > > would be useful, I could provide similar information in > Documentation/ABI/stable. > > Again, nvmem in sysfs is just a dump of the hardware memory. That should > not be how to control a switch device. > > > > And it feels like you are abusing sysfs for things it was not ment > > > for, > > you > > > might want to look into configfs? > > > > I'm using nvmem, which in turn uses sysfs, just like at24. Why should > > optoe be different? I would think it is actually better to use the > > same API (and > > code) as at24, and NOT to put it in a different place. > > at24 too is just an eeprom behind an i2c bus. Accessing it for simple things is > fine for userspace, but not for a standard device type. > > The networking developers have said that they feel the kernel should > properly control devices like this, with a standard api. And I agree with them > (note, I'm biased, I like standard APIs, heck, I've even written specs for > them...) Doing "raw" hardware accesses is great fun for things like one-off > devices (I have Linux running in a keyboard for something like that, also as > my doorbell), but doing this for a "real" > set of devices is not ok. > > Again, it's the difference between the UIO interface and a real ethernet > driver in the kernel. You could just say "all PCI network devices should use > the UIO interface and put the hardware-specific logic in userspace", but > that's not what we (i.e. the Linux kernel developers) feel is the proper way > to handle the abstraction of device types. > > Again, we are kernel developers, we like nice hardware abstractions. > Bonus is that it lets new hardware companies create new devices and no > userspace modifications are needed! I think history is on our side here :) > > > > But really, these are networking devices, so they should be > > > controllable > > using > > > the standard networking apis, not one-off sysfs files. Moving to > > > the > > Linux- > > > standard tools is a good thing, and will work out better in the end > > instead of > > > having to encode lots of device-specific state in userspace like > > > this > > "raw" api > > > seems to require. > > > > This is the
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Tue, Mar 23, 2021 at 11:43:55AM -0700, Don Bollinger wrote: > On Tue, Mar 23, 2021 at 7:12AM-0700, Greg KH wrote: > > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > > optoe is an i2c based driver that supports read/write access to all > > > the pages (tables) of MSA standard SFP and similar devices (conforming > > > to the SFF-8472 spec), MSA standard QSFP and similar devices > > > (conforming to the SFF-8636 spec) and CMIS and similar devices > > > (conforming to the Common Management Interface Specfication). > > > > I promise not to engage in a drawn out email exchange over this, but I would > like to appeal your decision, just once... > > > Given this thread, I think that using the SFP interface/api in the kernel > > already seems like the best idea forward. > > > > That being said, your api here is whack, and I couldn't accept it anyway. > > I don't understand. I don't mean you are wrong, I literally don't > understand what is whack about it. The interface is provided by nvmem. I > modeled the calls on at24. The layout of the data provided by the driver is > exactly the same layout that ethtool provides (device, offset, length). > Mapping i2c address, page and offset is exactly what ethtool provides. So, > which part of this is whack? It's sysfs. Does nvmem use sysfs for device discovery and enablement? nvmem is just a "raw" maping of hardware (memory) to userspace. You have a "real" device here that you are trying to also map to userspace, but when you just expose the "raw" registers (i.e. memory) to userspace, you are forcing userspace to handle all of the device differences, instead of the kernel. That's fine, for some things, but for anything with a standard, that's not ok, that's what a kernel is for. In other words, you could do what you want today probably with a UIO driver, just get the kernel out of the way and do it all in userspace. But that's not a viable or suportable api in the long-run for any standard hardware type. > > Not for the least being it's not even documented in Documentation/ABI/ > like > > all sysfs files have to be :) > > This could obviously be fixed. I wasn't aware of this directory. Now that > you've pointed it out, I see that nvmem is actually documented there, which > is the API I am using. I document that optoe uses the nvmem interface, and > the mapping of paged memory to linear memory in my patch in > Documentation/misc-devices/optoe.rst. If you think it would be useful, I > could provide similar information in Documentation/ABI/stable. Again, nvmem in sysfs is just a dump of the hardware memory. That should not be how to control a switch device. > > And it feels like you are abusing sysfs for things it was not ment for, > you > > might want to look into configfs? > > I'm using nvmem, which in turn uses sysfs, just like at24. Why should optoe > be different? I would think it is actually better to use the same API (and > code) as at24, and NOT to put it in a different place. at24 too is just an eeprom behind an i2c bus. Accessing it for simple things is fine for userspace, but not for a standard device type. The networking developers have said that they feel the kernel should properly control devices like this, with a standard api. And I agree with them (note, I'm biased, I like standard APIs, heck, I've even written specs for them...) Doing "raw" hardware accesses is great fun for things like one-off devices (I have Linux running in a keyboard for something like that, also as my doorbell), but doing this for a "real" set of devices is not ok. Again, it's the difference between the UIO interface and a real ethernet driver in the kernel. You could just say "all PCI network devices should use the UIO interface and put the hardware-specific logic in userspace", but that's not what we (i.e. the Linux kernel developers) feel is the proper way to handle the abstraction of device types. Again, we are kernel developers, we like nice hardware abstractions. Bonus is that it lets new hardware companies create new devices and no userspace modifications are needed! I think history is on our side here :) > > But really, these are networking devices, so they should be controllable > using > > the standard networking apis, not one-off sysfs files. Moving to the > Linux- > > standard tools is a good thing, and will work out better in the end > instead of > > having to encode lots of device-specific state in userspace like this > "raw" api > > seems to require. > > This is the real issue. It turns out, on these switches, there are two > kinds of networking. Linux kernel networking handles one port, of 1Gb (or > less), which functions as a management port. This is typically used for > console access. It is configured and managed as an ordinary network port, > with a kernel network driver and the usual networking utilities. 'ip addr' > will show this port as well as loopback ports. The linux kernel has no > visibility
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Tue, Mar 23, 2021 at 7:12AM-0700, Greg KH wrote: > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > optoe is an i2c based driver that supports read/write access to all > > the pages (tables) of MSA standard SFP and similar devices (conforming > > to the SFF-8472 spec), MSA standard QSFP and similar devices > > (conforming to the SFF-8636 spec) and CMIS and similar devices > > (conforming to the Common Management Interface Specfication). > I promise not to engage in a drawn out email exchange over this, but I would like to appeal your decision, just once... > Given this thread, I think that using the SFP interface/api in the kernel > already seems like the best idea forward. > > That being said, your api here is whack, and I couldn't accept it anyway. I don't understand. I don't mean you are wrong, I literally don't understand what is whack about it. The interface is provided by nvmem. I modeled the calls on at24. The layout of the data provided by the driver is exactly the same layout that ethtool provides (device, offset, length). Mapping i2c address, page and offset is exactly what ethtool provides. So, which part of this is whack? > > Not for the least being it's not even documented in Documentation/ABI/ like > all sysfs files have to be :) This could obviously be fixed. I wasn't aware of this directory. Now that you've pointed it out, I see that nvmem is actually documented there, which is the API I am using. I document that optoe uses the nvmem interface, and the mapping of paged memory to linear memory in my patch in Documentation/misc-devices/optoe.rst. If you think it would be useful, I could provide similar information in Documentation/ABI/stable. > > And it feels like you are abusing sysfs for things it was not ment for, you > might want to look into configfs? I'm using nvmem, which in turn uses sysfs, just like at24. Why should optoe be different? I would think it is actually better to use the same API (and code) as at24, and NOT to put it in a different place. > > But really, these are networking devices, so they should be controllable using > the standard networking apis, not one-off sysfs files. Moving to the Linux- > standard tools is a good thing, and will work out better in the end instead of > having to encode lots of device-specific state in userspace like this "raw" api > seems to require. This is the real issue. It turns out, on these switches, there are two kinds of networking. Linux kernel networking handles one port, of 1Gb (or less), which functions as a management port. This is typically used for console access. It is configured and managed as an ordinary network port, with a kernel network driver and the usual networking utilities. 'ip addr' will show this port as well as loopback ports. The linux kernel has no visibility to the switch networking ports. 'ip addr' will not show any of the switch networking ports. The switch functions, switching at 25Tb/s, are completely invisible to the linux kernel. The switch ASIC is managed by a device driver provided by the ASIC vendor. That driver is driven by management code from the ASIC vendor and a host of network applications. Multiple vendors compete to provide the best, most innovative, most secure, easiest... network capabilities on top of this architecture. NONE of them use a kernel network driver, or the layers of control or management that the linux kernel offers. On these systems, if you ask ethtool to provide EEPROM data, you get 'function not implemented'. On these systems, SFP/QSFP/CMIS devices are actually not 'networking devices' from a Linux kernel perspective. They are GPIO targets and EEPROM memory. Switch networking just needs the kernel to toggle the GPIO lines and read/write the EEPROM. optoe is just trying to read/write the EEPROM. One last note... The networking folks need a better SFP/QSFP/CMIS EEPROM driver to access more pages, and to support the new CMIS standard. optoe could provide that, and I would love to collaborate on integrating optoe into that stack. It wouldn't be hard, and it would be useful. I am not opposed to netdev/netlink/phylink. I have been commenting on Moshe's KAPI proposal, with several of my suggestions adopted there. I am not insisting on my approach INSTEAD of theirs. I am insisting on my approach IN ADDITION TO theirs. Without the nvmem interface, optoe is useless to my community. > > thanks, > > greg k-h Thanks Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > optoe is an i2c based driver that supports read/write access to all > the pages (tables) of MSA standard SFP and similar devices (conforming > to the SFF-8472 spec), MSA standard QSFP and similar devices (conforming > to the SFF-8636 spec) and CMIS and similar devices (conforming to the > Common Management Interface Specfication). Given this thread, I think that using the SFP interface/api in the kernel already seems like the best idea forward. That being said, your api here is whack, and I couldn't accept it anyway. Not for the least being it's not even documented in Documentation/ABI/ like all sysfs files have to be :) And it feels like you are abusing sysfs for things it was not ment for, you might want to look into configfs? But really, these are networking devices, so they should be controllable using the standard networking apis, not one-off sysfs files. Moving to the Linux-standard tools is a good thing, and will work out better in the end instead of having to encode lots of device-specific state in userspace like this "raw" api seems to require. thanks, greg k-h
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
Hello Don! I have read whole discussion and your EEPROM patch proposal. But for me it looks like some kernel glue code for some old legacy / proprietary access method which does not have any usage outside of that old code. Your code does not contain any quirks which are needed to read different EEPROMs in different SFP modules. As Andrew wrote there are lot of broken SFPs which needs special handling and this logic is already implemented in sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM content to userspace via ethtool -m API in unified way and userspace does not implement any quirks (nor does not have to deal with quirks). If you try to read EEPROM "incorrectly" then SFP module with its EEPROM chip (or emulation of chip) locks and is fully unusable after you unplug it and plug it again. Kernel really should not export API to userspace which can cause "damage" to SFP modules. And currently it does *not* do it. I have contributed code for some GPON SFP modules, so their EEPROM can be correctly read and exported to userspace via ethtool -m. So I know that this is very fragile area and needs to be properly handled. So I do not see any reason why can be a new optoe method in _current_ form useful. It does not implemented required things for handling different EEPROM modules. I would rather suggest you to use ethtool -m IOCTL API and in case it is not suitable for QSFP (e.g. because of paging system) then extend this API. There were already proposals for using netlink socket interface which is today de-facto standard interface for new code. sysfs API for such thing really looks like some legacy code and nowadays we have better access methods. If you want, I can help you with these steps and make patches to be in acceptable state; not written in "legacy" style. As I'm working with GPON SFP modules with different EEPROMs in them, I'm really interested in any improvements in their EEPROM area.
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> I have offered, in every response, to collaborate with the simple > integration to use optoe as the default upstream driver to access the module > EEPROMs. optoe would be superior to the existing default routines in sfp.c Actually, i'm not sure they would be. Since the KAPI issues are pretty much a NACK on their own, i didn't bother raising other issues. Both Russell King and I has issues with quirks and hotplug. Our experience is that a number of SFPs are broken, they don't follow the standard. Some you cannot perform more than 16 bytes reads without them locking up. Others will perform a 16 byte read, but only give you one useful byte of data. So you have to read enough of the EEPROM a byte at a time to get the vendor and product strings in order to determine what quirks need to be applied. optoe has nothing like this. Either you don't care and only support well behaved SFPs, or you have the quirk handling in user space, in the various vendor code blobs, repeated again and again. To make optoe generically usable, you are going to have to push the quirk handling into optoe. The brokenness should be hidden from userspace. And then you repeat all the quirk handling sfp.c has. That does not scale, we don't want the same quirks in two different places. However, because SFPs are hot pluggable, you need to re-evaluate the quirks whenever there is a hot-plug event. optoe has no idea if there has been a hotplug event, since it does not have access to the GPIOs. Your user space vendor code might know, it has access to the GPIOs. So maybe you could add an IOCTL call or something, to let optoe know the module has changed and it needs to update its quirks. Or for every user space read, you actually re-read the vendor IDs and refresh the quirks before performing the read the user actually wants. That all seems ugly and is missing from the current patch. I fully agree with Jakub NACK. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Mon, 15 Mar 2021 10:40 -0800 Jakub Kicinski wrote: > On Sat, 13 Mar 2021 13:35:56 -0800 Don Bollinger wrote: > > > away parts of the bottom end and replace it with a different KAPI, > > > and nobody will notice? In fact, this is probably how it was > > > designed. Anybody > > > > Actually everyone who touches this code would notice, each > > implementation would have to be modified, with literally no benefit to this > community. > > You keep saying that kernel API is "of no benefit to this community" > yet you don't want to accept the argument that your code is of no benefit to > the upstream community. I have offered, in every response, to collaborate with the simple integration to use optoe as the default upstream driver to access the module EEPROMs. optoe would be superior to the existing default routines in sfp.c and would allow multiple existing vendor specific upstream drivers to adopt the default. That would reduce the code base they maintain and provide better access to module eeproms. I already adopted the existing EEPROM api to make that integration easy (nvmem). I'm reluctant to submit the changes to sfp.c because I have no expertise in that stack and no platform to test it on. > > > optoe does not undermine the netlink KAPI that Moshe is working on. > > It does, although it may be hard to grasp for a vendor who can just EoL a > product at will once nobody is paying for it. We aim to provide uniform > support for all networking devices and an infinite backward compatibility > guarantee. I aim to provide uniform support for module EEPROM devices, with no less reason to expect an infinite backward compatibility guarantee. (Infinite is a bit much, that technology will inevitably become uninteresting to my community as well as yours.) > > People will try to use optoe-based tools on the upstream drivers and they > won't work. Realistically we will need to support both APIs. I assume they "won't work" because of new requirements or newly discovered defects. At that point optoe would be fixed by people who care, just like any other upstream code. If your stack adopts optoe, through Moshe's KAPI, then both communities benefit from ongoing support to maintain and enhance EEPROM access. If your stack does not adopt optoe, then my community will manage the support, since they need and use the code. As for "both APIs", the first API is Moshe's, which we both support (politically and technically). The second is nvmem, which is supported by the EEPROM driver folks, led by the support for the at24 driver. I'm calling the routines they created for this purpose, I'm not creating a new API. Bottom line: "Realistically we will need to support both APIs" even if optoe does not get accepted. > > > If your community is interested, it could adopt optoe, WITH your KAPI, > > to consolidate and improve module EEPROM access for mainstream netdev > > consumers. I am eager to collaborate on the fairly simple > > integration. > > Nacked-by: Jakub Kicinski > > Please move on. My community still has useful code that they would like in the upstream kernel. Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Sat, 13 Mar 2021 13:35:56 -0800 Don Bollinger wrote: > > away parts of the bottom end and replace it with a different KAPI, and > > nobody will notice? In fact, this is probably how it was designed. Anybody > > Actually everyone who touches this code would notice, each implementation > would have to be modified, with literally no benefit to this community. You keep saying that kernel API is "of no benefit to this community" yet you don't want to accept the argument that your code is of no benefit to the upstream community. > optoe does not undermine the netlink KAPI that Moshe is working on. It does, although it may be hard to grasp for a vendor who can just EoL a product at will once nobody is paying for it. We aim to provide uniform support for all networking devices and an infinite backward compatibility guarantee. People will try to use optoe-based tools on the upstream drivers and they won't work. Realistically we will need to support both APIs. > If your community is interested, it could adopt optoe, WITH your > KAPI, to consolidate and improve module EEPROM access for mainstream > netdev consumers. I am eager to collaborate on the fairly simple > integration. Nacked-by: Jakub Kicinski Please move on.
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > This interface is implemented in python scripts provided by the switch > > platform vendor. Those scripts encode the mapping of CPLD i2c muxes > > to i2c buses to port numbers, specific to each switch. > > > > At the bottom of that python stack, all EEPROM access goes through > > open/seek/read/close access to the optoe managed file in > > /sys/bus/i2c/devices/{num}-0050/eeprom. > > And this python stack is all open source? So you should be able to throw It is open in the sense that it is accessible to the world on github and the maintainers would presumably entertain contributions. In practice, these are developed and maintained by the white box switch vendors. There is one implementation for each platform on each supported NOS. There is lots of commonality among them, but each is nonetheless unique. Optoe has been adopted across this ecosystem as a common piece that has been factored out of many of the platform specific implementations. > away parts of the bottom end and replace it with a different KAPI, and > nobody will notice? In fact, this is probably how it was designed. Anybody Actually everyone who touches this code would notice, each implementation would have to be modified, with literally no benefit to this community. > working with out of tree code knows what gets merged later is going to be > different because of review comments. And KAPI code is even more likely to > be different. So nobody really expected optoe to get merged as is. The list of 'nobody' includes myself, my switch platform partners, my NOS partners and Greg KH. I did expect to accommodate constructive review of the code, which I have already done (this is v2). > > > You're not going to like this, but ethtool -e and ethtool -m both > > return ' Ethernet0 Cannot get EEPROM data: Operation not supported', > > for every port managed by the big switch silicon. > > You are still missing what i said. The existing IOCTL interface needs a network > interface name. But there is no reason why you cannot extend the new > netlink KAPI to take an alternative identifier, sfp42. That maps directly to the > SFP device, without using an interface name. Your pile of python can directly > use the netlink API, the ethtool command does not need to make use of this > form of identifier, and you don't need to "screen scrape" ethtool. It is just software, your proposal is certainly technically feasible. It provides no benefit to the community that is using optoe. optoe is using a perfectly good KAPI, the nvmem interface that is being developed and maintained by the folks who manage the EEPROM drivers in the kernel. It has been updated since the prior submittal in 2018 to use the nvmem interface and the regmap interface, both from the at24.c driver. This community isn't using the rest of the netdev/netlink interfaces, and has adopted (before I wrote optoe) a perfectly reasonable approach of writing a simple driver to access these simple devices. optoe does not undermine the netlink KAPI that Moshe is working on. If your community is interested, it could adopt optoe, WITH your KAPI, to consolidate and improve module EEPROM access for mainstream netdev consumers. I am eager to collaborate on the fairly simple integration. > > It seems very unlikely optoe is going to get merged. The network maintainers > are against it, due to KAPI issues. I'm trying to point out a path you can take > to get code merged. But it is up to you if you decided to follow it. Thank you, I decline. I respectfully request that optoe be accepted as a useful implementation of the EEPROM driver, with the same access methods as other EEPROM drivers, customized for the unique memory layout of SFP, QSFP and CMIS devices. I remain open to improvements in the implementation, but my community finds no value in an implementation that removes the standard EEPROM access via sysfs and open/seek/read/close calls. > > Andrew Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> This interface is implemented in python scripts provided by the switch > platform > vendor. Those scripts encode the mapping of CPLD i2c muxes to i2c buses to > port numbers, specific to each switch. > > At the bottom of that python stack, all EEPROM access goes through > open/seek/read/close access to the optoe managed file in > /sys/bus/i2c/devices/{num}-0050/eeprom. And this python stack is all open source? So you should be able to throw away parts of the bottom end and replace it with a different KAPI, and nobody will notice? In fact, this is probably how it was designed. Anybody working with out of tree code knows what gets merged later is going to be different because of review comments. And KAPI code is even more likely to be different. So nobody really expected optoe to get merged as is. > You're not going to like this, but ethtool -e and ethtool -m both > return ' Ethernet0 Cannot get EEPROM data: Operation not supported', > for every port managed by the big switch silicon. You are still missing what i said. The existing IOCTL interface needs a network interface name. But there is no reason why you cannot extend the new netlink KAPI to take an alternative identifier, sfp42. That maps directly to the SFP device, without using an interface name. Your pile of python can directly use the netlink API, the ethtool command does not need to make use of this form of identifier, and you don't need to "screen scrape" ethtool. It seems very unlikely optoe is going to get merged. The network maintainers are against it, due to KAPI issues. I'm trying to point out a path you can take to get code merged. But it is up to you if you decided to follow it. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, 5 Mar 2021 19:32 -0800 Andrew Lunn wrote: > > I am proposing acceptance of a commonly used implementation for > > accessing SFPs because the one used by the netdev/netlink community > > does not fit the architecture of the white box NOS/switch community. > > Please could you expand on this. I've given suggests as to how the new > netlink KAPI could be used for this use case, without being attached to a > netdev. And you have said nothing about why it cannot be made to work. > You cannot argue the architecture does not fit without actually saying why it > does not fit. > >Andrew Sorry it took some time to clarify this for myself. I'm using SONiC (the NOS Microsoft uses to run the switches in its Azure cloud) as my example. They are users of optoe, and they actually initiated the request to push optoe upstream. Other white box NOS vendors are similar. SONiC manages all aspects of SFP/QSFP/CMIS interaction through user space. They have specified an API that is implemented by switch platform vendors, that provides things like presence detection, LowPower mode up/down/status, raw access to EEPROM content, interpretation of EEPROM content (including TxPower/RxPower/bias, high/low alarm/warning thresholds, static content like serial number and part number, and dozens of other items). This interface is implemented in python scripts provided by the switch platform vendor. Those scripts encode the mapping of CPLD i2c muxes to i2c buses to port numbers, specific to each switch. At the bottom of that python stack, all EEPROM access goes through open/seek/read/close access to the optoe managed file in /sys/bus/i2c/devices/{num}-0050/eeprom. You're not going to like this, but ethtool -e and ethtool -m both return ' Ethernet0 Cannot get EEPROM data: Operation not supported', for every port managed by the big switch silicon. So, my users are using Linux for all the usual Linux things (memory management, process management, I/O, IPC, containers), but they don't use Linux networking to manage the ports that are managed by the big switch silicon. (Linux networking is still in use for the management port that talks to the management processor running Linux.) optoe provides the device EEPROM foundation for this architecture, but requires the sysfs interface (via nvmem) to provide it. optoe can also easily provide the default EEPROM access for the netdev/netlink interface through the old and new KAPI. Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> I am proposing acceptance of a commonly used implementation for accessing > SFPs because the one used by the netdev/netlink community does not fit the > architecture of the white box NOS/switch community. Please could you expand on this. I've given suggests as to how the new netlink KAPI could be used for this use case, without being attached to a netdev. And you have said nothing about why it cannot be made to work. You cannot argue the architecture does not fit without actually saying why it does not fit. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, 5 Mar 2021 2:55 PM -0800 Jakub Kicinski wrote: > On Fri, 5 Mar 2021 11:07:08 -0800 Don Bollinger wrote: > > Acknowledging your objections, I nonetheless request that optoe be > > accepted into upstream as an eeprom driver in drivers/misc/eeprom. It > > is a legitimate driver, with a legitimate user community, which > > deserves the benefits of being managed as a legitimate part of the linux > kernel. > > It's in the best interest of the community to standardize on how we expect > things to operate. You're free to do whatever you want in your proprietary > systems but please don't expect us to accept a parallel, in now way superior > method of accessing SFPs. These are not proprietary systems. The Network Operating Systems that use optoe are open source projects, Linux based, available on github, and contributing to the Linux source (see for example https://github.com/Azure/SONiC). The switches that these NOSs run on are open spec systems (https://www.opencompute.org/wiki/Networking/SpecsAndDesigns). The fact that they use the SDK from the switch silicon vendor should not mean they are banished from the Linux community. I am proposing acceptance of a commonly used implementation for accessing SFPs because the one used by the netdev/netlink community does not fit the architecture of the white box NOS/switch community. I am not trying to pick sides, I am trying to make optoe available to both communities, to improve EEPROM access for both of them. Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, 5 Mar 2021 11:07:08 -0800 Don Bollinger wrote: > Acknowledging your objections, I nonetheless request that optoe be accepted > into upstream as an eeprom driver in drivers/misc/eeprom. It is a > legitimate driver, with a legitimate user community, which deserves the > benefits of being managed as a legitimate part of the linux kernel. It's in the best interest of the community to standardize on how we expect things to operate. You're free to do whatever you want in your proprietary systems but please don't expect us to accept a parallel, in now way superior method of accessing SFPs.
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Mon, Mar 1, 2021 at 12:46 PM-0800, Andrew Lunn wrote: > > To be more specific, optoe is only replacing the functionality of > > drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and sfp_i2c_write(). > > These are the routines at the very bottom of the ethtool stack that > > actually execute the i2c calls to get the data. The existing routines > > are very limited, in that they don't handle pages at all. Hence they > > can only reach > > 256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data. I can > > propose a shorter cleaner replacement for each of those routines which > > will provide access to the rest of the data on those devices. > > drivers/net/phy/sfp.c is not the only code making use of this KAPI. > Any MAC driver can implement the ethtool op calls for reading SFP memory. > The MAC driver can either directly reply because it has the SFP hidden > behind firmware, or it can call into the sfp.c code, because Linux is driving the > SFP. OK, I have checked with my partners, including NOS vendors and switch platform vendors. They are not using the netdev framework, they are basically not using kernel networking for managing the networking through tens to over a hundred network ports at 10G to 400G speeds. The kernel is not the source of truth regarding the state of network devices. I know netdev *could* manage these systems, and that you are working toward that goal, that is not the approach they are taking nor the direction they are heading. I am not disparaging netdev, I respect and value the contribution to linux networking. It's all good. It just isn't the direction my partners are going at this time. You have described this architecture in the past as a 'bootloader'. In fact Linux is the operating system running on those switches. It is not temporary (eg loading the real OS and going away). It is allocating memory, dispatching processes and threads, handling interrupts, hosting docker containers, and running the proprietary network APIs that manage the networks. In this architecture, the optical modules are managed by the OS, through drivers. The network APIs interact through these drivers. For much of the configuration data, there are configuration files that match up device hardware (e.g. Low Power Mode GPIO lines and Tx disable lines) and i2c buses (through layers of i2c muxes) with switch ports as seen by the switch silicon. Network management software (user space apps) is responsible for enabling, configuring and monitoring optical modules to match the config files. Kernel drivers provide the access to the GPIO lines and the EEPROM control registers. Notably, in this architecture, there are actually NO kernel consumers of the module EEPROM data. The version of optoe that is in production in these NOS and switch environments does not even have an entry point callable by the kernel. All of the consumers are accessing the data via the sysfs file in /sys/bus/i2c/devices/*. I have closely modeled the updated version of optoe on the at24 driver (drivers/misc/eeprom/at24.c). Thus, the KAPI is actually the same as used by other eeprom drivers. It is an eeprom, it is accessed by the nvmem interfaces, in both kernel and user space. My primary motivation for creating optoe is to consolidate a bunch of different implementations by different vendors, to add page support which most implementations lacked, to extend the reach to all of the architected pages (the standards describe them as proprietary, not forbidden), to provide write access, and to enable CMIS devices. Those goals apply to the netdev/netlink environment as well. I added the kernel access via nvmem to facilitate adoption in your network stack, to achieve the same goals (standardization and improvement of access to module EEPROMs). > > Moshe is working on the Mellonox MAC drivers. As you say, the current sfp.c I love Moshe's KAPI, and am reviewing and commenting on it to ensure its success. > code is very limited. But once Moshe code is merged, i will do the work > needed to extend sfp.c to fully support the KAPI. It will then work for many > more MAC drivers, those using phylink. One piece of that work could be to replace the contents of drivers/net/phy/sfp.c, functions sfp_i2c_read() and sfp_i2c_write() with nvmem calls to optoe. That would be an easy change, and provide all of the features of optoe (pages, access to all of the EEPROM, write support, CMIS), without writing and maintaining that i2c access code. The actual i2c calls would be handled by the same code that is supporting at24. It is plausible that platform vendors would choose not to implement their own version of these functions if the generic sfp_i2c_read/write worked for them. Fewer implementations of the same code, with more capability in the common implementation, is obviously beneficial. > For me, the KAPI is the important thing, and less so how the implementation > underneath works. Ideally, we want one KAPI for
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> To be more specific, optoe is only replacing the functionality of > drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and sfp_i2c_write(). > These are the routines at the very bottom of the ethtool stack that actually > execute the i2c calls to get the data. The existing routines are very > limited, in that they don't handle pages at all. Hence they can only reach > 256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data. I can > propose a shorter cleaner replacement for each of those routines which will > provide access to the rest of the data on those devices. drivers/net/phy/sfp.c is not the only code making use of this KAPI. Any MAC driver can implement the ethtool op calls for reading SFP memory. The MAC driver can either directly reply because it has the SFP hidden behind firmware, or it can call into the sfp.c code, because Linux is driving the SFP. Moshe is working on the Mellonox MAC drivers. As you say, the current sfp.c code is very limited. But once Moshe code is merged, i will do the work needed to extend sfp.c to fully support the KAPI. It will then work for many more MAC drivers, those using phylink. For me, the KAPI is the important thing, and less so how the implementation underneath works. Ideally, we want one KAPI for accessing SFP EEPROMs. Up until now, that KAPI is the ethtool IOCTL. But that KAPI is now showing its age, and it causing us problems. So we need to replace that KAPI. ethtool has recently moved to using netlink messages. So any replacement should be based on netlink. The whole network stack is pretty much controlled via netlink. So you will find it very difficult to argue for any other form of KAPI within the netdev community. Since optoe's KAPI is not netlink based, it is very unlikely to be accepted. But netlink is much more flexible than the older IOCTL interface. Please work with us to ensure this new KAPI can work with your use cases. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, Feb 26, 2021 at 6:46 PM -0800, Don Bollinger wrote: > On Fri, Feb 26, 2021 at 2:35 PM -0800, Andrew Lunn wrote: > > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > > optoe is an i2c based driver that supports read/write access to all > > > the pages (tables) of MSA standard SFP and similar devices > > > (conforming to the SFF-8472 spec), MSA standard QSFP and similar > > > devices (conforming to the SFF-8636 spec) and CMIS and similar > > > devices (conforming to the Common Management Interface > Specfication). > > > > Hi Don > > > > And we have seen this code before, and the netdev Maintainers have > > argued against it before. > > Yes, though I have tried to make it more acceptable, and also more useful to > you... > > > > These devices provide identification, operational status and control > > > registers via an EEPROM model. These devices support one or 3 fixed > > > pages > > > (128 bytes) of data, and one page that is selected via a page > > > register on the first fixed page. Thus the driver's main task is to > > > map these pages onto a simple linear address space for user space > > > management > > applications. > > > See the driver code for a detailed layout. > > > > I assume you have seen the work NVIDIA submitted last week? This idea > > of linear pages is really restrictive and we are moving away from it. Yes, I see at least most of it in your response in the netdev archive. The linear pages are really a very simple mapping, but they also provide a very simple access model. I can readily accommodate Moshe's addressing (i2c addr, page, bank, offset, len). Details below... > > > The EEPROM data is accessible to user space and kernel consumers via > > > the nvmem interface. > > > > ethtool -m ? > > This is actually below ethtool. Ethtool has to fetch the data from the eeprom > using an appropriate i2c interface (these are defined to be i2c devices). And, > to fetch the data from any but the first 256 bytes the code ethtool calls must > deal with the page register on the device. This driver handles the page > register, for both SFP and QSFP/CMIS devices. This driver would be a useful > path for ethtool to use when someone decides to make that data available. > Note for example, CMIS devices put the DOM data (per-channel Tx power, > Rx Power, laser bias) on page 0x11. When you want that data, you'll have to > go past 256 bytes and deal with the page register. > optoe will do that for you. To be more specific, optoe is only replacing the functionality of drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and sfp_i2c_write(). These are the routines at the very bottom of the ethtool stack that actually execute the i2c calls to get the data. The existing routines are very limited, in that they don't handle pages at all. Hence they can only reach 256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data. I can propose a shorter cleaner replacement for each of those routines which will provide access to the rest of the data on those devices. ALL of the ethtool stack remains unchanged, works exactly the same, and will now provide access to more data on the EEPROMs. We may have to remove some range checks to allow requests to pages beyond page 0. Note that Moshe's RFC does not include the actual access routines, the equivalent of sfp_i2c_read/write. That will in fact require managing pages. Using optoe, those routines are a few lines of code to map his addr/page/bank/offset and a call to the new sfp_i2c_read/write calls. And, all of the i2c/EEPROM accesses go through the same code that routinely handles the rest of the EEPROMs in the system, with all of the architectural consolidation, i2c anomaly handling, and simplified support inherent in sharing common code. > > In the past, this code has been NACKed because it does not integrate > > into the networking stack. Is this attempt any different? > > Yes. I have updated the driver with all the latest changes in at24.c, the > primary eeprom driver. That update includes use of the nvmem interface, > which means this driver can now be called by kernel code. I believe it would > be useful and easy to replace the sfp.c code that accesses the eeprom with > nvmem calls to this driver. By doing so, you will be able to access the > additional pages of data on those devices. You would also get the benefit of > sharing common code the other eeprom drivers. As part of that cleanup, the > explicit sysfs creation of an eeprom device has been removed. > Full disclosure, the nvmem interface creates that device now. > > > Thanks > > Andrew > > One more point, I have been requested by the SONiC team to upstream this > driver. It is on their short list of kernel code that is not upstream, and they > want to fix that. They are not consumers of the netdev interface, but they > (and other NOS providers) have a need for a driver to access the eeprom > data. Greg KH was involved in that request, and I
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
> > I assume you have seen the work NVIDIA submitted last week? This idea of > > linear pages is really restrictive and we are moving away from it. > > No, I haven't seen it. I can't seem to locate anything in the past month on > LMKL from NVIDIA. Please point me to it. [RFC PATCH net-next 0/5] ethtool: Extend module EEPROM dump Message-Id: <1614181274-28482-1-git-send-email-mo...@nvidia.com> b4 should be able to fetch it for you, using that message id. Clearly, we don't want two different kernel APIs for doing the same thing. This new KAPI is still in its early days. You can contribute to it, and make it work for your use case. If i understand correctly, you are using Linux as a bootloader, and running the complete switch driver in userspace, not making use of the Linux network stack. This is not something the netdev community likes, but if you work within the networking KAPI, rather than adding parallel KAPI, we can probably work together. I think the biggest problem you have is identifiers. Since you don't have the SFP associated to a netdev, the current IOCTL interface which us the netdev name as an identifier does not work. But the new code is netlink based. The identifier is just an attribute in the message. See if you can use an alternative attribute which directly identifies the SFP, not the netdev. It is O.K. to instantiate an SFP device and then not make use of it in PHYLINK. So this should work. Andrew
RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Fri, Feb 26, 2021 at 2:35 PM -0800, Andrew Lunn wrote: > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > > optoe is an i2c based driver that supports read/write access to all > > the pages (tables) of MSA standard SFP and similar devices (conforming > > to the SFF-8472 spec), MSA standard QSFP and similar devices > > (conforming to the SFF-8636 spec) and CMIS and similar devices > > (conforming to the Common Management Interface Specfication). > > Hi Don > > Please make sure you Cc: netdev. This is networking stuff. Will do. > And we have seen this code before, and the netdev Maintainers have > argued against it before. Yes, though I have tried to make it more acceptable, and also more useful to you... > > These devices provide identification, operational status and control > > registers via an EEPROM model. These devices support one or 3 fixed > > pages > > (128 bytes) of data, and one page that is selected via a page register > > on the first fixed page. Thus the driver's main task is to map these > > pages onto a simple linear address space for user space management > applications. > > See the driver code for a detailed layout. > > I assume you have seen the work NVIDIA submitted last week? This idea of > linear pages is really restrictive and we are moving away from it. No, I haven't seen it. I can't seem to locate anything in the past month on LMKL from NVIDIA. Please point me to it. What are you using instead of mapping the pages into a linear address space? Perhaps I can provide a different interface to call with some other mapping. > > The EEPROM data is accessible to user space and kernel consumers via > > the nvmem interface. > > ethtool -m ? This is actually below ethtool. Ethtool has to fetch the data from the eeprom using an appropriate i2c interface (these are defined to be i2c devices). And, to fetch the data from any but the first 256 bytes the code ethtool calls must deal with the page register on the device. This driver handles the page register, for both SFP and QSFP/CMIS devices. This driver would be a useful path for ethtool to use when someone decides to make that data available. Note for example, CMIS devices put the DOM data (per-channel Tx power, Rx Power, laser bias) on page 0x11. When you want that data, you'll have to go past 256 bytes and deal with the page register. optoe will do that for you. > In the past, this code has been NACKed because it does not integrate into > the networking stack. Is this attempt any different? Yes. I have updated the driver with all the latest changes in at24.c, the primary eeprom driver. That update includes use of the nvmem interface, which means this driver can now be called by kernel code. I believe it would be useful and easy to replace the sfp.c code that accesses the eeprom with nvmem calls to this driver. By doing so, you will be able to access the additional pages of data on those devices. You would also get the benefit of sharing common code the other eeprom drivers. As part of that cleanup, the explicit sysfs creation of an eeprom device has been removed. Full disclosure, the nvmem interface creates that device now. > Thanks > Andrew One more point, I have been requested by the SONiC team to upstream this driver. It is on their short list of kernel code that is not upstream, and they want to fix that. They are not consumers of the netdev interface, but they (and other NOS providers) have a need for a driver to access the eeprom data. Greg KH was involved in that request, and I related your concerns to him, as well as my position that this is an eeprom driver with partners that need it independent of netdev. His response: GKH> I can't remember the actual patch anymore, but you are right, it's "just" GKH> poking around in the EEPROM for the device, and doing what you want in GKH> userspace, which really should be fine. Submit it and I'll be glad to review it GKH> under that type of functionality being added. He didn't say he would override your position, but he suggested it was appropriate to submit it. So, here it is. Thus: 1. There are existing NOS platforms that need and use this functionality, they want it in the upstream kernel. 2. This driver is better than sfp.c, and could easily be plugged in to improve netdev and ethtool access to SFP/QSFP/CMIS EEPROM data. Don
Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote: > optoe is an i2c based driver that supports read/write access to all > the pages (tables) of MSA standard SFP and similar devices (conforming > to the SFF-8472 spec), MSA standard QSFP and similar devices (conforming > to the SFF-8636 spec) and CMIS and similar devices (conforming to the > Common Management Interface Specfication). Hi Don Please make sure you Cc: netdev. This is networking stuff. And we have seen this code before, and the netdev Maintainers have argued against it before. > These devices provide identification, operational status and control > registers via an EEPROM model. These devices support one or 3 fixed pages > (128 bytes) of data, and one page that is selected via a page register on > the first fixed page. Thus the driver's main task is to map these pages > onto a simple linear address space for user space management applications. > See the driver code for a detailed layout. I assume you have seen the work NVIDIA submitted last week? This idea of linear pages is really restrictive and we are moving away from it. > The EEPROM data is accessible to user space and kernel consumers via the > nvmem interface. ethtool -m ? In the past, this code has been NACKed because it does not integrate into the networking stack. Is this attempt any different? Thanks Andrew
[PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
optoe is an i2c based driver that supports read/write access to all the pages (tables) of MSA standard SFP and similar devices (conforming to the SFF-8472 spec), MSA standard QSFP and similar devices (conforming to the SFF-8636 spec) and CMIS and similar devices (conforming to the Common Management Interface Specfication). These devices provide identification, operational status and control registers via an EEPROM model. These devices support one or 3 fixed pages (128 bytes) of data, and one page that is selected via a page register on the first fixed page. Thus the driver's main task is to map these pages onto a simple linear address space for user space management applications. See the driver code for a detailed layout. Several variants and predecessors of this driver exist outside the kernel tree. Some of them don't support pages at all, only accessing the first 256 bytes of EEPROM. None of them handle all three specifications, none of them support pages beyond page 3. This is a problem since critical monitoring data on CMIS devices is on page 0x11. Some of them don't support pages at all, only accessing the first 256 bytes of data. optoe supports the full architected 256 page address space of these devices. optoe is currently in production on multiple platforms running SONiC (Microsoft's Software for Open Networking in the Cloud) and ONL (Open Network Linux). They have requested that optoe be submitted upstream. The EEPROM data is accessible to user space and kernel consumers via the nvmem interface. optoe devices can be configured via device tree or the 'new_device' paradigm. See Documentation/misc-devices/optoe.rst for details on configuring optoe and accessing the EEPROM. Signed-off-by: Don Bollinger --- Change in v2: - made function optoe_make_regmap() static to fix compiler warnings Reported-by: kernel test robot --- Documentation/devicetree/bindings/eeprom/optoe.txt | 26 + Documentation/misc-devices/optoe.rst | 127 +++ drivers/misc/eeprom/Kconfig| 24 + drivers/misc/eeprom/Makefile | 1 + drivers/misc/eeprom/optoe.c| 931 + 5 files changed, 1109 insertions(+) create mode 100644 Documentation/devicetree/bindings/eeprom/optoe.txt create mode 100644 Documentation/misc-devices/optoe.rst create mode 100644 drivers/misc/eeprom/optoe.c diff --git a/Documentation/devicetree/bindings/eeprom/optoe.txt b/Documentation/devicetree/bindings/eeprom/optoe.txt new file mode 100644 index ..3a8c350cf2f5 --- /dev/null +++ b/Documentation/devicetree/bindings/eeprom/optoe.txt @@ -0,0 +1,26 @@ +EEPROMs on SFP/QSFP/CMIS optoelectronic modules using SFF-8472, SFF-8676, +CMIS (Common Management Interface Spec) devices. + +Required properties: +- compatible: shall be one of : + 'optoe,optoe1' - for QSFP class devices, adhering to SFF-8636 + 'optoe,optoe2' - for SFP class devices, adhering to SFF-8472 + 'optoe,optoe3' - for CMIS devices (newer QSFP class devices) +- reg: 0x50 The only valid value is 0x50, as all three standards specify that + the device is at i2c address 0x50. (optoe allocates an i2c dummy to access + the data at i2c address 0x51.) + +Optional property: +- port_name: can be set to any string up to 19 characters. Note that the + actual mapping between i2c busses and network ports is platform dependent + and varies widely. The 'port_name' property provides a way to associate + specific network ports with their associated hardware ports. + +Example: + #address-cells = <1>; + #size-cells = <0>; + optoe@50 { + compatible = "optoe,optoe2"; + reg = <0x50>; + port_name = "port1"; + }; diff --git a/Documentation/misc-devices/optoe.rst b/Documentation/misc-devices/optoe.rst new file mode 100644 index ..2ffce09f18bb --- /dev/null +++ b/Documentation/misc-devices/optoe.rst @@ -0,0 +1,127 @@ + +optoe - EEPROMs on SFP/QSFP/CMIS optoelectronic modules + + +Author: Don Bollinger (d...@thebollingers.org) + +Description: + + +Optoe is an i2c based driver that supports read/write access to all +the pages (tables) of MSA standard SFP and similar devices (conforming +to the SFF-8472 spec), MSA standard QSFP and similar devices (conforming +to the SFF-8436 or SFF-8636 spec) and CMIS devices (conforming to the +Common Management Interface Specification). + +i2c based optoelectronic transceivers (SPF, QSFP, etc) provide identification, +operational status, and control registers via an EEPROM model. Unlike the +EEPROMs that at24 supports, these devices access data beyond byte 256 via +a page select register, which must be managed by the driver. optoe +represents the EEPROM as a linear address space, managing the page register +as needed. On QSFP