Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 21:21:16 Marek Behún wrote:
> On Thu, 7 Jan 2021 19:45:49 +
> Russell King - ARM Linux admin  wrote:
> 
> > I think you're not reading the code very well. It checks for bytes at
> > offset 1..blocksize-1, blocksize+1..2*blocksize-1, etc are zero. It
> > does _not_ check that byte 0 or the byte at N*blocksize is zero - these
> > bytes are skipped. In other words, the first byte of each transfer can
> > be any value. The other bytes of the _entire_ ID must be zero.
> 
> Wouldn't it be better, instead of checking if 1..blocksize-1 are zero,
> to check whether reading byte by byte returns the same as reading 16
> bytes whole?

It would means to read EEPROM two times unconditionally for every SFP.
With current solution we read EEPROM two times only for these buggy
RTL-based SFP modules. For all other SFPs EEPROM content is read only
one time. I like current solution because we do not change the way how
are other (non-broken) SFPs detected. It is better to not touch things
which are not broken.

And as we know that these zeros are expected behavior on these broken
RTL-based SFPs I think such test is fine.

Moreover there are Nokia SFPs which do not like one byte read and locks
i2c bus. Yes, it happens only for EEPROM content on second address
(therefore ID part for this test is not affected) but who knows how
broken would be any other SFPs in future.


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Marek Behún
On Thu, 7 Jan 2021 19:45:49 +
Russell King - ARM Linux admin  wrote:

> I think you're not reading the code very well. It checks for bytes at
> offset 1..blocksize-1, blocksize+1..2*blocksize-1, etc are zero. It
> does _not_ check that byte 0 or the byte at N*blocksize is zero - these
> bytes are skipped. In other words, the first byte of each transfer can
> be any value. The other bytes of the _entire_ ID must be zero.

Wouldn't it be better, instead of checking if 1..blocksize-1 are zero,
to check whether reading byte by byte returns the same as reading 16
bytes whole?

Marek


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 06:19:23PM +0100, Andrew Lunn wrote:
> > -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base)
> > +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len)
> >  {
> > -   if (!memcmp(base->vendor_name, "VSOL", 16))
> > -   return 1;
> > -   if (!memcmp(base->vendor_name, "OEM ", 16) &&
> > -   !memcmp(base->vendor_pn,   "V2801F  ", 16))
> > -   return 1;
> > +   size_t i, block_size = sfp->i2c_block_size;
> >  
> > -   /* Some modules can't cope with long reads */
> > -   return 16;
> > -}
> > +   /* Already using byte IO */
> > +   if (block_size == 1)
> > +   return false;
> 
> This seems counter intuitive. We don't need byte IO because we are
> doing btye IO? Can we return True here?

It is counter-intuitive, but as this is indicating whether we need to
switch to byte IO, if we're already doing byte IO, then we don't need
to switch.

> > -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base 
> > *base)
> > -{
> > -   sfp->i2c_block_size = sfp_quirk_i2c_block_size(base);
> > +   for (i = 1; i < len; i += block_size) {
> > +   if (memchr_inv(buf + i, '\0', block_size - 1))
> > +   return false;
> > +   }
> 
> Is the loop needed?

I think you're not reading the code very well. It checks for bytes at
offset 1..blocksize-1, blocksize+1..2*blocksize-1, etc are zero. It
does _not_ check that byte 0 or the byte at N*blocksize is zero - these
bytes are skipped. In other words, the first byte of each transfer can
be any value. The other bytes of the _entire_ ID must be zero.

> I also wonder if on the last iteration of the loop you go passed the
> end of buf? Don't you need a min(block_size -1, len - i) or
> similar?

The ID is 64 bytes long, and is fixed. block_size could be a non-power
of two, but that is highly unlikely. block_size will never be larger
than 16 either.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 17:40:06 Russell King - ARM Linux admin wrote:
> On Thu, Jan 07, 2021 at 06:19:23PM +0100, Andrew Lunn wrote:
> > Did we loose the comment:
> > 
> > /* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a
> >  * single read. Switch back to reading 16 byte blocks ...
> > 
> > That explains why 16 is used. Given how broken stuff is and the number
> > of workaround we need, we should try to document as much as we cam, so
> > we don't break stuff when adding more workarounds.
> 
> It is _not_ why 16 is used at all.
> 
> We used to read the whole lot in one go. However, some modules could
> not cope with a full read - also some Linux I2C drivers struggled with
> it.
> 
> So, we reduced it down to 16 bytes. See commit 28e74a7cfd64 ("net: sfp:
> read eeprom in maximum 16 byte increments"). That had nothing to do
> with the 3FE46541AA, which came along later. It has been discovered
> that 3FE46541AA reacts badly to a single byte read to address 0x51 -
> it locks the I2C bus. Hence why we can't just go to single byte reads
> for every module.
> 
> So, the comment needs to be kept to explain why we are unable to go
> to single byte reads for all modules.  The choice of 16 remains
> relatively arbitary.

Do you have an idea where to put a comment?


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 18:19:23 Andrew Lunn wrote:
> > +   if (sfp->i2c_block_size < 2) {
> > +   dev_info(sfp->dev, "skipping hwmon device registration "
> > +  "due to broken EEPROM\n");
> > +   dev_info(sfp->dev, "diagnostic EEPROM area cannot be read "
> > +  "atomically to guarantee data coherency\n");
> 
> Strings like this are the exception to the 80 character rule. People
> grep for them, and when they are split, they are harder to find.

Ok. I will fix it.

> > -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base)
> > +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len)
> >  {
> > -   if (!memcmp(base->vendor_name, "VSOL", 16))
> > -   return 1;
> > -   if (!memcmp(base->vendor_name, "OEM ", 16) &&
> > -   !memcmp(base->vendor_pn,   "V2801F  ", 16))
> > -   return 1;
> > +   size_t i, block_size = sfp->i2c_block_size;
> >  
> > -   /* Some modules can't cope with long reads */
> > -   return 16;
> > -}
> > +   /* Already using byte IO */
> > +   if (block_size == 1)
> > +   return false;
> 
> This seems counter intuitive. We don't need byte IO because we are
> doing btye IO? Can we return True here?

I do not know this part was written by Russel.

Currently function is used in a way if sfp subsystem should switch to
byte IO. So if we are already using byte IO we are not going to do
switch and therefore false is returning.

At least this is how I understood why 'return false' is there.

> >  
> > -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base 
> > *base)
> > -{
> > -   sfp->i2c_block_size = sfp_quirk_i2c_block_size(base);
> > +   for (i = 1; i < len; i += block_size) {
> > +   if (memchr_inv(buf + i, '\0', block_size - 1))
> > +   return false;
> > +   }
> 
> Is the loop needed?

Originally I wanted to use just four memcmp() calls but Russel told me
that code should be generic (in case in future initial block size would
be changed, which is a good argument) and come up with this code with
for-loop.

So I think loop is needed.

> I also wonder if on the last iteration of the loop you go passed the
> end of buf? Don't you need a min(block_size -1, len - i) or
> similar?

You are right, if code is generic this needs to be fixed to prevent
reading reading undefined memory. I will replace it by proposed min(...)
call.


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 06:19:23PM +0100, Andrew Lunn wrote:
> Did we loose the comment:
> 
> /* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a
>  * single read. Switch back to reading 16 byte blocks ...
> 
> That explains why 16 is used. Given how broken stuff is and the number
> of workaround we need, we should try to document as much as we cam, so
> we don't break stuff when adding more workarounds.

It is _not_ why 16 is used at all.

We used to read the whole lot in one go. However, some modules could
not cope with a full read - also some Linux I2C drivers struggled with
it.

So, we reduced it down to 16 bytes. See commit 28e74a7cfd64 ("net: sfp:
read eeprom in maximum 16 byte increments"). That had nothing to do
with the 3FE46541AA, which came along later. It has been discovered
that 3FE46541AA reacts badly to a single byte read to address 0x51 -
it locks the I2C bus. Hence why we can't just go to single byte reads
for every module.

So, the comment needs to be kept to explain why we are unable to go
to single byte reads for all modules.  The choice of 16 remains
relatively arbitary.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Andrew Lunn
> + if (sfp->i2c_block_size < 2) {
> + dev_info(sfp->dev, "skipping hwmon device registration "
> +"due to broken EEPROM\n");
> + dev_info(sfp->dev, "diagnostic EEPROM area cannot be read "
> +"atomically to guarantee data coherency\n");

Strings like this are the exception to the 80 character rule. People
grep for them, and when they are split, they are harder to find.

> -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base)
> +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len)
>  {
> - if (!memcmp(base->vendor_name, "VSOL", 16))
> - return 1;
> - if (!memcmp(base->vendor_name, "OEM ", 16) &&
> - !memcmp(base->vendor_pn,   "V2801F  ", 16))
> - return 1;
> + size_t i, block_size = sfp->i2c_block_size;
>  
> - /* Some modules can't cope with long reads */
> - return 16;
> -}
> + /* Already using byte IO */
> + if (block_size == 1)
> + return false;

This seems counter intuitive. We don't need byte IO because we are
doing btye IO? Can we return True here?

>  
> -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base 
> *base)
> -{
> - sfp->i2c_block_size = sfp_quirk_i2c_block_size(base);
> + for (i = 1; i < len; i += block_size) {
> + if (memchr_inv(buf + i, '\0', block_size - 1))
> + return false;
> + }

Is the loop needed?

I also wonder if on the last iteration of the loop you go passed the
end of buf? Don't you need a min(block_size -1, len - i) or
similar?

> - /* Some modules (CarlitoxxPro CPGOS03-0490) do not support multibyte
> -  * reads from the EEPROM, so start by reading the base identifying
> -  * information one byte at a time.
> -  */
> - sfp->i2c_block_size = 1;
> + sfp->i2c_block_size = 16;

Did we loose the comment:

/* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a
 * single read. Switch back to reading 16 byte blocks ...

That explains why 16 is used. Given how broken stuff is and the number
of workaround we need, we should try to document as much as we cam, so
we don't break stuff when adding more workarounds.

 Andrew


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 03:02:36 Andrew Lunn wrote:
> > +   /* hwmon interface needs to access 16bit registers in atomic way to
> > +* guarantee coherency of the diagnostic monitoring data. If it is not
> > +* possible to guarantee coherency because EEPROM is broken in such way
> > +* that does not support atomic 16bit read operation then we have to
> > +* skip registration of hwmon device.
> > +*/
> > +   if (sfp->i2c_block_size < 2) {
> > +   dev_info(sfp->dev, "skipping hwmon device registration "
> > +  "due to broken EEPROM\n");
> > +   dev_info(sfp->dev, "diagnostic EEPROM area cannot be read "
> > +  "atomically to guarantee data coherency\n");
> > +   return;
> > +   }
> 
> This solves hwmon. But we still return the broken data to ethtool -m.
> I wonder if we should prevent that?

Looks like that it is not too simple for now.

And because we already export these data for these broken chips in
current mainline kernel, I would propose to postpone fix for ethtool and
let it for future patches. This patch series does not change (nor make
it worse) behavior.


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-06 Thread Andrew Lunn
> + /* hwmon interface needs to access 16bit registers in atomic way to
> +  * guarantee coherency of the diagnostic monitoring data. If it is not
> +  * possible to guarantee coherency because EEPROM is broken in such way
> +  * that does not support atomic 16bit read operation then we have to
> +  * skip registration of hwmon device.
> +  */
> + if (sfp->i2c_block_size < 2) {
> + dev_info(sfp->dev, "skipping hwmon device registration "
> +"due to broken EEPROM\n");
> + dev_info(sfp->dev, "diagnostic EEPROM area cannot be read "
> +"atomically to guarantee data coherency\n");
> + return;
> + }

This solves hwmon. But we still return the broken data to ethtool -m.
I wonder if we should prevent that?

  Andrew