Re: Regression: at24 eeprom writing

2015-10-13 Thread Peter Rosin
On 2015-10-13 14:57, Nicolas Ferre wrote:
> Le 13/10/2015 12:38, Peter Rosin a écrit :
>> On 2015-10-12 18:13, Cyrille Pitchen wrote:
>>> Le 12/10/2015 17:13, Peter Rosin a écrit :
 On 2015-10-05 17:09, Peter Rosin wrote:
> But what trouble does the i2c bus driver see? Admittedly I only
> have a simple logic level bus viewer, and not a full-blown
> oscilloscope, so there might be something analogue going on?
> I don't think so though, those signals looked fine last time we
> looked (but we obviously didn't have these issues then, and
> didn't really look that closely). I'll see if I can recheck
> with a real scope too.

 We redid the tests with a real scope, and the signal looks nice
 and square, so it is not that.

 Speculating further on the cause of the long ACKs, I think that
 the i2c driver gets confused by an interrupt that marks the
 transfer complete, and thinks it's a NACK interrupt instead. Is that
 plausible?

 If the peripheral unit is such that it generates a stop automatically
 on NACKs, then this makes perfect sense. I.e. the TWI sees that the
 transfer is complete, generates an interrupt, and waits for further
 data or a stop command. Meanwhile the driver thinks it's a NACK and
 that a stop condition has already been sent to the bus, and just
 notifies the i2c consumer (the eeprom driver in this case) of the
 failure and frees up the bus for any future user.

 This also matches what I see when I turn on some more traffic on the
 bus, that is interleaved with the eeprom traffic. AFAICT, it can be
 any command that gets chewed up by the eeprom if it is sent to the
 i2c driver during the long ACK.

 Are you Atmel people making any progress on this data corrupting
 regression? Is there anything else I can do to help?

 Cheers,
 Peter

>>>
>>> Hi Peter,
>>>
>>> I have sent a patch to Ludovic for a first internal review before 
>>> publishing to
>>> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
>>> board with an at24 eeprom.
>>>
>>> More details on the reason of this bug would be provided in both the commit
>>> message and comments in the code provided by the reviewed patch but I you 
>>> want
>>> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
>>> at91_do_twi_transfer(). This read clears the NACK bit in the Status 
>>> Register.
>>> Then the following source code can safely enable the NACK interrupt, 
>>> otherwise
>>> in some cases a pending NACK interrupt would rise immediately after the 
>>> line:
>>> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
>>> hence breaking the sequence of operations to be done because the interrupt
>>> handler would call complete() too early so wait_for_completion_timeout()
>>> also exits too early.
>>>
>>> So reading the Status Register at the beginning of at91_do_twi_transfer()
>>> should be enough to fix the issue.
>>
>> Yes, I see no more long ACKs after that reading the Status Register there.
>> Great!
>>
>>> Another mistake is in the interrupt handler itself, ie 
>>> atmel_twi_interrupt():
>>> we should check the TWI_TXRDY status bit before calling
>>> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
>>> bits
>>> are clear. Otherwise, writing a new byte into the THR tells the I2C 
>>> controller
>>> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
>>> also reply by a second NACK. Hence the NACK bit is already set into the 
>>> Status
>>> Register on the next call of at91_do_twi_transfer().
>>> This is what I saw on my scope for PIO transfers.
>>
>> I interpret this as a proposed solution for the strange double NACKs?
>>
>> Anyway, I find it unnecessarily hard to grasp exactly what you mean
>> (wasteful policy you are apparently suffering from where it is OK to
>> publish a patch written in English, but apparently a big no-no to
>> send a diff until it passes some internal review???). I interpreted
> 
> I find your remark pretty rude as I'm reading it while just at the desk
> behind me Cyrille and Ludovic are together trying hard to understand and
> fix this issue.
> They are making sure that this fix doesn't interfere with another SoC's
> IP version with any of the PIO/DMA/FIFO transfer types.
> 
> Cyrille just wanted to keep you informed quickly as we were
> progressing... Anyone can tell you that, obviously, there is no stupid
> policy of not releasing patches @ Atmel!
> 
> Anyway, let's keep the good debugging session progressing well with your
> valuable help...

Yeah, right, that didn't come out right. Sorry!

It was just that from over here I was pulling my hair out over this bug,
and then there appeared to have existed a patch that had been stuck in
internal review since some point last week. I realize now that this might
not have been the case and that the solution could have been found just

Re: Regression: at24 eeprom writing

2015-10-13 Thread ludovic.desroc...@atmel.com
Hi Peter,

On Tue, Oct 13, 2015 at 12:38:39PM +0200, Peter Rosin wrote:
> On 2015-10-12 18:13, Cyrille Pitchen wrote:
> > Le 12/10/2015 17:13, Peter Rosin a écrit :
> >> On 2015-10-05 17:09, Peter Rosin wrote:

[...]

> Anyway, I find it unnecessarily hard to grasp exactly what you mean
> (wasteful policy you are apparently suffering from where it is OK to
> publish a patch written in English, but apparently a big no-no to
> send a diff until it passes some internal review???). I interpreted
> your "patch" in English as:
> 
>   at91_twi_read_next_byte(dev);
> - else if (irqstatus & AT_TWI_TXRDY)
> + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
> AT91_TWI_NACK)) == AT91_TWI_TXRDY)
>   at91_twi_write_next_byte(dev);

We don't have such kind of policy. I simply discussed about your bug
with Cyrille and he managed to fix it. Since there are many
combinations (IP version, PIO/dma, hdmac/xdmac), I wanted to test it on
other platforms that the one used by Cyrille and add my signed-off-by
before sending it.

If nothing goes wrong (one platform has not been tested yet), the patch will
be sent this afternoon.

Cyrille answered in English in order to not create confusion with a patch which
could not fit all combinations.

Regards

Ludovic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-13 Thread Nicolas Ferre
Le 13/10/2015 12:38, Peter Rosin a écrit :
> On 2015-10-12 18:13, Cyrille Pitchen wrote:
>> Le 12/10/2015 17:13, Peter Rosin a écrit :
>>> On 2015-10-05 17:09, Peter Rosin wrote:
 But what trouble does the i2c bus driver see? Admittedly I only
 have a simple logic level bus viewer, and not a full-blown
 oscilloscope, so there might be something analogue going on?
 I don't think so though, those signals looked fine last time we
 looked (but we obviously didn't have these issues then, and
 didn't really look that closely). I'll see if I can recheck
 with a real scope too.
>>>
>>> We redid the tests with a real scope, and the signal looks nice
>>> and square, so it is not that.
>>>
>>> Speculating further on the cause of the long ACKs, I think that
>>> the i2c driver gets confused by an interrupt that marks the
>>> transfer complete, and thinks it's a NACK interrupt instead. Is that
>>> plausible?
>>>
>>> If the peripheral unit is such that it generates a stop automatically
>>> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
>>> transfer is complete, generates an interrupt, and waits for further
>>> data or a stop command. Meanwhile the driver thinks it's a NACK and
>>> that a stop condition has already been sent to the bus, and just
>>> notifies the i2c consumer (the eeprom driver in this case) of the
>>> failure and frees up the bus for any future user.
>>>
>>> This also matches what I see when I turn on some more traffic on the
>>> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
>>> any command that gets chewed up by the eeprom if it is sent to the
>>> i2c driver during the long ACK.
>>>
>>> Are you Atmel people making any progress on this data corrupting
>>> regression? Is there anything else I can do to help?
>>>
>>> Cheers,
>>> Peter
>>>
>>
>> Hi Peter,
>>
>> I have sent a patch to Ludovic for a first internal review before publishing 
>> to
>> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
>> board with an at24 eeprom.
>>
>> More details on the reason of this bug would be provided in both the commit
>> message and comments in the code provided by the reviewed patch but I you 
>> want
>> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
>> at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
>> Then the following source code can safely enable the NACK interrupt, 
>> otherwise
>> in some cases a pending NACK interrupt would rise immediately after the line:
>> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
>> hence breaking the sequence of operations to be done because the interrupt
>> handler would call complete() too early so wait_for_completion_timeout()
>> also exits too early.
>>
>> So reading the Status Register at the beginning of at91_do_twi_transfer()
>> should be enough to fix the issue.
> 
> Yes, I see no more long ACKs after that reading the Status Register there.
> Great!
> 
>> Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
>> we should check the TWI_TXRDY status bit before calling
>> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
>> bits
>> are clear. Otherwise, writing a new byte into the THR tells the I2C 
>> controller
>> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
>> also reply by a second NACK. Hence the NACK bit is already set into the 
>> Status
>> Register on the next call of at91_do_twi_transfer().
>> This is what I saw on my scope for PIO transfers.
> 
> I interpret this as a proposed solution for the strange double NACKs?
> 
> Anyway, I find it unnecessarily hard to grasp exactly what you mean
> (wasteful policy you are apparently suffering from where it is OK to
> publish a patch written in English, but apparently a big no-no to
> send a diff until it passes some internal review???). I interpreted

I find your remark pretty rude as I'm reading it while just at the desk
behind me Cyrille and Ludovic are together trying hard to understand and
fix this issue.
They are making sure that this fix doesn't interfere with another SoC's
IP version with any of the PIO/DMA/FIFO transfer types.

Cyrille just wanted to keep you informed quickly as we were
progressing... Anyone can tell you that, obviously, there is no stupid
policy of not releasing patches @ Atmel!

Anyway, let's keep the good debugging session progressing well with your
valuable help...

Bye.


> your "patch" in English as:
> 
>   at91_twi_read_next_byte(dev);
> - else if (irqstatus & AT_TWI_TXRDY)
> + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
> AT91_TWI_NACK)) == AT91_TWI_TXRDY)
>   at91_twi_write_next_byte(dev);
> 
> But still see some double NACKs. Not always though, and it doesn't wreak
> any havoc. But it still looks strange and I can't explain them when looking
> at what the eeprom driver requests. Does this mean that there are 

Re: Regression: at24 eeprom writing

2015-10-13 Thread Peter Rosin
On 2015-10-12 18:13, Cyrille Pitchen wrote:
> Le 12/10/2015 17:13, Peter Rosin a écrit :
>> On 2015-10-05 17:09, Peter Rosin wrote:
>>> But what trouble does the i2c bus driver see? Admittedly I only
>>> have a simple logic level bus viewer, and not a full-blown
>>> oscilloscope, so there might be something analogue going on?
>>> I don't think so though, those signals looked fine last time we
>>> looked (but we obviously didn't have these issues then, and
>>> didn't really look that closely). I'll see if I can recheck
>>> with a real scope too.
>>
>> We redid the tests with a real scope, and the signal looks nice
>> and square, so it is not that.
>>
>> Speculating further on the cause of the long ACKs, I think that
>> the i2c driver gets confused by an interrupt that marks the
>> transfer complete, and thinks it's a NACK interrupt instead. Is that
>> plausible?
>>
>> If the peripheral unit is such that it generates a stop automatically
>> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
>> transfer is complete, generates an interrupt, and waits for further
>> data or a stop command. Meanwhile the driver thinks it's a NACK and
>> that a stop condition has already been sent to the bus, and just
>> notifies the i2c consumer (the eeprom driver in this case) of the
>> failure and frees up the bus for any future user.
>>
>> This also matches what I see when I turn on some more traffic on the
>> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
>> any command that gets chewed up by the eeprom if it is sent to the
>> i2c driver during the long ACK.
>>
>> Are you Atmel people making any progress on this data corrupting
>> regression? Is there anything else I can do to help?
>>
>> Cheers,
>> Peter
>>
> 
> Hi Peter,
> 
> I have sent a patch to Ludovic for a first internal review before publishing 
> to
> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
> board with an at24 eeprom.
> 
> More details on the reason of this bug would be provided in both the commit
> message and comments in the code provided by the reviewed patch but I you want
> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
> at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
> Then the following source code can safely enable the NACK interrupt, otherwise
> in some cases a pending NACK interrupt would rise immediately after the line:
> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
> hence breaking the sequence of operations to be done because the interrupt
> handler would call complete() too early so wait_for_completion_timeout()
> also exits too early.
> 
> So reading the Status Register at the beginning of at91_do_twi_transfer()
> should be enough to fix the issue.

Yes, I see no more long ACKs after that reading the Status Register there.
Great!

> Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
> we should check the TWI_TXRDY status bit before calling
> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
> bits
> are clear. Otherwise, writing a new byte into the THR tells the I2C controller
> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
> also reply by a second NACK. Hence the NACK bit is already set into the Status
> Register on the next call of at91_do_twi_transfer().
> This is what I saw on my scope for PIO transfers.

I interpret this as a proposed solution for the strange double NACKs?

Anyway, I find it unnecessarily hard to grasp exactly what you mean
(wasteful policy you are apparently suffering from where it is OK to
publish a patch written in English, but apparently a big no-no to
send a diff until it passes some internal review???). I interpreted
your "patch" in English as:

at91_twi_read_next_byte(dev);
-   else if (irqstatus & AT_TWI_TXRDY)
+   else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
AT91_TWI_NACK)) == AT91_TWI_TXRDY)
at91_twi_write_next_byte(dev);

But still see some double NACKs. Not always though, and it doesn't wreak
any havoc. But it still looks strange and I can't explain them when looking
at what the eeprom driver requests. Does this mean that there are more
races present?

Or, did I just parse your English "patch" badly?

> By the way, in my case, the first NACK occurs because the at24 driver tries to
> perform a second write transfer too quickly and the eeprom is not ready yet,
> then replies with a NACK.

Yes, I believe this is by design. Noone wants to encode the exact delays
needed since they are hard to predict. So, the eeprom driver polls.

> However while debugging, placing debug traces in the I2C controller driver
> changed the timing: this small delay was enough to make the first NACK
> disappear because the eeprom was now ready to process the second write
> operation.

Yes, w/o NACKs everything is going smooth regardless of kernel version.

> For DMA 

Re: Regression: at24 eeprom writing

2015-10-13 Thread Peter Rosin
On 2015-10-12 18:13, Cyrille Pitchen wrote:
> Le 12/10/2015 17:13, Peter Rosin a écrit :
>> On 2015-10-05 17:09, Peter Rosin wrote:
>>> But what trouble does the i2c bus driver see? Admittedly I only
>>> have a simple logic level bus viewer, and not a full-blown
>>> oscilloscope, so there might be something analogue going on?
>>> I don't think so though, those signals looked fine last time we
>>> looked (but we obviously didn't have these issues then, and
>>> didn't really look that closely). I'll see if I can recheck
>>> with a real scope too.
>>
>> We redid the tests with a real scope, and the signal looks nice
>> and square, so it is not that.
>>
>> Speculating further on the cause of the long ACKs, I think that
>> the i2c driver gets confused by an interrupt that marks the
>> transfer complete, and thinks it's a NACK interrupt instead. Is that
>> plausible?
>>
>> If the peripheral unit is such that it generates a stop automatically
>> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
>> transfer is complete, generates an interrupt, and waits for further
>> data or a stop command. Meanwhile the driver thinks it's a NACK and
>> that a stop condition has already been sent to the bus, and just
>> notifies the i2c consumer (the eeprom driver in this case) of the
>> failure and frees up the bus for any future user.
>>
>> This also matches what I see when I turn on some more traffic on the
>> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
>> any command that gets chewed up by the eeprom if it is sent to the
>> i2c driver during the long ACK.
>>
>> Are you Atmel people making any progress on this data corrupting
>> regression? Is there anything else I can do to help?
>>
>> Cheers,
>> Peter
>>
> 
> Hi Peter,
> 
> I have sent a patch to Ludovic for a first internal review before publishing 
> to
> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
> board with an at24 eeprom.
> 
> More details on the reason of this bug would be provided in both the commit
> message and comments in the code provided by the reviewed patch but I you want
> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
> at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
> Then the following source code can safely enable the NACK interrupt, otherwise
> in some cases a pending NACK interrupt would rise immediately after the line:
> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
> hence breaking the sequence of operations to be done because the interrupt
> handler would call complete() too early so wait_for_completion_timeout()
> also exits too early.
> 
> So reading the Status Register at the beginning of at91_do_twi_transfer()
> should be enough to fix the issue.

Yes, I see no more long ACKs after that reading the Status Register there.
Great!

> Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
> we should check the TWI_TXRDY status bit before calling
> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
> bits
> are clear. Otherwise, writing a new byte into the THR tells the I2C controller
> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
> also reply by a second NACK. Hence the NACK bit is already set into the Status
> Register on the next call of at91_do_twi_transfer().
> This is what I saw on my scope for PIO transfers.

I interpret this as a proposed solution for the strange double NACKs?

Anyway, I find it unnecessarily hard to grasp exactly what you mean
(wasteful policy you are apparently suffering from where it is OK to
publish a patch written in English, but apparently a big no-no to
send a diff until it passes some internal review???). I interpreted
your "patch" in English as:

at91_twi_read_next_byte(dev);
-   else if (irqstatus & AT_TWI_TXRDY)
+   else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
AT91_TWI_NACK)) == AT91_TWI_TXRDY)
at91_twi_write_next_byte(dev);

But still see some double NACKs. Not always though, and it doesn't wreak
any havoc. But it still looks strange and I can't explain them when looking
at what the eeprom driver requests. Does this mean that there are more
races present?

Or, did I just parse your English "patch" badly?

> By the way, in my case, the first NACK occurs because the at24 driver tries to
> perform a second write transfer too quickly and the eeprom is not ready yet,
> then replies with a NACK.

Yes, I believe this is by design. Noone wants to encode the exact delays
needed since they are hard to predict. So, the eeprom driver polls.

> However while debugging, placing debug traces in the I2C controller driver
> changed the timing: this small delay was enough to make the first NACK
> disappear because the eeprom was now ready to process the second write
> operation.

Yes, w/o NACKs everything is going smooth regardless of kernel version.

> For DMA 

Re: Regression: at24 eeprom writing

2015-10-13 Thread ludovic.desroc...@atmel.com
Hi Peter,

On Tue, Oct 13, 2015 at 12:38:39PM +0200, Peter Rosin wrote:
> On 2015-10-12 18:13, Cyrille Pitchen wrote:
> > Le 12/10/2015 17:13, Peter Rosin a écrit :
> >> On 2015-10-05 17:09, Peter Rosin wrote:

[...]

> Anyway, I find it unnecessarily hard to grasp exactly what you mean
> (wasteful policy you are apparently suffering from where it is OK to
> publish a patch written in English, but apparently a big no-no to
> send a diff until it passes some internal review???). I interpreted
> your "patch" in English as:
> 
>   at91_twi_read_next_byte(dev);
> - else if (irqstatus & AT_TWI_TXRDY)
> + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
> AT91_TWI_NACK)) == AT91_TWI_TXRDY)
>   at91_twi_write_next_byte(dev);

We don't have such kind of policy. I simply discussed about your bug
with Cyrille and he managed to fix it. Since there are many
combinations (IP version, PIO/dma, hdmac/xdmac), I wanted to test it on
other platforms that the one used by Cyrille and add my signed-off-by
before sending it.

If nothing goes wrong (one platform has not been tested yet), the patch will
be sent this afternoon.

Cyrille answered in English in order to not create confusion with a patch which
could not fit all combinations.

Regards

Ludovic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-13 Thread Nicolas Ferre
Le 13/10/2015 12:38, Peter Rosin a écrit :
> On 2015-10-12 18:13, Cyrille Pitchen wrote:
>> Le 12/10/2015 17:13, Peter Rosin a écrit :
>>> On 2015-10-05 17:09, Peter Rosin wrote:
 But what trouble does the i2c bus driver see? Admittedly I only
 have a simple logic level bus viewer, and not a full-blown
 oscilloscope, so there might be something analogue going on?
 I don't think so though, those signals looked fine last time we
 looked (but we obviously didn't have these issues then, and
 didn't really look that closely). I'll see if I can recheck
 with a real scope too.
>>>
>>> We redid the tests with a real scope, and the signal looks nice
>>> and square, so it is not that.
>>>
>>> Speculating further on the cause of the long ACKs, I think that
>>> the i2c driver gets confused by an interrupt that marks the
>>> transfer complete, and thinks it's a NACK interrupt instead. Is that
>>> plausible?
>>>
>>> If the peripheral unit is such that it generates a stop automatically
>>> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
>>> transfer is complete, generates an interrupt, and waits for further
>>> data or a stop command. Meanwhile the driver thinks it's a NACK and
>>> that a stop condition has already been sent to the bus, and just
>>> notifies the i2c consumer (the eeprom driver in this case) of the
>>> failure and frees up the bus for any future user.
>>>
>>> This also matches what I see when I turn on some more traffic on the
>>> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
>>> any command that gets chewed up by the eeprom if it is sent to the
>>> i2c driver during the long ACK.
>>>
>>> Are you Atmel people making any progress on this data corrupting
>>> regression? Is there anything else I can do to help?
>>>
>>> Cheers,
>>> Peter
>>>
>>
>> Hi Peter,
>>
>> I have sent a patch to Ludovic for a first internal review before publishing 
>> to
>> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
>> board with an at24 eeprom.
>>
>> More details on the reason of this bug would be provided in both the commit
>> message and comments in the code provided by the reviewed patch but I you 
>> want
>> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
>> at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
>> Then the following source code can safely enable the NACK interrupt, 
>> otherwise
>> in some cases a pending NACK interrupt would rise immediately after the line:
>> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
>> hence breaking the sequence of operations to be done because the interrupt
>> handler would call complete() too early so wait_for_completion_timeout()
>> also exits too early.
>>
>> So reading the Status Register at the beginning of at91_do_twi_transfer()
>> should be enough to fix the issue.
> 
> Yes, I see no more long ACKs after that reading the Status Register there.
> Great!
> 
>> Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
>> we should check the TWI_TXRDY status bit before calling
>> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
>> bits
>> are clear. Otherwise, writing a new byte into the THR tells the I2C 
>> controller
>> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
>> also reply by a second NACK. Hence the NACK bit is already set into the 
>> Status
>> Register on the next call of at91_do_twi_transfer().
>> This is what I saw on my scope for PIO transfers.
> 
> I interpret this as a proposed solution for the strange double NACKs?
> 
> Anyway, I find it unnecessarily hard to grasp exactly what you mean
> (wasteful policy you are apparently suffering from where it is OK to
> publish a patch written in English, but apparently a big no-no to
> send a diff until it passes some internal review???). I interpreted

I find your remark pretty rude as I'm reading it while just at the desk
behind me Cyrille and Ludovic are together trying hard to understand and
fix this issue.
They are making sure that this fix doesn't interfere with another SoC's
IP version with any of the PIO/DMA/FIFO transfer types.

Cyrille just wanted to keep you informed quickly as we were
progressing... Anyone can tell you that, obviously, there is no stupid
policy of not releasing patches @ Atmel!

Anyway, let's keep the good debugging session progressing well with your
valuable help...

Bye.


> your "patch" in English as:
> 
>   at91_twi_read_next_byte(dev);
> - else if (irqstatus & AT_TWI_TXRDY)
> + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | 
> AT91_TWI_NACK)) == AT91_TWI_TXRDY)
>   at91_twi_write_next_byte(dev);
> 
> But still see some double NACKs. Not always though, and it doesn't wreak
> any havoc. But it still looks strange and I can't explain them when looking
> at what the eeprom driver requests. Does this mean that there are 

Re: Regression: at24 eeprom writing

2015-10-13 Thread Peter Rosin
On 2015-10-13 14:57, Nicolas Ferre wrote:
> Le 13/10/2015 12:38, Peter Rosin a écrit :
>> On 2015-10-12 18:13, Cyrille Pitchen wrote:
>>> Le 12/10/2015 17:13, Peter Rosin a écrit :
 On 2015-10-05 17:09, Peter Rosin wrote:
> But what trouble does the i2c bus driver see? Admittedly I only
> have a simple logic level bus viewer, and not a full-blown
> oscilloscope, so there might be something analogue going on?
> I don't think so though, those signals looked fine last time we
> looked (but we obviously didn't have these issues then, and
> didn't really look that closely). I'll see if I can recheck
> with a real scope too.

 We redid the tests with a real scope, and the signal looks nice
 and square, so it is not that.

 Speculating further on the cause of the long ACKs, I think that
 the i2c driver gets confused by an interrupt that marks the
 transfer complete, and thinks it's a NACK interrupt instead. Is that
 plausible?

 If the peripheral unit is such that it generates a stop automatically
 on NACKs, then this makes perfect sense. I.e. the TWI sees that the
 transfer is complete, generates an interrupt, and waits for further
 data or a stop command. Meanwhile the driver thinks it's a NACK and
 that a stop condition has already been sent to the bus, and just
 notifies the i2c consumer (the eeprom driver in this case) of the
 failure and frees up the bus for any future user.

 This also matches what I see when I turn on some more traffic on the
 bus, that is interleaved with the eeprom traffic. AFAICT, it can be
 any command that gets chewed up by the eeprom if it is sent to the
 i2c driver during the long ACK.

 Are you Atmel people making any progress on this data corrupting
 regression? Is there anything else I can do to help?

 Cheers,
 Peter

>>>
>>> Hi Peter,
>>>
>>> I have sent a patch to Ludovic for a first internal review before 
>>> publishing to
>>> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
>>> board with an at24 eeprom.
>>>
>>> More details on the reason of this bug would be provided in both the commit
>>> message and comments in the code provided by the reviewed patch but I you 
>>> want
>>> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
>>> at91_do_twi_transfer(). This read clears the NACK bit in the Status 
>>> Register.
>>> Then the following source code can safely enable the NACK interrupt, 
>>> otherwise
>>> in some cases a pending NACK interrupt would rise immediately after the 
>>> line:
>>> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
>>> hence breaking the sequence of operations to be done because the interrupt
>>> handler would call complete() too early so wait_for_completion_timeout()
>>> also exits too early.
>>>
>>> So reading the Status Register at the beginning of at91_do_twi_transfer()
>>> should be enough to fix the issue.
>>
>> Yes, I see no more long ACKs after that reading the Status Register there.
>> Great!
>>
>>> Another mistake is in the interrupt handler itself, ie 
>>> atmel_twi_interrupt():
>>> we should check the TWI_TXRDY status bit before calling
>>> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status 
>>> bits
>>> are clear. Otherwise, writing a new byte into the THR tells the I2C 
>>> controller
>>> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
>>> also reply by a second NACK. Hence the NACK bit is already set into the 
>>> Status
>>> Register on the next call of at91_do_twi_transfer().
>>> This is what I saw on my scope for PIO transfers.
>>
>> I interpret this as a proposed solution for the strange double NACKs?
>>
>> Anyway, I find it unnecessarily hard to grasp exactly what you mean
>> (wasteful policy you are apparently suffering from where it is OK to
>> publish a patch written in English, but apparently a big no-no to
>> send a diff until it passes some internal review???). I interpreted
> 
> I find your remark pretty rude as I'm reading it while just at the desk
> behind me Cyrille and Ludovic are together trying hard to understand and
> fix this issue.
> They are making sure that this fix doesn't interfere with another SoC's
> IP version with any of the PIO/DMA/FIFO transfer types.
> 
> Cyrille just wanted to keep you informed quickly as we were
> progressing... Anyone can tell you that, obviously, there is no stupid
> policy of not releasing patches @ Atmel!
> 
> Anyway, let's keep the good debugging session progressing well with your
> valuable help...

Yeah, right, that didn't come out right. Sorry!

It was just that from over here I was pulling my hair out over this bug,
and then there appeared to have existed a patch that had been stuck in
internal review since some point last week. I realize now that this might
not have been the case and that the solution could have been found just

Re: Regression: at24 eeprom writing

2015-10-12 Thread Cyrille Pitchen
Le 12/10/2015 17:13, Peter Rosin a écrit :
> On 2015-10-05 17:09, Peter Rosin wrote:
>> But what trouble does the i2c bus driver see? Admittedly I only
>> have a simple logic level bus viewer, and not a full-blown
>> oscilloscope, so there might be something analogue going on?
>> I don't think so though, those signals looked fine last time we
>> looked (but we obviously didn't have these issues then, and
>> didn't really look that closely). I'll see if I can recheck
>> with a real scope too.
> 
> We redid the tests with a real scope, and the signal looks nice
> and square, so it is not that.
> 
> Speculating further on the cause of the long ACKs, I think that
> the i2c driver gets confused by an interrupt that marks the
> transfer complete, and thinks it's a NACK interrupt instead. Is that
> plausible?
> 
> If the peripheral unit is such that it generates a stop automatically
> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
> transfer is complete, generates an interrupt, and waits for further
> data or a stop command. Meanwhile the driver thinks it's a NACK and
> that a stop condition has already been sent to the bus, and just
> notifies the i2c consumer (the eeprom driver in this case) of the
> failure and frees up the bus for any future user.
> 
> This also matches what I see when I turn on some more traffic on the
> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
> any command that gets chewed up by the eeprom if it is sent to the
> i2c driver during the long ACK.
> 
> Are you Atmel people making any progress on this data corrupting
> regression? Is there anything else I can do to help?
> 
> Cheers,
> Peter
> 

Hi Peter,

I have sent a patch to Ludovic for a first internal review before publishing to
mainline. The patch should fix your issue since it fixes it on my sama5d36ek
board with an at24 eeprom.

More details on the reason of this bug would be provided in both the commit
message and comments in the code provided by the reviewed patch but I you want
an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
Then the following source code can safely enable the NACK interrupt, otherwise
in some cases a pending NACK interrupt would rise immediately after the line:
at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
hence breaking the sequence of operations to be done because the interrupt
handler would call complete() too early so wait_for_completion_timeout()
also exits too early.

So reading the Status Register at the beginning of at91_do_twi_transfer()
should be enough to fix the issue.

Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
we should check the TWI_TXRDY status bit before calling
at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status bits
are clear. Otherwise, writing a new byte into the THR tells the I2C controller
to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
also reply by a second NACK. Hence the NACK bit is already set into the Status
Register on the next call of at91_do_twi_transfer().
This is what I saw on my scope for PIO transfers.

By the way, in my case, the first NACK occurs because the at24 driver tries to
perform a second write transfer too quickly and the eeprom is not ready yet,
then replies with a NACK.
However while debugging, placing debug traces in the I2C controller driver
changed the timing: this small delay was enough to make the first NACK
disappear because the eeprom was now ready to process the second write
operation.

For DMA transfer, you have to know that when NACK error is detected, the I2C
controller sets the TWI_TXCOMP, TWI_NACK and TWI_TXRDY bits all together in
the Status Register:
- The TWI_NACK is cleared by reading the Status Register.
- The TWI_TXRDY triggers the DMA controller to write the next byte in the THR,
  which should also tell the I2C controller to start a new transfer.
- The TWI_TXCOMP bit is cleared by writing into the THR: this is why the
  interrupt handler may fail to detect NACK if it relies on TWI_TXCOMP.

Best Regards,

Cyrille
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-12 Thread Peter Rosin
On 2015-10-05 17:09, Peter Rosin wrote:
> But what trouble does the i2c bus driver see? Admittedly I only
> have a simple logic level bus viewer, and not a full-blown
> oscilloscope, so there might be something analogue going on?
> I don't think so though, those signals looked fine last time we
> looked (but we obviously didn't have these issues then, and
> didn't really look that closely). I'll see if I can recheck
> with a real scope too.

We redid the tests with a real scope, and the signal looks nice
and square, so it is not that.

Speculating further on the cause of the long ACKs, I think that
the i2c driver gets confused by an interrupt that marks the
transfer complete, and thinks it's a NACK interrupt instead. Is that
plausible?

If the peripheral unit is such that it generates a stop automatically
on NACKs, then this makes perfect sense. I.e. the TWI sees that the
transfer is complete, generates an interrupt, and waits for further
data or a stop command. Meanwhile the driver thinks it's a NACK and
that a stop condition has already been sent to the bus, and just
notifies the i2c consumer (the eeprom driver in this case) of the
failure and frees up the bus for any future user.

This also matches what I see when I turn on some more traffic on the
bus, that is interleaved with the eeprom traffic. AFAICT, it can be
any command that gets chewed up by the eeprom if it is sent to the
i2c driver during the long ACK.

Are you Atmel people making any progress on this data corrupting
regression? Is there anything else I can do to help?

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-12 Thread Cyrille Pitchen
Le 12/10/2015 17:13, Peter Rosin a écrit :
> On 2015-10-05 17:09, Peter Rosin wrote:
>> But what trouble does the i2c bus driver see? Admittedly I only
>> have a simple logic level bus viewer, and not a full-blown
>> oscilloscope, so there might be something analogue going on?
>> I don't think so though, those signals looked fine last time we
>> looked (but we obviously didn't have these issues then, and
>> didn't really look that closely). I'll see if I can recheck
>> with a real scope too.
> 
> We redid the tests with a real scope, and the signal looks nice
> and square, so it is not that.
> 
> Speculating further on the cause of the long ACKs, I think that
> the i2c driver gets confused by an interrupt that marks the
> transfer complete, and thinks it's a NACK interrupt instead. Is that
> plausible?
> 
> If the peripheral unit is such that it generates a stop automatically
> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
> transfer is complete, generates an interrupt, and waits for further
> data or a stop command. Meanwhile the driver thinks it's a NACK and
> that a stop condition has already been sent to the bus, and just
> notifies the i2c consumer (the eeprom driver in this case) of the
> failure and frees up the bus for any future user.
> 
> This also matches what I see when I turn on some more traffic on the
> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
> any command that gets chewed up by the eeprom if it is sent to the
> i2c driver during the long ACK.
> 
> Are you Atmel people making any progress on this data corrupting
> regression? Is there anything else I can do to help?
> 
> Cheers,
> Peter
> 

Hi Peter,

I have sent a patch to Ludovic for a first internal review before publishing to
mainline. The patch should fix your issue since it fixes it on my sama5d36ek
board with an at24 eeprom.

More details on the reason of this bug would be provided in both the commit
message and comments in the code provided by the reviewed patch but I you want
an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
Then the following source code can safely enable the NACK interrupt, otherwise
in some cases a pending NACK interrupt would rise immediately after the line:
at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
hence breaking the sequence of operations to be done because the interrupt
handler would call complete() too early so wait_for_completion_timeout()
also exits too early.

So reading the Status Register at the beginning of at91_do_twi_transfer()
should be enough to fix the issue.

Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
we should check the TWI_TXRDY status bit before calling
at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status bits
are clear. Otherwise, writing a new byte into the THR tells the I2C controller
to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
also reply by a second NACK. Hence the NACK bit is already set into the Status
Register on the next call of at91_do_twi_transfer().
This is what I saw on my scope for PIO transfers.

By the way, in my case, the first NACK occurs because the at24 driver tries to
perform a second write transfer too quickly and the eeprom is not ready yet,
then replies with a NACK.
However while debugging, placing debug traces in the I2C controller driver
changed the timing: this small delay was enough to make the first NACK
disappear because the eeprom was now ready to process the second write
operation.

For DMA transfer, you have to know that when NACK error is detected, the I2C
controller sets the TWI_TXCOMP, TWI_NACK and TWI_TXRDY bits all together in
the Status Register:
- The TWI_NACK is cleared by reading the Status Register.
- The TWI_TXRDY triggers the DMA controller to write the next byte in the THR,
  which should also tell the I2C controller to start a new transfer.
- The TWI_TXCOMP bit is cleared by writing into the THR: this is why the
  interrupt handler may fail to detect NACK if it relies on TWI_TXCOMP.

Best Regards,

Cyrille
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-12 Thread Peter Rosin
On 2015-10-05 17:09, Peter Rosin wrote:
> But what trouble does the i2c bus driver see? Admittedly I only
> have a simple logic level bus viewer, and not a full-blown
> oscilloscope, so there might be something analogue going on?
> I don't think so though, those signals looked fine last time we
> looked (but we obviously didn't have these issues then, and
> didn't really look that closely). I'll see if I can recheck
> with a real scope too.

We redid the tests with a real scope, and the signal looks nice
and square, so it is not that.

Speculating further on the cause of the long ACKs, I think that
the i2c driver gets confused by an interrupt that marks the
transfer complete, and thinks it's a NACK interrupt instead. Is that
plausible?

If the peripheral unit is such that it generates a stop automatically
on NACKs, then this makes perfect sense. I.e. the TWI sees that the
transfer is complete, generates an interrupt, and waits for further
data or a stop command. Meanwhile the driver thinks it's a NACK and
that a stop condition has already been sent to the bus, and just
notifies the i2c consumer (the eeprom driver in this case) of the
failure and frees up the bus for any future user.

This also matches what I see when I turn on some more traffic on the
bus, that is interleaved with the eeprom traffic. AFAICT, it can be
any command that gets chewed up by the eeprom if it is sent to the
i2c driver during the long ACK.

Are you Atmel people making any progress on this data corrupting
regression? Is there anything else I can do to help?

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-06 Thread Peter Rosin
On 2015-10-05 08:16, Christian Gmeiner wrote:
> Hi Peter.
> 
> Sorry for the late answer - I am currently on my way to Dublin. Maybe
> it helps if you enable
> I2C_DEBUG_CORE and I2C_DEBUG_BUS. In theory you should see a little
> bit better what
> happens on the bus.

Good suggestion. I did a new run (vanilla 4.2), and this is what I saw on
the bus this time:

[0.0] S W50 0x00 "product = 1-776-" P
[0.00184] S W50 NACK P
[0.00195] S W50 NACK P
[0.02027] S W50 0x10 ACK...
[0.04035] ...ACK 0x10 "3.0\n" P
[0.04100] S W50 NACK P
[0.04111] S W50 NACK P
[0.06032] S W50 "serial = 380" P
[0.06174] S W50 NACK P
[0.06185] S W50 NACK P
[0.08032] S W50 0x20 ACK...
[0.10034] ...ACK 0x20 "02\n" P
[0.10127] S W50 NACK P
[0.10138] S W50 NACK P
[0.12039] S W50 0x27 " " P
[0.12155] S W50 NACK P
[0.12166] S W50 NACK P
[0.14040] S W50 0x30 "" P
[0.14219] S W50 NACK P
[0.14230] S W50 NACK P
[0.16042] S W50 0x40 "" P
...

And this is the corresponding part of the log from the same run
(offset ~197.79 s):

[  197.79] i2c i2c-0: master_xfer[0] W, addr=0x50, len=17
[  197.79] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.79] at91_i2c f0014000.i2c: transfer: write 17 bytes.
[  197.80] at91_i2c f0014000.i2c: transfer complete
[  197.80] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.80] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.80] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.80] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.80] at91_i2c f0014000.i2c: wrote 0x33, to go 3
[  197.80] at91_i2c f0014000.i2c: received nack
[  197.82] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.82] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.82] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.82] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.82] at91_i2c f0014000.i2c: received nack
[  197.84] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.84] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.84] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.84] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.84] at91_i2c f0014000.i2c: wrote 0x33, to go 3
[  197.84] at91_i2c f0014000.i2c: wrote 0x2e, to go 2
[  197.84] at91_i2c f0014000.i2c: wrote 0x30, to go 1
[  197.84] at91_i2c f0014000.i2c: wrote 0xa, to go 0
[  197.84] at91_i2c f0014000.i2c: transfer complete
[  197.84] i2c i2c-0: master_xfer[0] W, addr=0x50, len=13
[  197.84] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.84] at91_i2c f0014000.i2c: transfer: write 13 bytes.
[  197.84] at91_i2c f0014000.i2c: received nack
[  197.86] i2c i2c-0: master_xfer[0] W, addr=0x50, len=13
[  197.86] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.86] at91_i2c f0014000.i2c: transfer: write 13 bytes.
[  197.86] at91_i2c f0014000.i2c: transfer complete
[  197.86] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.86] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.86] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.86] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.86] at91_i2c f0014000.i2c: wrote 0x30, to go 6
[  197.86] at91_i2c f0014000.i2c: received nack
[  197.88] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.88] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.88] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.88] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.88] at91_i2c f0014000.i2c: received nack
[  197.90] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.90] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.90] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.90] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 6
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 5
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 4
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 3
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 2
[  197.90] at91_i2c f0014000.i2c: wrote 0x32, to go 1
[  197.90] at91_i2c f0014000.i2c: wrote 0xa, to go 0
[  197.90] at91_i2c f0014000.i2c: transfer complete
[  197.90] i2c i2c-0: master_xfer[0] W, addr=0x50, len=10
[  197.90] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.90] at91_i2c f0014000.i2c: transfer: write 10 bytes.
[  197.90] at91_i2c f0014000.i2c: received nack
[  197.92] i2c i2c-0: master_xfer[0] W, addr=0x50, len=10
[  197.92] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.92] at91_i2c f0014000.i2c: transfer: write 10 bytes.
[  197.92] at91_i2c f0014000.i2c: transfer complete
[  197.92] i2c i2c-0: master_xfer[0] W, addr=0x50, len=17
[  197.92] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.92] at91_i2c f0014000.i2c: transfer: write 17 bytes.
[  197.92] at91_i2c f0014000.i2c: 

Re: Regression: at24 eeprom writing

2015-10-06 Thread Peter Rosin
On 2015-10-05 08:16, Christian Gmeiner wrote:
> Hi Peter.
> 
> Sorry for the late answer - I am currently on my way to Dublin. Maybe
> it helps if you enable
> I2C_DEBUG_CORE and I2C_DEBUG_BUS. In theory you should see a little
> bit better what
> happens on the bus.

Good suggestion. I did a new run (vanilla 4.2), and this is what I saw on
the bus this time:

[0.0] S W50 0x00 "product = 1-776-" P
[0.00184] S W50 NACK P
[0.00195] S W50 NACK P
[0.02027] S W50 0x10 ACK...
[0.04035] ...ACK 0x10 "3.0\n" P
[0.04100] S W50 NACK P
[0.04111] S W50 NACK P
[0.06032] S W50 "serial = 380" P
[0.06174] S W50 NACK P
[0.06185] S W50 NACK P
[0.08032] S W50 0x20 ACK...
[0.10034] ...ACK 0x20 "02\n" P
[0.10127] S W50 NACK P
[0.10138] S W50 NACK P
[0.12039] S W50 0x27 " " P
[0.12155] S W50 NACK P
[0.12166] S W50 NACK P
[0.14040] S W50 0x30 "" P
[0.14219] S W50 NACK P
[0.14230] S W50 NACK P
[0.16042] S W50 0x40 "" P
...

And this is the corresponding part of the log from the same run
(offset ~197.79 s):

[  197.79] i2c i2c-0: master_xfer[0] W, addr=0x50, len=17
[  197.79] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.79] at91_i2c f0014000.i2c: transfer: write 17 bytes.
[  197.80] at91_i2c f0014000.i2c: transfer complete
[  197.80] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.80] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.80] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.80] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.80] at91_i2c f0014000.i2c: wrote 0x33, to go 3
[  197.80] at91_i2c f0014000.i2c: received nack
[  197.82] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.82] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.82] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.82] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.82] at91_i2c f0014000.i2c: received nack
[  197.84] i2c i2c-0: master_xfer[0] W, addr=0x50, len=5
[  197.84] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.84] at91_i2c f0014000.i2c: transfer: write 5 bytes.
[  197.84] at91_i2c f0014000.i2c: wrote 0x10, to go 4
[  197.84] at91_i2c f0014000.i2c: wrote 0x33, to go 3
[  197.84] at91_i2c f0014000.i2c: wrote 0x2e, to go 2
[  197.84] at91_i2c f0014000.i2c: wrote 0x30, to go 1
[  197.84] at91_i2c f0014000.i2c: wrote 0xa, to go 0
[  197.84] at91_i2c f0014000.i2c: transfer complete
[  197.84] i2c i2c-0: master_xfer[0] W, addr=0x50, len=13
[  197.84] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.84] at91_i2c f0014000.i2c: transfer: write 13 bytes.
[  197.84] at91_i2c f0014000.i2c: received nack
[  197.86] i2c i2c-0: master_xfer[0] W, addr=0x50, len=13
[  197.86] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.86] at91_i2c f0014000.i2c: transfer: write 13 bytes.
[  197.86] at91_i2c f0014000.i2c: transfer complete
[  197.86] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.86] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.86] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.86] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.86] at91_i2c f0014000.i2c: wrote 0x30, to go 6
[  197.86] at91_i2c f0014000.i2c: received nack
[  197.88] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.88] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.88] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.88] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.88] at91_i2c f0014000.i2c: received nack
[  197.90] i2c i2c-0: master_xfer[0] W, addr=0x50, len=8
[  197.90] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.90] at91_i2c f0014000.i2c: transfer: write 8 bytes.
[  197.90] at91_i2c f0014000.i2c: wrote 0x20, to go 7
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 6
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 5
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 4
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 3
[  197.90] at91_i2c f0014000.i2c: wrote 0x30, to go 2
[  197.90] at91_i2c f0014000.i2c: wrote 0x32, to go 1
[  197.90] at91_i2c f0014000.i2c: wrote 0xa, to go 0
[  197.90] at91_i2c f0014000.i2c: transfer complete
[  197.90] i2c i2c-0: master_xfer[0] W, addr=0x50, len=10
[  197.90] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.90] at91_i2c f0014000.i2c: transfer: write 10 bytes.
[  197.90] at91_i2c f0014000.i2c: received nack
[  197.92] i2c i2c-0: master_xfer[0] W, addr=0x50, len=10
[  197.92] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.92] at91_i2c f0014000.i2c: transfer: write 10 bytes.
[  197.92] at91_i2c f0014000.i2c: transfer complete
[  197.92] i2c i2c-0: master_xfer[0] W, addr=0x50, len=17
[  197.92] i2c i2c-0: at91_xfer: processing 1 messages:
[  197.92] at91_i2c f0014000.i2c: transfer: write 17 bytes.
[  197.92] at91_i2c f0014000.i2c: 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-05 17:28, Cyrille Pitchen wrote:
> Le 05/10/2015 10:45, Peter Rosin a écrit :
>> On 2015-10-03 01:05, Peter Rosin wrote:
>>> Hi!
>>>
>>> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
>>> and everything seemed fine. Until I tried to write to the little eeprom
>>> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
>>>
>>> The symptoms are that it seems like writes get interrupted, and restarted
>>> again without properly initializing everything again. Inspecting the i2c
>>> bus during these fails gets me something like this (int hex) when I
>>>
>>> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
>>>
>>> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
>>> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
>>>
>>> Notice how the address byte in the second chunk (10) is repeated after
>>> the strange event on the i2c bus.
>>>
>>> I looked around and found that if I revert 
>>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>>> eeprom writing starts working again.
>>>
>>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>>> i2c_transfer code path both with that patch and with it reverted,
>>> so I sadly don't see why the patch makes a difference.
>>>
>>> I'm on a board that is based on the sama5d31 evaluation kit, with a
>>> NXP SE97BTP,547 chip and this in the devicetree:
>>>
>>> i2c0: i2c@f0014000 {
>>> status = "okay";
>>>
>>> jc42@18 {
>>> compatible = "jc42";
>>> reg = <0x18>;
>>> };
>>>
>>> eeprom@50 {
>>> compatible = "24c02";
>>> reg = <0x50>;
>>> pagesize = <16>;
>>> };
>>> };
>>
>> Ok, I found the culprit, and I double and triple checked it this time...
>>
>> If I move to the very latest on the linux-3.18-at91 branch, the bug is
>> there too. Which made it vastly more palatable to bisect the bug.
>>
>> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
>> "i2c: at91: fix a race condition when using the DMA controller"
>> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
>>
>> If I add that patch on top of my previously working tree, it behaves just
>> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
>> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
>> makes the problem go away.
>>
>> I have attached what I actually reverted.
>>
>> Cheers,
>> Peter
>>
> 
> Hi Peter,
> 
> Can you tell me whether your device tree sets the I2C controller i2c0 to use
> dma channels, especially the "tx" one. I guess so but it is just to confirm 
> hence we look in the right direction.

I think yes, I'm including sama5d3.dtsi and am not overriding anything 
interesting
in that area.
 
> Then I think we should look at this part of the original patch:
> 
> } else {
> if (dev->use_dma && (dev->buf_len > AT91_I2C_DMA_THRESHOLD)) {
> +   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
> at91_twi_write_data_dma(dev);
> -   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
> } else {
> at91_twi_write_next_byte(dev);
> at91_twi_write(dev, AT91_TWI_IER,
> 
> Here, for DMA TX transfers, we enable the NACK interrupt instead of the TXCOMP
> one. This is the actual fix of the DMA race. Indeed there were two issues when
> using TXCOMP to detect NACK conditions.
> 
> As written in the datasheet and confirmed by the IP designer, the TXCOMP bit 
> is
> set in the Status Register when both the Transmit Holding Register (THR) and
> its internal shifter are empty and the STOP condition has been sent.
> So when a first transfer successfully completes, the TXCOMP bit is set. Then
> this bit remains set until the next write into THR.
> 
> The first issue is the race condition:
> at91_twi_write_data_dma(dev);
> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
> 
> The first line prepares a DMA transfer but when we execute the second line to
> enable the TXCOMP interrupt, we have no mean to know whether the DMA has
> already performed a first write access into THR, which also clears the TXCOMP
> bit in the Status Register. If the DMA controller hasn't completed this first
> write yet, the TXCOMP bit is still set in the Status Register. Hence the
> interrupt handler is executed immediately after the TXCOMP interrupt has been
> enabled. If the interrupt handler reads the Status Register before the DMA
> controller has written into the THR, the TXCOMP bit is still set. 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Cyrille Pitchen
Le 05/10/2015 10:45, Peter Rosin a écrit :
> On 2015-10-03 01:05, Peter Rosin wrote:
>> Hi!
>>
>> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
>> and everything seemed fine. Until I tried to write to the little eeprom
>> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
>>
>> The symptoms are that it seems like writes get interrupted, and restarted
>> again without properly initializing everything again. Inspecting the i2c
>> bus during these fails gets me something like this (int hex) when I
>>
>> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
>>
>> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
>> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
>>
>> Notice how the address byte in the second chunk (10) is repeated after
>> the strange event on the i2c bus.
>>
>> I looked around and found that if I revert 
>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>> eeprom writing starts working again.
>>
>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>> i2c_transfer code path both with that patch and with it reverted,
>> so I sadly don't see why the patch makes a difference.
>>
>> I'm on a board that is based on the sama5d31 evaluation kit, with a
>> NXP SE97BTP,547 chip and this in the devicetree:
>>
>>  i2c0: i2c@f0014000 {
>>  status = "okay";
>>
>>  jc42@18 {
>>  compatible = "jc42";
>>  reg = <0x18>;
>>  };
>>
>>  eeprom@50 {
>>  compatible = "24c02";
>>  reg = <0x50>;
>>  pagesize = <16>;
>>  };
>>  };
> 
> Ok, I found the culprit, and I double and triple checked it this time...
> 
> If I move to the very latest on the linux-3.18-at91 branch, the bug is
> there too. Which made it vastly more palatable to bisect the bug.
> 
> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
> "i2c: at91: fix a race condition when using the DMA controller"
> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
> 
> If I add that patch on top of my previously working tree, it behaves just
> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
> makes the problem go away.
> 
> I have attached what I actually reverted.
> 
> Cheers,
> Peter
> 

Hi Peter,

Can you tell me whether your device tree sets the I2C controller i2c0 to use
dma channels, especially the "tx" one. I guess so but it is just to confirm 
hence we look in the right direction.

Then I think we should look at this part of the original patch:

} else {
if (dev->use_dma && (dev->buf_len > AT91_I2C_DMA_THRESHOLD)) {
+   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
at91_twi_write_data_dma(dev);
-   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
} else {
at91_twi_write_next_byte(dev);
at91_twi_write(dev, AT91_TWI_IER,

Here, for DMA TX transfers, we enable the NACK interrupt instead of the TXCOMP
one. This is the actual fix of the DMA race. Indeed there were two issues when
using TXCOMP to detect NACK conditions.

As written in the datasheet and confirmed by the IP designer, the TXCOMP bit is
set in the Status Register when both the Transmit Holding Register (THR) and
its internal shifter are empty and the STOP condition has been sent.
So when a first transfer successfully completes, the TXCOMP bit is set. Then
this bit remains set until the next write into THR.

The first issue is the race condition:
at91_twi_write_data_dma(dev);
at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);

The first line prepares a DMA transfer but when we execute the second line to
enable the TXCOMP interrupt, we have no mean to know whether the DMA has
already performed a first write access into THR, which also clears the TXCOMP
bit in the Status Register. If the DMA controller hasn't completed this first
write yet, the TXCOMP bit is still set in the Status Register. Hence the
interrupt handler is executed immediately after the TXCOMP interrupt has been
enabled. If the interrupt handler reads the Status Register before the DMA
controller has written into the THR, the TXCOMP bit is still set. Consequently,
the interrupt handler calls complete(>cmd_complete) thinking the transfer
has completed though it actually has not even started.


The second issue is about the detection of NACK condition when using the DMA
controller. Before the patch, the driver relied on 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-05 17:00, Ludovic Desroches wrote:
> Hi Peter
> 
> On Mon, Oct 05, 2015 at 10:45:29AM +0200, Peter Rosin wrote:
>> On 2015-10-03 01:05, Peter Rosin wrote:
> 
> [...]
> 
>> Ok, I found the culprit, and I double and triple checked it this time...
>>
>> If I move to the very latest on the linux-3.18-at91 branch, the bug is
>> there too. Which made it vastly more palatable to bisect the bug.
>>
>> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
>> "i2c: at91: fix a race condition when using the DMA controller"
>> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
>>
> 
> Thanks for the bisecting effort. I am currently at ELCE where I have
> met someone with the same kind of issue. Is it easily reproducible? It
> doesn't seem to be the case for him.

Yes, easy as pie, happens on every eeprom write of 256 bytes so far...

> I'll have a look once back.

Ok good, to further help understanding, I'm seeing this on the i2c bus (I
hope you understand the notation, or just ask):

Working (4.2 + the revert from my previous message)
===

S W50 0x00 "product = 1-776-" P   S W50 NACK P   S W50 NACK P
delay 15.2 ms
S W50 0x10 "3.0\n" P
delay 19.5 ms
S W50 0x10 "3.0\n" P   S W50 NACK P   S W50 NACK P   S W50 NACK P
delay 19.0 ms
S W50 0x14 "serial = 380" P
delay 18.8 ms
S W50 0x14 "serial = 380" P   S W50 NACK P   S W50 NACK P
delay 18.4 ms
S W50 0x20 "02\n" P
delay 19.2 ms
S W50 0x20 "02\n" P
delay 10.8 ms
S W50 0x27 " " P   S W50 NACK P (repeated 5 times)
delay 16.7 ms
S W50 0x30 "" P
delay 18.4 ms
S W50 0x30 "" P   S W50 NACK P (repeated 3 times)
delay 17.9 ms
S W50 0x40 "" P

etc

I.e. every write (but the first) seems to fail the first time and is
then retried, even if the i2c bus shows no failure indication (at
least that I can find).


Not working (vanilla 4.2)
=

S W50 0x00 "product = 1-776-" P   S W50 NACK P   S W50 NACK P
delay 17.3 ms
S W50 0x10 ACK...
delay 19.8 with both SDA and SCL low
...ACK 0x10 "3.0\n" P   S W50 NACK P   S W50 NACK P
delay 19.3 ms
S W50 0x14 "serial = 380" P   S W50 NACK P   S W50 NACK P
delay 18.5 ms
S W50 0x20 ACK...
delay 19.9 with both SDA and SCL low
...ACK 0x20 "02\n" P   S W50 NACK P   S W50 NACK P
delay 18.9 ms
S W50 0x27 " " P   S W50 NACK P   S W50 NACK P
delay 19.2 ms
S W50 0x30 "" P   S W50 NACK P   S W50 NACK P
delay 17.6 ms
S W50 0x40 "" P   S W50 NACK P   S W50 NACK P

etc

I.e. when there is a disturbance (the long ACks) the recovery
mechanism appears to attempt to heal it by resending only the
failing byte, but the eeprom appears to not see the failure and
takes both bytes instead of just the resend.



It seems dangerous to attempt to fix apparent trouble with an
i2c command by anything less than a full retry, like the working
version appears to do. No?

But what trouble does the i2c bus driver see? Admittedly I only
have a simple logic level bus viewer, and not a full-blown
oscilloscope, so there might be something analogue going on?
I don't think so though, those signals looked fine last time we
looked (but we obviously didn't have these issues then, and
didn't really look that closely). I'll see if I can recheck
with a real scope too.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-05 Thread Ludovic Desroches
Hi Peter

On Mon, Oct 05, 2015 at 10:45:29AM +0200, Peter Rosin wrote:
> On 2015-10-03 01:05, Peter Rosin wrote:

[...]

> Ok, I found the culprit, and I double and triple checked it this time...
> 
> If I move to the very latest on the linux-3.18-at91 branch, the bug is
> there too. Which made it vastly more palatable to bisect the bug.
> 
> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
> "i2c: at91: fix a race condition when using the DMA controller"
> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
> 

Thanks for the bisecting effort. I am currently at ELCE where I have
met someone with the same kind of issue. Is it easily reproducible? It
doesn't seem to be the case for him.

I'll have a look once back.

> If I add that patch on top of my previously working tree, it behaves just
> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
> makes the problem go away.
> 
> I have attached what I actually reverted.
> 
> Cheers,
> Peter
> 

Ludovic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Regression: at24 eeprom writing

2015-10-05 Thread kbuild test robot
Hi Peter,

[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please 
ignore]

config: arm-at91_dt_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   drivers/i2c/busses/i2c-at91.c: In function 'at91_twi_read_data_dma_callback':
>> drivers/i2c/busses/i2c-at91.c:378:11: warning: unused variable 'ier' 
>> [-Wunused-variable]
 unsigned ier = AT91_TWI_TXCOMP;
  ^

vim +/ier +378 drivers/i2c/busses/i2c-at91.c

75b81f33 Marek Roszko  2014-08-20  362  dev->buf_len = 
1;
75b81f33 Marek Roszko  2014-08-20  363  }
fac368a0 Nikolaus Voss 2011-11-08  364  }
fac368a0 Nikolaus Voss 2011-11-08  365  
fac368a0 Nikolaus Voss 2011-11-08  366  /* send stop if second but last 
byte has been read */
0ef6f321 Cyrille Pitchen   2015-06-09  367  if (!dev->pdata->has_alt_cmd && 
dev->buf_len == 1)
fac368a0 Nikolaus Voss 2011-11-08  368  at91_twi_write(dev, 
AT91_TWI_CR, AT91_TWI_STOP);
fac368a0 Nikolaus Voss 2011-11-08  369  
fac368a0 Nikolaus Voss 2011-11-08  370  dev_dbg(dev->dev, "read 0x%x, 
to go %d\n", *dev->buf, dev->buf_len);
fac368a0 Nikolaus Voss 2011-11-08  371  
fac368a0 Nikolaus Voss 2011-11-08  372  ++dev->buf;
fac368a0 Nikolaus Voss 2011-11-08  373  }
fac368a0 Nikolaus Voss 2011-11-08  374  
60937b2c Ludovic Desroches 2012-11-23  375  static void 
at91_twi_read_data_dma_callback(void *data)
60937b2c Ludovic Desroches 2012-11-23  376  {
60937b2c Ludovic Desroches 2012-11-23  377  struct at91_twi_dev *dev = 
(struct at91_twi_dev *)data;
0ef6f321 Cyrille Pitchen   2015-06-09 @378  unsigned ier = AT91_TWI_TXCOMP;
60937b2c Ludovic Desroches 2012-11-23  379  
5e3cfc6c Cyrille Pitchen   2015-06-09  380  dma_unmap_single(dev->dev, 
sg_dma_address(>dma.sg[0]),
28772ac8 Wolfram Sang  2014-07-21  381   dev->buf_len, 
DMA_FROM_DEVICE);
60937b2c Ludovic Desroches 2012-11-23  382  
60937b2c Ludovic Desroches 2012-11-23  383  /* The last two bytes have to 
be read without using dma */
60937b2c Ludovic Desroches 2012-11-23  384  dev->buf += dev->buf_len - 2;
60937b2c Ludovic Desroches 2012-11-23  385  dev->buf_len = 2;
e0b440c7 Peter Rosin   2015-10-05  386  at91_twi_write(dev, 
AT91_TWI_IER, AT91_TWI_RXRDY);

:: The code at line 378 was first introduced by commit
:: 0ef6f3213dac48bbc71c98b5a2752db556f3e649 i2c: at91: add support for new 
alternative command mode

:: TO: Cyrille Pitchen 
:: CC: Wolfram Sang 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-03 01:05, Peter Rosin wrote:
> Hi!
> 
> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
> and everything seemed fine. Until I tried to write to the little eeprom
> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
> 
> The symptoms are that it seems like writes get interrupted, and restarted
> again without properly initializing everything again. Inspecting the i2c
> bus during these fails gets me something like this (int hex) when I
> 
> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
> 
> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
> 
> Notice how the address byte in the second chunk (10) is repeated after
> the strange event on the i2c bus.
> 
> I looked around and found that if I revert 
> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
> eeprom writing starts working again.
> 
> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
> i2c_transfer code path both with that patch and with it reverted,
> so I sadly don't see why the patch makes a difference.
> 
> I'm on a board that is based on the sama5d31 evaluation kit, with a
> NXP SE97BTP,547 chip and this in the devicetree:
> 
>   i2c0: i2c@f0014000 {
>   status = "okay";
> 
>   jc42@18 {
>   compatible = "jc42";
>   reg = <0x18>;
>   };
> 
>   eeprom@50 {
>   compatible = "24c02";
>   reg = <0x50>;
>   pagesize = <16>;
>   };
>   };

Ok, I found the culprit, and I double and triple checked it this time...

If I move to the very latest on the linux-3.18-at91 branch, the bug is
there too. Which made it vastly more palatable to bisect the bug.

The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
"i2c: at91: fix a race condition when using the DMA controller"
which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.

If I add that patch on top of my previously working tree, it behaves just
as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
makes the problem go away.

I have attached what I actually reverted.

Cheers,
Peter

>From d178e0636358e61503ac55d39c8612ef93c1d893 Mon Sep 17 00:00:00 2001
From: Peter Rosin 
Date: Mon, 5 Oct 2015 10:16:18 +0200
Subject: [PATCH] Revert "i2c: at91: fix a race condition when using the DMA
 controller"

This reverts commit 93563a6a71bb69dd324fc7354c60fb05f84aae6b.

Conflicts:
	drivers/i2c/busses/i2c-at91.c
---
 drivers/i2c/busses/i2c-at91.c |   97 +
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c
index 1c758cd1e1ba..b5a5ef26b142 100644
--- a/drivers/i2c/busses/i2c-at91.c
+++ b/drivers/i2c/busses/i2c-at91.c
@@ -74,9 +74,6 @@
 #define	AT91_TWI_NACK		BIT(8)	/* Not Acknowledged */
 #define	AT91_TWI_LOCK		BIT(23) /* TWI Lock due to Frame Errors */
 
-#define	AT91_TWI_INT_MASK \
-	(AT91_TWI_TXCOMP | AT91_TWI_RXRDY | AT91_TWI_TXRDY | AT91_TWI_NACK)
-
 #define	AT91_TWI_IER		0x0024	/* Interrupt Enable Register */
 #define	AT91_TWI_IDR		0x0028	/* Interrupt Disable Register */
 #define	AT91_TWI_IMR		0x002c	/* Interrupt Mask Register */
@@ -155,12 +152,13 @@ static void at91_twi_write(struct at91_twi_dev *dev, unsigned reg, unsigned val)
 
 static void at91_disable_twi_interrupts(struct at91_twi_dev *dev)
 {
-	at91_twi_write(dev, AT91_TWI_IDR, AT91_TWI_INT_MASK);
+	at91_twi_write(dev, AT91_TWI_IDR,
+		   AT91_TWI_TXCOMP | AT91_TWI_RXRDY | AT91_TWI_TXRDY);
 }
 
 static void at91_twi_irq_save(struct at91_twi_dev *dev)
 {
-	dev->imr = at91_twi_read(dev, AT91_TWI_IMR) & AT91_TWI_INT_MASK;
+	dev->imr = at91_twi_read(dev, AT91_TWI_IMR) & 0x7;
 	at91_disable_twi_interrupts(dev);
 }
 
@@ -255,16 +253,7 @@ static void at91_twi_write_data_dma_callback(void *data)
 	dma_unmap_single(dev->dev, sg_dma_address(>dma.sg[0]),
 			 dev->buf_len, DMA_TO_DEVICE);
 
-	/*
-	 * When this callback is called, THR/TX FIFO is likely not to be empty
-	 * yet. So we have to wait for TXCOMP or NACK bits to be set into the
-	 * Status Register to be sure that the STOP bit has been sent and the
-	 * transfer is completed. The NACK interrupt has already been enabled,
-	 * we just have to enable TXCOMP one.
-	 */
-	at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
-	if (!dev->pdata->has_alt_cmd)
-		at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_STOP);
+	at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_STOP);
 }
 
 static void 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Christian Gmeiner
Hi Peter.

Sorry for the late answer - I am currently on my way to Dublin. Maybe
it helps if you enable
I2C_DEBUG_CORE and I2C_DEBUG_BUS. In theory you should see a little
bit better what
happens on the bus.

Greets
--
Christian Gmeiner, MSc

https://soundcloud.com/christian-gmeiner


2015-10-04 21:50 GMT+02:00 Peter Rosin :
> On 2015-10-03 01:05, Peter Rosin wrote:
>> I looked around and found that if I revert 
>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>> eeprom writing starts working again.
>>
>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>> i2c_transfer code path both with that patch and with it reverted,
>> so I sadly don't see why the patch makes a difference.
>
> And now when I retry the same thing, that patch is no longer affecting things.
> I must have confused myself over what kernel was actually running. Christian,
> please accept my deepest apologies for implicating you in this regression.
>
> But the regression is still there. In short, linux-3.18-at91 from the
> linux4sam tree works, linux-4.1-at91 from the same tree does not, and
> vanilla 4.2 also doesn't work. I have a hard time bisecting this thing
> though, since the last known good version has a long list of atmel
> patches that I refuse to even try to rebase...
>
> Ideas still welcome of course.
>
> Cheers,
> Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-05 Thread Christian Gmeiner
Hi Peter.

Sorry for the late answer - I am currently on my way to Dublin. Maybe
it helps if you enable
I2C_DEBUG_CORE and I2C_DEBUG_BUS. In theory you should see a little
bit better what
happens on the bus.

Greets
--
Christian Gmeiner, MSc

https://soundcloud.com/christian-gmeiner


2015-10-04 21:50 GMT+02:00 Peter Rosin :
> On 2015-10-03 01:05, Peter Rosin wrote:
>> I looked around and found that if I revert 
>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>> eeprom writing starts working again.
>>
>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>> i2c_transfer code path both with that patch and with it reverted,
>> so I sadly don't see why the patch makes a difference.
>
> And now when I retry the same thing, that patch is no longer affecting things.
> I must have confused myself over what kernel was actually running. Christian,
> please accept my deepest apologies for implicating you in this regression.
>
> But the regression is still there. In short, linux-3.18-at91 from the
> linux4sam tree works, linux-4.1-at91 from the same tree does not, and
> vanilla 4.2 also doesn't work. I have a hard time bisecting this thing
> though, since the last known good version has a long list of atmel
> patches that I refuse to even try to rebase...
>
> Ideas still welcome of course.
>
> Cheers,
> Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Regression: at24 eeprom writing

2015-10-05 Thread kbuild test robot
Hi Peter,

[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please 
ignore]

config: arm-at91_dt_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   drivers/i2c/busses/i2c-at91.c: In function 'at91_twi_read_data_dma_callback':
>> drivers/i2c/busses/i2c-at91.c:378:11: warning: unused variable 'ier' 
>> [-Wunused-variable]
 unsigned ier = AT91_TWI_TXCOMP;
  ^

vim +/ier +378 drivers/i2c/busses/i2c-at91.c

75b81f33 Marek Roszko  2014-08-20  362  dev->buf_len = 
1;
75b81f33 Marek Roszko  2014-08-20  363  }
fac368a0 Nikolaus Voss 2011-11-08  364  }
fac368a0 Nikolaus Voss 2011-11-08  365  
fac368a0 Nikolaus Voss 2011-11-08  366  /* send stop if second but last 
byte has been read */
0ef6f321 Cyrille Pitchen   2015-06-09  367  if (!dev->pdata->has_alt_cmd && 
dev->buf_len == 1)
fac368a0 Nikolaus Voss 2011-11-08  368  at91_twi_write(dev, 
AT91_TWI_CR, AT91_TWI_STOP);
fac368a0 Nikolaus Voss 2011-11-08  369  
fac368a0 Nikolaus Voss 2011-11-08  370  dev_dbg(dev->dev, "read 0x%x, 
to go %d\n", *dev->buf, dev->buf_len);
fac368a0 Nikolaus Voss 2011-11-08  371  
fac368a0 Nikolaus Voss 2011-11-08  372  ++dev->buf;
fac368a0 Nikolaus Voss 2011-11-08  373  }
fac368a0 Nikolaus Voss 2011-11-08  374  
60937b2c Ludovic Desroches 2012-11-23  375  static void 
at91_twi_read_data_dma_callback(void *data)
60937b2c Ludovic Desroches 2012-11-23  376  {
60937b2c Ludovic Desroches 2012-11-23  377  struct at91_twi_dev *dev = 
(struct at91_twi_dev *)data;
0ef6f321 Cyrille Pitchen   2015-06-09 @378  unsigned ier = AT91_TWI_TXCOMP;
60937b2c Ludovic Desroches 2012-11-23  379  
5e3cfc6c Cyrille Pitchen   2015-06-09  380  dma_unmap_single(dev->dev, 
sg_dma_address(>dma.sg[0]),
28772ac8 Wolfram Sang  2014-07-21  381   dev->buf_len, 
DMA_FROM_DEVICE);
60937b2c Ludovic Desroches 2012-11-23  382  
60937b2c Ludovic Desroches 2012-11-23  383  /* The last two bytes have to 
be read without using dma */
60937b2c Ludovic Desroches 2012-11-23  384  dev->buf += dev->buf_len - 2;
60937b2c Ludovic Desroches 2012-11-23  385  dev->buf_len = 2;
e0b440c7 Peter Rosin   2015-10-05  386  at91_twi_write(dev, 
AT91_TWI_IER, AT91_TWI_RXRDY);

:: The code at line 378 was first introduced by commit
:: 0ef6f3213dac48bbc71c98b5a2752db556f3e649 i2c: at91: add support for new 
alternative command mode

:: TO: Cyrille Pitchen 
:: CC: Wolfram Sang 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-03 01:05, Peter Rosin wrote:
> Hi!
> 
> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
> and everything seemed fine. Until I tried to write to the little eeprom
> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
> 
> The symptoms are that it seems like writes get interrupted, and restarted
> again without properly initializing everything again. Inspecting the i2c
> bus during these fails gets me something like this (int hex) when I
> 
> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
> 
> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
> 
> Notice how the address byte in the second chunk (10) is repeated after
> the strange event on the i2c bus.
> 
> I looked around and found that if I revert 
> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
> eeprom writing starts working again.
> 
> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
> i2c_transfer code path both with that patch and with it reverted,
> so I sadly don't see why the patch makes a difference.
> 
> I'm on a board that is based on the sama5d31 evaluation kit, with a
> NXP SE97BTP,547 chip and this in the devicetree:
> 
>   i2c0: i2c@f0014000 {
>   status = "okay";
> 
>   jc42@18 {
>   compatible = "jc42";
>   reg = <0x18>;
>   };
> 
>   eeprom@50 {
>   compatible = "24c02";
>   reg = <0x50>;
>   pagesize = <16>;
>   };
>   };

Ok, I found the culprit, and I double and triple checked it this time...

If I move to the very latest on the linux-3.18-at91 branch, the bug is
there too. Which made it vastly more palatable to bisect the bug.

The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
"i2c: at91: fix a race condition when using the DMA controller"
which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.

If I add that patch on top of my previously working tree, it behaves just
as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
makes the problem go away.

I have attached what I actually reverted.

Cheers,
Peter

>From d178e0636358e61503ac55d39c8612ef93c1d893 Mon Sep 17 00:00:00 2001
From: Peter Rosin 
Date: Mon, 5 Oct 2015 10:16:18 +0200
Subject: [PATCH] Revert "i2c: at91: fix a race condition when using the DMA
 controller"

This reverts commit 93563a6a71bb69dd324fc7354c60fb05f84aae6b.

Conflicts:
	drivers/i2c/busses/i2c-at91.c
---
 drivers/i2c/busses/i2c-at91.c |   97 +
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c
index 1c758cd1e1ba..b5a5ef26b142 100644
--- a/drivers/i2c/busses/i2c-at91.c
+++ b/drivers/i2c/busses/i2c-at91.c
@@ -74,9 +74,6 @@
 #define	AT91_TWI_NACK		BIT(8)	/* Not Acknowledged */
 #define	AT91_TWI_LOCK		BIT(23) /* TWI Lock due to Frame Errors */
 
-#define	AT91_TWI_INT_MASK \
-	(AT91_TWI_TXCOMP | AT91_TWI_RXRDY | AT91_TWI_TXRDY | AT91_TWI_NACK)
-
 #define	AT91_TWI_IER		0x0024	/* Interrupt Enable Register */
 #define	AT91_TWI_IDR		0x0028	/* Interrupt Disable Register */
 #define	AT91_TWI_IMR		0x002c	/* Interrupt Mask Register */
@@ -155,12 +152,13 @@ static void at91_twi_write(struct at91_twi_dev *dev, unsigned reg, unsigned val)
 
 static void at91_disable_twi_interrupts(struct at91_twi_dev *dev)
 {
-	at91_twi_write(dev, AT91_TWI_IDR, AT91_TWI_INT_MASK);
+	at91_twi_write(dev, AT91_TWI_IDR,
+		   AT91_TWI_TXCOMP | AT91_TWI_RXRDY | AT91_TWI_TXRDY);
 }
 
 static void at91_twi_irq_save(struct at91_twi_dev *dev)
 {
-	dev->imr = at91_twi_read(dev, AT91_TWI_IMR) & AT91_TWI_INT_MASK;
+	dev->imr = at91_twi_read(dev, AT91_TWI_IMR) & 0x7;
 	at91_disable_twi_interrupts(dev);
 }
 
@@ -255,16 +253,7 @@ static void at91_twi_write_data_dma_callback(void *data)
 	dma_unmap_single(dev->dev, sg_dma_address(>dma.sg[0]),
 			 dev->buf_len, DMA_TO_DEVICE);
 
-	/*
-	 * When this callback is called, THR/TX FIFO is likely not to be empty
-	 * yet. So we have to wait for TXCOMP or NACK bits to be set into the
-	 * Status Register to be sure that the STOP bit has been sent and the
-	 * transfer is completed. The NACK interrupt has already been enabled,
-	 * we just have to enable TXCOMP one.
-	 */
-	at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
-	if (!dev->pdata->has_alt_cmd)
-		at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_STOP);
+	at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_STOP);
 }
 
 static void 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Ludovic Desroches
Hi Peter

On Mon, Oct 05, 2015 at 10:45:29AM +0200, Peter Rosin wrote:
> On 2015-10-03 01:05, Peter Rosin wrote:

[...]

> Ok, I found the culprit, and I double and triple checked it this time...
> 
> If I move to the very latest on the linux-3.18-at91 branch, the bug is
> there too. Which made it vastly more palatable to bisect the bug.
> 
> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
> "i2c: at91: fix a race condition when using the DMA controller"
> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
> 

Thanks for the bisecting effort. I am currently at ELCE where I have
met someone with the same kind of issue. Is it easily reproducible? It
doesn't seem to be the case for him.

I'll have a look once back.

> If I add that patch on top of my previously working tree, it behaves just
> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
> makes the problem go away.
> 
> I have attached what I actually reverted.
> 
> Cheers,
> Peter
> 

Ludovic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-05 17:00, Ludovic Desroches wrote:
> Hi Peter
> 
> On Mon, Oct 05, 2015 at 10:45:29AM +0200, Peter Rosin wrote:
>> On 2015-10-03 01:05, Peter Rosin wrote:
> 
> [...]
> 
>> Ok, I found the culprit, and I double and triple checked it this time...
>>
>> If I move to the very latest on the linux-3.18-at91 branch, the bug is
>> there too. Which made it vastly more palatable to bisect the bug.
>>
>> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
>> "i2c: at91: fix a race condition when using the DMA controller"
>> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
>>
> 
> Thanks for the bisecting effort. I am currently at ELCE where I have
> met someone with the same kind of issue. Is it easily reproducible? It
> doesn't seem to be the case for him.

Yes, easy as pie, happens on every eeprom write of 256 bytes so far...

> I'll have a look once back.

Ok good, to further help understanding, I'm seeing this on the i2c bus (I
hope you understand the notation, or just ask):

Working (4.2 + the revert from my previous message)
===

S W50 0x00 "product = 1-776-" P   S W50 NACK P   S W50 NACK P
delay 15.2 ms
S W50 0x10 "3.0\n" P
delay 19.5 ms
S W50 0x10 "3.0\n" P   S W50 NACK P   S W50 NACK P   S W50 NACK P
delay 19.0 ms
S W50 0x14 "serial = 380" P
delay 18.8 ms
S W50 0x14 "serial = 380" P   S W50 NACK P   S W50 NACK P
delay 18.4 ms
S W50 0x20 "02\n" P
delay 19.2 ms
S W50 0x20 "02\n" P
delay 10.8 ms
S W50 0x27 " " P   S W50 NACK P (repeated 5 times)
delay 16.7 ms
S W50 0x30 "" P
delay 18.4 ms
S W50 0x30 "" P   S W50 NACK P (repeated 3 times)
delay 17.9 ms
S W50 0x40 "" P

etc

I.e. every write (but the first) seems to fail the first time and is
then retried, even if the i2c bus shows no failure indication (at
least that I can find).


Not working (vanilla 4.2)
=

S W50 0x00 "product = 1-776-" P   S W50 NACK P   S W50 NACK P
delay 17.3 ms
S W50 0x10 ACK...
delay 19.8 with both SDA and SCL low
...ACK 0x10 "3.0\n" P   S W50 NACK P   S W50 NACK P
delay 19.3 ms
S W50 0x14 "serial = 380" P   S W50 NACK P   S W50 NACK P
delay 18.5 ms
S W50 0x20 ACK...
delay 19.9 with both SDA and SCL low
...ACK 0x20 "02\n" P   S W50 NACK P   S W50 NACK P
delay 18.9 ms
S W50 0x27 " " P   S W50 NACK P   S W50 NACK P
delay 19.2 ms
S W50 0x30 "" P   S W50 NACK P   S W50 NACK P
delay 17.6 ms
S W50 0x40 "" P   S W50 NACK P   S W50 NACK P

etc

I.e. when there is a disturbance (the long ACks) the recovery
mechanism appears to attempt to heal it by resending only the
failing byte, but the eeprom appears to not see the failure and
takes both bytes instead of just the resend.



It seems dangerous to attempt to fix apparent trouble with an
i2c command by anything less than a full retry, like the working
version appears to do. No?

But what trouble does the i2c bus driver see? Admittedly I only
have a simple logic level bus viewer, and not a full-blown
oscilloscope, so there might be something analogue going on?
I don't think so though, those signals looked fine last time we
looked (but we obviously didn't have these issues then, and
didn't really look that closely). I'll see if I can recheck
with a real scope too.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-05 Thread Peter Rosin
On 2015-10-05 17:28, Cyrille Pitchen wrote:
> Le 05/10/2015 10:45, Peter Rosin a écrit :
>> On 2015-10-03 01:05, Peter Rosin wrote:
>>> Hi!
>>>
>>> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
>>> and everything seemed fine. Until I tried to write to the little eeprom
>>> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
>>>
>>> The symptoms are that it seems like writes get interrupted, and restarted
>>> again without properly initializing everything again. Inspecting the i2c
>>> bus during these fails gets me something like this (int hex) when I
>>>
>>> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
>>>
>>> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
>>> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
>>>
>>> Notice how the address byte in the second chunk (10) is repeated after
>>> the strange event on the i2c bus.
>>>
>>> I looked around and found that if I revert 
>>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>>> eeprom writing starts working again.
>>>
>>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>>> i2c_transfer code path both with that patch and with it reverted,
>>> so I sadly don't see why the patch makes a difference.
>>>
>>> I'm on a board that is based on the sama5d31 evaluation kit, with a
>>> NXP SE97BTP,547 chip and this in the devicetree:
>>>
>>> i2c0: i2c@f0014000 {
>>> status = "okay";
>>>
>>> jc42@18 {
>>> compatible = "jc42";
>>> reg = <0x18>;
>>> };
>>>
>>> eeprom@50 {
>>> compatible = "24c02";
>>> reg = <0x50>;
>>> pagesize = <16>;
>>> };
>>> };
>>
>> Ok, I found the culprit, and I double and triple checked it this time...
>>
>> If I move to the very latest on the linux-3.18-at91 branch, the bug is
>> there too. Which made it vastly more palatable to bisect the bug.
>>
>> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
>> "i2c: at91: fix a race condition when using the DMA controller"
>> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
>>
>> If I add that patch on top of my previously working tree, it behaves just
>> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
>> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
>> makes the problem go away.
>>
>> I have attached what I actually reverted.
>>
>> Cheers,
>> Peter
>>
> 
> Hi Peter,
> 
> Can you tell me whether your device tree sets the I2C controller i2c0 to use
> dma channels, especially the "tx" one. I guess so but it is just to confirm 
> hence we look in the right direction.

I think yes, I'm including sama5d3.dtsi and am not overriding anything 
interesting
in that area.
 
> Then I think we should look at this part of the original patch:
> 
> } else {
> if (dev->use_dma && (dev->buf_len > AT91_I2C_DMA_THRESHOLD)) {
> +   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
> at91_twi_write_data_dma(dev);
> -   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
> } else {
> at91_twi_write_next_byte(dev);
> at91_twi_write(dev, AT91_TWI_IER,
> 
> Here, for DMA TX transfers, we enable the NACK interrupt instead of the TXCOMP
> one. This is the actual fix of the DMA race. Indeed there were two issues when
> using TXCOMP to detect NACK conditions.
> 
> As written in the datasheet and confirmed by the IP designer, the TXCOMP bit 
> is
> set in the Status Register when both the Transmit Holding Register (THR) and
> its internal shifter are empty and the STOP condition has been sent.
> So when a first transfer successfully completes, the TXCOMP bit is set. Then
> this bit remains set until the next write into THR.
> 
> The first issue is the race condition:
> at91_twi_write_data_dma(dev);
> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
> 
> The first line prepares a DMA transfer but when we execute the second line to
> enable the TXCOMP interrupt, we have no mean to know whether the DMA has
> already performed a first write access into THR, which also clears the TXCOMP
> bit in the Status Register. If the DMA controller hasn't completed this first
> write yet, the TXCOMP bit is still set in the Status Register. Hence the
> interrupt handler is executed immediately after the TXCOMP interrupt has been
> enabled. If the interrupt handler reads the Status Register before the DMA
> controller has written into the THR, the TXCOMP bit is still set. 

Re: Regression: at24 eeprom writing

2015-10-05 Thread Cyrille Pitchen
Le 05/10/2015 10:45, Peter Rosin a écrit :
> On 2015-10-03 01:05, Peter Rosin wrote:
>> Hi!
>>
>> I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
>> and everything seemed fine. Until I tried to write to the little eeprom
>> chip. I then tried the linux-4.1-at91 kernel and that suffers too.
>>
>> The symptoms are that it seems like writes get interrupted, and restarted
>> again without properly initializing everything again. Inspecting the i2c
>> bus during these fails gets me something like this (int hex) when I
>>
>> echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom
>>
>> S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
>> S a0 10 (clk and data low for a "long" time) 10 71 72 0a P
>>
>> Notice how the address byte in the second chunk (10) is repeated after
>> the strange event on the i2c bus.
>>
>> I looked around and found that if I revert 
>> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
>> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
>> eeprom writing starts working again.
>>
>> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
>> i2c_transfer code path both with that patch and with it reverted,
>> so I sadly don't see why the patch makes a difference.
>>
>> I'm on a board that is based on the sama5d31 evaluation kit, with a
>> NXP SE97BTP,547 chip and this in the devicetree:
>>
>>  i2c0: i2c@f0014000 {
>>  status = "okay";
>>
>>  jc42@18 {
>>  compatible = "jc42";
>>  reg = <0x18>;
>>  };
>>
>>  eeprom@50 {
>>  compatible = "24c02";
>>  reg = <0x50>;
>>  pagesize = <16>;
>>  };
>>  };
> 
> Ok, I found the culprit, and I double and triple checked it this time...
> 
> If I move to the very latest on the linux-3.18-at91 branch, the bug is
> there too. Which made it vastly more palatable to bisect the bug.
> 
> The offender (in the 4.2 kernel) is 93563a6a71bb69dd324fc7354c60fb05f84aae6b
> "i2c: at91: fix a race condition when using the DMA controller"
> which is far more understandable. Ao, adding Cyrille Pitchen to the Cc list.
> 
> If I add that patch on top of my previously working tree, it behaves just
> as newer kernels, i.e. equally bad. The patch doesn't revert cleanly, but
> reverting the patch and quick-n-dirty-fixing the conflict on vanilla 4.2
> makes the problem go away.
> 
> I have attached what I actually reverted.
> 
> Cheers,
> Peter
> 

Hi Peter,

Can you tell me whether your device tree sets the I2C controller i2c0 to use
dma channels, especially the "tx" one. I guess so but it is just to confirm 
hence we look in the right direction.

Then I think we should look at this part of the original patch:

} else {
if (dev->use_dma && (dev->buf_len > AT91_I2C_DMA_THRESHOLD)) {
+   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
at91_twi_write_data_dma(dev);
-   at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
} else {
at91_twi_write_next_byte(dev);
at91_twi_write(dev, AT91_TWI_IER,

Here, for DMA TX transfers, we enable the NACK interrupt instead of the TXCOMP
one. This is the actual fix of the DMA race. Indeed there were two issues when
using TXCOMP to detect NACK conditions.

As written in the datasheet and confirmed by the IP designer, the TXCOMP bit is
set in the Status Register when both the Transmit Holding Register (THR) and
its internal shifter are empty and the STOP condition has been sent.
So when a first transfer successfully completes, the TXCOMP bit is set. Then
this bit remains set until the next write into THR.

The first issue is the race condition:
at91_twi_write_data_dma(dev);
at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);

The first line prepares a DMA transfer but when we execute the second line to
enable the TXCOMP interrupt, we have no mean to know whether the DMA has
already performed a first write access into THR, which also clears the TXCOMP
bit in the Status Register. If the DMA controller hasn't completed this first
write yet, the TXCOMP bit is still set in the Status Register. Hence the
interrupt handler is executed immediately after the TXCOMP interrupt has been
enabled. If the interrupt handler reads the Status Register before the DMA
controller has written into the THR, the TXCOMP bit is still set. Consequently,
the interrupt handler calls complete(>cmd_complete) thinking the transfer
has completed though it actually has not even started.


The second issue is about the detection of NACK condition when using the DMA
controller. Before the patch, the driver relied on 

Re: Regression: at24 eeprom writing

2015-10-04 Thread Peter Rosin
On 2015-10-03 01:05, Peter Rosin wrote:
> I looked around and found that if I revert 
> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
> eeprom writing starts working again.
> 
> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
> i2c_transfer code path both with that patch and with it reverted,
> so I sadly don't see why the patch makes a difference.

And now when I retry the same thing, that patch is no longer affecting things.
I must have confused myself over what kernel was actually running. Christian,
please accept my deepest apologies for implicating you in this regression.

But the regression is still there. In short, linux-3.18-at91 from the
linux4sam tree works, linux-4.1-at91 from the same tree does not, and
vanilla 4.2 also doesn't work. I have a hard time bisecting this thing
though, since the last known good version has a long list of atmel
patches that I refuse to even try to rebase...

Ideas still welcome of course.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: at24 eeprom writing

2015-10-04 Thread Peter Rosin
On 2015-10-03 01:05, Peter Rosin wrote:
> I looked around and found that if I revert 
> a839ce663b3183209fdf7b1fc4796bfe2a4679c3
> "eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
> eeprom writing starts working again.
> 
> AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
> i2c_transfer code path both with that patch and with it reverted,
> so I sadly don't see why the patch makes a difference.

And now when I retry the same thing, that patch is no longer affecting things.
I must have confused myself over what kernel was actually running. Christian,
please accept my deepest apologies for implicating you in this regression.

But the regression is still there. In short, linux-3.18-at91 from the
linux4sam tree works, linux-4.1-at91 from the same tree does not, and
vanilla 4.2 also doesn't work. I have a hard time bisecting this thing
though, since the last known good version has a long list of atmel
patches that I refuse to even try to rebase...

Ideas still welcome of course.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/