Hi Naga, Boris Brezillon <boris.brezil...@bootlin.com> wrote on Tue, 20 Nov 2018 12:02:44 +0100:
> On Tue, 20 Nov 2018 07:02:08 +0000 > Naga Sureshkumar Relli <nagas...@xilinx.com> wrote: > > > > > > > > Can you please run nandbiterrs (availaible in mtd-utils). I fear your > > > device won't pass the test. > > Yes, nandbiterror test is passing till 24bit, after that it is failing. > > Can you paste the output of nandbiterrs please? Apparently 'nandbiterrs -i 'just crashes the kernel because of a segmentation fault. Please run this test (from the mtd-utils package) and fix this issue. Then we would like to see the output. > > > > > > > > But we are hitting this because of erased page reading(needed in case > > > > of ubifs). > > > > > > > > > > > > > > Don't you have a bit (or several bits) reporting when the ECC engine > > > > > was not able to > > > correct > > > > > data? I you do, you should base the "detect bitflips in erase pages" > > > > > logic on this information. > > > > Bit reporting for several bit errors is there only for Hamming(1bit > > > > correction and 2bit > > > detection) but not in BCH. > > > > > > > > > > Then I tend to agree with Miquel: your ECC engine is broken, and I'm > > > not even sure how to deal with that yet. > > So as per the Miquel's suggestion, can I proceed to add the below one? > > "you should re-read the page in raw mode and check for the number of > > bitflips manually (thanks to the helpers in the core). Again, if the number > > of BF is above 16, we can assume the page is bad and increment ->ecc.failed > > accordingly." > > But that's just partially fixing the problem. And you didn't answer my > previous question: what happens when you configure the ECC engine in, > say 12bit/1024 and you end up with uncorrectable errors (more than 12 > bitflips in a 1k block). What's the number reported ECC_ERR_CNT? Is it > set to 13? Please dump this register, and eventually what's the value of the Packet_bound_Err_count field ([0:7]) for each iteration of nandbiterrs -i. If there is no way, when the status bit is set, to discriminate if the data is reliable or was not corrected at all, it is gonna be a real issue and I don't think we want to support such engine. Thanks, Miquèl