Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Thu, Jan 6, 2022 at 11:51 AM Chris Packham wrote: > > On Wed, Jan 5, 2022 at 10:10 PM Marek Behún wrote: > > > > On Wed, 5 Jan 2022 11:10:51 +1300 > > Chris Packham wrote: > > > > > Hi Marek, > > > > > > On Wed, Jan 5, 2022 at 9:28 AM Marek Behún wrote: > > > > > > > > From: Marek Behún > > > > > > > > Hello, > > > > > > > > continuing my last discussion with Chris [1] about this, could you > > > > please test this change? (For Chris, mainly on your x530, since last > > > > time you said it hanged your board in SPL.) > > > > > > I still get a hang after "Returning to BootROM (return address > > > 0x05c4)... " (after the DDR training). > > > > Hi Chris, > > > > would it be possible for me to connect to a computer to which the x530 > > is connected via UART so that I can debug it? Could you prepare such an > > environment? It would also need a mechanism to reset the board remotely > > somehow. I prepared a similar remote debugging environment for Marvell > > when they were solving some issues for us. > > > > If not, then I guess we will just have to send debugging code and > > results back and forth. > > > > Marek > > It might be possible. We can probably set something up to be internet > accessible and have remotely controllable PDUs. Unfortunately most of > the people who can set this up are still on holiday and by the time we > get that sorted I'll be on leave again. > > In terms of the actual problem the plot thickens. The board I had been > testing with was a production unit with MV88F6820-B0 at 1600 MHz (DDR > 800MHz). To debug the hang I wanted to attach a JTAG debugger so I > found an old prototype that had the JTAG header fitted, but the > problem didn't occur. The older prototype has MV88F6820-A0 at 1866 MHz > (DDR 933MHz) (as well as the usual collection of re-work wires that > prototypes often accumulate). I don't know if the A0 vs B0 differences > are significant or if the clock speed has some impact. > > I'll get the JTAG header fitted to my production unit and see where I get to. When I got the JTAG debugger attached to my production board I could see that with the "split change" after training was complete (just before the jump to u-boot proper) some of the DDR was inaccessible (random 4byte words throughout the memory). The lockup was a data abort trying to run the code out of RAM. Without the change the memory was "good". I think we might be running over some old issues that we've experienced with our internal u-boot fork. We first started to see these as we ramped production and started seeing DDR training failures on some boards (but not others using the same PCB and DDR). With the increased scrutiny on the DDR we did end up making some layout changes on the newer boards and have tried to work around the issues in SW for the existing boards. I actually thought the recent changes to the mvebu ddr code may have addressed some of these issues as the board I had on my desk had been working happily with upstream u-boot despite having a suspect board layout. The workaround for the older boards that we've been using in our u-boot fork is to disable ECC. This appears to do the trick with Marek's patch so I'm tempted to say let's just go with that (I'll post a patch for that shortly).
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Wed, Jan 5, 2022 at 10:10 PM Marek Behún wrote: > > On Wed, 5 Jan 2022 11:10:51 +1300 > Chris Packham wrote: > > > Hi Marek, > > > > On Wed, Jan 5, 2022 at 9:28 AM Marek Behún wrote: > > > > > > From: Marek Behún > > > > > > Hello, > > > > > > continuing my last discussion with Chris [1] about this, could you > > > please test this change? (For Chris, mainly on your x530, since last > > > time you said it hanged your board in SPL.) > > > > I still get a hang after "Returning to BootROM (return address > > 0x05c4)... " (after the DDR training). > > Hi Chris, > > would it be possible for me to connect to a computer to which the x530 > is connected via UART so that I can debug it? Could you prepare such an > environment? It would also need a mechanism to reset the board remotely > somehow. I prepared a similar remote debugging environment for Marvell > when they were solving some issues for us. > > If not, then I guess we will just have to send debugging code and > results back and forth. > > Marek It might be possible. We can probably set something up to be internet accessible and have remotely controllable PDUs. Unfortunately most of the people who can set this up are still on holiday and by the time we get that sorted I'll be on leave again. In terms of the actual problem the plot thickens. The board I had been testing with was a production unit with MV88F6820-B0 at 1600 MHz (DDR 800MHz). To debug the hang I wanted to attach a JTAG debugger so I found an old prototype that had the JTAG header fitted, but the problem didn't occur. The older prototype has MV88F6820-A0 at 1866 MHz (DDR 933MHz) (as well as the usual collection of re-work wires that prototypes often accumulate). I don't know if the A0 vs B0 differences are significant or if the clock speed has some impact. I'll get the JTAG header fitted to my production unit and see where I get to.
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Wed, 5 Jan 2022 11:10:51 +1300 Chris Packham wrote: > Hi Marek, > > On Wed, Jan 5, 2022 at 9:28 AM Marek Behún wrote: > > > > From: Marek Behún > > > > Hello, > > > > continuing my last discussion with Chris [1] about this, could you > > please test this change? (For Chris, mainly on your x530, since last > > time you said it hanged your board in SPL.) > > I still get a hang after "Returning to BootROM (return address > 0x05c4)... " (after the DDR training). Hi Chris, would it be possible for me to connect to a computer to which the x530 is connected via UART so that I can debug it? Could you prepare such an environment? It would also need a mechanism to reset the board remotely somehow. I prepared a similar remote debugging environment for Marvell when they were solving some issues for us. If not, then I guess we will just have to send debugging code and results back and forth. Marek
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
Hi Marek, On Wed, Jan 5, 2022 at 9:28 AM Marek Behún wrote: > > From: Marek Behún > > Hello, > > continuing my last discussion with Chris [1] about this, could you > please test this change? (For Chris, mainly on your x530, since last > time you said it hanged your board in SPL.) I still get a hang after "Returning to BootROM (return address 0x05c4)... " (after the DDR training). > > It should fix DDR3 training issues. > > [1] https://lore.kernel.org/u-boot/20210208191225.14645-1-marek.be...@nic.cz/ > > Signed-off-by: Marek Behún > --- > .../a38x/ddr3_training_centralization.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c > b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c > index 648b37ef6f..ed799757b9 100644 > --- a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c > +++ b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c > @@ -55,6 +55,7 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) > enum hws_training_ip_stat training_result[MAX_INTERFACE_NUM]; > u32 if_id, pattern_id, bit_id; > u8 bus_id; > + u8 current_byte_status; > u8 cur_start_win[BUS_WIDTH_IN_BITS]; > u8 centralization_result[MAX_INTERFACE_NUM][BUS_WIDTH_IN_BITS]; > u8 cur_end_win[BUS_WIDTH_IN_BITS]; > @@ -166,6 +167,10 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) > result[search_dir_id][7])); > } > > + current_byte_status = > + > mv_ddr_tip_sub_phy_byte_status_get(if_id, > + > bus_id); > + > for (bit_id = 0; bit_id < BUS_WIDTH_IN_BITS; > bit_id++) { > /* check if this code is valid for 2 > edge, probably not :( */ > @@ -174,11 +179,33 @@ static int ddr3_tip_centralization(u32 dev_num, u32 > mode) >[HWS_LOW2HIGH] >[bit_id], >EDGE_1); > + if (current_byte_status & > + (BYTE_SPLIT_OUT_MIX | > +BYTE_HOMOGENEOUS_SPLIT_OUT)) { > + if (cur_start_win[bit_id] >= > 64) > + cur_start_win[bit_id] > -= 64; > + else > + cur_start_win[bit_id] > = 0; > + DEBUG_CENTRALIZATION_ENGINE > + (DEBUG_LEVEL_INFO, > +("pattern %d IF %d > pup %d bit %d subtract 64 adll from start\n", > + pattern_id, if_id, > bus_id, bit_id)); > + } > cur_end_win[bit_id] = > GET_TAP_RESULT(result >[HWS_HIGH2LOW] >[bit_id], >EDGE_1); > + if (cur_end_win[bit_id] >= 64 && > + (current_byte_status & > +BYTE_SPLIT_OUT_MIX)) { > + cur_end_win[bit_id] -= 64; > + DEBUG_CENTRALIZATION_ENGINE > + (DEBUG_LEVEL_INFO, > +("pattern %d IF %d > pup %d bit %d subtract 64 adll from end\n", > + pattern_id, if_id, > bus_id, bit_id)); > + } > + > /* window length */ > current_window[bit_id] = > cur_end_win[bit_id] - > -- > 2.34.1 >
[PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
From: Marek Behún Hello, continuing my last discussion with Chris [1] about this, could you please test this change? (For Chris, mainly on your x530, since last time you said it hanged your board in SPL.) It should fix DDR3 training issues. [1] https://lore.kernel.org/u-boot/20210208191225.14645-1-marek.be...@nic.cz/ Signed-off-by: Marek Behún --- .../a38x/ddr3_training_centralization.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c index 648b37ef6f..ed799757b9 100644 --- a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c +++ b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c @@ -55,6 +55,7 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) enum hws_training_ip_stat training_result[MAX_INTERFACE_NUM]; u32 if_id, pattern_id, bit_id; u8 bus_id; + u8 current_byte_status; u8 cur_start_win[BUS_WIDTH_IN_BITS]; u8 centralization_result[MAX_INTERFACE_NUM][BUS_WIDTH_IN_BITS]; u8 cur_end_win[BUS_WIDTH_IN_BITS]; @@ -166,6 +167,10 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) result[search_dir_id][7])); } + current_byte_status = + mv_ddr_tip_sub_phy_byte_status_get(if_id, + bus_id); + for (bit_id = 0; bit_id < BUS_WIDTH_IN_BITS; bit_id++) { /* check if this code is valid for 2 edge, probably not :( */ @@ -174,11 +179,33 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) [HWS_LOW2HIGH] [bit_id], EDGE_1); + if (current_byte_status & + (BYTE_SPLIT_OUT_MIX | +BYTE_HOMOGENEOUS_SPLIT_OUT)) { + if (cur_start_win[bit_id] >= 64) + cur_start_win[bit_id] -= 64; + else + cur_start_win[bit_id] = 0; + DEBUG_CENTRALIZATION_ENGINE + (DEBUG_LEVEL_INFO, +("pattern %d IF %d pup %d bit %d subtract 64 adll from start\n", + pattern_id, if_id, bus_id, bit_id)); + } cur_end_win[bit_id] = GET_TAP_RESULT(result [HWS_HIGH2LOW] [bit_id], EDGE_1); + if (cur_end_win[bit_id] >= 64 && + (current_byte_status & +BYTE_SPLIT_OUT_MIX)) { + cur_end_win[bit_id] -= 64; + DEBUG_CENTRALIZATION_ENGINE + (DEBUG_LEVEL_INFO, +("pattern %d IF %d pup %d bit %d subtract 64 adll from end\n", + pattern_id, if_id, bus_id, bit_id)); + } + /* window length */ current_window[bit_id] = cur_end_win[bit_id] - -- 2.34.1
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Mon, Jun 28, 2021 at 9:21 PM Chris Packham wrote: > > On Mon, Jun 28, 2021 at 7:20 PM Pali Rohár wrote: > > > > On Monday 08 February 2021 21:24:09 Marek Behun wrote: > > > On Mon, 8 Feb 2021 20:14:26 + > > > Chris Packham wrote: > > > > > > > On 9/02/21 8:15 am, Marek Behun wrote: > > > > > This patch is needed on some Turris Omnia boards with Samsung DDR > > > > > chips, > > > > > otherwise DDR training fails in ~60% of cases. > > > > > > > > > > Marvell send us this patch for testing, I have updated it a little. > > > > > > > > > > Please test this on other A38x boards. > > > > > > > > > > If it doesn't break anything on other boards, we can apply it and send > > > > > it to mv-ddr-marvell upstream. > > > > They gave us the same patch. I was hoping their SoC team would get it > > > > into mv-ddr-marvell (or even their vendor USP) but obviously they're > > > > still sitting on it. I know it improved matters for some of our boards > > > > but it didn't totally fix them we still had yield problems when we > > > > ramped up production. > > > > > > There is a bug in the version they sent us: > > > > > > in file ddr3_training_ip_engine.c there is a line added: > > >if ((bit_bit_mask[sybphy_id] & (1 << bit_id)) == 1) > > > > > > try chaning it to > > >if (bit_bit_mask[sybphy_id] & (1 << bit_id)) > > > > > > This is fixed in the version I sent to mailing list > > > > Hello Chris! Have you tried above modification if it fixes your issue? > > I haven't tried Marek's patch specifically. What we saw seemed to be a > yield issue rather than something that could be tested on just one > board. I must have missed Marek's response to my comment. Given that > he's identified at least one problem with Marvell's SPLIT_OUT_MIX > patch it's probably worth me trying it out on an x530 board. If that's > OK then I think U-Boot can take the patch (or even better > mv-ddr-marvell apply it and U-Boot syncs). With http://patchwork.ozlabs.org/project/uboot/patch/20210208191225.14645-1-marek.be...@nic.cz/mbox/ applied on top of v2021.07-rc4 the x530 hangs after the SPL. I haven't had time to look into why.
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Mon, Jun 28, 2021 at 7:20 PM Pali Rohár wrote: > > On Monday 08 February 2021 21:24:09 Marek Behun wrote: > > On Mon, 8 Feb 2021 20:14:26 + > > Chris Packham wrote: > > > > > On 9/02/21 8:15 am, Marek Behun wrote: > > > > This patch is needed on some Turris Omnia boards with Samsung DDR chips, > > > > otherwise DDR training fails in ~60% of cases. > > > > > > > > Marvell send us this patch for testing, I have updated it a little. > > > > > > > > Please test this on other A38x boards. > > > > > > > > If it doesn't break anything on other boards, we can apply it and send > > > > it to mv-ddr-marvell upstream. > > > They gave us the same patch. I was hoping their SoC team would get it > > > into mv-ddr-marvell (or even their vendor USP) but obviously they're > > > still sitting on it. I know it improved matters for some of our boards > > > but it didn't totally fix them we still had yield problems when we > > > ramped up production. > > > > There is a bug in the version they sent us: > > > > in file ddr3_training_ip_engine.c there is a line added: > >if ((bit_bit_mask[sybphy_id] & (1 << bit_id)) == 1) > > > > try chaning it to > >if (bit_bit_mask[sybphy_id] & (1 << bit_id)) > > > > This is fixed in the version I sent to mailing list > > Hello Chris! Have you tried above modification if it fixes your issue? I haven't tried Marek's patch specifically. What we saw seemed to be a yield issue rather than something that could be tested on just one board. I must have missed Marek's response to my comment. Given that he's identified at least one problem with Marvell's SPLIT_OUT_MIX patch it's probably worth me trying it out on an x530 board. If that's OK then I think U-Boot can take the patch (or even better mv-ddr-marvell apply it and U-Boot syncs).
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Monday 08 February 2021 21:24:09 Marek Behun wrote: > On Mon, 8 Feb 2021 20:14:26 + > Chris Packham wrote: > > > On 9/02/21 8:15 am, Marek Behun wrote: > > > This patch is needed on some Turris Omnia boards with Samsung DDR chips, > > > otherwise DDR training fails in ~60% of cases. > > > > > > Marvell send us this patch for testing, I have updated it a little. > > > > > > Please test this on other A38x boards. > > > > > > If it doesn't break anything on other boards, we can apply it and send > > > it to mv-ddr-marvell upstream. > > They gave us the same patch. I was hoping their SoC team would get it > > into mv-ddr-marvell (or even their vendor USP) but obviously they're > > still sitting on it. I know it improved matters for some of our boards > > but it didn't totally fix them we still had yield problems when we > > ramped up production. > > There is a bug in the version they sent us: > > in file ddr3_training_ip_engine.c there is a line added: >if ((bit_bit_mask[sybphy_id] & (1 << bit_id)) == 1) > > try chaning it to >if (bit_bit_mask[sybphy_id] & (1 << bit_id)) > > This is fixed in the version I sent to mailing list Hello Chris! Have you tried above modification if it fixes your issue?
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On Mon, 8 Feb 2021 20:14:26 + Chris Packham wrote: > On 9/02/21 8:15 am, Marek Behun wrote: > > This patch is needed on some Turris Omnia boards with Samsung DDR chips, > > otherwise DDR training fails in ~60% of cases. > > > > Marvell send us this patch for testing, I have updated it a little. > > > > Please test this on other A38x boards. > > > > If it doesn't break anything on other boards, we can apply it and send > > it to mv-ddr-marvell upstream. > They gave us the same patch. I was hoping their SoC team would get it > into mv-ddr-marvell (or even their vendor USP) but obviously they're > still sitting on it. I know it improved matters for some of our boards > but it didn't totally fix them we still had yield problems when we > ramped up production. There is a bug in the version they sent us: in file ddr3_training_ip_engine.c there is a line added: if ((bit_bit_mask[sybphy_id] & (1 << bit_id)) == 1) try chaning it to if (bit_bit_mask[sybphy_id] & (1 << bit_id)) This is fixed in the version I sent to mailing list
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
On 9/02/21 8:15 am, Marek Behun wrote: > This patch is needed on some Turris Omnia boards with Samsung DDR chips, > otherwise DDR training fails in ~60% of cases. > > Marvell send us this patch for testing, I have updated it a little. > > Please test this on other A38x boards. > > If it doesn't break anything on other boards, we can apply it and send > it to mv-ddr-marvell upstream. They gave us the same patch. I was hoping their SoC team would get it into mv-ddr-marvell (or even their vendor USP) but obviously they're still sitting on it. I know it improved matters for some of our boards but it didn't totally fix them we still had yield problems when we ramped up production.
Re: [PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
This patch is needed on some Turris Omnia boards with Samsung DDR chips, otherwise DDR training fails in ~60% of cases. Marvell send us this patch for testing, I have updated it a little. Please test this on other A38x boards. If it doesn't break anything on other boards, we can apply it and send it to mv-ddr-marvell upstream. Marek
[PATCH u-boot-marvell] PLEASE TEST ddr: marvell: a38x: fix SPLIT_OUT_MIX state decision
From: motib In each pattern cycle the bus state can be changed. In order to avoid it, we need to switch back to the same bus state on each pattern cycle. Signed-off-by: motib Fixed code style, removed commented code, switched to use DEBUG macros instead of printf. Signed-off-by: Marek Behún --- .../a38x/ddr3_training_centralization.c | 25 +++ .../marvell/a38x/ddr3_training_ip_engine.c| 8 -- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c index 648b37ef6f..a92760681e 100644 --- a/drivers/ddr/marvell/a38x/ddr3_training_centralization.c +++ b/drivers/ddr/marvell/a38x/ddr3_training_centralization.c @@ -24,6 +24,8 @@ u8 bus_start_window[NUM_OF_CENTRAL_TYPES][MAX_INTERFACE_NUM][MAX_BUS_NUM]; u8 centralization_state[MAX_INTERFACE_NUM][MAX_BUS_NUM]; static u8 ddr3_tip_special_rx_run_once_flag; +extern u8 byte_status[MAX_INTERFACE_NUM][MAX_BUS_NUM]; + static int ddr3_tip_centralization(u32 dev_num, u32 mode); /* @@ -110,6 +112,7 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) (max_win_size - 1) + cons_tap; bus_start_window[mode][if_id][bus_id] = 0; centralization_result[if_id][bus_id] = 0; + byte_status[if_id][bus_id] = BYTE_NOT_DEFINED; } } @@ -166,6 +169,12 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) result[search_dir_id][7])); } + DEBUG_CENTRALIZATION_ENGINE + (DEBUG_LEVEL_TRACE, +("byte_status[%d][%d] = 0x%x\n", +if_id, +bus_id, +byte_status[if_id][bus_id])); for (bit_id = 0; bit_id < BUS_WIDTH_IN_BITS; bit_id++) { /* check if this code is valid for 2 edge, probably not :( */ @@ -174,11 +183,27 @@ static int ddr3_tip_centralization(u32 dev_num, u32 mode) [HWS_LOW2HIGH] [bit_id], EDGE_1); + if ((byte_status[if_id][bus_id] & BYTE_SPLIT_OUT_MIX) || + (byte_status[if_id][bus_id] & BYTE_HOMOGENEOUS_SPLIT_OUT)) { + if (cur_start_win[bit_id] >= 64) + cur_start_win[bit_id] -= 64; + else + cur_start_win[bit_id] = 0; + DEBUG_CENTRALIZATION_ENGINE + (DEBUG_LEVEL_TRACE, + ("--\n")); + } cur_end_win[bit_id] = GET_TAP_RESULT(result [HWS_HIGH2LOW] [bit_id], EDGE_1); + if (cur_end_win[bit_id] >= 64 && (byte_status[if_id][bus_id] & BYTE_SPLIT_OUT_MIX)) { + cur_end_win[bit_id] -= 64; + DEBUG_CENTRALIZATION_ENGINE + (DEBUG_LEVEL_TRACE, + ("++\n")); + } /* window length */ current_window[bit_id] = cur_end_win[bit_id] - diff --git a/drivers/ddr/marvell/a38x/ddr3_training_ip_engine.c b/drivers/ddr/marvell/a38x/ddr3_training_ip_engine.c index 5fd9a052fa..3d1fa1e74e 100644 --- a/drivers/ddr/marvell/a38x/ddr3_training_ip_engine.c +++ b/drivers/ddr/marvell/a38x/ddr3_training_ip_engine.c @@ -1174,7 +1174,6 @@ int ddr3_tip_ip_training_wrapper(u32 dev_num, enum hws_access_type access_type, /* zero the data base */ bit_bit_mask[sybphy_id] = 0; - byte_status[if_id][sybphy_id] = BYTE_NOT_DEFINED; for (bit_id = 0; bit_id <