On Tue, 12 Jun 2018 15:24:41 -0600
Jens Axboe <ax...@kernel.dk> wrote:

> On 6/12/18 2:20 PM, Stefan Agner wrote:
> > On 12.06.2018 17:24, Jens Axboe wrote:  
> >> On 6/12/18 3:17 AM, Stefan Agner wrote:  
> >>> [also added Jens Axboe]
> >>>
> >>> On 12.06.2018 10:27, Boris Brezillon wrote:  
> >>>> On Tue, 12 Jun 2018 10:06:42 +0200
> >>>> Stefan Agner <ste...@agner.ch> wrote:
> >>>>  
> >>>>> On 12.06.2018 02:03, Dmitry Osipenko wrote:  
> >>>>>> On Monday, 11 June 2018 23:52:22 MSK Stefan Agner wrote:  
> >>>>>>> Add support for the NAND flash controller found on NVIDIA
> >>>>>>> Tegra 2 SoCs. This implementation does not make use of the
> >>>>>>> command queue feature. Regular operations/data transfers are
> >>>>>>> done in PIO mode. Page read/writes with hardware ECC make
> >>>>>>> use of the DMA for data transfer.
> >>>>>>>
> >>>>>>> Signed-off-by: Lucas Stach <d...@lynxeye.de>
> >>>>>>> Signed-off-by: Stefan Agner <ste...@agner.ch>
> >>>>>>> ---
> >>>>>>>  MAINTAINERS                       |    7 +
> >>>>>>>  drivers/mtd/nand/raw/Kconfig      |    6 +
> >>>>>>>  drivers/mtd/nand/raw/Makefile     |    1 +
> >>>>>>>  drivers/mtd/nand/raw/tegra_nand.c | 1248 
> >>>>>>> +++++++++++++++++++++++++++++
> >>>>>>>  4 files changed, 1262 insertions(+)
> >>>>>>>  create mode 100644 drivers/mtd/nand/raw/tegra_nand.c
> >>>>>>>  
> >>>>> [snip]  
> >>>>>>> +static int tegra_nand_cmd(struct nand_chip *chip,
> >>>>>>> +                      const struct nand_subop *subop)
> >>>>>>> +{
> >>>>>>> +     const struct nand_op_instr *instr;
> >>>>>>> +     const struct nand_op_instr *instr_data_in = NULL;
> >>>>>>> +     struct tegra_nand_controller *ctrl = 
> >>>>>>> to_tegra_ctrl(chip->controller);
> >>>>>>> +     unsigned int op_id, size = 0, offset = 0;
> >>>>>>> +     bool first_cmd = true;
> >>>>>>> +     u32 reg, cmd = 0;
> >>>>>>> +     int ret;
> >>>>>>> +
> >>>>>>> +     for (op_id = 0; op_id < subop->ninstrs; op_id++) {
> >>>>>>> +             unsigned int naddrs, i;
> >>>>>>> +             const u8 *addrs;
> >>>>>>> +             u32 addr1 = 0, addr2 = 0;
> >>>>>>> +
> >>>>>>> +             instr = &subop->instrs[op_id];
> >>>>>>> +
> >>>>>>> +             switch (instr->type) {
> >>>>>>> +             case NAND_OP_CMD_INSTR:
> >>>>>>> +                     if (first_cmd) {
> >>>>>>> +                             cmd |= COMMAND_CLE;
> >>>>>>> +                             writel_relaxed(instr->ctx.cmd.opcode,
> >>>>>>> +                                            ctrl->regs + CMD_REG1);
> >>>>>>> +                     } else {
> >>>>>>> +                             cmd |= COMMAND_SEC_CMD;
> >>>>>>> +                             writel_relaxed(instr->ctx.cmd.opcode,
> >>>>>>> +                                            ctrl->regs + CMD_REG2);
> >>>>>>> +                     }
> >>>>>>> +                     first_cmd = false;
> >>>>>>> +                     break;
> >>>>>>> +             case NAND_OP_ADDR_INSTR:
> >>>>>>> +                     offset = nand_subop_get_addr_start_off(subop, 
> >>>>>>> op_id);
> >>>>>>> +                     naddrs = nand_subop_get_num_addr_cyc(subop, 
> >>>>>>> op_id);
> >>>>>>> +                     addrs = &instr->ctx.addr.addrs[offset];
> >>>>>>> +
> >>>>>>> +                     cmd |= COMMAND_ALE | COMMAND_ALE_SIZE(naddrs);
> >>>>>>> +                     for (i = 0; i < min_t(unsigned int, 4, naddrs); 
> >>>>>>> i++)
> >>>>>>> +                             addr1 |= *addrs++ << (BITS_PER_BYTE * 
> >>>>>>> i);
> >>>>>>> +                     naddrs -= i;
> >>>>>>> +                     for (i = 0; i < min_t(unsigned int, 4, naddrs); 
> >>>>>>> i++)
> >>>>>>> +                             addr2 |= *addrs++ << (BITS_PER_BYTE * 
> >>>>>>> i);
> >>>>>>> +                     writel_relaxed(addr1, ctrl->regs + ADDR_REG1);
> >>>>>>> +                     writel_relaxed(addr2, ctrl->regs + ADDR_REG2);
> >>>>>>> +                     break;
> >>>>>>> +
> >>>>>>> +             case NAND_OP_DATA_IN_INSTR:
> >>>>>>> +                     size = nand_subop_get_data_len(subop, op_id);
> >>>>>>> +                     offset = nand_subop_get_data_start_off(subop, 
> >>>>>>> op_id);
> >>>>>>> +
> >>>>>>> +                     cmd |= COMMAND_TRANS_SIZE(size) | COMMAND_PIO |
> >>>>>>> +                             COMMAND_RX | COMMAND_A_VALID;
> >>>>>>> +
> >>>>>>> +                     instr_data_in = instr;
> >>>>>>> +                     break;
> >>>>>>> +
> >>>>>>> +             case NAND_OP_DATA_OUT_INSTR:
> >>>>>>> +                     size = nand_subop_get_data_len(subop, op_id);
> >>>>>>> +                     offset = nand_subop_get_data_start_off(subop, 
> >>>>>>> op_id);
> >>>>>>> +
> >>>>>>> +                     cmd |= COMMAND_TRANS_SIZE(size) | COMMAND_PIO |
> >>>>>>> +                             COMMAND_TX | COMMAND_A_VALID;
> >>>>>>> +
> >>>>>>> +                     memcpy(&reg, instr->ctx.data.buf.out + offset, 
> >>>>>>> size);
> >>>>>>> +                     writel_relaxed(reg, ctrl->regs + RESP);
> >>>>>>> +
> >>>>>>> +                     break;
> >>>>>>> +             case NAND_OP_WAITRDY_INSTR:
> >>>>>>> +                     cmd |= COMMAND_RBSY_CHK;
> >>>>>>> +                     break;
> >>>>>>> +
> >>>>>>> +             }
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     cmd |= COMMAND_GO | COMMAND_CE(ctrl->cur_cs);
> >>>>>>> +     writel_relaxed(cmd, ctrl->regs + COMMAND);
> >>>>>>> +     ret = wait_for_completion_io_timeout(&ctrl->command_complete,
> >>>>>>> +                                          msecs_to_jiffies(500));  
> >>>>>>
> >>>>>> It's not obvious to me whether _io_ variant is appropriate to use 
> >>>>>> here, would
> >>>>>> be nice if somebody could clarify that. Maybe block/ already does the 
> >>>>>> IO
> >>>>>> accounting itself and hence the IO time would be counted twice in that 
> >>>>>> case.  
> >>>>>
> >>>>> Good that you bring this up.
> >>>>>
> >>>>> I don't think that there is any higher layer which could take care of
> >>>>> accounting. Usually, with raw nand there is no block layer involved
> >>>>> anyway.
> >>>>>
> >>>>> In a quick test it seems that only when using wait_for_completion_io I/O
> >>>>> is properly accounted in the "wait" section of top.
> >>>>>
> >>>>> So far only a single driver (omap2) used the _io variant, but I think it
> >>>>> is the right thing to do! After all, it is I/O...
> >>>>>
> >>>>> Boris or any other MTD maintainer, any comment on this?  
> >>>>
> >>>> Given this definition of io_schedule_timeout() [1] (which is used when
> >>>> you call wait_for_completion_io_timeout()), I'd say it's not useful to
> >>>> use the _io_ version, simply because MTD devs are not exposed as blk
> >>>> devices, and thus don't need the blk_schedule_flush_plug() that is done
> >>>> is io_schedule_prepare(). But that also means MTD I/Os are not
> >>>> accounted as I/Os :-(.  
> >>>
> >>> Documentation of wait_for_completion_io says:
> >>> "The caller is accounted as waiting for IO (which traditionally means
> >>> blkio only)."
> >>>
> >>> Which sounds as if it using _io is only an accounting thing...  
> >>
> >> Yes, you should only use it for waiting for IO off a system call
> >> read path. So block IO, or file system IO. Don't use it for internal
> >> IO that isn't related to that.  
> > 
> > I guess that would be the case here, since MTD page read/writes are
> > typically file system IOs (e.g. UBIFS).
> > 
> > The problem is just that is not block related at all since it uses the
> > MTD subsystem... And it seems that the _io variants besides accounting,
> > also take a role in the block subsystems device plugging mechanism. What
> > is unclear to me if using the _io variant from the MTD subsystem
> > potentially disturbs the plugging mechanism...  
> 
> No, it has nothing to do with plugging at the block level. So if you're
> doing regular user IO, then you should use the _io variants.

It's a bit more complicated than that. ->exec_op() is not only used for
read/write accesses, but also all kind of management around the NAND
chip which can't really be considered as storage device I/Os (at least
that's my opinion). The one in tegra_nand_page_xfer() is probably
valid though.

Reply via email to