On 04/07/17 11:50, Javier González wrote:
>  Documentation/lightnvm/pblk.txt  |   21 +
>  drivers/lightnvm/Kconfig         |   19 +
>  drivers/lightnvm/Makefile        |    5 +
>  drivers/lightnvm/pblk-cache.c    |  112 +++
>  drivers/lightnvm/pblk-core.c     | 1641 
> ++++++++++++++++++++++++++++++++++++++
>  drivers/lightnvm/pblk-gc.c       |  542 +++++++++++++
>  drivers/lightnvm/pblk-init.c     |  942 ++++++++++++++++++++++
>  drivers/lightnvm/pblk-map.c      |  135 ++++
>  drivers/lightnvm/pblk-rb.c       |  859 ++++++++++++++++++++
>  drivers/lightnvm/pblk-read.c     |  513 ++++++++++++
>  drivers/lightnvm/pblk-recovery.c | 1007 +++++++++++++++++++++++
>  drivers/lightnvm/pblk-rl.c       |  182 +++++
>  drivers/lightnvm/pblk-sysfs.c    |  500 ++++++++++++
>  drivers/lightnvm/pblk-write.c    |  408 ++++++++++
>  drivers/lightnvm/pblk.h          | 1127 ++++++++++++++++++++++++++
>  include/linux/lightnvm.h         |   57 +-
>  pblk-sysfs.c                     |  581 ++++++++++++++

This patch introduces two slightly different versions of pblk-sysfs.c - 
one at the top level and one in drivers/lightnvm. Please remove the file
at the top level.

> +config NVM_PBLK_L2P_OPT
> +     bool "PBLK with optimized L2P table for devices up to 8TB"
> +     depends on NVM_PBLK
> +     ---help---
> +     Physical device addresses are typical 64-bit integers. Since we store
> +     the logical to physical (L2P) table on the host, this takes 1/500 of
> +     host memory (e.g., 2GB per 1TB of storage media). On drives under 8TB,
> +     it is possible to reduce this to 1/1000 (e.g., 1GB per 1TB). This
> +     option allows to do this optimization on the host L2P table.

Why is NVM_PBLK_L2P_OPT a compile-time option instead of a run-time 
option? Since this define does not affect the definition of the ppa_addr 
I don't see why this has to be a compile-time option. For e.g. Linux 
distributors the only choice would be to disable NVM_PBLK_L2P_OPT. I 
think it would be unfortunate if no Linux distribution ever would be 
able to benefit from this optimization.

> +#ifdef CONFIG_NVM_DEBUG
> +     atomic_add(nr_entries, &pblk->inflight_writes);
> +     atomic_add(nr_entries, &pblk->req_writes);
> +#endif

Has it been considered to use the "static key" feature such that 
consistency checks can be enabled at run-time instead of having to 
rebuild the kernel to enable CONFIG_NVM_DEBUG?

> +#ifdef CONFIG_NVM_DEBUG
> +     BUG_ON(nr_rec_entries != valid_entries);
> +     atomic_add(valid_entries, &pblk->inflight_writes);
> +     atomic_add(valid_entries, &pblk->recov_gc_writes);
> +#endif

Are you aware that Linus is strongly opposed against using BUG_ON()?

> +#ifdef CONFIG_NVM_DEBUG
> +     lockdep_assert_held(&l_mg->free_lock);
> +#endif

Why is lockdep_assert_held() surrounded with #ifdef CONFIG_NVM_DEBUG / 
#endif? Are you aware that lockdep_assert_held() do not generate any 
code with CONFIG_PROVE_LOCKING=n?

> +static const struct block_device_operations pblk_fops = {
> +     .owner          = THIS_MODULE,
> +};

Is this data structure useful? If so, where is pblk_fops used?

> +static void pblk_l2p_free(struct pblk *pblk)
> +{
> +     vfree(pblk->trans_map);
> +}
> +
> +static int pblk_l2p_init(struct pblk *pblk)
> +{
> +     sector_t i;
> +
> +     pblk->trans_map = vmalloc(sizeof(pblk_addr) * pblk->rl.nr_secs);
> +     if (!pblk->trans_map)
> +             return -ENOMEM;
> +
> +     for (i = 0; i < pblk->rl.nr_secs; i++)
> +             pblk_ppa_set_empty(&pblk->trans_map[i]);
> +
> +     return 0;
> +}

Has it been considered to add support for keeping a subset of the L2P 
translation table in memory instead of keeping it in memory in its entirety?

> +     sprintf(cache_name, "pblk_line_m_%s", pblk->disk->disk_name);

Please use snprintf() or kasprintf() instead of printf(). That makes it 
easier for humans to verify that no buffer overflow is triggered.

> +/* physical block device target */
> +static struct nvm_tgt_type tt_pblk = {
> +     .name           = "pblk",
> +     .version        = {1, 0, 0},

Are version numbers useful inside the kernel tree?

> +void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
> +              unsigned long *lun_bitmap, unsigned int valid_secs,
> +              unsigned int off)
> +{
> +     struct pblk_sec_meta *meta_list = rqd->meta_list;
> +     unsigned int map_secs;
> +     int min = pblk->min_write_pgs;
> +     int i;
> +
> +     for (i = off; i < rqd->nr_ppas; i += min) {
> +             map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> +             pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> +                                     lun_bitmap, &meta_list[i], map_secs);
> +     }
> +}

Has it been considered to implement the above code such that no modulo 
(%) computation is needed, which is a relatively expensive operation? I 
think for this loop that can be done easily.

> +static DECLARE_RWSEM(pblk_rb_lock);
> +
> +void pblk_rb_data_free(struct pblk_rb *rb)
> +{
> +     struct pblk_rb_pages *p, *t;
> +
> +     down_write(&pblk_rb_lock);
> +     list_for_each_entry_safe(p, t, &rb->pages, list) {
> +             free_pages((unsigned long)page_address(p->pages), p->order);
> +             list_del(&p->list);
> +             kfree(p);
> +     }
> +     up_write(&pblk_rb_lock);
> +}

Global locks like pblk_rb_lock are a performance bottleneck on 
multi-socket systems. Why is that lock global?

Bart.

Reply via email to