Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api
On Wed, 2016-03-16 at 09:23 +0100, Christoph Hellwig wrote: > > > We can defintively kill this one. We want to support different size of pools. How can we kill this one? Or did you mean we just create a single pool with size SG_CHUNK_SIZE? > > > +static __init int sg_mempool_init(void) > > +{ > > + int i; > > + > > + for (i = 0; i < SG_MEMPOOL_NR; i++) { > > + struct sg_mempool *sgp = sg_pools + i; > > + int size = sgp->size * sizeof(struct scatterlist); > > + > > + sgp->slab = kmem_cache_create(sgp->name, size, 0, > > + SLAB_HWCACHE_ALIGN, NULL); > > Having these mempoools around in every kernel will make some embedded > developers rather unhappy. We could either not create them at > runtime, which would require either a check in the fast path, or > an init call in every driver, or just move the functions you > added into a separe file, which will be compiled only based on a > Kconfig > symbol, and could even be potentially modular. I think that > second option might be easier. I created lib/sg_pool.c with CONFIG_SG_POOL.
[lkp] [drm/dp_helper] 31f8862c6e: No primary result change, 128.2% piglit.time.voluntary_context_switches
FYI, we noticed that piglit.time.voluntary_context_switches +917.2% change with your commit. https://github.com/0day-ci/linux Lyude/drm-dp_helper-retry-on-ETIMEDOUT-in-drm_dp_dpcd_access/20160317-234351 commit 31f8862c6e6303223e946e6fcbdfa7f87274baef ("drm/dp_helper: retry on -ETIMEDOUT in drm_dp_dpcd_access()") = compiler/group/kconfig/rootfs/tbox_group/testcase: gcc-4.9/igt-071/x86_64-rhel/debian-x86_64-2015-02-07.cgz/snb-black/piglit commit: cf481068cdd430a22425d7712c8deeb25efdedc1 31f8862c6e6303223e946e6fcbdfa7f87274baef cf481068cdd430a2 31f8862c6e6303223e946e6fcb -- %stddev %change %stddev \ |\ 111.96 ± 0%+128.2% 255.52 ± 0% piglit.time.elapsed_time 111.96 ± 0%+128.2% 255.52 ± 0% piglit.time.elapsed_time.max 8.25 ± 5% -39.4% 5.00 ± 0% piglit.time.percent_of_cpu_this_job_got 31676 ± 0%+917.2% 32 ± 0% piglit.time.voluntary_context_switches 111.96 ± 0%+128.2% 255.52 ± 0% time.elapsed_time 111.96 ± 0%+128.2% 255.52 ± 0% time.elapsed_time.max 115.50 ± 1% +31.2% 151.50 ± 13% time.involuntary_context_switches 8.25 ± 5% -39.4% 5.00 ± 0% time.percent_of_cpu_this_job_got 8.31 ± 0% +45.7% 12.12 ± 0% time.system_time 31676 ± 0%+917.2% 32 ± 0% time.voluntary_context_switches snb-black: Sandy Bridge Memory: 8G piglit.time.voluntary_context_switches 35 ++-+ | OO OO | 30 ++ | | | 25 ++ | | | 20 ++ | | | 15 ++ | | | 10 OO OO OO OO OO OOO OO OO OO OO | | | 5 ++ | **.**.**. *.**.***.**.**.**.**.**.***.**.**.**.**.***.**.**.**.**.** 0 ++---*-+ [*] bisect-good sample [O] bisect-bad sample To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong Ye. --- LKP_SERVER: inn LKP_CGI_PORT: 80 LKP_CIFS_PORT: 139 testcase: piglit default-monitors: wait: activate-monitor kmsg: heartbeat: interval: 10 default-watchdogs: oom-killer: watchdog: commit: 31f8862c6e6303223e946e6fcbdfa7f87274baef model: Sandy Bridge memory: 8G nr_cpu: 8 hdd_partitions: swap_partitions: rootfs_partition: category: functional timeout: 30m piglit: group: igt-071 queue: bisect testbox: snb-black tbox_group: snb-black kconfig: x86_64-rhel enqueue_time: 2016-03-21 06:44:51.728433840 +08:00 compiler: gcc-4.9 rootfs: debian-x86_64-2015-02-07.cgz id: 40af15dd8c1739d33ffe090f8e6d07650c5fbf07 user: lkp head_commit: ff90403f45704fa7ad73121525559f3567886c53 base_commit: b562e44f507e863c6792946e4e1b1449fbbac85d branch: linux-devel/devel-hourly-2016032017 result_root: "/result/piglit/igt-071/snb-black/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/31f8862c6e6303223e946e6fcbdfa7f87274baef/0" job_file: "/lkp/scheduled/snb-black/bisect_piglit-igt-071-debian-x86_64-2015-02-07.cgz-x86_64-rhel-31f8862c6e6303223e946e6fcbdfa7f87274baef-20160321-39265-rnyf53-0.yaml" max_uptime: 1800 initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz" bootloader_append: - root=/dev/ram0 - user=lkp - job=/lkp/scheduled/snb-black/bisect_piglit-igt-071-debian-x86_64-2015-02-07.cgz-x86_64-rhel-31f8862c6e6303223e946e6fcbdfa7f87274baef-20160321-39265-rnyf53-0.yaml - ARCH=x86_64 - kconfig=x86_64-rhel - branch=linux-devel/devel-hourly-2016032017 - commit=31f8862c6e6303223e946e6fcbdfa7f87274baef - BOOT_IMAGE=/pkg/linux/x86_64-rhel/gcc-4.9/31f8862c6e6303223e946e6fcbdfa7f87274baef/vmlinuz-4.5.0-rc7-
[lkp] [rcutorture] 5b3e3964db: torture_init_begin: refusing rcu init: spin_lock running
FYI, we noticed the below changes on https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/dev commit 5b3e3964dba5f5a3210ca931d523c1e1f3119b31 ("rcutorture: Add RCU grace-period performance tests") As below, the log "torture_init_begin: refusing rcu init: spin_lock running" showed with your commit. [3.310757] spin_lock-torture:--- Start of test [debug]: nwriters_stress=4 nreaders_stress=0 stat_interval=60 verbose=1 shuffle_interval=3 stutter=5 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0 [3.310757] spin_lock-torture:--- Start of test [debug]: nwriters_stress=4 nreaders_stress=0 stat_interval=60 verbose=1 shuffle_interval=3 stutter=5 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0 [3.318722] spin_lock-torture: Creating torture_shuffle task [3.318722] spin_lock-torture: Creating torture_shuffle task [3.350213] spin_lock-torture: Creating torture_stutter task [3.350213] spin_lock-torture: Creating torture_stutter task [3.353000] spin_lock-torture: torture_shuffle task started [3.353000] spin_lock-torture: torture_shuffle task started [3.355562] spin_lock-torture: Creating lock_torture_writer task [3.355562] spin_lock-torture: Creating lock_torture_writer task [3.358373] spin_lock-torture: torture_stutter task started [3.358373] spin_lock-torture: torture_stutter task started [3.361060] spin_lock-torture: lock_torture_writer task started [3.361060] spin_lock-torture: lock_torture_writer task started [3.370011] spin_lock-torture: Creating lock_torture_writer task [3.370011] spin_lock-torture: Creating lock_torture_writer task [3.372856] spin_lock-torture: Creating lock_torture_writer task [3.372856] spin_lock-torture: Creating lock_torture_writer task [3.375817] spin_lock-torture: lock_torture_writer task started [3.375817] spin_lock-torture: lock_torture_writer task started [3.378697] spin_lock-torture: Creating lock_torture_writer task [3.378697] spin_lock-torture: Creating lock_torture_writer task [3.380049] spin_lock-torture: lock_torture_writer task started [3.380049] spin_lock-torture: lock_torture_writer task started [3.410169] spin_lock-torture: Creating lock_torture_stats task [3.410169] spin_lock-torture: Creating lock_torture_stats task [3.413129] spin_lock-torture: lock_torture_writer task started [3.413129] spin_lock-torture: lock_torture_writer task started [3.420137] torture_init_begin: refusing rcu init: spin_lock running [3.420137] torture_init_begin: refusing rcu init: spin_lock running [3.430064] spin_lock-torture: lock_torture_stats task started [3.430064] spin_lock-torture: lock_torture_stats task started [3.441101] futex hash table entries: 16 (order: -1, 2048 bytes) [3.441101] futex hash table entries: 16 (order: -1, 2048 bytes) [3.443791] audit: initializing netlink subsys (disabled) [3.443791] audit: initializing netlink subsys (disabled) [3.446329] audit: type=2000 audit(1458435960.381:1): initialized [3.446329] audit: type=2000 audit(1458435960.381:1): initialized [3.470185] zbud: loaded [3.470185] zbud: loaded FYI, raw QEMU command line is: qemu-system-x86_64 -enable-kvm -cpu Nehalem -kernel /pkg/linux/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/vmlinuz-4.5.0-rc1-00035-g5b3e396 -append 'root=/dev/ram0 user=lkp job=/lkp/scheduled/vm-intel12-yocto-x86_64-6/bisect_boot-1-yocto-minimal-x86_64.cgz-x86_64-randconfig-i0-201612-5b3e3964dba5f5a3210ca931d523c1e1f3119b31-20160320-8459-1gazcic-1.yaml ARCH=x86_64 kconfig=x86_64-randconfig-i0-201612 branch=linux-devel/devel-spot-201603200631 commit=5b3e3964dba5f5a3210ca931d523c1e1f3119b31 BOOT_IMAGE=/pkg/linux/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/vmlinuz-4.5.0-rc1-00035-g5b3e396 max_uptime=600 RESULT_ROOT=/result/boot/1/vm-intel12-yocto-x86_64/yocto-minimal-x86_64.cgz/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/0 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw ip=vm-intel12-yocto-x86_64-6::dhcp drbd.minor_count=8' -initrd /fs/KVM/initrd-vm-intel12-yocto-x86_64-6 -m 320 -smp 2 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -drive file=/fs/KVM/disk0-vm-intel12-yocto-x86_64-6,media=disk,if=virtio -drive file=/fs/KVM/disk1-vm-intel12-yocto-x86_64-6,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-vm-intel12-yocto-x86_64-6 -serial file:/dev/shm/kboot/serial-vm-intel12-yocto-x86_64-6 -daemonize -display none -monitor null Thanks, Xiaolong Ye. dmesg.xz Description: Binary data
Re: [PATCH 1/2] media/dvb-core: fix inverted check
Hi Max, Already in the tree: http://git.linuxtv.org/media_tree.git/commit/drivers/media/dvb-core?id=711f3fba6ffd3914fd1b5ed9faf8d22bab6f2203 Cheers, -olli On 18 March 2016 at 23:31, Max Kellermann wrote: > Breakage caused by commit f50d51661a > > Signed-off-by: Max Kellermann > --- > drivers/media/dvb-core/dvbdev.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c > index 560450a..c756d4b 100644 > --- a/drivers/media/dvb-core/dvbdev.c > +++ b/drivers/media/dvb-core/dvbdev.c > @@ -682,7 +682,7 @@ int dvb_create_media_graph(struct dvb_adapter *adap, > if (demux && ca) { > ret = media_create_pad_link(demux, 1, ca, > 0, MEDIA_LNK_FL_ENABLED); > - if (!ret) > + if (ret) > return -ENOMEM; > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-media" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 05/18] zsmalloc: remove unused pool param in obj_free
Let's remove unused pool param in obj_free Reviewed-by: Sergey Senozhatsky Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 16556a6db628..a0890e9003e2 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1438,8 +1438,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) } EXPORT_SYMBOL_GPL(zs_malloc); -static void obj_free(struct zs_pool *pool, struct size_class *class, - unsigned long obj) +static void obj_free(struct size_class *class, unsigned long obj) { struct link_free *link; struct page *first_page, *f_page; @@ -1485,7 +1484,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) class = pool->size_class[class_idx]; spin_lock(&class->lock); - obj_free(pool, class, obj); + obj_free(class, obj); fullness = fix_fullness_group(class, first_page); if (fullness == ZS_EMPTY) { zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( @@ -1648,7 +1647,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, free_obj |= BIT(HANDLE_PIN_BIT); record_obj(handle, free_obj); unpin_tag(handle); - obj_free(pool, class, used_obj); + obj_free(class, used_obj); } /* Remember last position in this iteration */ -- 1.9.1
[lkp] [cpufreq] 9be4fd2c77: No primary result change, 56.4% fsmark.time.involuntary_context_switches
| 1000 ++ | 900 ++ | || 800 ++.* .** .* .** * .* *. ** .* *. *. * | ** ** * ** *.* *.*** **.** * * * ** ***.* *.***.***.*| 700 ++---**-** [*] bisect-good sample [O] bisect-bad sample To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong Ye. --- LKP_SERVER: inn LKP_CGI_PORT: 80 LKP_CIFS_PORT: 139 testcase: fsmark default-monitors: wait: activate-monitor kmsg: uptime: iostat: heartbeat: vmstat: numa-numastat: numa-vmstat: numa-meminfo: proc-vmstat: proc-stat: interval: 10 meminfo: slabinfo: interrupts: lock_stat: latency_stats: softirqs: bdi_dev_mapping: diskstats: nfsstat: cpuidle: cpufreq-stats: turbostat: pmeter: sched_debug: interval: 60 cpufreq_governor: default-watchdogs: oom-killer: watchdog: commit: 9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4 model: Westmere-EP memory: 16G nr_hdd_partitions: 10 hdd_partitions: "/dev/disk/by-id/scsi-35000c500*-part1" swap_partitions: rootfs_partition: "/dev/disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR5408564-part3" category: benchmark iterations: 1x nr_threads: 32t disk: 1HDD fs: ext4 fs2: nfsv4 fsmark: filesize: 8K test_size: 400M sync_method: fsyncBeforeClose nr_directories: 16d nr_files_per_directory: 256fpd queue: bisect testbox: lkp-ws02 tbox_group: lkp-ws02 kconfig: x86_64-rhel enqueue_time: 2016-03-20 10:09:24.525219588 +08:00 compiler: gcc-4.9 rootfs: debian-x86_64-2015-02-07.cgz id: fdd404daccaee5e0c96a90d1c6f11354ed761f51 user: lkp head_commit: 6c01e36f36861235cc151706b1bcb674e965c5a5 base_commit: b562e44f507e863c6792946e4e1b1449fbbac85d branch: linux-devel/devel-hourly-2016031901 result_root: "/result/fsmark/1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd/lkp-ws02/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/0" job_file: "/lkp/scheduled/lkp-ws02/bisect_fsmark-1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd-debian-x86_64-2015-02-07.cgz-x86_64-rhel-9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4-20160320-39429-bq8oj6-0.yaml" nr_cpu: "$(nproc)" max_uptime: 1756.70003 initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz" bootloader_append: - root=/dev/ram0 - user=lkp - job=/lkp/scheduled/lkp-ws02/bisect_fsmark-1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd-debian-x86_64-2015-02-07.cgz-x86_64-rhel-9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4-20160320-39429-bq8oj6-0.yaml - ARCH=x86_64 - kconfig=x86_64-rhel - branch=linux-devel/devel-hourly-2016031901 - commit=9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4 - BOOT_IMAGE=/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/vmlinuz-4.5.0-rc2-4-g9be4fd2 - max_uptime=1756 - RESULT_ROOT=/result/fsmark/1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd/lkp-ws02/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/0 - LKP_SERVER=inn - |- ipmi_watchdog.start_now=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw lkp_initrd: "/lkp/lkp/lkp-x86_64.cgz" modules_initrd: "/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/modules.cgz" bm_initrd: "/osimage/deps/debian-x86_64-2015-02-07.cgz/lkp.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/run-ipconfig.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/turbostat.cgz,/lkp/benchmarks/turbostat.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs2.cgz,/lkp/benchmarks/fsmark.cgz" linux_headers_initrd: "/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/linux-headers.cgz" repeat_to: 2 kernel: "/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/vmlinuz-4.5.0-rc2-4-g9be4fd2" dequeue_time: 2016-03-20 10:16:38.380437312 +08:00 job_state: finished loadavg: 37.39 31.57 16.10 3/373 13706 start_time: '1458440258' end_time: '1458440752' version: "/lkp/lkp/.src-20160318-155012" 2016-03-20
[PATCH v2 10/18] zsmalloc: factor page chain functionality out
For migration, we need to create sub-page chain of zspage dynamically so this patch factors it out from alloc_zspage. As a minor refactoring, it makes OBJ_ALLOCATED_TAG assign more clear in obj_malloc(it could be another patch but it's trivial so I want to put together in this patch). Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 80 ++- 1 file changed, 46 insertions(+), 34 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 958f27a9079d..833da8f4ffc9 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -982,7 +982,9 @@ static void init_zspage(struct size_class *class, struct page *first_page) unsigned long off = 0; struct page *page = first_page; - VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + first_page->freelist = NULL; + INIT_LIST_HEAD(&first_page->lru); + set_zspage_inuse(first_page, 0); while (page) { struct page *next_page; @@ -1027,13 +1029,44 @@ static void init_zspage(struct size_class *class, struct page *first_page) set_freeobj(first_page, 0); } +static void create_page_chain(struct page *pages[], int nr_pages) +{ + int i; + struct page *page; + struct page *prev_page = NULL; + struct page *first_page = NULL; + + for (i = 0; i < nr_pages; i++) { + page = pages[i]; + + INIT_LIST_HEAD(&page->lru); + if (i == 0) { + SetPagePrivate(page); + set_page_private(page, 0); + first_page = page; + } + + if (i == 1) + set_page_private(first_page, (unsigned long)page); + if (i >= 1) + set_page_private(page, (unsigned long)first_page); + if (i >= 2) + list_add(&page->lru, &prev_page->lru); + if (i == nr_pages - 1) + SetPagePrivate2(page); + + prev_page = page; + } +} + /* * Allocate a zspage for the given size class */ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) { - int i, error; - struct page *first_page = NULL, *uninitialized_var(prev_page); + int i; + struct page *first_page = NULL; + struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE]; /* * Allocate individual pages and link them together as: @@ -1046,43 +1079,23 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) * (i.e. no other sub-page has this flag set) and PG_private_2 to * identify the last page. */ - error = -ENOMEM; for (i = 0; i < class->pages_per_zspage; i++) { struct page *page; page = alloc_page(flags); - if (!page) - goto cleanup; - - INIT_LIST_HEAD(&page->lru); - if (i == 0) { /* first page */ - page->freelist = NULL; - SetPagePrivate(page); - set_page_private(page, 0); - first_page = page; - set_zspage_inuse(page, 0); + if (!page) { + while (--i >= 0) + __free_page(pages[i]); + return NULL; } - if (i == 1) - set_page_private(first_page, (unsigned long)page); - if (i >= 1) - set_page_private(page, (unsigned long)first_page); - if (i >= 2) - list_add(&page->lru, &prev_page->lru); - if (i == class->pages_per_zspage - 1) /* last page */ - SetPagePrivate2(page); - prev_page = page; + + pages[i] = page; } + create_page_chain(pages, class->pages_per_zspage); + first_page = pages[0]; init_zspage(class, first_page); - error = 0; /* Success */ - -cleanup: - if (unlikely(error) && first_page) { - free_zspage(first_page); - first_page = NULL; - } - return first_page; } @@ -1422,7 +1435,6 @@ static unsigned long obj_malloc(struct size_class *class, unsigned long m_offset; void *vaddr; - handle |= OBJ_ALLOCATED_TAG; obj = get_freeobj(first_page); objidx_to_page_and_offset(class, first_page, obj, &m_page, &m_offset); @@ -1432,10 +1444,10 @@ static unsigned long obj_malloc(struct size_class *class, set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG); if (!class->huge) /* record handle in the header of allocated chunk */ - link->handle = handle; + link->handle = handle | OBJ_ALLOCATED_TAG; else /* record handle in first_page
[PATCH v2 09/18] zsmalloc: move struct zs_meta from mapping to freelist
For supporting migration from VM, we need to have address_space on every page so zsmalloc shouldn't use page->mapping. So, this patch moves zs_meta from mapping to freelist. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 0c8ccd87c084..958f27a9079d 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -29,7 +29,7 @@ * Look at size_class->huge. * page->lru: links together first pages of various zspages. * Basically forming list of zspages in a fullness group. - * page->mapping: override by struct zs_meta + * page->freelist: override by struct zs_meta * * Usage of struct page flags: * PG_private: identifies the first component page @@ -418,7 +418,7 @@ static int get_zspage_inuse(struct page *first_page) VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; return m->inuse; } @@ -429,7 +429,7 @@ static void set_zspage_inuse(struct page *first_page, int val) VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; m->inuse = val; } @@ -439,7 +439,7 @@ static void mod_zspage_inuse(struct page *first_page, int val) VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; m->inuse += val; } @@ -449,7 +449,7 @@ static void set_freeobj(struct page *first_page, int idx) VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; m->freeobj = idx; } @@ -459,7 +459,7 @@ static unsigned long get_freeobj(struct page *first_page) VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; return m->freeobj; } @@ -471,7 +471,7 @@ static void get_zspage_mapping(struct page *first_page, VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; *fullness = m->fullness; *class_idx = m->class; } @@ -484,7 +484,7 @@ static void set_zspage_mapping(struct page *first_page, VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (struct zs_meta *)&first_page->mapping; + m = (struct zs_meta *)&first_page->freelist; m->fullness = fullness; m->class = class_idx; } @@ -946,7 +946,7 @@ static void reset_page(struct page *page) clear_bit(PG_private, &page->flags); clear_bit(PG_private_2, &page->flags); set_page_private(page, 0); - page->mapping = NULL; + page->freelist = NULL; page_mapcount_reset(page); } @@ -1056,6 +1056,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) INIT_LIST_HEAD(&page->lru); if (i == 0) { /* first page */ + page->freelist = NULL; SetPagePrivate(page); set_page_private(page, 0); first_page = page; @@ -2068,9 +2069,9 @@ static int __init zs_init(void) /* * A zspage's a free object index, class index, fullness group, -* inuse object count are encoded in its (first)page->mapping +* inuse object count are encoded in its (first)page->freelist * so sizeof(struct zs_meta) should be less than -* sizeof(page->mapping(i.e., unsigned long)). +* sizeof(page->freelist(i.e., void *)). */ BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long)); -- 1.9.1
[PATCH v2 06/18] zsmalloc: keep max_object in size_class
Every zspage in a size_class has same number of max objects so we could move it to a size_class. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a0890e9003e2..8649d0243e6c 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -32,8 +32,6 @@ * page->freelist: points to the first free object in zspage. * Free objects are linked together using in-place * metadata. - * page->objects: maximum number of objects we can store in this - * zspage (class->zspage_order * PAGE_SIZE / class->size) * page->lru: links together first pages of various zspages. * Basically forming list of zspages in a fullness group. * page->mapping: class index and fullness group of the zspage @@ -211,6 +209,7 @@ struct size_class { * of ZS_ALIGN. */ int size; + int objs_per_zspage; unsigned int index; struct zs_size_stat stats; @@ -627,21 +626,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool) * the pool (not yet implemented). This function returns fullness * status of the given page. */ -static enum fullness_group get_fullness_group(struct page *first_page) +static enum fullness_group get_fullness_group(struct size_class *class, + struct page *first_page) { - int inuse, max_objects; + int inuse, objs_per_zspage; enum fullness_group fg; VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); inuse = first_page->inuse; - max_objects = first_page->objects; + objs_per_zspage = class->objs_per_zspage; if (inuse == 0) fg = ZS_EMPTY; - else if (inuse == max_objects) + else if (inuse == objs_per_zspage) fg = ZS_FULL; - else if (inuse <= 3 * max_objects / fullness_threshold_frac) + else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac) fg = ZS_ALMOST_EMPTY; else fg = ZS_ALMOST_FULL; @@ -728,7 +728,7 @@ static enum fullness_group fix_fullness_group(struct size_class *class, enum fullness_group currfg, newfg; get_zspage_mapping(first_page, &class_idx, &currfg); - newfg = get_fullness_group(first_page); + newfg = get_fullness_group(class, first_page); if (newfg == currfg) goto out; @@ -1008,9 +1008,6 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) init_zspage(class, first_page); first_page->freelist = location_to_obj(first_page, 0); - /* Maximum number of objects we can store in this zspage */ - first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size; - error = 0; /* Success */ cleanup: @@ -1238,11 +1235,11 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage) return true; } -static bool zspage_full(struct page *first_page) +static bool zspage_full(struct size_class *class, struct page *first_page) { VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - return first_page->inuse == first_page->objects; + return first_page->inuse == class->objs_per_zspage; } unsigned long zs_get_total_pages(struct zs_pool *pool) @@ -1628,7 +1625,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, } /* Stop if there is no more space */ - if (zspage_full(d_page)) { + if (zspage_full(class, d_page)) { unpin_tag(handle); ret = -ENOMEM; break; @@ -1687,7 +1684,7 @@ static enum fullness_group putback_zspage(struct zs_pool *pool, { enum fullness_group fullness; - fullness = get_fullness_group(first_page); + fullness = get_fullness_group(class, first_page); insert_zspage(class, fullness, first_page); set_zspage_mapping(first_page, class->index, fullness); @@ -1936,8 +1933,9 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags) class->size = size; class->index = i; class->pages_per_zspage = pages_per_zspage; - if (pages_per_zspage == 1 && - get_maxobj_per_zspage(size, pages_per_zspage) == 1) + class->objs_per_zspage = class->pages_per_zspage * + PAGE_SIZE / class->size; + if (pages_per_zspage == 1 && class->objs_per_zspage == 1) class->huge = true; spin_lock_init(&class->lock); pool->size_class[i] = class; -- 1.9.1
[PATCH v2 12/18] zsmalloc: zs_compact refactoring
Currently, we rely on class->lock to prevent zspage destruction. It was okay until now because the critical section is short but with run-time migration, it could be long so class->lock is not a good apporach any more. So, this patch introduces [un]freeze_zspage functions which freeze allocated objects in the zspage with pinning tag so user cannot free using object. With those functions, this patch redesign compaction. Those functions will be used for implementing zspage runtime migrations, too. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 393 ++ 1 file changed, 257 insertions(+), 136 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 9c0ab1e92e9b..990d752fb65b 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -922,6 +922,13 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page, return *(unsigned long *)obj; } +static inline int testpin_tag(unsigned long handle) +{ + unsigned long *ptr = (unsigned long *)handle; + + return test_bit(HANDLE_PIN_BIT, ptr); +} + static inline int trypin_tag(unsigned long handle) { unsigned long *ptr = (unsigned long *)handle; @@ -950,8 +957,7 @@ static void reset_page(struct page *page) page_mapcount_reset(page); } -static void free_zspage(struct zs_pool *pool, struct size_class *class, - struct page *first_page) +static void free_zspage(struct zs_pool *pool, struct page *first_page) { struct page *nextp, *tmp, *head_extra; @@ -974,11 +980,6 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, } reset_page(head_extra); __free_page(head_extra); - - zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( - class->size, class->pages_per_zspage)); - atomic_long_sub(class->pages_per_zspage, - &pool->pages_allocated); } /* Initialize a newly allocated zspage */ @@ -1326,6 +1327,11 @@ static bool zspage_full(struct size_class *class, struct page *first_page) return get_zspage_inuse(first_page) == class->objs_per_zspage; } +static bool zspage_empty(struct size_class *class, struct page *first_page) +{ + return get_zspage_inuse(first_page) == 0; +} + unsigned long zs_get_total_pages(struct zs_pool *pool) { return atomic_long_read(&pool->pages_allocated); @@ -1456,7 +1462,6 @@ static unsigned long obj_malloc(struct size_class *class, set_page_private(first_page, handle | OBJ_ALLOCATED_TAG); kunmap_atomic(vaddr); mod_zspage_inuse(first_page, 1); - zs_stat_inc(class, OBJ_USED, 1); obj = location_to_obj(m_page, obj); @@ -1511,6 +1516,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) } obj = obj_malloc(class, first_page, handle); + zs_stat_inc(class, OBJ_USED, 1); /* Now move the zspage to another fullness group, if required */ fix_fullness_group(class, first_page); record_obj(handle, obj); @@ -1541,7 +1547,6 @@ static void obj_free(struct size_class *class, unsigned long obj) kunmap_atomic(vaddr); set_freeobj(first_page, f_objidx); mod_zspage_inuse(first_page, -1); - zs_stat_dec(class, OBJ_USED, 1); } void zs_free(struct zs_pool *pool, unsigned long handle) @@ -1565,10 +1570,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle) spin_lock(&class->lock); obj_free(class, obj); + zs_stat_dec(class, OBJ_USED, 1); fullness = fix_fullness_group(class, first_page); - if (fullness == ZS_EMPTY) - free_zspage(pool, class, first_page); + if (fullness == ZS_EMPTY) { + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( + class->size, class->pages_per_zspage)); + spin_unlock(&class->lock); + atomic_long_sub(class->pages_per_zspage, + &pool->pages_allocated); + free_zspage(pool, first_page); + goto out; + } spin_unlock(&class->lock); +out: unpin_tag(handle); free_handle(pool, handle); @@ -1638,127 +1652,66 @@ static void zs_object_copy(struct size_class *class, unsigned long dst, kunmap_atomic(s_addr); } -/* - * Find alloced object in zspage from index object and - * return handle. - */ -static unsigned long find_alloced_obj(struct size_class *class, - struct page *page, int index) +static unsigned long handle_from_obj(struct size_class *class, + struct page *first_page, int obj_idx) { - unsigned long head; - int offset = 0; - unsigned long handle = 0; - void *addr = kmap_atomic(page); - - if (!is_first_page(page)) - offset = page->index; - offset += class->size * inde
[PATCH v2 18/18] zram: use __GFP_MOVABLE for memory allocation
Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE from now on. I did test to see how it helps to make higher order pages. Test scenario is as follows. KVM guest, 1G memory, ext4 formated zram block device, for i in `seq 1 8`; do dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 & done wait `pidof dd` for i in `seq 1 2 8`; do rm -rf mnt/test$i.txt done fstrim -v mnt echo "init" cat /proc/buddyinfo echo "compaction" echo 1 > /proc/sys/vm/compact_memory cat /proc/buddyinfo old: init Node 0, zone DMA208120 51 41 11 0 0 0 0 0 0 Node 0, zoneDMA32 16380 13777 9184 3805789 54 3 0 0 0 0 compaction Node 0, zone DMA132 82 40 39 16 2 1 0 0 0 0 Node 0, zoneDMA32 5219 5526 4969 3455 1831677139 15 0 0 0 new: init Node 0, zone DMA379115 97 19 2 0 0 0 0 0 0 Node 0, zoneDMA32 18891 16774 10862 3947637 21 0 0 0 0 0 compaction 1 Node 0, zone DMA214 66 87 29 10 3 0 0 0 0 0 Node 0, zoneDMA32 1612 3139 3154 2469 1745990384 94 7 0 0 As you can see, compaction made so many high-order pages. Yay! Reviewed-by: Sergey Senozhatsky Signed-off-by: Minchan Kim --- drivers/block/zram/zram_drv.c | 3 ++- mm/zsmalloc.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 46055dbc4095..da8298b9f05e 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -517,7 +517,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize) goto out_error; } - meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM); + meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM + |__GFP_MOVABLE); if (!meta->mem_pool) { pr_err("Error creating memory pool\n"); goto out_error; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 35bafa0bc3f1..8557da6dbaf2 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -308,7 +308,7 @@ static void destroy_handle_cache(struct zs_pool *pool) static unsigned long alloc_handle(struct zs_pool *pool) { return (unsigned long)kmem_cache_alloc(pool->handle_cachep, - pool->flags & ~__GFP_HIGHMEM); + pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); } static void free_handle(struct zs_pool *pool, unsigned long handle) -- 1.9.1
[PATCH v2 16/18] zsmalloc: use single linked list for page chain
For tail page migration, we shouldn't use page->lru which was used for page chaining because VM will use it for own purpose so that we need another field for chaining. For chaining, singly linked list is enough and page->index of tail page to point first object offset in the page could be replaced in run-time calculation. So, this patch change page->lru list for chaining with singly linked list via page->freelist squeeze and introduces get_first_obj_ofs to get first object offset in a page. With that, it could maintain page chaining without using page->lru. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 119 ++ 1 file changed, 78 insertions(+), 41 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index b3b31fdfea0f..9b4b03d8f993 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -17,10 +17,7 @@ * * Usage of struct page fields: * page->private: points to the first component (0-order) page - * page->index (union with page->freelist): offset of the first object - * starting in this page. - * page->lru: links together all component pages (except the first page) - * of a zspage + * page->index (union with page->freelist): override by struct zs_meta * * For _first_ page only: * @@ -271,10 +268,19 @@ struct zs_pool { }; struct zs_meta { - unsigned long freeobj:FREEOBJ_BITS; - unsigned long class:CLASS_BITS; - unsigned long fullness:FULLNESS_BITS; - unsigned long inuse:INUSE_BITS; + union { + /* first page */ + struct { + unsigned long freeobj:FREEOBJ_BITS; + unsigned long class:CLASS_BITS; + unsigned long fullness:FULLNESS_BITS; + unsigned long inuse:INUSE_BITS; + }; + /* tail pages */ + struct { + struct page *next; + }; + }; }; struct mapping_area { @@ -491,6 +497,34 @@ static unsigned long get_freeobj(struct page *first_page) return m->freeobj; } +static void set_next_page(struct page *page, struct page *next) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(is_first_page(page), page); + + m = (struct zs_meta *)&page->index; + m->next = next; +} + +static struct page *get_next_page(struct page *page) +{ + struct page *next; + + if (is_last_page(page)) + next = NULL; + else if (is_first_page(page)) + next = (struct page *)page_private(page); + else { + struct zs_meta *m = (struct zs_meta *)&page->index; + + VM_BUG_ON(!m->next); + next = m->next; + } + + return next; +} + static void get_zspage_mapping(struct page *first_page, unsigned int *class_idx, enum fullness_group *fullness) @@ -871,18 +905,30 @@ static struct page *get_first_page(struct page *page) return (struct page *)page_private(page); } -static struct page *get_next_page(struct page *page) +int get_first_obj_ofs(struct size_class *class, struct page *first_page, + struct page *page) { - struct page *next; + int pos, bound; + int page_idx = 0; + int ofs = 0; + struct page *cursor = first_page; - if (is_last_page(page)) - next = NULL; - else if (is_first_page(page)) - next = (struct page *)page_private(page); - else - next = list_entry(page->lru.next, struct page, lru); + if (first_page == page) + goto out; - return next; + while (page != cursor) { + page_idx++; + cursor = get_next_page(cursor); + } + + bound = PAGE_SIZE * page_idx; + pos = (((class->objs_per_zspage * class->size) * + page_idx / class->pages_per_zspage) / class->size + ) * class->size; + + ofs = (pos + class->size) % PAGE_SIZE; +out: + return ofs; } static void objidx_to_page_and_offset(struct size_class *class, @@ -1008,27 +1054,25 @@ void lock_zspage(struct page *first_page) static void free_zspage(struct zs_pool *pool, struct page *first_page) { - struct page *nextp, *tmp, *head_extra; + struct page *nextp, *tmp; VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page); lock_zspage(first_page); - head_extra = (struct page *)page_private(first_page); + nextp = (struct page *)page_private(first_page); /* zspage with only 1 system page */ - if (!head_extra) + if (!nextp) goto out; - list_for_each_entry_safe(nextp, tmp, &head_extra->lru, lru) { - list_del(&nextp->lru); - reset_page(nextp); -
[PATCH v2 04/18] zsmalloc: reordering function parameter
This patch cleans up function parameter ordering to order higher data structure first. Reviewed-by: Sergey Senozhatsky Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 50 ++ 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 6a7b9313ee8c..16556a6db628 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -569,7 +569,7 @@ static const struct file_operations zs_stat_size_ops = { .release= single_release, }; -static int zs_pool_stat_create(const char *name, struct zs_pool *pool) +static int zs_pool_stat_create(struct zs_pool *pool, const char *name) { struct dentry *entry; @@ -609,7 +609,7 @@ static void __exit zs_stat_exit(void) { } -static inline int zs_pool_stat_create(const char *name, struct zs_pool *pool) +static inline int zs_pool_stat_create(struct zs_pool *pool, const char *name) { return 0; } @@ -655,8 +655,9 @@ static enum fullness_group get_fullness_group(struct page *first_page) * have. This functions inserts the given zspage into the freelist * identified by . */ -static void insert_zspage(struct page *first_page, struct size_class *class, - enum fullness_group fullness) +static void insert_zspage(struct size_class *class, + enum fullness_group fullness, + struct page *first_page) { struct page **head; @@ -687,8 +688,9 @@ static void insert_zspage(struct page *first_page, struct size_class *class, * This function removes the given zspage from the freelist identified * by . */ -static void remove_zspage(struct page *first_page, struct size_class *class, - enum fullness_group fullness) +static void remove_zspage(struct size_class *class, + enum fullness_group fullness, + struct page *first_page) { struct page **head; @@ -730,8 +732,8 @@ static enum fullness_group fix_fullness_group(struct size_class *class, if (newfg == currfg) goto out; - remove_zspage(first_page, class, currfg); - insert_zspage(first_page, class, newfg); + remove_zspage(class, currfg, first_page); + insert_zspage(class, newfg, first_page); set_zspage_mapping(first_page, class_idx, newfg); out: @@ -915,7 +917,7 @@ static void free_zspage(struct page *first_page) } /* Initialize a newly allocated zspage */ -static void init_zspage(struct page *first_page, struct size_class *class) +static void init_zspage(struct size_class *class, struct page *first_page) { unsigned long off = 0; struct page *page = first_page; @@ -1003,7 +1005,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) prev_page = page; } - init_zspage(first_page, class); + init_zspage(class, first_page); first_page->freelist = location_to_obj(first_page, 0); /* Maximum number of objects we can store in this zspage */ @@ -1348,8 +1350,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); -static unsigned long obj_malloc(struct page *first_page, - struct size_class *class, unsigned long handle) +static unsigned long obj_malloc(struct size_class *class, + struct page *first_page, unsigned long handle) { unsigned long obj; struct link_free *link; @@ -1426,7 +1428,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) class->size, class->pages_per_zspage)); } - obj = obj_malloc(first_page, class, handle); + obj = obj_malloc(class, first_page, handle); /* Now move the zspage to another fullness group, if required */ fix_fullness_group(class, first_page); record_obj(handle, obj); @@ -1499,8 +1501,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_free); -static void zs_object_copy(unsigned long dst, unsigned long src, - struct size_class *class) +static void zs_object_copy(struct size_class *class, unsigned long dst, + unsigned long src) { struct page *s_page, *d_page; unsigned long s_objidx, d_objidx; @@ -1566,8 +1568,8 @@ static void zs_object_copy(unsigned long dst, unsigned long src, * Find alloced object in zspage from index object and * return handle. */ -static unsigned long find_alloced_obj(struct page *page, int index, - struct size_class *class) +static unsigned long find_alloced_obj(struct size_class *class, + struct page *page, int index) { unsigned long head; int offset = 0; @@ -1617,7 +1619,7 @@ static int migrate_zspage(struct zs_pool *p
[PATCH v2 07/18] zsmalloc: squeeze inuse into page->mapping
Currently, we store class:fullness into page->mapping. The number of class we can support is 255 and fullness is 4 so (8 + 2 = 10bit) is enough to represent them. Meanwhile, the bits we need to store in-use objects in zspage is that 11bit is enough. For example, If we assume that 64K PAGE_SIZE, class_size 32 which is worst case, class->pages_per_zspage become 1 so the number of objects in zspage is 2048 so 11bit is enough. The next class is 32 + 256(i.e., ZS_SIZE_CLASS_DELTA). With worst case that ZS_MAX_PAGES_PER_ZSPAGE, 64K * 4 / (32 + 256) = 910 so 11bit is still enough. So, we could squeeze inuse object count to page->mapping. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 103 -- 1 file changed, 71 insertions(+), 32 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 8649d0243e6c..4dd72a803568 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -34,8 +34,7 @@ * metadata. * page->lru: links together first pages of various zspages. * Basically forming list of zspages in a fullness group. - * page->mapping: class index and fullness group of the zspage - * page->inuse: the number of objects that are used in this zspage + * page->mapping: override by struct zs_meta * * Usage of struct page flags: * PG_private: identifies the first component page @@ -132,6 +131,13 @@ /* each chunk includes extra space to keep handle */ #define ZS_MAX_ALLOC_SIZE PAGE_SIZE +#define CLASS_BITS 8 +#define CLASS_MASK ((1 << CLASS_BITS) - 1) +#define FULLNESS_BITS 2 +#define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1) +#define INUSE_BITS 11 +#define INUSE_MASK ((1 << INUSE_BITS) - 1) + /* * On systems with 4K page size, this gives 255 size classes! There is a * trader-off here: @@ -145,7 +151,7 @@ * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN * (reason above) */ -#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> 8) +#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS) /* * We do not maintain any list for completely empty or full pages @@ -155,7 +161,7 @@ enum fullness_group { ZS_ALMOST_EMPTY, _ZS_NR_FULLNESS_GROUPS, - ZS_EMPTY, + ZS_EMPTY = _ZS_NR_FULLNESS_GROUPS, ZS_FULL }; @@ -263,14 +269,11 @@ struct zs_pool { #endif }; -/* - * A zspage's class index and fullness group - * are encoded in its (first)page->mapping - */ -#define CLASS_IDX_BITS 28 -#define FULLNESS_BITS 4 -#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1) -#define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1) +struct zs_meta { + unsigned long class:CLASS_BITS; + unsigned long fullness:FULLNESS_BITS; + unsigned long inuse:INUSE_BITS; +}; struct mapping_area { #ifdef CONFIG_PGTABLE_MAPPING @@ -412,28 +415,61 @@ static int is_last_page(struct page *page) return PagePrivate2(page); } +static int get_zspage_inuse(struct page *first_page) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + m = (struct zs_meta *)&first_page->mapping; + + return m->inuse; +} + +static void set_zspage_inuse(struct page *first_page, int val) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + m = (struct zs_meta *)&first_page->mapping; + m->inuse = val; +} + +static void mod_zspage_inuse(struct page *first_page, int val) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + m = (struct zs_meta *)&first_page->mapping; + m->inuse += val; +} + static void get_zspage_mapping(struct page *first_page, unsigned int *class_idx, enum fullness_group *fullness) { - unsigned long m; + struct zs_meta *m; + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = (unsigned long)first_page->mapping; - *fullness = m & FULLNESS_MASK; - *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK; + m = (struct zs_meta *)&first_page->mapping; + *fullness = m->fullness; + *class_idx = m->class; } static void set_zspage_mapping(struct page *first_page, unsigned int class_idx, enum fullness_group fullness) { - unsigned long m; + struct zs_meta *m; + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); - m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) | - (fullness & FULLNESS_MASK); - first_page->mapping = (struct address_space *)m; + m = (struct zs_meta *)&first_page->mapping; + m->fullness = fullness; + m->class = class_idx; } /* @@ -632,9 +668,7 @@ static enum fullness_group get_fullness_group(struct size_class *class, int inuse, objs_per_zspage; enum fullness_group fg; - VM_BU
[PATCH v2 17/18] zsmalloc: migrate tail pages in zspage
This patch enables tail page migration of zspage. In this point, I tested zsmalloc regression with micro-benchmark which does zs_malloc/map/unmap/zs_free for all size class in every CPU(my system is 12) during 20 sec. It shows 1% regression which is really small when we consider the benefit of this feature and realworkload overhead(i.e., most overhead comes from compression). Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 131 +++--- 1 file changed, 115 insertions(+), 16 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 9b4b03d8f993..35bafa0bc3f1 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -551,6 +551,19 @@ static void set_zspage_mapping(struct page *first_page, m->class = class_idx; } +static bool check_isolated_page(struct page *first_page) +{ + struct page *cursor; + + for (cursor = first_page; cursor != NULL; cursor = + get_next_page(cursor)) { + if (PageIsolated(cursor)) + return true; + } + + return false; +} + /* * zsmalloc divides the pool into various size classes where each * class maintains a list of zspages where each zspage is divided @@ -1052,6 +1065,44 @@ void lock_zspage(struct page *first_page) } while ((cursor = get_next_page(cursor)) != NULL); } +int trylock_zspage(struct page *first_page, struct page *locked_page) +{ + struct page *cursor, *fail; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + for (cursor = first_page; cursor != NULL; cursor = + get_next_page(cursor)) { + if (cursor != locked_page) { + if (!trylock_page(cursor)) { + fail = cursor; + goto unlock; + } + } + } + + return 1; +unlock: + for (cursor = first_page; cursor != fail; cursor = + get_next_page(cursor)) { + if (cursor != locked_page) + unlock_page(cursor); + } + + return 0; +} + +void unlock_zspage(struct page *first_page, struct page *locked_page) +{ + struct page *cursor = first_page; + + for (; cursor != NULL; cursor = get_next_page(cursor)) { + VM_BUG_ON_PAGE(!PageLocked(cursor), cursor); + if (cursor != locked_page) + unlock_page(cursor); + }; +} + static void free_zspage(struct zs_pool *pool, struct page *first_page) { struct page *nextp, *tmp; @@ -1090,16 +1141,17 @@ static void init_zspage(struct size_class *class, struct page *first_page, first_page->freelist = NULL; INIT_LIST_HEAD(&first_page->lru); set_zspage_inuse(first_page, 0); - BUG_ON(!trylock_page(first_page)); - first_page->mapping = mapping; - __SetPageMovable(first_page); - unlock_page(first_page); while (page) { struct page *next_page; struct link_free *link; void *vaddr; + BUG_ON(!trylock_page(page)); + page->mapping = mapping; + __SetPageMovable(page); + unlock_page(page); + vaddr = kmap_atomic(page); link = (struct link_free *)vaddr + off / sizeof(*link); @@ -1850,6 +1902,7 @@ static enum fullness_group putback_zspage(struct size_class *class, VM_BUG_ON_PAGE(!list_empty(&first_page->lru), first_page); VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page); + VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page); fullness = get_fullness_group(class, first_page); insert_zspage(class, fullness, first_page); @@ -1956,6 +2009,12 @@ static struct page *isolate_source_page(struct size_class *class) if (!page) continue; + /* To prevent race between object and page migration */ + if (!trylock_zspage(page, NULL)) { + page = NULL; + continue; + } + remove_zspage(class, i, page); inuse = get_zspage_inuse(page); @@ -1964,6 +2023,7 @@ static struct page *isolate_source_page(struct size_class *class) if (inuse != freezed) { unfreeze_zspage(class, page, freezed); putback_zspage(class, page); + unlock_zspage(page, NULL); page = NULL; continue; } @@ -1995,6 +2055,12 @@ static struct page *isolate_target_page(struct size_class *class) if (!page) continue; + /* To prevent race between object and page migration */ + if (!trylock_zspage(page, NULL)) { + page = NULL; +
[PATCH v2 08/18] zsmalloc: squeeze freelist into page->mapping
Zsmalloc stores first free object's position into first_page->freelist in each zspage. If we change it with object index from first_page instead of location, we could squeeze it into page->mapping because the number of bit we need to store offset is at most 11bit. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 159 +++--- 1 file changed, 96 insertions(+), 63 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 4dd72a803568..0c8ccd87c084 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -18,9 +18,7 @@ * Usage of struct page fields: * page->private: points to the first component (0-order) page * page->index (union with page->freelist): offset of the first object - * starting in this page. For the first page, this is - * always 0, so we use this field (aka freelist) to point - * to the first free object in zspage. + * starting in this page. * page->lru: links together all component pages (except the first page) * of a zspage * @@ -29,9 +27,6 @@ * page->private: refers to the component page after the first page * If the page is first_page for huge object, it stores handle. * Look at size_class->huge. - * page->freelist: points to the first free object in zspage. - * Free objects are linked together using in-place - * metadata. * page->lru: links together first pages of various zspages. * Basically forming list of zspages in a fullness group. * page->mapping: override by struct zs_meta @@ -131,6 +126,7 @@ /* each chunk includes extra space to keep handle */ #define ZS_MAX_ALLOC_SIZE PAGE_SIZE +#define FREEOBJ_BITS 11 #define CLASS_BITS 8 #define CLASS_MASK ((1 << CLASS_BITS) - 1) #define FULLNESS_BITS 2 @@ -228,17 +224,17 @@ struct size_class { /* * Placed within free objects to form a singly linked list. - * For every zspage, first_page->freelist gives head of this list. + * For every zspage, first_page->freeobj gives head of this list. * * This must be power of 2 and less than or equal to ZS_ALIGN */ struct link_free { union { /* -* Position of next free chunk (encodes ) +* free object list * It's valid for non-allocated object */ - void *next; + unsigned long next; /* * Handle of allocated object. */ @@ -270,6 +266,7 @@ struct zs_pool { }; struct zs_meta { + unsigned long freeobj:FREEOBJ_BITS; unsigned long class:CLASS_BITS; unsigned long fullness:FULLNESS_BITS; unsigned long inuse:INUSE_BITS; @@ -446,6 +443,26 @@ static void mod_zspage_inuse(struct page *first_page, int val) m->inuse += val; } +static void set_freeobj(struct page *first_page, int idx) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + m = (struct zs_meta *)&first_page->mapping; + m->freeobj = idx; +} + +static unsigned long get_freeobj(struct page *first_page) +{ + struct zs_meta *m; + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + m = (struct zs_meta *)&first_page->mapping; + return m->freeobj; +} + static void get_zspage_mapping(struct page *first_page, unsigned int *class_idx, enum fullness_group *fullness) @@ -837,30 +854,33 @@ static struct page *get_next_page(struct page *page) return next; } -/* - * Encode as a single handle value. - * We use the least bit of handle for tagging. - */ -static void *location_to_obj(struct page *page, unsigned long obj_idx) +static void objidx_to_page_and_offset(struct size_class *class, + struct page *first_page, + unsigned long obj_idx, + struct page **obj_page, + unsigned long *offset_in_page) { - unsigned long obj; + int i; + unsigned long offset; + struct page *cursor; + int nr_page; - if (!page) { - VM_BUG_ON(obj_idx); - return NULL; - } + offset = obj_idx * class->size; + cursor = first_page; + nr_page = offset >> PAGE_SHIFT; - obj = page_to_pfn(page) << OBJ_INDEX_BITS; - obj |= ((obj_idx) & OBJ_INDEX_MASK); - obj <<= OBJ_TAG_BITS; + *offset_in_page = offset & ~PAGE_MASK; + + for (i = 0; i < nr_page; i++) + cursor = get_next_page(cursor); - return (void *)obj; + *obj_page = cursor; } -/* - * Decode pair from the given object handle. We adjust the - * decoded obj_idx back to its original value since it was adjusted in - * location_to_obj(). +/** + * obj_to_location - get (, ) fro
[PATCH v2 15/18] zsmalloc: migrate head page of zspage
This patch introduces run-time migration feature for zspage. To begin with, it supports only head page migration for easy review(later patches will support tail page migration). For migration, it supports three functions * zs_page_isolate It isolates a zspage which includes a subpage VM want to migrate from class so anyone cannot allocate new object from the zspage. IOW, allocation freeze * zs_page_migrate First of all, it freezes zspage to prevent zspage destrunction so anyone cannot free object. Then, It copies content from oldpage to newpage and create new page-chain with new page. If it was successful, drop the refcount of old page to free and putback new zspage to right data structure of zsmalloc. Lastly, unfreeze zspages so we allows object allocation/free from now on. * zs_page_putback It returns isolated zspage to right fullness_group list if it fails to migrate a page. NOTE: A hurdle to support migration is that destroying zspage while migration is going on. Once a zspage is isolated, anyone cannot allocate object from the zspage but can deallocate object freely so a zspage could be destroyed until all of objects in zspage are freezed to prevent deallocation. The problem is large window betwwen zs_page_isolate and freeze_zspage in zs_page_migrate so the zspage could be destroyed. A easy approach to solve the problem is that object freezing in zs_page_isolate but it has a drawback that any object cannot be deallocated until migration fails after isolation. However, There is large time gab between isolation and migration so any object freeing in other CPU should spin by pin_tag which would cause big latency. So, this patch introduces lock_zspage which holds PG_lock of all pages in a zspage right before freeing the zspage. VM migration locks the page, too right before calling ->migratepage so such race doesn't exist any more. Signed-off-by: Minchan Kim --- include/uapi/linux/magic.h | 1 + mm/zsmalloc.c | 329 +++-- 2 files changed, 317 insertions(+), 13 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index e1fbe72c39c0..93b1affe4801 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -79,5 +79,6 @@ #define NSFS_MAGIC 0x6e736673 #define BPF_FS_MAGIC 0xcafe4a11 #define BALLOON_KVM_MAGIC 0x13661366 +#define ZSMALLOC_MAGIC 0x58295829 #endif /* __LINUX_MAGIC_H__ */ diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 990d752fb65b..b3b31fdfea0f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -56,6 +56,8 @@ #include #include #include +#include +#include /* * This must be power of 2 and greater than of equal to sizeof(link_free). @@ -182,6 +184,8 @@ struct zs_size_stat { static struct dentry *zs_stat_root; #endif +static struct vfsmount *zsmalloc_mnt; + /* * number of size_classes */ @@ -263,6 +267,7 @@ struct zs_pool { #ifdef CONFIG_ZSMALLOC_STAT struct dentry *stat_dentry; #endif + struct inode *inode; }; struct zs_meta { @@ -412,6 +417,29 @@ static int is_last_page(struct page *page) return PagePrivate2(page); } +/* + * Indicate that whether zspage is isolated for page migration. + * Protected by size_class lock + */ +static void SetZsPageIsolate(struct page *first_page) +{ + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + SetPageUptodate(first_page); +} + +static int ZsPageIsolate(struct page *first_page) +{ + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + + return PageUptodate(first_page); +} + +static void ClearZsPageIsolate(struct page *first_page) +{ + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + ClearPageUptodate(first_page); +} + static int get_zspage_inuse(struct page *first_page) { struct zs_meta *m; @@ -783,8 +811,11 @@ static enum fullness_group fix_fullness_group(struct size_class *class, if (newfg == currfg) goto out; - remove_zspage(class, currfg, first_page); - insert_zspage(class, newfg, first_page); + /* Later, putback will insert page to right list */ + if (!ZsPageIsolate(first_page)) { + remove_zspage(class, currfg, first_page); + insert_zspage(class, newfg, first_page); + } set_zspage_mapping(first_page, class_idx, newfg); out: @@ -950,13 +981,31 @@ static void unpin_tag(unsigned long handle) static void reset_page(struct page *page) { + __ClearPageMovable(page); clear_bit(PG_private, &page->flags); clear_bit(PG_private_2, &page->flags); set_page_private(page, 0); page->freelist = NULL; + page->mapping = NULL; page_mapcount_reset(page); } +/** + * lock_zspage - lock all pages in the zspage + * @first_page: head page of the zspage + * + * To prevent destroy during migration, zspage freeing should + * hold locks of all pages in a zspage + */
[PATCH v2 14/18] mm/balloon: use general movable page feature into balloon
Now, VM has a feature to migrate non-lru movable pages so balloon doesn't need custom migration hooks in migrate.c and compact.c. Instead, this patch implements page->mapping ->{isolate|migrate|putback} functions. With that, we could remove hooks for ballooning in general migration functions and make balloon compaction simple. Cc: virtualizat...@lists.linux-foundation.org Cc: Rafael Aquini Cc: Konstantin Khlebnikov Signed-off-by: Gioh Kim Signed-off-by: Minchan Kim --- drivers/virtio/virtio_balloon.c| 45 - include/linux/balloon_compaction.h | 47 - include/linux/page-flags.h | 52 +++ include/uapi/linux/magic.h | 1 + mm/balloon_compaction.c| 101 - mm/compaction.c| 7 --- mm/migrate.c | 22 ++-- mm/vmscan.c| 2 +- 8 files changed, 113 insertions(+), 164 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 7b6d74f0c72f..46a69b6a0c4f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -30,6 +30,7 @@ #include #include #include +#include /* * Balloon device works in 4K page units. So each page is pointed to by @@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); +#ifdef CONFIG_BALLOON_COMPACTION +static struct vfsmount *balloon_mnt; +#endif + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -482,10 +487,29 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, mutex_unlock(&vb->balloon_lock); + ClearPageIsolated(page); put_page(page); /* balloon reference */ return MIGRATEPAGE_SUCCESS; } + +static struct dentry *balloon_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data) +{ + static const struct dentry_operations ops = { + .d_dname = simple_dname, + }; + + return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops, + BALLOON_KVM_MAGIC); +} + +static struct file_system_type balloon_fs = { + .name = "balloon-kvm", + .mount = balloon_mount, + .kill_sb= kill_anon_super, +}; + #endif /* CONFIG_BALLOON_COMPACTION */ static int virtballoon_probe(struct virtio_device *vdev) @@ -516,12 +540,25 @@ static int virtballoon_probe(struct virtio_device *vdev) balloon_devinfo_init(&vb->vb_dev_info); #ifdef CONFIG_BALLOON_COMPACTION + balloon_mnt = kern_mount(&balloon_fs); + if (IS_ERR(balloon_mnt)) { + err = PTR_ERR(balloon_mnt); + goto out_free_vb; + } + vb->vb_dev_info.migratepage = virtballoon_migratepage; + vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb); + if (IS_ERR(vb->vb_dev_info.inode)) { + err = PTR_ERR(vb->vb_dev_info.inode); + vb->vb_dev_info.inode = NULL; + goto out_unmount; + } + vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops; #endif err = init_vqs(vb); if (err) - goto out_free_vb; + goto out_unmount; vb->nb.notifier_call = virtballoon_oom_notify; vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; @@ -535,6 +572,10 @@ static int virtballoon_probe(struct virtio_device *vdev) out_oom_notify: vdev->config->del_vqs(vdev); +out_unmount: + if (vb->vb_dev_info.inode) + iput(vb->vb_dev_info.inode); + kern_unmount(balloon_mnt); out_free_vb: kfree(vb); out: @@ -567,6 +608,8 @@ static void virtballoon_remove(struct virtio_device *vdev) cancel_work_sync(&vb->update_balloon_stats_work); remove_common(vb); + if (vb->vb_dev_info.inode) + iput(vb->vb_dev_info.inode); kfree(vb); } diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h index 9b0a15d06a4f..43a858545844 100644 --- a/include/linux/balloon_compaction.h +++ b/include/linux/balloon_compaction.h @@ -48,6 +48,7 @@ #include #include #include +#include /* * Balloon device information descriptor. @@ -62,6 +63,7 @@ struct balloon_dev_info { struct list_head pages; /* Pages enqueued & handled to Host */ int (*migratepage)(struct balloon_dev_info *, struct page *newpage, struct page *page, enum migrate_mode mode); + struct inode *inode; }; extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info); @@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon) spin_lock_init(&balloon->pages_lock); INIT_L
[PATCH v2 13/18] mm/compaction: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run. So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags. Basically, this patch supports two page-flags and two functions related to page migration. The flag and page->mapping stability are protected by PG_lock. PG_movable PG_isolated bool (*isolate_page) (struct page *, isolate_mode_t); void (*putback_page) (struct page *); Duty of subsystem want to make their pages as migratable are as follows: 1. It should register address_space to page->mapping then mark the page as PG_movable via __SetPageMovable. 2. It should mark the page as PG_isolated via SetPageIsolated if isolation is sucessful and return true. 3. If migration is successful, it should clear PG_isolated and PG_movable of the page for free preparation then release the reference of the page to free. 4. If migration fails, putback function of subsystem should clear PG_isolated via ClearPageIsolated. Cc: Vlastimil Babka Cc: Mel Gorman Cc: Hugh Dickins Cc: dri-de...@lists.freedesktop.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Gioh Kim Signed-off-by: Minchan Kim --- Documentation/filesystems/Locking | 4 + Documentation/filesystems/vfs.txt | 5 ++ fs/proc/page.c | 3 + include/linux/fs.h | 2 + include/linux/migrate.h| 2 + include/linux/page-flags.h | 29 include/uapi/linux/kernel-page-flags.h | 1 + mm/compaction.c| 14 +++- mm/migrate.c | 132 + 9 files changed, 177 insertions(+), 15 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 619af9bfdcb3..0bb79560abb3 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -195,7 +195,9 @@ unlocks and drops the reference. int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + bool (*isolate_page) (struct page *, isolate_mode_t); int (*migratepage)(struct address_space *, struct page *, struct page *); + void (*putback_page) (struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); @@ -219,7 +221,9 @@ invalidatepage: yes releasepage: yes freepage: yes direct_IO: +isolate_page: yes migratepage: yes (both) +putback_page: yes launder_page: yes is_partially_uptodate: yes error_remove_page: yes diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b02a7d598258..4c1b6c3b4bc8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -592,9 +592,14 @@ struct address_space_operations { int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + /* isolate a page for migration */ + bool (*isolate_page) (struct page *, isolate_mode_t); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); + /* put the page back to right list */ + void (*putback_page) (struct page *); int (*launder_page) (struct page *); + int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *); diff --git a/fs/proc/page.c b/fs/proc/page.c index 712f1b9992cc..e2066e73a9b8 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) if (page_is_idle(page)) u |= 1 << KPF_IDLE; + if (PageMovable(page)) + u |= 1 << KPF_MOVABLE; + u |= kpf_copy_bit(k, KPF_LOCKED,PG_locked); u |= kpf_copy_bit(k, KPF_SLAB, PG_slab); diff --git a/include/linux/fs.h b/include/linux/fs.h index 14a97194b34b..b7ef2e41fa4a 100644 --- a/include/linux/fs.h +
[PATCH v2 02/18] zsmalloc: use first_page rather than page
This patch cleans up function parameter "struct page". Many functions of zsmalloc expects that page paramter is "first_page" so use "first_page" rather than "page" for code readability. Reviewed-by: Sergey Senozhatsky Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 62 ++- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index e72efb109fde..b09a80d398c9 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -413,26 +413,28 @@ static int is_last_page(struct page *page) return PagePrivate2(page); } -static void get_zspage_mapping(struct page *page, unsigned int *class_idx, +static void get_zspage_mapping(struct page *first_page, + unsigned int *class_idx, enum fullness_group *fullness) { unsigned long m; - BUG_ON(!is_first_page(page)); + BUG_ON(!is_first_page(first_page)); - m = (unsigned long)page->mapping; + m = (unsigned long)first_page->mapping; *fullness = m & FULLNESS_MASK; *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK; } -static void set_zspage_mapping(struct page *page, unsigned int class_idx, +static void set_zspage_mapping(struct page *first_page, + unsigned int class_idx, enum fullness_group fullness) { unsigned long m; - BUG_ON(!is_first_page(page)); + BUG_ON(!is_first_page(first_page)); m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) | (fullness & FULLNESS_MASK); - page->mapping = (struct address_space *)m; + first_page->mapping = (struct address_space *)m; } /* @@ -625,14 +627,14 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool) * the pool (not yet implemented). This function returns fullness * status of the given page. */ -static enum fullness_group get_fullness_group(struct page *page) +static enum fullness_group get_fullness_group(struct page *first_page) { int inuse, max_objects; enum fullness_group fg; - BUG_ON(!is_first_page(page)); + BUG_ON(!is_first_page(first_page)); - inuse = page->inuse; - max_objects = page->objects; + inuse = first_page->inuse; + max_objects = first_page->objects; if (inuse == 0) fg = ZS_EMPTY; @@ -652,12 +654,12 @@ static enum fullness_group get_fullness_group(struct page *page) * have. This functions inserts the given zspage into the freelist * identified by . */ -static void insert_zspage(struct page *page, struct size_class *class, +static void insert_zspage(struct page *first_page, struct size_class *class, enum fullness_group fullness) { struct page **head; - BUG_ON(!is_first_page(page)); + BUG_ON(!is_first_page(first_page)); if (fullness >= _ZS_NR_FULLNESS_GROUPS) return; @@ -667,7 +669,7 @@ static void insert_zspage(struct page *page, struct size_class *class, head = &class->fullness_list[fullness]; if (!*head) { - *head = page; + *head = first_page; return; } @@ -675,21 +677,21 @@ static void insert_zspage(struct page *page, struct size_class *class, * We want to see more ZS_FULL pages and less almost * empty/full. Put pages with higher ->inuse first. */ - list_add_tail(&page->lru, &(*head)->lru); - if (page->inuse >= (*head)->inuse) - *head = page; + list_add_tail(&first_page->lru, &(*head)->lru); + if (first_page->inuse >= (*head)->inuse) + *head = first_page; } /* * This function removes the given zspage from the freelist identified * by . */ -static void remove_zspage(struct page *page, struct size_class *class, +static void remove_zspage(struct page *first_page, struct size_class *class, enum fullness_group fullness) { struct page **head; - BUG_ON(!is_first_page(page)); + BUG_ON(!is_first_page(first_page)); if (fullness >= _ZS_NR_FULLNESS_GROUPS) return; @@ -698,11 +700,11 @@ static void remove_zspage(struct page *page, struct size_class *class, BUG_ON(!*head); if (list_empty(&(*head)->lru)) *head = NULL; - else if (*head == page) + else if (*head == first_page) *head = (struct page *)list_entry((*head)->lru.next, struct page, lru); - list_del_init(&page->lru); + list_del_init(&first_page->lru); zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ? CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1); } @@ -717,21 +719,21 @@ static void remove_zspage(struct page *page, struct size_class *class, * fullness group. */ static enum
[PATCH v2 00/18] Support non-lru page migration
Recently, I got many reports about perfermance degradation in embedded system(Android mobile phone, webOS TV and so on) and failed to fork easily. The problem was fragmentation caused by zram and GPU driver pages. Their pages cannot be migrated so compaction cannot work well, either so reclaimer ends up shrinking all of working set pages. It made system very slow and even to fail to fork easily. Other pain point is that they cannot work with CMA. Most of CMA memory space could be idle(ie, it could be used for movable pages unless driver is using) but if driver(i.e., zram) cannot migrate his page, that memory space could be wasted. In our product which has big CMA memory, it reclaims zones too exccessively although there are lots of free space in CMA so system was very slow easily. To solve these problem, this patch try to add facility to migrate non-lru pages via introducing new friend functions of migratepage in address_space_operation and new page flags. (isolate_page, putback_page) (PG_movable, PG_isolated) For details, please read description in "mm/compaction: support non-lru movable page migration". Originally, Gioh Kim tried to support this feature but he moved so I took over the work. But I took many code from his work and changed a little bit. Thanks, Gioh! And I should mention Konstantin Khlebnikov. He really heped Gioh at that time so he should deserve to have many credit, too. Thanks, Konstantin! This patchset consists of five parts 1. clean up migration mm: use put_page to free page instead of putback_lru_page 2. zsmalloc clean-up for preparing page migration zsmalloc: use first_page rather than page zsmalloc: clean up many BUG_ON zsmalloc: reordering function parameter zsmalloc: remove unused pool param in obj_free zsmalloc: keep max_object in size_class zsmalloc: squeeze inuse into page->mapping zsmalloc: squeeze freelist into page->mapping zsmalloc: move struct zs_meta from mapping to freelist zsmalloc: factor page chain functionality out zsmalloc: separate free_zspage from putback_zspage zsmalloc: zs_compact refactoring 3. add non-lru page migration feature mm/compaction: support non-lru movable page migration 4. rework KVM memory-ballooning mm/balloon: use general movable page feature into balloon 5. add zsmalloc page migration zsmalloc: migrate head page of zspage zsmalloc: use single linked list for page chain zsmalloc: migrate tail pages in zspage zram: use __GFP_MOVABLE for memory allocation * From v1 * rebase on v4.5-mmotm-2016-03-17-15-04 * reordering patches to merge clean-up patches first * add Acked-by/Reviewed-by from Vlastimil and Sergey * use each own mount model instead of reusing anon_inode_fs - Al Viro * small changes - YiPing, Gioh Minchan Kim (18): mm: use put_page to free page instead of putback_lru_page zsmalloc: use first_page rather than page zsmalloc: clean up many BUG_ON zsmalloc: reordering function parameter zsmalloc: remove unused pool param in obj_free zsmalloc: keep max_object in size_class zsmalloc: squeeze inuse into page->mapping zsmalloc: squeeze freelist into page->mapping zsmalloc: move struct zs_meta from mapping to freelist zsmalloc: factor page chain functionality out zsmalloc: separate free_zspage from putback_zspage zsmalloc: zs_compact refactoring mm/compaction: support non-lru movable page migration mm/balloon: use general movable page feature into balloon zsmalloc: migrate head page of zspage zsmalloc: use single linked list for page chain zsmalloc: migrate tail pages in zspage zram: use __GFP_MOVABLE for memory allocation Documentation/filesystems/Locking |4 + Documentation/filesystems/vfs.txt |5 + drivers/block/zram/zram_drv.c |3 +- drivers/virtio/virtio_balloon.c| 45 +- fs/proc/page.c |3 + include/linux/balloon_compaction.h | 47 +- include/linux/fs.h |2 + include/linux/migrate.h|2 + include/linux/page-flags.h | 41 +- include/uapi/linux/kernel-page-flags.h |1 + include/uapi/linux/magic.h |2 + mm/balloon_compaction.c| 101 +-- mm/compaction.c| 15 +- mm/migrate.c | 198 +++-- mm/vmscan.c|2 +- mm/zsmalloc.c | 1338 +++- 16 files changed, 1284 insertions(+), 525 deletions(-) -- 1.9.1
[PATCH v2 11/18] zsmalloc: separate free_zspage from putback_zspage
Currently, putback_zspage does free zspage under class->lock if fullness become ZS_EMPTY but it makes trouble to implement locking scheme for new zspage migration. So, this patch is to separate free_zspage from putback_zspage and free zspage out of class->lock which is preparation for zspage migration. Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 46 +++--- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 833da8f4ffc9..9c0ab1e92e9b 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -950,7 +950,8 @@ static void reset_page(struct page *page) page_mapcount_reset(page); } -static void free_zspage(struct page *first_page) +static void free_zspage(struct zs_pool *pool, struct size_class *class, + struct page *first_page) { struct page *nextp, *tmp, *head_extra; @@ -973,6 +974,11 @@ static void free_zspage(struct page *first_page) } reset_page(head_extra); __free_page(head_extra); + + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( + class->size, class->pages_per_zspage)); + atomic_long_sub(class->pages_per_zspage, + &pool->pages_allocated); } /* Initialize a newly allocated zspage */ @@ -1560,13 +1566,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle) spin_lock(&class->lock); obj_free(class, obj); fullness = fix_fullness_group(class, first_page); - if (fullness == ZS_EMPTY) { - zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( - class->size, class->pages_per_zspage)); - atomic_long_sub(class->pages_per_zspage, - &pool->pages_allocated); - free_zspage(first_page); - } + if (fullness == ZS_EMPTY) + free_zspage(pool, class, first_page); spin_unlock(&class->lock); unpin_tag(handle); @@ -1753,7 +1754,7 @@ static struct page *isolate_target_page(struct size_class *class) * @class: destination class * @first_page: target page * - * Return @fist_page's fullness_group + * Return @first_page's updated fullness_group */ static enum fullness_group putback_zspage(struct zs_pool *pool, struct size_class *class, @@ -1765,15 +1766,6 @@ static enum fullness_group putback_zspage(struct zs_pool *pool, insert_zspage(class, fullness, first_page); set_zspage_mapping(first_page, class->index, fullness); - if (fullness == ZS_EMPTY) { - zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( - class->size, class->pages_per_zspage)); - atomic_long_sub(class->pages_per_zspage, - &pool->pages_allocated); - - free_zspage(first_page); - } - return fullness; } @@ -1836,23 +1828,31 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class) if (!migrate_zspage(pool, class, &cc)) break; - putback_zspage(pool, class, dst_page); + VM_BUG_ON_PAGE(putback_zspage(pool, class, + dst_page) == ZS_EMPTY, dst_page); } /* Stop if we couldn't find slot */ if (dst_page == NULL) break; - putback_zspage(pool, class, dst_page); - if (putback_zspage(pool, class, src_page) == ZS_EMPTY) + VM_BUG_ON_PAGE(putback_zspage(pool, class, + dst_page) == ZS_EMPTY, dst_page); + if (putback_zspage(pool, class, src_page) == ZS_EMPTY) { pool->stats.pages_compacted += class->pages_per_zspage; - spin_unlock(&class->lock); + spin_unlock(&class->lock); + free_zspage(pool, class, src_page); + } else { + spin_unlock(&class->lock); + } + cond_resched(); spin_lock(&class->lock); } if (src_page) - putback_zspage(pool, class, src_page); + VM_BUG_ON_PAGE(putback_zspage(pool, class, + src_page) == ZS_EMPTY, src_page); spin_unlock(&class->lock); } -- 1.9.1
[PATCH v2 03/18] zsmalloc: clean up many BUG_ON
There are many BUG_ON in zsmalloc.c which is not recommened so change them as alternatives. Normal rule is as follows: 1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE 2. use VM_BUG_ON_PAGE if we need to see struct page's fields 3. use those assertion in primitive functions so higher functions can rely on the assertion in the primitive function. 4. Don't use assertion if following instruction can trigger Oops Reviewed-by: Sergey Senozhatsky Signed-off-by: Minchan Kim --- mm/zsmalloc.c | 42 +++--- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index b09a80d398c9..6a7b9313ee8c 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -418,7 +418,7 @@ static void get_zspage_mapping(struct page *first_page, enum fullness_group *fullness) { unsigned long m; - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); m = (unsigned long)first_page->mapping; *fullness = m & FULLNESS_MASK; @@ -430,7 +430,7 @@ static void set_zspage_mapping(struct page *first_page, enum fullness_group fullness) { unsigned long m; - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) | (fullness & FULLNESS_MASK); @@ -631,7 +631,8 @@ static enum fullness_group get_fullness_group(struct page *first_page) { int inuse, max_objects; enum fullness_group fg; - BUG_ON(!is_first_page(first_page)); + + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); inuse = first_page->inuse; max_objects = first_page->objects; @@ -659,7 +660,7 @@ static void insert_zspage(struct page *first_page, struct size_class *class, { struct page **head; - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); if (fullness >= _ZS_NR_FULLNESS_GROUPS) return; @@ -691,13 +692,13 @@ static void remove_zspage(struct page *first_page, struct size_class *class, { struct page **head; - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); if (fullness >= _ZS_NR_FULLNESS_GROUPS) return; head = &class->fullness_list[fullness]; - BUG_ON(!*head); + VM_BUG_ON_PAGE(!*head, first_page); if (list_empty(&(*head)->lru)) *head = NULL; else if (*head == first_page) @@ -724,8 +725,6 @@ static enum fullness_group fix_fullness_group(struct size_class *class, int class_idx; enum fullness_group currfg, newfg; - BUG_ON(!is_first_page(first_page)); - get_zspage_mapping(first_page, &class_idx, &currfg); newfg = get_fullness_group(first_page); if (newfg == currfg) @@ -811,7 +810,7 @@ static void *location_to_obj(struct page *page, unsigned long obj_idx) unsigned long obj; if (!page) { - BUG_ON(obj_idx); + VM_BUG_ON(obj_idx); return NULL; } @@ -844,7 +843,7 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page, void *obj) { if (class->huge) { - VM_BUG_ON(!is_first_page(page)); + VM_BUG_ON_PAGE(!is_first_page(page), page); return page_private(page); } else return *(unsigned long *)obj; @@ -894,8 +893,8 @@ static void free_zspage(struct page *first_page) { struct page *nextp, *tmp, *head_extra; - BUG_ON(!is_first_page(first_page)); - BUG_ON(first_page->inuse); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + VM_BUG_ON_PAGE(first_page->inuse, first_page); head_extra = (struct page *)page_private(first_page); @@ -921,7 +920,8 @@ static void init_zspage(struct page *first_page, struct size_class *class) unsigned long off = 0; struct page *page = first_page; - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); + while (page) { struct page *next_page; struct link_free *link; @@ -1238,7 +1238,7 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage) static bool zspage_full(struct page *first_page) { - BUG_ON(!is_first_page(first_page)); + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); return first_page->inuse == first_page->objects; } @@ -1276,14 +1276,12 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, struct page *pages[2]; void *ret; - BUG_ON(!handle); - /* * Because we use per-cpu mapping areas shared a
[PATCH v2 01/18] mm: use put_page to free page instead of putback_lru_page
Procedure of page migration is as follows: First of all, it should isolate a page from LRU and try to migrate the page. If it is successful, it releases the page for freeing. Otherwise, it should put the page back to LRU list. For LRU pages, we have used putback_lru_page for both freeing and putback to LRU list. It's okay because put_page is aware of LRU list so if it releases last refcount of the page, it removes the page from LRU list. However, It makes unnecessary operations (e.g., lru_cache_add, pagevec and flags operations. It would be not significant but no worth to do) and harder to support new non-lru page migration because put_page isn't aware of non-lru page's data structure. To solve the problem, we can add new hook in put_page with PageMovable flags check but it can increase overhead in hot path and needs new locking scheme to stabilize the flag check with put_page. So, this patch cleans it up to divide two semantic(ie, put and putback). If migration is successful, use put_page instead of putback_lru_page and use putback_lru_page only on failure. That makes code more readable and doesn't add overhead in put_page. Comment from Vlastimil "Yeah, and compaction (perhaps also other migration users) has to drain the lru pvec... Getting rid of this stuff is worth even by itself." Cc: Mel Gorman Cc: Hugh Dickins Cc: Naoya Horiguchi Acked-by: Vlastimil Babka Signed-off-by: Minchan Kim --- mm/migrate.c | 50 +++--- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 6c822a7b27e0..b65c84267ce0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -913,6 +913,14 @@ static int __unmap_and_move(struct page *page, struct page *newpage, put_anon_vma(anon_vma); unlock_page(page); out: + /* If migration is scucessful, move newpage to right list */ + if (rc == MIGRATEPAGE_SUCCESS) { + if (unlikely(__is_movable_balloon_page(newpage))) + put_page(newpage); + else + putback_lru_page(newpage); + } + return rc; } @@ -946,6 +954,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, if (page_count(page) == 1) { /* page was freed from under us. So we are done. */ + ClearPageActive(page); + ClearPageUnevictable(page); + if (put_new_page) + put_new_page(newpage, private); + else + put_page(newpage); goto out; } @@ -958,10 +972,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, } rc = __unmap_and_move(page, newpage, force, mode); - if (rc == MIGRATEPAGE_SUCCESS) { - put_new_page = NULL; + if (rc == MIGRATEPAGE_SUCCESS) set_page_owner_migrate_reason(newpage, reason); - } out: if (rc != -EAGAIN) { @@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); - /* Soft-offlined page shouldn't go through lru cache list */ + } + + /* +* If migration is successful, drop the reference grabbed during +* isolation. Otherwise, restore the page to LRU list unless we +* want to retry. +*/ + if (rc == MIGRATEPAGE_SUCCESS) { + put_page(page); if (reason == MR_MEMORY_FAILURE) { - put_page(page); if (!test_set_page_hwpoison(page)) num_poisoned_pages_inc(); - } else + } + } else { + if (rc != -EAGAIN) putback_lru_page(page); + if (put_new_page) + put_new_page(newpage, private); + else + put_page(newpage); } - /* -* If migration was not successful and there's a freeing callback, use -* it. Otherwise, putback_lru_page() will drop the reference grabbed -* during isolation. -*/ - if (put_new_page) - put_new_page(newpage, private); - else if (unlikely(__is_movable_balloon_page(newpage))) { - /* drop our reference, page already in the balloon */ - put_page(newpage); - } else - putback_lru_page(newpage); - if (result) { if (rc) *result = rc; -- 1.9.1
Re: [PATCH v2 3/3] Make core_pattern support namespace
Zhao Lei writes: > Currently, each container shared one copy of coredump setting > with the host system, if host system changed the setting, each > running containers will be affected. > > Moreover, it is not easy to let each container keeping their own > coredump setting. > > We can use some workaround as pipe program to make the second > requirement possible, but it is not simple, and both host and > container are limited to set to fixed pipe program. > In one word, for host running contailer, we can't change core_pattern > anymore. > To make the problem more hard, if a host running more than one > container product, each product will try to snatch the global > coredump setting to fit their own requirement. > > For container based on namespace design, it is good to allow > each container keeping their own coredump setting. > > It will bring us following benefit: > 1: Each container can change their own coredump setting >based on operation on /proc/sys/kernel/core_pattern > 2: Coredump setting changed in host will not affect >running containers. > 3: Support both case of "putting coredump in guest" and >"putting curedump in host". > > Each namespace-based software(lxc, docker, ..) can use this function > to custom their dump setting. > > And this function makes each continer working as separate system, > it fit for design goal of namespace There are a lot of questionable things with this patchset. > @@ -183,7 +182,7 @@ put_exe_file: > static int format_corename(struct core_name *cn, struct coredump_params > *cprm) > { > const struct cred *cred = current_cred(); > - const char *pat_ptr = core_pattern; > + const char *pat_ptr = > current->nsproxy->pid_ns_for_children->core_pattern; current->nsproxy->pid_ns_for_children as the name implies is completely inappropriate for getting the pid namespace of the current task. This should use task_active_pid_namespace. > int ispipe = (*pat_ptr == '|'); > int pid_in_pattern = 0; > int err = 0; > diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h > index 918b117..a5af1e9 100644 > --- a/include/linux/pid_namespace.h > +++ b/include/linux/pid_namespace.h > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > > struct pidmap { > atomic_t nr_free; > @@ -45,6 +46,7 @@ struct pid_namespace { > int hide_pid; > int reboot; /* group exit code if this pidns was rebooted */ > struct ns_common ns; > + char core_pattern[CORENAME_MAX_SIZE]; > }; > > extern struct pid_namespace init_pid_ns; > diff --git a/kernel/pid.c b/kernel/pid.c > index 4d73a83..c79c1d5 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -83,6 +83,7 @@ struct pid_namespace init_pid_ns = { > #ifdef CONFIG_PID_NS > .ns.ops = &pidns_operations, > #endif > + .core_pattern = "core", > }; > EXPORT_SYMBOL_GPL(init_pid_ns); > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index a65ba13..16d6d21 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -123,6 +123,9 @@ static struct pid_namespace *create_pid_namespace(struct > user_namespace *user_ns > for (i = 1; i < PIDMAP_ENTRIES; i++) > atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE); > > + strncpy(ns->core_pattern, parent_pid_ns->core_pattern, > + sizeof(ns->core_pattern)); > + This is pretty horrible. You are giving unprivileged processes the ability to run an already specified core dump helper in a pid namespace of their choosing. That is not backwards compatible, and it is possible this can lead to privilege escalation by triciking a privileged dump process to do something silly because it is running in the wrong pid namespace. Similarly the entire concept of forking from the program dumping core suffers from the same problem but for all other namespaces. I was hoping that I would see a justification somewhere in the patch descriptions describing why this set of decisions could be safe. I do not and so I assume this case was not considered. If you had managed to fork for the child_reaper of the pid_namespace that set the core pattern (as has been suggested) there would be some chance that things would work correctly.As you are forking from the program actually dumping core I see no chance that this patchset is either safe or backwards compatible as currently written. Eric
[PATCH] firewire: nosy: Replace timeval with timespec64
'struct timeval' uses a 32 bit field for its 'seconds' value which will overflow in year 2038 and beyond. This patch replaces the use of timeval in nosy.c with timespec64 which doesn't suffer from y2038 issue. The code is correct as is - since it is only using the microseconds portion of timeval. However, this patch does the replacement as part of a larger effort to remove all instances of 'struct timeval' from the kernel (that would help identify cases where the code is actually broken). Signed-off-by: Tina Ruchandani --- drivers/firewire/nosy.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/firewire/nosy.c b/drivers/firewire/nosy.c index 8a46077..631c977 100644 --- a/drivers/firewire/nosy.c +++ b/drivers/firewire/nosy.c @@ -446,14 +446,16 @@ static void bus_reset_irq_handler(struct pcilynx *lynx) { struct client *client; - struct timeval tv; + struct timespec64 ts64; + u32timestamp; - do_gettimeofday(&tv); + ktime_get_real_ts64(&ts64); + timestamp = ts64.tv_nsec / NSEC_PER_USEC; spin_lock(&lynx->client_list_lock); list_for_each_entry(client, &lynx->client_list, link) - packet_buffer_put(&client->buffer, &tv.tv_usec, 4); + packet_buffer_put(&client->buffer, ×tamp, 4); spin_unlock(&lynx->client_list_lock); } -- 2.8.0.rc3.226.g39d4020
Re: [PATCH 1/4] vfs: add file_dentry()
On Thu, Mar 17, 2016 at 10:02:00AM +0100, Miklos Szeredi wrote: > Add a new helper, file_dentry() [*], to get the filesystem's own dentry > from the file. This simply compares file_inode(file->f_path.dentry) to > file_inode(file) and if they are equal returns file->f_path.dentry (this is > the common, non-overlayfs case). > > In the uncommon case (regular file on overlayfs) it will call into > overlayfs's ->d_native_dentry() to get the underlying dentry matching > file_inode(file). What's wrong with making ovl_dentry_real() an instance of optional ->d_real() method and having a flag (DCACHE_OP_REAL) controlling its calls? With d_real(dentry) returning either that or dentry itself, and file_dentry(file) being simply d_real(file->f_path.dentry)... Why do we need to look at the inode at all? d_set_d_op() dereferences ->d_op anyway, as well as setting ->d_flags, so there's no extra cost there, and "test bit in ->d_flags + branch not taken" is all it would cost in normal case...
Re: [PATCH 1/4] vfs: add file_dentry()
On Mon, Mar 21, 2016 at 01:02:15AM -0400, Theodore Ts'o wrote: > I have this patch in the ext4.git tree, but I'd like to get an > Acked-by from Al before I send a pull request to Linus. > > Al? Any objections to my sending in this change via the ext4 tree? > - Ted FWIW, I would rather add DCACHE_OP_REAL (set at d_set_d_op() time) and turned that into static inline struct dentry *d_real(const struct dentry *dentry) { if (unlikely(dentry->d_flags & DCACHE_OP_NATIVE_DENTRY)) returd dentry->d_op->d_real(dentry); else return dentry; } static inline struct dentry *file_dentry(const struct file *file) { return d_real(file->f_path.dentry); } and used ovl_dentry_real as ->d_real for overlayfs. Miklos, do you see any problems with that variant?
Re: [PATCH v11 3/9] arm64: add copy_to/from_user to kprobes blacklist
Hi James, On 18/03/2016:06:12:20 PM, James Morse wrote: > Hi Pratyush, > > On 18/03/16 14:43, Pratyush Anand wrote: > > On 18/03/2016:02:02:49 PM, James Morse wrote: > >> In kernel/entry.S when entered from EL0 we test for TIF_SINGLESTEP in the > >> thread_info flags, and use disable_step_tsk/enable_step_tsk to > >> save/restore the > >> single-step state. > >> > >> Could we do this regardless of which EL we came from? > > > > Thanks for another idea. I think, we can not do this as it is, because > > TIF_SINGLESTEP will not be set for kprobe events. > > Hmmm, I see kernel_enable_single_step() doesn't set it, but setup_singlestep() > in patch 5 could... > > There is probably a good reason its never set for a kernel thread, I will > have a > look at where else it is used. > > > > But, we can introduce a > > variant disable_step_kernel and enable_step_kernel, which can be called in > > el1_da. > > What about sp/pc misalignment, or undefined instructions? > Or worse... an irq occurs during your el1_da call (el1_da may re-enable irqs). > el1_irq doesn't know you were careful not to unmask debug exceptions, it > blindly > turns them back on. > > The problem is the 'single step me' bit is still set, save/restoring it will > save us having to consider every interaction, (and then missing some!). > > It would also mean you don't have to disable interrupts while single stepping > in > patch 5 (comment above kprobes_save_local_irqflag()). I see. kernel_enable_single_step() is called from watchpoint and kgdb handler. It seems to me that, similar issue may arise there as well. So, it would be a good idea to set TIF_SINGLESTEP in kernel_enable_single_step() and clear in kernel_disable_single_step(). Meanwhile, I prepared a test case to reproduce the issue without this patch. Instrumented a kprobe at an instruction of __copy_to_user() which stores in user space memory. I can see a sea of messages "Unexpected kernel single-step exception at EL1" within few seconds. While with patch[1] applied, I do not see any such messages. May be I can send [1] as RFC and seek feedback. ~Pratyush [1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652
[GIT PULL] MD for 4.6
Hi Linus, Could you please pull the MD update for 4.6? This update mainly fixes bugs. - A raid5 discard related fix from Jes - A MD multipath bio clone fix from Ming - raid1 error handling deadlock fix from Nate and corresponding raid10 fix from myself - A raid5 stripe batch fix from Neil - A patch from Sebastian to avoid unnecessary uevent - Several cleanup/debug patches Thanks, Shaohua The following changes since commit 6dc390ad61ac8dfca5fa9b0823981fb6f7ec17a0: Merge tag 'arc-4.5-rc6-fixes-upd' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc (2016-02-24 14:06:17 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.6-rc1 for you to fetch changes up to 1d034e68e2c256640eb1f44bd7dcd89f90806ccf: md/raid5: Cleanup cpu hotplug notifier (2016-03-17 14:30:15 -0700) Anna-Maria Gleixner (1): md/raid5: Cleanup cpu hotplug notifier Eric Engestrom (1): md/bitmap: remove redundant check Guoqing Jiang (3): md/raid1: remove unnecessary BUG_ON md/bitmap: remove redundant return in bitmap_checkpage md: fix typos for stipe Jes Sorensen (1): md/raid5: Compare apples to apples (or sectors to sectors) Ming Lei (1): md: multipath: don't hardcopy bio in .make_request path Nate Dailey (1): raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang NeilBrown (1): md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list Sebastian Parschauer (1): md: Drop sending a change uevent when stopping Shaohua Li (6): RAID5: check_reshape() shouldn't call mddev_suspend RAID5: revert e9e4c377e2f563 to fix a livelock MD: warn for potential deadlock Update MD git tree URL md/raid5: output stripe state for debug raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang MAINTAINERS| 2 +- drivers/md/bitmap.c| 4 +--- drivers/md/bitmap.h| 4 ++-- drivers/md/md.c| 2 +- drivers/md/multipath.c | 4 +++- drivers/md/raid1.c | 8 --- drivers/md/raid10.c| 7 -- drivers/md/raid5.c | 63 +- drivers/md/raid5.h | 4 +++- 9 files changed, 58 insertions(+), 40 deletions(-)
Re: [PATCH 1/4] vfs: add file_dentry()
On Thu, Mar 17, 2016 at 10:02:00AM +0100, Miklos Szeredi wrote: > From: Miklos Szeredi > > This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make > f_path always point to the overlay and f_inode to the underlay"). > > Regular files opened on overlayfs will result in the file being opened on > the underlying filesystem, while f_path points to the overlayfs > mount/dentry. > > This confuses filesystems which get the dentry from struct file and assume > it's theirs. > > Add a new helper, file_dentry() [*], to get the filesystem's own dentry > from the file. This simply compares file_inode(file->f_path.dentry) to > file_inode(file) and if they are equal returns file->f_path.dentry (this is > the common, non-overlayfs case). > > In the uncommon case (regular file on overlayfs) it will call into > overlayfs's ->d_native_dentry() to get the underlying dentry matching > file_inode(file). > > [*] If possible, it's better simply to use file_inode() instead. > > Signed-off-by: Miklos Szeredi > Tested-by: Goldwyn Rodrigues > Reviewed-by: Trond Myklebust > Cc: # v4.2 > Cc: David Howells > Cc: Al Viro > Cc: Theodore Ts'o > Cc: Daniel Axtens > --- > fs/open.c | 11 +++ > fs/overlayfs/super.c | 16 > include/linux/dcache.h | 1 + > include/linux/fs.h | 2 ++ > 4 files changed, 30 insertions(+) I have this patch in the ext4.git tree, but I'd like to get an Acked-by from Al before I send a pull request to Linus. Al? Any objections to my sending in this change via the ext4 tree? - Ted > > diff --git a/fs/open.c b/fs/open.c > index 55bdc75e2172..6326c11eda78 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -831,6 +831,17 @@ char *file_path(struct file *filp, char *buf, int buflen) > } > EXPORT_SYMBOL(file_path); > > +struct dentry *file_dentry(const struct file *file) > +{ > + struct dentry *dentry = file->f_path.dentry; > + > + if (likely(d_inode(dentry) == file_inode(file))) > + return dentry; > + else > + return dentry->d_op->d_native_dentry(dentry, file_inode(file)); > +} > +EXPORT_SYMBOL(file_dentry); > + > /** > * vfs_open - open the file at the given path > * @path: path to open > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index 619ad4b016d2..5142aa2034c4 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -336,14 +336,30 @@ static int ovl_dentry_weak_revalidate(struct dentry > *dentry, unsigned int flags) > return ret; > } > > +static struct dentry *ovl_d_native_dentry(struct dentry *dentry, > + struct inode *inode) > +{ > + struct ovl_entry *oe = dentry->d_fsdata; > + struct dentry *realentry = ovl_upperdentry_dereference(oe); > + > + if (realentry && inode == d_inode(realentry)) > + return realentry; > + realentry = __ovl_dentry_lower(oe); > + if (realentry && inode == d_inode(realentry)) > + return realentry; > + BUG(); > +} > + > static const struct dentry_operations ovl_dentry_operations = { > .d_release = ovl_dentry_release, > .d_select_inode = ovl_d_select_inode, > + .d_native_dentry = ovl_d_native_dentry, > }; > > static const struct dentry_operations ovl_reval_dentry_operations = { > .d_release = ovl_dentry_release, > .d_select_inode = ovl_d_select_inode, > + .d_native_dentry = ovl_d_native_dentry, > .d_revalidate = ovl_dentry_revalidate, > .d_weak_revalidate = ovl_dentry_weak_revalidate, > }; > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index c4b5f4b3f8f8..99ecb6de636c 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -161,6 +161,7 @@ struct dentry_operations { > struct vfsmount *(*d_automount)(struct path *); > int (*d_manage)(struct dentry *, bool); > struct inode *(*d_select_inode)(struct dentry *, unsigned); > + struct dentry *(*d_native_dentry)(struct dentry *, struct inode *); > } cacheline_aligned; > > /* > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ae681002100a..1091d9f43271 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1234,6 +1234,8 @@ static inline struct inode *file_inode(const struct > file *f) > return f->f_inode; > } > > +extern struct dentry *file_dentry(const struct file *file); > + > static inline int locks_lock_file_wait(struct file *filp, struct file_lock > *fl) > { > return locks_lock_inode_wait(file_inode(filp), fl); > -- > 2.1.4 >
Re: [PATCH v3] ARC: [dts] Introduce Timer bindings
On Sunday 20 March 2016 06:12 AM, Rob Herring wrote: > On Fri, Mar 18, 2016 at 10:56:29AM +0530, Vineet Gupta wrote: >> ARC Timers have historically been probed directly. >> As precursor to start probing Timers thru DT introduce these bindings >> Note that to keep series bisectable, these bindings are not yet used in >> code. >> >> Cc: Daniel Lezcano >> Cc: Rob Herring >> Cc: devicet...@vger.kernel.org >> Signed-off-by: Vineet Gupta >> --- >> v3: >> - Renamed Node name to avoid new warnings when unit address used w/o regs >> [Rob] >> v2: >> - http://lists.infradead.org/pipermail/linux-snps-arc/2016-March/000653.html >> - snps,arc-timer[0-1] folded into single snps-arc-timer [Rob] >> - Node name in DT example fixed:[Rob] >> "timer1: timer_clksrc {" -> timer@1 { >> - Introduced 64bit RTC in skeleton_hs.dtsi [Vineet] >> v1: >> - >> http://lists.infradead.org/pipermail/linux-snps-arc/2016-February/000447.html >> --- >> .../devicetree/bindings/timer/snps,arc-timer.txt | 32 >> ++ >> .../devicetree/bindings/timer/snps,archs-gfrc.txt | 14 ++ >> .../devicetree/bindings/timer/snps,archs-rtc.txt | 14 ++ >> arch/arc/boot/dts/abilis_tb10x.dtsi| 14 ++ >> arch/arc/boot/dts/skeleton.dtsi| 14 ++ >> arch/arc/boot/dts/skeleton_hs.dtsi | 20 ++ >> arch/arc/boot/dts/skeleton_hs_idu.dtsi | 14 ++ >> 7 files changed, 122 insertions(+) >> create mode 100644 >> Documentation/devicetree/bindings/timer/snps,arc-timer.txt >> create mode 100644 >> Documentation/devicetree/bindings/timer/snps,archs-gfrc.txt >> create mode 100644 >> Documentation/devicetree/bindings/timer/snps,archs-rtc.txt > Acked-by: Rob Herring Thx a bunch Rob ! -Vineet
[GIT PULL] ARC changes for 4.6-rc1
Hi Linus, ARC changes for 4.6-rc1. Nothing too exciting here although diffstat hows more files touched than usual due to some sweeping defconfig / DT updates. Please pull ! Thx, -Vineet -> The following changes since commit fc77dbd34c5c99bce46d40a2491937c3bcbd10af: Linux 4.5-rc6 (2016-02-28 08:41:20 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git/ tags/arc-4.6-rc1 for you to fetch changes up to deaf7565eb618a80534844300aeacffa14125182: ARCv2: ioremap: Support dynamic peripheral address space (2016-03-19 14:34:10 +0530) ARC updates for 4.6-rc1 - Big Endian io accessors fix [Lada] - Spellos fixes [Adam] - Fix for DW GMAC breakage [Alexey] - Making DMA API 64-bit ready - Shutting up -Wmaybe-uninitialized noise for ARC - Other minor fixes here and there, comments update Adam Buchbinder (1): ARC: Fix misspellings in comments. Alexey Brodkin (1): ARC: [plat-axs10x] add Ethernet PHY description in .dts Kefeng Wang (1): arc: use of_platform_default_populate() to populate default bus Lada Trimasova (2): ARC: [BE] readl()/writel() to work in Big Endian CPU configuration arc: [plat-nsimosci*] use ezchip network driver Vineet Gupta (16): ARC: bitops: Remove non relevant comments ARC: [BE] Select correct CROSS_COMPILE prefix ARC: [*defconfig] No need to specify CONFIG_CROSS_COMPILE ARCv2: Allow enabling PAE40 w/o HIGHMEM ARC: build: Better way to detect ISA compatible toolchain ARC: [plat-nsim] document ranges ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern ARCv2: LLSC: software backoff is NOT needed starting HS2.1c ARC: thp: unbork !CONFIG_TRANSPARENT_HUGEPAGE build ARC: build: Turn off -Wmaybe-uninitialized for ARC gcc 4.8 ARC: dma: Use struct page based page allocator helpers ARC: dma: non-coherent pages need V-P mapping if in HIGHMEM ARC: dma: pass_phys() not sg_virt() to cache ops ARC: dma: ioremap: use phys_addr_t consistenctly in code paths ARC: dma: reintroduce platform specific dma<->phys ARCv2: ioremap: Support dynamic peripheral address space arch/arc/Kconfig | 6 ++- arch/arc/Makefile | 22 - arch/arc/boot/dts/axs10x_mb.dtsi | 8 arch/arc/boot/dts/nsim_hs.dts | 3 +- arch/arc/boot/dts/nsimosci.dts | 5 +- arch/arc/boot/dts/nsimosci_hs.dts | 5 +- arch/arc/boot/dts/nsimosci_hs_idu.dts | 5 +- arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arc/configs/nsim_700_defconfig| 1 - arch/arc/configs/nsim_hs_defconfig | 1 - arch/arc/configs/nsim_hs_smp_defconfig | 1 - arch/arc/configs/nsimosci_defconfig| 2 +- arch/arc/configs/nsimosci_hs_defconfig | 2 +- arch/arc/configs/nsimosci_hs_smp_defconfig | 3 +- arch/arc/configs/tb10x_defconfig | 1 - arch/arc/configs/vdk_hs38_defconfig| 1 - arch/arc/configs/vdk_hs38_smp_defconfig| 1 - arch/arc/include/asm/arcregs.h | 6 --- arch/arc/include/asm/bitops.h | 15 -- arch/arc/include/asm/cache.h | 1 + arch/arc/include/asm/cacheflush.h | 6 +-- arch/arc/include/asm/cmpxchg.h | 2 +- arch/arc/include/asm/dma-mapping.h | 7 +++ arch/arc/include/asm/entry-compact.h | 2 +- arch/arc/include/asm/io.h | 22 ++--- arch/arc/include/asm/page.h| 19 +++- arch/arc/include/asm/pgtable.h | 11 ++--- arch/arc/include/asm/tlbflush.h| 7 ++- arch/arc/kernel/setup.c| 21 ++--- arch/arc/kernel/stacktrace.c | 2 +- arch/arc/kernel/time.c | 4 +- arch/arc/mm/cache.c| 39 +--- arch/arc/mm/dma.c | 75 -- arch/arc/mm/highmem.c | 2 +- arch/arc/mm/ioremap.c | 37 ++- arch/arc/mm/tlb.c | 8 ++-- 38 files changed, 201 insertions(+), 155 deletions(-)
Re: Suspicious error for CMA stress test
On Fri, Mar 18, 2016 at 02:32:35PM +0100, Lucas Stach wrote: > Hi Vlastimil, Joonsoo, > > Am Freitag, den 18.03.2016, 00:52 +0900 schrieb Joonsoo Kim: > > 2016-03-18 0:43 GMT+09:00 Vlastimil Babka : > > > On 03/17/2016 10:24 AM, Hanjun Guo wrote: > > >> > > >> On 2016/3/17 14:54, Joonsoo Kim wrote: > > >>> > > >>> On Wed, Mar 16, 2016 at 05:44:28PM +0800, Hanjun Guo wrote: > > > > On 2016/3/14 15:18, Joonsoo Kim wrote: > > > > > > On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote: > > >> > > >> On 03/14/2016 07:49 AM, Joonsoo Kim wrote: > > >>> > > >>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote: > > > > On 03/11/2016 04:00 PM, Joonsoo Kim wrote: > > > > How about something like this? Just and idea, probably buggy > > (off-by-one etc.). > > Should keep away cost from > expense of the > > relatively fewer >pageblock_order iterations. > > >>> > > >>> Hmm... I tested this and found that it's code size is a little bit > > >>> larger than mine. I'm not sure why this happens exactly but I guess > > >>> it would be > > >>> related to compiler optimization. In this case, I'm in favor of my > > >>> implementation because it looks like well abstraction. It adds one > > >>> unlikely branch to the merge loop but compiler would optimize it to > > >>> check it once. > > >> > > >> I would be surprised if compiler optimized that to check it once, as > > >> order increases with each loop iteration. But maybe it's smart > > >> enough to do something like I did by hand? Guess I'll check the > > >> disassembly. > > > > > > Okay. I used following slightly optimized version and I need to > > > add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)' > > > to yours. Please consider it, too. > > > > Hmm, this one is not work, I still can see the bug is there after > > applying > > this patch, did I miss something? > > >>> > > >>> I may find that there is a bug which was introduced by me some time > > >>> ago. Could you test following change in __free_one_page() on top of > > >>> Vlastimil's patch? > > >>> > > >>> -page_idx = pfn & ((1 << max_order) - 1); > > >>> +page_idx = pfn & ((1 << MAX_ORDER) - 1); > > >> > > >> > > >> I tested Vlastimil's patch + your change with stress for more than half > > >> hour, the bug > > >> I reported is gone :) > > > > > > > > > Oh, ok, will try to send proper patch, once I figure out what to write in > > > the changelog :) > > > > Thanks in advance! > > After digging into the "PFN busy" race in CMA (see [1]), I believe we > should just prevent any buddy merging in isolated ranges. This fixes the > race I'm seeing without the need to hold the zone lock for extend > periods of time. "PFNs busy" can be caused by other type of race, too. I guess that other cases happens more than buddy merging. Do you have any test case for your problem? If it is indeed a problem, you can avoid it with simple retry MAX_ORDER times on alloc_contig_range(). This is a rather dirty but the reason I suggest it is that there are other type of race in __alloc_contig_range() and retry could help them, too. For example, if some of pages in the requested range isn't attached to the LRU yet or detached from the LRU but not freed to buddy, test_pages_isolated() can be failed. Thanks.
RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout
> -Original Message- > From: Alan Stern [mailto:st...@rowland.harvard.edu] > Sent: Friday, March 18, 2016 7:51 PM > To: Rajesh Bhagat > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > gre...@linuxfoundation.org; mathias.ny...@intel.com; Sriram Dash > > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On Fri, 18 Mar 2016, Rajesh Bhagat wrote: > > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -2897,10 +2897,14 @@ done: > > /* The xHC may think the device is already reset, > > * so ignore the status. > > */ > > - if (hcd->driver->reset_device) > > - hcd->driver->reset_device(hcd, udev); > > - > > - usb_set_device_state(udev, USB_STATE_DEFAULT); > > + if (hcd->driver->reset_device) { > > + status = hcd->driver->reset_device(hcd, udev); > > + if (status == 0) > > + usb_set_device_state(udev, > > USB_STATE_DEFAULT); > > + else > > + usb_set_device_state(udev, > USB_STATE_NOTATTACHED); > > + } else > > + usb_set_device_state(udev, USB_STATE_DEFAULT); > > This is a really bad patch: > > You left in the comment about ignoring the status, but then you changed the > code so > that it doesn't ignore the status! > My Apologies, I completely missed the above comment which was added before. > You also called usb_set_device_state() more times than necessary. You could > have > done it like this: > > if (hcd->driver->reset_device) > status = hcd->driver->reset_device(hcd, udev); > if (status == 0) > usb_set_device_state(udev, USB_STATE_DEFAULT); > else > usb_set_device_state(udev, > USB_STATE_NOTATTACHED); > > (Even that could be simplified further, by combining it with the code that > follows.) > > Finally, you violated the 80-column limit. > I agree to your point. Actually the intent was to check the return status of reset_device which is currently being ignored. I encountered the reset_device failure case in resume operation (STR) which is increasing the time of resume and causing unexpected crashes if return value is not checked. Do you agree it should be checked here? If yes, I can rework this patch. > Alan Stern
Re: [PATCHv4 08/25] thp: support file pages in zap_huge_pmd()
"Kirill A. Shutemov" writes: > [ text/plain ] > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" writes: >> >> > [ text/plain ] >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just >> > clearing pmd entry as we can re-fill this area from page cache on pte >> > level later. >> > >> > This means we don't need deposit page tables when file THP is mapped. >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() >> > file THP PMD. >> >> Archs like ppc64 use deposited page table to track the hardware page >> table slot information. We probably may want to add hooks which arch can >> use to achieve the same even with file THP > > Could you describe more on what kind of information you're talking about? > Hardware page table in ppc64 requires us to map each subpage of the huge page. This is needed because at low level we use segment base page size to find the hash slot and on TLB miss, we use the faulting address and base page size (which is 64k even with THP) to find whether we have the page mapped in hash page table. Since we use base page size of 64K, we need to make sure that subpages are mapped (on demand) in hash page table. If we have then mapped we also need to track their hash table slot information so that we can clear it on invalidate of hugepage. With THP we used the deposited page table to store the hash slot information. -aneesh
RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout
> -Original Message- > From: Mathias Nyman [mailto:mathias.ny...@linux.intel.com] > Sent: Friday, March 18, 2016 4:51 PM > To: Rajesh Bhagat ; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org > Cc: gre...@linuxfoundation.org; mathias.ny...@intel.com; Sriram Dash > > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 18.03.2016 09:01, Rajesh Bhagat wrote: > > We are facing issue while performing the system resume operation from > > STR where XHCI is going to indefinite hang/sleep state due to > > wait_for_completion API called in function xhci_alloc_dev for command > > TRB_ENABLE_SLOT which never completes. > > > > Now, xhci_handle_command_timeout function is called and prints > > "Command timeout" message but never calls complete API for above > > TRB_ENABLE_SLOT command as xhci_abort_cmd_ring is successful. > > > > Solution to above problem is: > > 1. calling xhci_cleanup_command_queue API even if xhci_abort_cmd_ring > > is successful or not. > > 2. checking the status of reset_device in usb core code. > > > Hi > > I think clearing the whole command ring is a bit too much in this case. > It may cause issues for all attached devices when one command times out. > Hi Mathias, I understand your point, But I want to understand how would completion handler be called if a command is timed out and xhci_abort_cmd_ring is successful. In this case all the code would be waiting on completion handler forever. > We need to look in more detail why we fail to call completion for that one > aborted > command. > I checked the below code, Please correct me if I am wrong code waiting on wait_for_completion: int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev) { ... ret = xhci_queue_slot_control(xhci, command, TRB_ENABLE_SLOT, 0); ... wait_for_completion(command->completion); <=== waiting for command to complete code calling completion handler: 1. handle_cmd_completion -> xhci_complete_del_and_free_cmd 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd In our case command is timed out, Hence we hit the case #2 but xhci_abort_cmd_ring is success which does not calls complete. > The bigger question is why the timeout happens in the first place? > We are doing suspend resume operation, It might be controller issue :(, IMO software should not hang/stop if hardware is not behaving correct. > What kernel version, and what xhci vendor was this triggered on? > We are using 4.1.8 kernel > It's possible that the timeout is related either to the locking issue found > by Chris > Bainbridge: > http://marc.info/?l=linux-usb&m=145493945408601&w=2 > > or the resume issues in this thread, (see full thread) > http://marc.info/?l=linux-usb&m=145477850706552&w=2 > > Does any of those proposed solutions fix the command timeout for you? > I will check the above patches and share status. > -Mathias
Re: [PATCH] sparc: Convert naked unsigned uses to unsigned int
From: Joe Perches Date: Thu, 10 Mar 2016 15:21:43 -0800 > Use the more normal kernel definition/declaration style. > > Done via: > > $ git ls-files arch/sparc | \ > xargs ./scripts/checkpatch.pl -f --fix-inplace --types=unspecified_int > > Signed-off-by: Joe Perches Applied.
linux-next: Tree for Mar 21
Hi all, Please do not add any v4.7 related material to your linux-next included trees until after v4.6-rc1 is released. Changes since 20160318: The ext4 tree gained a conflict against Linus' tree. The drm tree still had its build failure for which I applied a fix patch. Non-merge commits (relative to Linus' tree): 3025 2386 files changed, 114129 insertions(+), 49973 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 231 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (46e595a17dcf Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging fixes/master (36f90b0a2ddd Linux 4.5-rc2) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (deaf7565eb61 ARCv2: ioremap: Support dynamic peripheral address space) Merging arm-current/fixes (f474c8c857d9 ARM: 8544/1: set_memory_xx fixes) Merging m68k-current/for-linus (efbec135f11d m68k: Fix misspellings in comments.) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging powerpc-fixes/fixes (b562e44f507e Linux 4.5) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (142b9e6c9de0 x86/kallsyms: fix GOLD link failure with new relative kallsyms table format) Merging net/master (1c191307af5f Revert "lan78xx: add ndo_get_stats64") Merging ipsec/master (215276c0147e xfrm: Reset encapsulation field of the skb before transformation) Merging ipvs/master (7617a24f83b5 ipvs: correct initial offset of Call-ID header search in SIP persistence engine) Merging wireless-drivers/master (10da848f67a7 ssb: host_soc depends on sprom) Merging mac80211/master (ad8ec957f693 wext: unregister_pernet_subsys() on notifier registration failure) Merging sound-current/for-linus (0ef21100ae91 ALSA: usb-audio: add Microsoft HD-5001 to quirks) Merging pci-current/for-linus (54c6e2dd00c3 PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr()) Merging driver-core.current/driver-core-linus (18558cae0272 Linux 4.5-rc4) Merging tty.current/tty-linus (18558cae0272 Linux 4.5-rc4) Merging usb.current/usb-linus (6b5f04b6cf8e Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup) Merging usb-gadget-fixes/fixes (3b2435192fe9 MAINTAINERS: drop OMAP USB and MUSB maintainership) Merging usb-serial-fixes/usb-linus (f6cede5b49e8 Linux 4.5-rc7) Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: change workqueue ci_otg as freezable) Merging staging.current/staging-linus (1200b6809dfd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next) Merging char-misc.current/char-misc-linus (5cd0911a9e0e Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux) Merging input-current/for-linus (4d2508a55990 ARM: pxa/raumfeld: use PROPERTY_ENTRY_INTEGER to define props) Merging crypto-current/master (dfe97ad30e8c crypto: marvell/cesa - forward devm_ioremap_resource() error code) Merging ide/master (0d7ef45cdeeb ide: palm_bk3710: test clock rate to avoid division by 0) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms vs module insertion race.) Me
[PATCH v3 07/23] ncr5380: Remove BOARD_REQUIRES_NO_DELAY macro
The io_recovery_delay macro is intended to insert a microsecond delay between the chip register accesses that begin a DMA operation. This is reportedly needed for some ISA boards. Reverse the sense of the macro test so that in the common case, where no delay is required, drivers need not define the macro. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 18 -- drivers/scsi/dtc.h |2 ++ drivers/scsi/g_NCR5380.h |2 ++ drivers/scsi/t128.h |2 ++ 4 files changed, 14 insertions(+), 10 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:20.0 +1100 @@ -39,12 +39,6 @@ * tagged queueing) */ -#ifdef BOARD_REQUIRES_NO_DELAY -#define io_recovery_delay(x) -#else -#define io_recovery_delay(x) udelay(x) -#endif - /* * Design * @@ -150,6 +144,10 @@ * possible) function may be used. */ +#ifndef NCR5380_io_delay +#define NCR5380_io_delay(x) +#endif + static int do_abort(struct Scsi_Host *); static void do_reset(struct Scsi_Host *); @@ -1468,14 +1466,14 @@ static int NCR5380_transfer_dma(struct S */ if (p & SR_IO) { - io_recovery_delay(1); + NCR5380_io_delay(1); NCR5380_write(START_DMA_INITIATOR_RECEIVE_REG, 0); } else { - io_recovery_delay(1); + NCR5380_io_delay(1); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA); - io_recovery_delay(1); + NCR5380_io_delay(1); NCR5380_write(START_DMA_SEND_REG, 0); - io_recovery_delay(1); + NCR5380_io_delay(1); } /* Index: linux/drivers/scsi/dtc.h === --- linux.orig/drivers/scsi/dtc.h 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/dtc.h2016-03-21 13:31:20.0 +1100 @@ -28,6 +28,8 @@ #define NCR5380_bus_reset dtc_bus_reset #define NCR5380_info dtc_info +#define NCR5380_io_delay(x)udelay(x) + /* 15 12 11 10 1001 1100 */ Index: linux/drivers/scsi/g_NCR5380.h === --- linux.orig/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:20.0 +1100 @@ -71,6 +71,8 @@ #define NCR5380_pwrite generic_NCR5380_pwrite #define NCR5380_info generic_NCR5380_info +#define NCR5380_io_delay(x)udelay(x) + #define BOARD_NCR5380 0 #define BOARD_NCR53C4001 #define BOARD_NCR53C400A 2 Index: linux/drivers/scsi/t128.h === --- linux.orig/drivers/scsi/t128.h 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/t128.h 2016-03-21 13:31:20.0 +1100 @@ -84,6 +84,8 @@ #define NCR5380_bus_reset t128_bus_reset #define NCR5380_info t128_info +#define NCR5380_io_delay(x)udelay(x) + /* 15 14 12 10 7 5 3 1101 0100 1010 1000 */
[PATCH v3 10/23] ncr5380: Merge DMA implementation from atari_NCR5380 core driver
Adopt the DMA implementation from atari_NCR5380.c. This means that atari_scsi and sun3_scsi can make use of the NCR5380.c core driver and the atari_NCR5380.c driver fork can be made redundant. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 170 +++- drivers/scsi/arm/cumana_1.c |3 drivers/scsi/arm/oak.c |3 drivers/scsi/dmx3191d.c |1 drivers/scsi/dtc.c |2 drivers/scsi/dtc.h |1 drivers/scsi/g_NCR5380.c|2 drivers/scsi/g_NCR5380.h|1 drivers/scsi/mac_scsi.c |3 drivers/scsi/pas16.c|2 drivers/scsi/pas16.h|1 drivers/scsi/t128.c |2 drivers/scsi/t128.h |1 13 files changed, 152 insertions(+), 40 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:25.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:27.0 +1100 @@ -31,9 +31,6 @@ /* * Further development / testing that should be done : - * 1. Cleanup the NCR5380_transfer_dma function and DMA operation complete - * code so that everything does the same thing that's done at the - * end of a pseudo-DMA read operation. * * 4. Test SCSI-II tagged queueing (I have no devices which support * tagged queueing) @@ -117,6 +114,8 @@ * * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases. * + * REAL_DMA - if defined, REAL DMA is used during the data transfer phases. + * * These macros MUST be defined : * * NCR5380_read(register) - read from the specified register @@ -801,6 +800,72 @@ static void NCR5380_main(struct work_str } while (!done); } +/* + * NCR5380_dma_complete - finish DMA transfer + * @instance: the scsi host instance + * + * Called by the interrupt handler when DMA finishes or a phase + * mismatch occurs (which would end the DMA transfer). + */ + +static void NCR5380_dma_complete(struct Scsi_Host *instance) +{ + struct NCR5380_hostdata *hostdata = shost_priv(instance); + int transferred; + unsigned char **data; + int *count; + int saved_data = 0, overrun = 0; + unsigned char p; + + if (hostdata->read_overruns) { + p = hostdata->connected->SCp.phase; + if (p & SR_IO) { + udelay(10); + if ((NCR5380_read(BUS_AND_STATUS_REG) & +(BASR_PHASE_MATCH | BASR_ACK)) == + (BASR_PHASE_MATCH | BASR_ACK)) { + saved_data = NCR5380_read(INPUT_DATA_REG); + overrun = 1; + dsprintk(NDEBUG_DMA, instance, "read overrun handled\n"); + } + } + } + + NCR5380_write(MODE_REG, MR_BASE); + NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); + NCR5380_read(RESET_PARITY_INTERRUPT_REG); + + transferred = hostdata->dma_len - NCR5380_dma_residual(instance); + hostdata->dma_len = 0; + + data = (unsigned char **)&hostdata->connected->SCp.ptr; + count = &hostdata->connected->SCp.this_residual; + *data += transferred; + *count -= transferred; + + if (hostdata->read_overruns) { + int cnt, toPIO; + + if ((NCR5380_read(STATUS_REG) & PHASE_MASK) == p && (p & SR_IO)) { + cnt = toPIO = hostdata->read_overruns; + if (overrun) { + dsprintk(NDEBUG_DMA, instance, +"Got an input overrun, using saved byte\n"); + *(*data)++ = saved_data; + (*count)--; + cnt--; + toPIO--; + } + if (toPIO > 0) { + dsprintk(NDEBUG_DMA, instance, +"Doing %d byte PIO to 0x%p\n", cnt, *data); + NCR5380_transfer_pio(instance, &p, &cnt, data); + *count -= toPIO - cnt; + } + } + } +} + #ifndef DONT_USE_INTR /** @@ -855,7 +920,22 @@ static irqreturn_t NCR5380_intr(int irq, dsprintk(NDEBUG_INTR, instance, "IRQ %d, BASR 0x%02x, SR 0x%02x, MR 0x%02x\n", irq, basr, sr, mr); - if ((NCR5380_read(CURRENT_SCSI_DATA_REG) & hostdata->id_mask) && + if ((mr & MR_DMA_MODE) || (mr & MR_MONITOR_BSY)) { + /* Probably End of DMA, Phase Mismatch or Loss of BSY. +* We ack IRQ after clearing Mode Register. Workarounds +* for End of DMA e
[PATCH v3 09/23] ncr5380: Adopt uniform DMA setup convention
Standardize the DMA setup hooks so that the DMA implementation in atari_NCR5380.c can be reconciled with pseudo DMA implementation in NCR5380.c. Calls to NCR5380_dma_recv_setup() and NCR5380_dma_send_setup() return a negative value on failure, zero on PDMA transfer success and a positive byte count for DMA setup success. This convention is not entirely new, but is now applied consistently. Also remove a pointless Status Register access: the *phase assignment is redundant because after NCR5380_transfer_dma() returns control to NCR5380_information_transfer(), that routine then returns control to NCR5380_main(), which means *phase is dead. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 21 ++--- drivers/scsi/arm/cumana_1.c | 10 -- drivers/scsi/arm/oak.c |4 ++-- drivers/scsi/atari_scsi.c |3 --- 4 files changed, 20 insertions(+), 18 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:21.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:25.0 +1100 @@ -1431,7 +1431,7 @@ static int NCR5380_transfer_dma(struct S register unsigned char p = *phase; register unsigned char *d = *data; unsigned char tmp; - int foo; + int result; if ((tmp = (NCR5380_read(STATUS_REG) & PHASE_MASK)) != p) { *phase = tmp; @@ -1505,9 +1505,9 @@ static int NCR5380_transfer_dma(struct S */ if (p & SR_IO) { - foo = NCR5380_dma_recv_setup(instance, d, + result = NCR5380_dma_recv_setup(instance, d, hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c); - if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { + if (!result && (hostdata->flags & FLAG_DMA_FIXUP)) { /* * The workaround was to transfer fewer bytes than we * intended to with the pseudo-DMA read function, wait for @@ -1525,19 +1525,19 @@ static int NCR5380_transfer_dma(struct S if (NCR5380_poll_politely(instance, BUS_AND_STATUS_REG, BASR_DRQ, BASR_DRQ, HZ) < 0) { - foo = -1; + result = -1; shost_printk(KERN_ERR, instance, "PDMA read: DRQ timeout\n"); } if (NCR5380_poll_politely(instance, STATUS_REG, SR_REQ, 0, HZ) < 0) { - foo = -1; + result = -1; shost_printk(KERN_ERR, instance, "PDMA read: !REQ timeout\n"); } d[c - 1] = NCR5380_read(INPUT_DATA_REG); } } else { - foo = NCR5380_dma_send_setup(instance, d, c); - if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { + result = NCR5380_dma_send_setup(instance, d, c); + if (!result && (hostdata->flags & FLAG_DMA_FIXUP)) { /* * Wait for the last byte to be sent. If REQ is being asserted for * the byte we're interested, we'll ACK it and it will go false. @@ -1545,7 +1545,7 @@ static int NCR5380_transfer_dma(struct S if (NCR5380_poll_politely2(instance, BUS_AND_STATUS_REG, BASR_DRQ, BASR_DRQ, BUS_AND_STATUS_REG, BASR_PHASE_MATCH, 0, HZ) < 0) { - foo = -1; + result = -1; shost_printk(KERN_ERR, instance, "PDMA write: DRQ and phase timeout\n"); } } @@ -1555,8 +1555,7 @@ static int NCR5380_transfer_dma(struct S NCR5380_read(RESET_PARITY_INTERRUPT_REG); *data = d + c; *count = 0; - *phase = NCR5380_read(STATUS_REG) & PHASE_MASK; - return foo; + return result; } /* @@ -1652,7 +1651,7 @@ static void NCR5380_information_transfer if (!cmd->device->borken) transfersize = NCR5380_dma_xfer_len(instance, cmd, phase); - if (transfersize) { + if (transfersize > 0) { len = transfersize; if (NCR5380_transfer_dma(instance, &phase, &len, (unsigned char **)&cmd->SCp.ptr)) { Index: linux/drivers/scsi/arm/cumana_1.c === --- linux.orig/drivers/scsi/arm/cuma
[PATCH v3 14/23] ncr5380: Reduce max_lun limit
The driver has a limit of eight LUs because of the byte-sized bitfield that is used for busy flags. That means the maximum LUN is 7. The default is 8. Signed-off-by: Finn Thain Tested-by: Michael Schmitz --- Changed since v1: - Reduce shost->max_lun limit instead of adding 'MAX_LUN' limit. --- drivers/scsi/NCR5380.c |2 ++ 1 file changed, 2 insertions(+) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:33.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:36.0 +1100 @@ -488,6 +488,8 @@ static int NCR5380_init(struct Scsi_Host int i; unsigned long deadline; + instance->max_lun = 7; + hostdata->host = instance; hostdata->id_mask = 1 << instance->this_id; hostdata->id_higher_mask = 0;
[PATCH v3 03/23] ncr5380: Remove REAL_DMA and REAL_DMA_POLL macros
For the NCR5380.c core driver, these macros are never used. If REAL_DMA were to be defined, compilation would fail. For the atari_NCR5380.c core driver, REAL_DMA is always defined. Hence these macros are pointless. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 218 +-- drivers/scsi/NCR5380.h | 112 -- drivers/scsi/atari_NCR5380.c | 62 +--- drivers/scsi/atari_scsi.c| 32 -- drivers/scsi/sun3_scsi.c | 13 -- 5 files changed, 22 insertions(+), 415 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:09.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:10.0 +1100 @@ -35,18 +35,10 @@ * code so that everything does the same thing that's done at the * end of a pseudo-DMA read operation. * - * 2. Fix REAL_DMA (interrupt driven, polled works fine) - - * basically, transfer size needs to be reduced by one - * and the last byte read as is done with PSEUDO_DMA. - * * 4. Test SCSI-II tagged queueing (I have no devices which support * tagged queueing) */ -#ifndef notyet -#undef REAL_DMA -#endif - #ifdef BOARD_REQUIRES_NO_DELAY #define io_recovery_delay(x) #else @@ -131,12 +123,6 @@ * * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases. * - * REAL_DMA - if defined, REAL DMA is used during the data transfer phases. - * - * REAL_DMA_POLL - if defined, REAL DMA is used but the driver doesn't - * rely on phase mismatch and EOP interrupts to determine end - * of phase. - * * These macros MUST be defined : * * NCR5380_read(register) - read from the specified register @@ -147,15 +133,9 @@ * specific implementation of the NCR5380 * * Either real DMA *or* pseudo DMA may be implemented - * REAL functions : - * NCR5380_REAL_DMA should be defined if real DMA is to be used. * Note that the DMA setup functions should return the number of bytes * that they were able to program the controller for. * - * Also note that generic i386/PC versions of these macros are - * available as NCR5380_i386_dma_write_setup, - * NCR5380_i386_dma_read_setup, and NCR5380_i386_dma_residual. - * * NCR5380_dma_write_setup(instance, src, count) - initialize * NCR5380_dma_read_setup(instance, dst, count) - initialize * NCR5380_dma_residual(instance); - residual count @@ -486,12 +466,6 @@ static void prepare_info(struct Scsi_Hos #ifdef DIFFERENTIAL "DIFFERENTIAL " #endif -#ifdef REAL_DMA -"REAL_DMA " -#endif -#ifdef REAL_DMA_POLL -"REAL_DMA_POLL " -#endif #ifdef PARITY "PARITY " #endif @@ -551,9 +525,8 @@ static int NCR5380_init(struct Scsi_Host hostdata->id_higher_mask |= i; for (i = 0; i < 8; ++i) hostdata->busy[i] = 0; -#ifdef REAL_DMA - hostdata->dmalen = 0; -#endif + hostdata->dma_len = 0; + spin_lock_init(&hostdata->lock); hostdata->connected = NULL; hostdata->sensing = NULL; @@ -850,11 +823,7 @@ static void NCR5380_main(struct work_str requeue_cmd(instance, cmd); } } - if (hostdata->connected -#ifdef REAL_DMA - && !hostdata->dmalen -#endif - ) { + if (hostdata->connected && !hostdata->dma_len) { dsprintk(NDEBUG_MAIN, instance, "main: performing information transfer\n"); NCR5380_information_transfer(instance); done = 0; @@ -919,34 +888,6 @@ static irqreturn_t NCR5380_intr(int irq, dsprintk(NDEBUG_INTR, instance, "IRQ %d, BASR 0x%02x, SR 0x%02x, MR 0x%02x\n", irq, basr, sr, mr); -#if defined(REAL_DMA) - if ((mr & MR_DMA_MODE) || (mr & MR_MONITOR_BSY)) { - /* Probably End of DMA, Phase Mismatch or Loss of BSY. -* We ack IRQ after clearing Mode Register. Workarounds -* for End of DMA errata need to happen in DMA Mode. -*/ - - dsprintk(NDEBUG_INTR, instance, "interrupt in DMA mode\n"); - - int transferred; - - if (!hostdata->connected) - panic("scsi%d : DMA interrupt with no connected cmd\n", - instance->hostno); - - transferred = hostdata->dmalen - NCR5380_dma_residual(instance); - hostdata->connected->SCp.this_residual -= transferred; - hostdata->connected->SCp.ptr += transferred; - hostdata->dmalen = 0; - - /* FIXME
[PATCH v3 12/23] sun3_scsi: Adopt NCR5380.c core driver
Add support for the custom Sun 3 DMA logic to the NCR5380.c core driver. This code is copied from atari_NCR5380.c. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- The Sun 3 DMA code is still configured by macros. I have simplified things slightly but I have avoided more ambitious refactoring. It's not clear to me what that should look like and I can't test sun3_scsi anyway. At least this permits the removal of atari_NCR5380.c. --- drivers/scsi/NCR5380.c | 131 +++ drivers/scsi/sun3_scsi.c |8 +- 2 files changed, 124 insertions(+), 15 deletions(-) Index: linux/drivers/scsi/sun3_scsi.c === --- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:13.0 +1100 +++ linux/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:32.0 +1100 @@ -54,10 +54,8 @@ #define NCR5380_abort sun3scsi_abort #define NCR5380_infosun3scsi_info -#define NCR5380_dma_read_setup(instance, data, count) \ -sun3scsi_dma_setup(instance, data, count, 0) -#define NCR5380_dma_write_setup(instance, data, count) \ -sun3scsi_dma_setup(instance, data, count, 1) +#define NCR5380_dma_recv_setup(instance, data, count) (count) +#define NCR5380_dma_send_setup(instance, data, count) (count) #define NCR5380_dma_residual(instance) \ sun3scsi_dma_residual(instance) #define NCR5380_dma_xfer_len(instance, cmd, phase) \ @@ -406,7 +404,7 @@ static int sun3scsi_dma_finish(int write } -#include "atari_NCR5380.c" +#include "NCR5380.c" #ifdef SUN3_SCSI_VME #define SUN3_SCSI_NAME "Sun3 NCR5380 VME SCSI" Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:31.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:32.0 +1100 @@ -31,6 +31,8 @@ /* Ported to Atari by Roman Hodek and others. */ +/* Adapted for the Sun 3 by Sam Creasey. */ + /* * Further development / testing that should be done : * @@ -858,6 +860,23 @@ static void NCR5380_dma_complete(struct } } +#ifdef CONFIG_SUN3 + if ((sun3scsi_dma_finish(rq_data_dir(hostdata->connected->request { + pr_err("scsi%d: overrun in UDC counter -- not prepared to deal with this!\n", + instance->host_no); + BUG(); + } + + if ((NCR5380_read(BUS_AND_STATUS_REG) & (BASR_PHASE_MATCH | BASR_ACK)) == + (BASR_PHASE_MATCH | BASR_ACK)) { + pr_err("scsi%d: BASR %02x\n", instance->host_no, + NCR5380_read(BUS_AND_STATUS_REG)); + pr_err("scsi%d: bus stuck in data phase -- probably a single byte overrun!\n", + instance->host_no); + BUG(); + } +#endif + NCR5380_write(MODE_REG, MR_BASE); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); NCR5380_read(RESET_PARITY_INTERRUPT_REG); @@ -981,10 +1000,16 @@ static irqreturn_t NCR5380_intr(int irq, NCR5380_read(RESET_PARITY_INTERRUPT_REG); dsprintk(NDEBUG_INTR, instance, "unknown interrupt\n"); +#ifdef SUN3_SCSI_VME + dregs->csr |= CSR_DMA_ENABLE; +#endif } handled = 1; } else { shost_printk(KERN_NOTICE, instance, "interrupt without IRQ bit\n"); +#ifdef SUN3_SCSI_VME + dregs->csr |= CSR_DMA_ENABLE; +#endif } spin_unlock_irqrestore(&hostdata->lock, flags); @@ -1274,6 +1299,10 @@ static struct scsi_cmnd *NCR5380_select( hostdata->connected = cmd; hostdata->busy[cmd->device->id] |= 1 << cmd->device->lun; +#ifdef SUN3_SCSI_VME + dregs->csr |= CSR_INTR; +#endif + initialize_SCp(cmd); cmd = NULL; @@ -1557,6 +1586,11 @@ static int NCR5380_transfer_dma(struct S dsprintk(NDEBUG_DMA, instance, "initializing DMA %s: length %d, address %p\n", (p & SR_IO) ? "receive" : "send", c, d); +#ifdef CONFIG_SUN3 + /* send start chain */ + sun3scsi_dma_start(c, *data); +#endif + NCR5380_write(TARGET_COMMAND_REG, PHASE_SR_TO_TCR(p)); NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY | MR_ENABLE_EOP_INTR); @@ -1577,6 +1611,7 @@ static int NCR5380_transfer_dma(struct S */ if (p & SR_IO) { + NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); NCR5380_io_delay(1); NCR5380_write(START_DMA_INITIATOR_RECEIVE_REG, 0); } else { @@ -1587,6 +1622,13 @@ static int NCR5380_transfer_dma(struct S NCR5380_io_delay(1); } +#ifdef CONFIG_SUN3 +#ifdef SUN3_SCSI_VME + dregs->csr |= CSR_DMA_ENABLE; +#endif + sun3_dma_active
[PATCH v3 11/23] atari_scsi: Adopt NCR5380.c core driver
Add support for the Atari ST DMA chip to the NCR5380.c core driver. This code is copied from atari_NCR5380.c. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c| 32 drivers/scsi/atari_scsi.c |6 +++--- 2 files changed, 35 insertions(+), 3 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:27.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:31.0 +1100 @@ -29,6 +29,8 @@ * Ronald van Cuijlenborg, Alan Cox and others. */ +/* Ported to Atari by Roman Hodek and others. */ + /* * Further development / testing that should be done : * @@ -141,6 +143,14 @@ #define NCR5380_io_delay(x) #endif +#ifndef NCR5380_acquire_dma_irq +#define NCR5380_acquire_dma_irq(x) (1) +#endif + +#ifndef NCR5380_release_dma_irq +#define NCR5380_release_dma_irq(x) +#endif + static int do_abort(struct Scsi_Host *); static void do_reset(struct Scsi_Host *); @@ -658,6 +668,9 @@ static int NCR5380_queue_command(struct cmd->result = 0; + if (!NCR5380_acquire_dma_irq(instance)) + return SCSI_MLQUEUE_HOST_BUSY; + spin_lock_irqsave(&hostdata->lock, flags); /* @@ -682,6 +695,19 @@ static int NCR5380_queue_command(struct return 0; } +static inline void maybe_release_dma_irq(struct Scsi_Host *instance) +{ + struct NCR5380_hostdata *hostdata = shost_priv(instance); + + /* Caller does the locking needed to set & test these data atomically */ + if (list_empty(&hostdata->disconnected) && + list_empty(&hostdata->unissued) && + list_empty(&hostdata->autosense) && + !hostdata->connected && + !hostdata->selecting) + NCR5380_release_dma_irq(instance); +} + /** * dequeue_next_cmd - dequeue a command for processing * @instance: the scsi host instance @@ -783,6 +809,7 @@ static void NCR5380_main(struct work_str if (!NCR5380_select(instance, cmd)) { dsprintk(NDEBUG_MAIN, instance, "main: select complete\n"); + maybe_release_dma_irq(instance); } else { dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance, "main: select failed, returning %p to queue\n", cmd); @@ -1828,6 +1855,8 @@ static void NCR5380_information_transfer /* Enable reselect interrupts */ NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask); + + maybe_release_dma_irq(instance); return; case MESSAGE_REJECT: /* Accept message by clearing ACK */ @@ -1963,6 +1992,7 @@ static void NCR5380_information_transfer hostdata->connected = NULL; cmd->result = DID_ERROR << 16; complete_cmd(instance, cmd); + maybe_release_dma_irq(instance); NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask); return; } @@ -2256,6 +2286,7 @@ out: dsprintk(NDEBUG_ABORT, instance, "abort: successfully aborted %p\n", cmd); queue_work(hostdata->work_q, &hostdata->main_task); + maybe_release_dma_irq(instance); spin_unlock_irqrestore(&hostdata->lock, flags); return result; @@ -2336,6 +2367,7 @@ static int NCR5380_bus_reset(struct scsi hostdata->dma_len = 0; queue_work(hostdata->work_q, &hostdata->main_task); + maybe_release_dma_irq(instance); spin_unlock_irqrestore(&hostdata->lock, flags); return SUCCESS; Index: linux/drivers/scsi/atari_scsi.c === --- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:25.0 +1100 +++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:31.0 +1100 @@ -99,9 +99,9 @@ #define NCR5380_abort atari_scsi_abort #define NCR5380_infoatari_scsi_info -#define NCR5380_dma_read_setup(instance, data, count) \ +#define NCR5380_dma_recv_setup(instance, data, count) \ atari_scsi_dma_setup(instance, data, count, 0) -#define NCR5380_dma_write_setup(instance, data, count) \ +#define NCR5380_dma_send_setup(instance, data, count) \ atari_scsi_dma_setup(instance, data, count, 1) #define NCR5380_dma_residual(instance) \ atari_scsi_dma_residual(instance) @@ -715,7 +715,7 @@ static void atari_scsi_f
[PATCH v3 01/23] g_ncr5380: Remove CONFIG_SCSI_GENERIC_NCR53C400
This change brings a number of improvements: fewer macros, better test coverage, simpler code and sane Kconfig options. The downside is a small chance of incompatibility (which seems unavoidable). CONFIG_SCSI_GENERIC_NCR53C400 exists to enable or inhibit pseudo DMA transfers when the driver is used with 53C400-compatible cards. Thanks to Ondrej Zary's patches, PDMA now works which means it can be enabled unconditionally. Due to bad design, CONFIG_SCSI_GENERIC_NCR53C400 ties together unrelated functionality as it sets both PSEUDO_DMA and BIOSPARAM macros. This patch effectively enables PSEUDO_DMA and disables BIOSPARAM. The defconfigs and the Kconfig default leave CONFIG_SCSI_GENERIC_NCR53C400 undefined. Red Hat 9 and CentOS 2.1 were the same. This leaves both PSEUDO_DMA and BIOSPARAM disabled. The effect of this patch should be better performance from enabling PSEUDO_DMA. On the other hand, Debian 4 and SLES 10 had CONFIG_SCSI_GENERIC_NCR53C400 enabled, so both PSEUDO_DMA and BIOSPARAM were enabled. This patch might affect configurations like this by disabling BIOSPARAM. My best guess is that this could be a problem only in the vanishingly rare case that 1) the CHS values stored in the boot device partition table are wrong and 2) a 5380 card is in use (because PDMA on 53C400 used to be broken). Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke --- Here are the distro kernel versions I looked at: CentOS 2.1: $ strings kernel-2.4.9-e.40.i686/lib/modules/2.4.9-e.40/kernel/drivers/scsi/g_NCR5380.o | grep extension NO NCR53C400 driver extensions Red Hat 7: $ strings kernel-2.4.18-3.i386/lib/modules/2.4.18-3/kernel/drivers/scsi/g_NCR5380.o | grep extension NO NCR53C400 driver extensions Red Hat 9: $ strings kernel-2.4.20-8.i586/lib/modules/2.4.20-8/kernel/drivers/scsi/g_NCR5380.o | grep extension NO NCR53C400 driver extensions Debian 4: $ strings linux-image-2.6.24-etchnhalf.1-486_2.6.24-6-etchnhalf.9etch3_i386/lib/modules/2.6.24-etchnhalf.1-486/kernel/drivers/scsi/g_NCR5380_mmio.ko | grep extension NCR53C400 extension version %d $ strings kernel-image-2.6.8-2-386_2.6.8-13_i386/lib/modules/2.6.8-2-386/kernel/drivers/scsi/g_NCR5380_mmio.ko | grep extension NCR53C400 extension version %d SLES 10.2: $ strings kernel-default-2.6.18.2-34.i586/lib/modules/2.6.18.2-34-default/kernel/drivers/scsi/g_NCR5380_mmio.ko | grep extension NCR53C400 extension version %d --- drivers/scsi/Kconfig | 11 -- drivers/scsi/g_NCR5380.c | 75 ++- drivers/scsi/g_NCR5380.h | 16 +- 3 files changed, 25 insertions(+), 77 deletions(-) Index: linux/drivers/scsi/Kconfig === --- linux.orig/drivers/scsi/Kconfig 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/Kconfig 2016-03-21 13:31:07.0 +1100 @@ -812,17 +812,6 @@ config SCSI_GENERIC_NCR5380_MMIO To compile this driver as a module, choose M here: the module will be called g_NCR5380_mmio. -config SCSI_GENERIC_NCR53C400 - bool "Enable NCR53c400 extensions" - depends on SCSI_GENERIC_NCR5380 - help - This enables certain optimizations for the NCR53c400 SCSI cards. - You might as well try it out. Note that this driver will only probe - for the Trantor T130B in its default configuration; you might have - to pass a command line option to the kernel at boot time if it does - not detect your card. See the file - for details. - config SCSI_IPS tristate "IBM ServeRAID support" depends on PCI && SCSI Index: linux/drivers/scsi/g_NCR5380.c === --- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:07.0 +1100 @@ -57,10 +57,7 @@ */ #define AUTOPROBE_IRQ - -#ifdef CONFIG_SCSI_GENERIC_NCR53C400 #define PSEUDO_DMA -#endif #include #include @@ -270,7 +267,7 @@ static int __init generic_NCR5380_detect #ifndef SCSI_G_NCR5380_MEM int i; int port_idx = -1; - unsigned long region_size = 16; + unsigned long region_size; #endif static unsigned int __initdata ncr_53c400a_ports[] = { 0x280, 0x290, 0x300, 0x310, 0x330, 0x340, 0x348, 0x350, 0 @@ -290,6 +287,7 @@ static int __init generic_NCR5380_detect #ifdef SCSI_G_NCR5380_MEM unsigned long base; void __iomem *iomem; + resource_size_t iomem_size; #endif if (ncr_irq) @@ -353,9 +351,7 @@ static int __init generic_NCR5380_detect flags = FLAG_NO_PSEUDO_DMA; break; case BOARD_NCR53C400: -#ifdef PSEUDO_DMA flags = FLAG_NO_DMA_FIXUP; -#endif break; case BOARD_NCR53C400A: flags = FLAG_NO_DM
[PATCH v3 08/23] ncr5380: Use DMA hooks for PDMA
Those wrapper drivers which use DMA define the REAL_DMA macro and those which use pseudo DMA define PSEUDO_DMA. These macros need to be removed for a number of reasons, not least of which is to have drivers share more code. Redefine the PDMA send and receive hooks as DMA setup hooks, so that the DMA code can be shared by all 5380 wrapper drivers. This will help to reunify the forked core driver. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 10 ++ drivers/scsi/arm/cumana_1.c | 10 ++ drivers/scsi/arm/oak.c | 10 ++ drivers/scsi/dmx3191d.c |4 ++-- drivers/scsi/dtc.c |6 -- drivers/scsi/dtc.h |2 ++ drivers/scsi/g_NCR5380.c| 10 ++ drivers/scsi/g_NCR5380.h|4 ++-- drivers/scsi/mac_scsi.c |5 ++--- drivers/scsi/pas16.c| 14 -- drivers/scsi/pas16.h|2 ++ drivers/scsi/t128.c | 12 ++-- drivers/scsi/t128.h |2 ++ 13 files changed, 50 insertions(+), 41 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:20.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:21.0 +1100 @@ -127,17 +127,11 @@ * specific implementation of the NCR5380 * * Either real DMA *or* pseudo DMA may be implemented - * Note that the DMA setup functions should return the number of bytes - * that they were able to program the controller for. * * NCR5380_dma_write_setup(instance, src, count) - initialize * NCR5380_dma_read_setup(instance, dst, count) - initialize * NCR5380_dma_residual(instance); - residual count * - * PSEUDO functions : - * NCR5380_pwrite(instance, src, count) - * NCR5380_pread(instance, dst, count); - * * The generic driver is initialized by calling NCR5380_init(instance), * after setting the appropriate host specific fields and ID. If the * driver wishes to autoprobe for an IRQ line, the NCR5380_probe_irq(instance, @@ -1511,7 +1505,7 @@ static int NCR5380_transfer_dma(struct S */ if (p & SR_IO) { - foo = NCR5380_pread(instance, d, + foo = NCR5380_dma_recv_setup(instance, d, hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c); if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { /* @@ -1542,7 +1536,7 @@ static int NCR5380_transfer_dma(struct S d[c - 1] = NCR5380_read(INPUT_DATA_REG); } } else { - foo = NCR5380_pwrite(instance, d, c); + foo = NCR5380_dma_send_setup(instance, d, c); if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { /* * Wait for the last byte to be sent. If REQ is being asserted for Index: linux/drivers/scsi/arm/cumana_1.c === --- linux.orig/drivers/scsi/arm/cumana_1.c 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/arm/cumana_1.c 2016-03-21 13:31:21.0 +1100 @@ -18,6 +18,8 @@ #define NCR5380_write(reg, value) cumanascsi_write(instance, reg, value) #define NCR5380_dma_xfer_len(instance, cmd, phase) (cmd->transfersize) +#define NCR5380_dma_recv_setup cumanascsi_pread +#define NCR5380_dma_send_setup cumanascsi_pwrite #define NCR5380_intr cumanascsi_intr #define NCR5380_queue_command cumanascsi_queue_command @@ -39,8 +41,8 @@ void cumanascsi_setup(char *str, int *in #define L(v) (((v)<<16)|((v) & 0x)) #define H(v) (((v)>>16)|((v) & 0x)) -static inline int -NCR5380_pwrite(struct Scsi_Host *host, unsigned char *addr, int len) +static inline int cumanascsi_pwrite(struct Scsi_Host *host, +unsigned char *addr, int len) { unsigned long *laddr; void __iomem *dma = priv(host)->dma + 0x2000; @@ -102,8 +104,8 @@ end: return len; } -static inline int -NCR5380_pread(struct Scsi_Host *host, unsigned char *addr, int len) +static inline int cumanascsi_pread(struct Scsi_Host *host, + unsigned char *addr, int len) { unsigned long *laddr; void __iomem *dma = priv(host)->dma + 0x2000; Index: linux/drivers/scsi/arm/oak.c === --- linux.orig/drivers/scsi/arm/oak.c 2016-03-21 13:31:16.0 +1100 +++ linux/drivers/scsi/arm/oak.c2016-03-21 13:31:21.0 +1100 @@ -24,6 +24,8 @@ writeb(value, priv(instance)->base + ((reg) << 2)) #define NCR5380_dma_xfer_len(instance, cmd, phase) (0) +#define NCR5380_dma_recv_setup oakscsi_pread +#define NCR5380_dma_send_setup oakscsi_pwrite #define NCR5380_queue_command oakscsi_que
[PATCH v3 18/23] ncr5380: Remove DONT_USE_INTR and AUTOPROBE_IRQ macros
Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 12 +--- drivers/scsi/NCR5380.h |4 drivers/scsi/arm/oak.c |2 -- drivers/scsi/dmx3191d.c |2 -- drivers/scsi/dtc.c | 12 +++- drivers/scsi/g_NCR5380.c |2 -- drivers/scsi/pas16.c |1 - drivers/scsi/t128.c |1 - 8 files changed, 4 insertions(+), 32 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:39.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:40.0 +1100 @@ -106,9 +106,6 @@ * DIFFERENTIAL - if defined, NCR53c81 chips will use external differential * transceivers. * - * DONT_USE_INTR - if defined, never use interrupts, even if we probe or - * override-configure an IRQ. - * * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases. * * REAL_DMA - if defined, REAL DMA is used during the data transfer phases. @@ -464,9 +461,6 @@ static void prepare_info(struct Scsi_Hos hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "", hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "", hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY " : "", -#ifdef AUTOPROBE_IRQ -"AUTOPROBE_IRQ " -#endif #ifdef DIFFERENTIAL "DIFFERENTIAL " #endif @@ -915,8 +909,6 @@ static void NCR5380_dma_complete(struct } } -#ifndef DONT_USE_INTR - /** * NCR5380_intr - generic NCR5380 irq handler * @irq: interrupt number @@ -951,7 +943,7 @@ static void NCR5380_dma_complete(struct * the Busy Monitor interrupt is enabled together with DMA Mode. */ -static irqreturn_t NCR5380_intr(int irq, void *dev_id) +static irqreturn_t __maybe_unused NCR5380_intr(int irq, void *dev_id) { struct Scsi_Host *instance = dev_id; struct NCR5380_hostdata *hostdata = shost_priv(instance); @@ -1020,8 +1012,6 @@ static irqreturn_t NCR5380_intr(int irq, return IRQ_RETVAL(handled); } -#endif - /* * Function : int NCR5380_select(struct Scsi_Host *instance, * struct scsi_cmnd *cmd) Index: linux/drivers/scsi/NCR5380.h === --- linux.orig/drivers/scsi/NCR5380.h 2016-03-21 13:31:33.0 +1100 +++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:40.0 +1100 @@ -280,16 +280,12 @@ static void NCR5380_print(struct Scsi_Ho #define NCR5380_dprint_phase(flg, arg) do {} while (0) #endif -#if defined(AUTOPROBE_IRQ) static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible); -#endif static int NCR5380_init(struct Scsi_Host *instance, int flags); static int NCR5380_maybe_reset_bus(struct Scsi_Host *); static void NCR5380_exit(struct Scsi_Host *instance); static void NCR5380_information_transfer(struct Scsi_Host *instance); -#ifndef DONT_USE_INTR static irqreturn_t NCR5380_intr(int irq, void *dev_id); -#endif static void NCR5380_main(struct work_struct *work); static const char *NCR5380_info(struct Scsi_Host *instance); static void NCR5380_reselect(struct Scsi_Host *instance); Index: linux/drivers/scsi/arm/oak.c === --- linux.orig/drivers/scsi/arm/oak.c 2016-03-21 13:31:27.0 +1100 +++ linux/drivers/scsi/arm/oak.c2016-03-21 13:31:40.0 +1100 @@ -14,8 +14,6 @@ #include -#define DONT_USE_INTR - #define priv(host) ((struct NCR5380_hostdata *)(host)->hostdata) #define NCR5380_read(reg) \ Index: linux/drivers/scsi/dmx3191d.c === --- linux.orig/drivers/scsi/dmx3191d.c 2016-03-21 13:31:37.0 +1100 +++ linux/drivers/scsi/dmx3191d.c 2016-03-21 13:31:40.0 +1100 @@ -34,8 +34,6 @@ * Definitions for the generic 5380 driver. */ -#define DONT_USE_INTR - #define NCR5380_read(reg) inb(instance->io_port + reg) #define NCR5380_write(reg, value) outb(value, instance->io_port + reg) Index: linux/drivers/scsi/dtc.c === --- linux.orig/drivers/scsi/dtc.c 2016-03-21 13:31:27.0 +1100 +++ linux/drivers/scsi/dtc.c2016-03-21 13:31:40.0 +1100 @@ -1,5 +1,3 @@ -#define DONT_USE_INTR - /* * DTC 3180/3280 driver, by * Ray Van Tassle ra...@comm.mot.com @@ -53,7 +51,6 @@ #include #include "dtc.h" -#define AUTOPROBE_IRQ #include "NCR5380.h" /* @@ -243,9 +240,10 @@ found: if (instance->irq == 255) instance->irq = NO_IRQ; -#ifndef DONT_USE_INTR /* With interrupts enabled, it will sometimes hang when doing heavy * reads. So better not enable them until I finger it out. */ +
[PATCH v3 20/23] atari_scsi: Set a reasonable default for cmd_per_lun
This setting does not need to be conditional on Atari ST or TT. Signed-off-by: Finn Thain Tested-by: Michael Schmitz --- Changed since v1: - Set the default cmd_per_lun to 4 based on test results. Changed since v2: - Revert the default cmd_per_lun to 2, like in the v1 patch, because a uniform default across all ten 5380 wrapper drivers is worth more than a tiny improvement in one particular microbenchmark on one system. Michael tells me that 2 is also the best setting for his Atari Falcon. --- drivers/scsi/atari_scsi.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Index: linux/drivers/scsi/atari_scsi.c === --- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:33.0 +1100 +++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:43.0 +1100 @@ -752,6 +752,7 @@ static struct scsi_host_template atari_s .eh_abort_handler = atari_scsi_abort, .eh_bus_reset_handler = atari_scsi_bus_reset, .this_id= 7, + .cmd_per_lun= 2, .use_clustering = DISABLE_CLUSTERING, .cmd_size = NCR5380_CMD_SIZE, }; @@ -788,11 +789,9 @@ static int __init atari_scsi_probe(struc */ if (ATARIHW_PRESENT(TT_SCSI)) { atari_scsi_template.can_queue= 16; - atari_scsi_template.cmd_per_lun = 8; atari_scsi_template.sg_tablesize = SG_ALL; } else { atari_scsi_template.can_queue= 8; - atari_scsi_template.cmd_per_lun = 1; atari_scsi_template.sg_tablesize = SG_NONE; }
[PATCH v3 13/23] ncr5380: Remove disused atari_NCR5380.c core driver
Now that atari_scsi and sun3_scsi have been converted to use the NCR5380.c core driver, remove atari_NCR5380.c. Also remove the last vestiges of its Tagged Command Queueing implementation from the wrapper drivers. The TCQ support in atari_NCR5380.c is abandoned by this patch. It is not merged into the remaining core driver because, 1) atari_scsi defines SUPPORT_TAGS but leaves FLAG_TAGGED_QUEUING disabled by default, which indicates that it is mostly undesirable. 2) I'm told that it doesn't work correctly when enabled. 3) The algorithm does not make use of block layer tags which it will have to do because scmd->tag is deprecated. 4) sun3_scsi doesn't define SUPPORT_TAGS at all, yet the the SUPPORT_TAGS macro interacts with the CONFIG_SUN3 macro in 'interesting' ways. 5) Compile-time configuration with macros like SUPPORT_TAGS caused the configuration space to explode, leading to untestable and unmaintainable code that is too hard to reason about. The merge_contiguous_buffers() code is also abandoned. This was unused by sun3_scsi. Only atari_scsi used it and then only on TT, because only TT supports scatter/gather. I suspect that the TT would work fine with ENABLE_CLUSTERING instead. If someone can benchmark the difference then perhaps the merge_contiguous_buffers() code can be be justified. Until then we are better off without the extra complexity. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 22 drivers/scsi/NCR5380.h | 19 drivers/scsi/atari_NCR5380.c | 2632 --- drivers/scsi/atari_scsi.c| 11 drivers/scsi/mac_scsi.c |8 drivers/scsi/sun3_scsi.c | 11 6 files changed, 4 insertions(+), 2699 deletions(-) Index: linux/drivers/scsi/atari_scsi.c === --- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:31.0 +1100 +++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:33.0 +1100 @@ -87,9 +87,6 @@ /* Definitions for the core NCR5380 driver. */ -#define SUPPORT_TAGS -#define MAX_TAGS32 - #define NCR5380_implementation_fields /* none */ #define NCR5380_read(reg) atari_scsi_reg_read(reg) @@ -189,8 +186,6 @@ static int setup_cmd_per_lun = -1; module_param(setup_cmd_per_lun, int, 0); static int setup_sg_tablesize = -1; module_param(setup_sg_tablesize, int, 0); -static int setup_use_tagged_queuing = -1; -module_param(setup_use_tagged_queuing, int, 0); static int setup_hostid = -1; module_param(setup_hostid, int, 0); static int setup_toshiba_delay = -1; @@ -479,8 +474,7 @@ static int __init atari_scsi_setup(char setup_sg_tablesize = ints[3]; if (ints[0] >= 4) setup_hostid = ints[4]; - if (ints[0] >= 5) - setup_use_tagged_queuing = ints[5]; + /* ints[5] (use_tagged_queuing) is ignored */ /* ints[6] (use_pdma) is ignored */ if (ints[0] >= 7) setup_toshiba_delay = ints[7]; @@ -853,9 +847,6 @@ static int __init atari_scsi_probe(struc instance->irq = irq->start; host_flags |= IS_A_TT() ? 0 : FLAG_LATE_DMA_SETUP; -#ifdef SUPPORT_TAGS - host_flags |= setup_use_tagged_queuing > 0 ? FLAG_TAGGED_QUEUING : 0; -#endif host_flags |= setup_toshiba_delay > 0 ? FLAG_TOSHIBA_DELAY : 0; error = NCR5380_init(instance, host_flags); Index: linux/drivers/scsi/sun3_scsi.c === --- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:32.0 +1100 +++ linux/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:33.0 +1100 @@ -41,9 +41,6 @@ /* Definitions for the core NCR5380 driver. */ -/* #define SUPPORT_TAGS */ -/* #define MAX_TAGS 32 */ - #define NCR5380_implementation_fields /* none */ #define NCR5380_read(reg) sun3scsi_read(reg) @@ -75,10 +72,6 @@ static int setup_cmd_per_lun = -1; module_param(setup_cmd_per_lun, int, 0); static int setup_sg_tablesize = -1; module_param(setup_sg_tablesize, int, 0); -#ifdef SUPPORT_TAGS -static int setup_use_tagged_queuing = -1; -module_param(setup_use_tagged_queuing, int, 0); -#endif static int setup_hostid = -1; module_param(setup_hostid, int, 0); @@ -512,10 +505,6 @@ static int __init sun3_scsi_probe(struct instance->io_port = (unsigned long)ioaddr; instance->irq = irq->start; -#ifdef SUPPORT_TAGS - host_flags |= setup_use_tagged_queuing > 0 ? FLAG_TAGGED_QUEUING : 0; -#endif - error = NCR5380_init(instance, host_flags); if (error) goto fail_init; Index: linux/drivers/scsi/mac_scsi.c === --- linux.orig/drivers/scsi/mac_scsi.c 2016-03-21 13:31:27.0 +1100 +++ linux/drivers/scsi/mac_scsi.c 2016-03-21 13:31:33.00
[PATCH v3 17/23] ncr5380: Remove remaining register storage qualifiers
Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:38.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:39.0 +1100 @@ -1555,9 +1555,9 @@ static int NCR5380_transfer_dma(struct S unsigned char **data) { struct NCR5380_hostdata *hostdata = shost_priv(instance); - register int c = *count; - register unsigned char p = *phase; - register unsigned char *d = *data; + int c = *count; + unsigned char p = *phase; + unsigned char *d = *data; unsigned char tmp; int result = 0;
[PATCH v3 15/23] dmx3191d: Drop max_sectors limit
The dmx3191d driver is not capable of DMA or PDMA so all transfers use PIO. Now that large slow PIO transfers periodically stop and call cond_resched(), the max_sectors limit can go away. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke --- drivers/scsi/dmx3191d.c |1 - 1 file changed, 1 deletion(-) Index: linux/drivers/scsi/dmx3191d.c === --- linux.orig/drivers/scsi/dmx3191d.c 2016-03-21 13:31:27.0 +1100 +++ linux/drivers/scsi/dmx3191d.c 2016-03-21 13:31:37.0 +1100 @@ -67,7 +67,6 @@ static struct scsi_host_template dmx3191 .cmd_per_lun= 2, .use_clustering = DISABLE_CLUSTERING, .cmd_size = NCR5380_CMD_SIZE, - .max_sectors= 128, }; static int dmx3191d_probe_one(struct pci_dev *pdev,
[PATCH v3 16/23] ncr5380: Fix register decoding for debugging
Decode all bits in the chip registers. They are all useful at times. Fix printk severity so that this output can be suppressed along with the other debugging output. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:36.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:38.0 +1100 @@ -256,12 +256,20 @@ static struct { {0, NULL} }, basrs[] = { + {BASR_END_DMA_TRANSFER, "END OF DMA"}, + {BASR_DRQ, "DRQ"}, + {BASR_PARITY_ERROR, "PARITY ERROR"}, + {BASR_IRQ, "IRQ"}, + {BASR_PHASE_MATCH, "PHASE MATCH"}, + {BASR_BUSY_ERROR, "BUSY ERROR"}, {BASR_ATN, "ATN"}, {BASR_ACK, "ACK"}, {0, NULL} }, icrs[] = { {ICR_ASSERT_RST, "ASSERT RST"}, + {ICR_ARBITRATION_PROGRESS, "ARB. IN PROGRESS"}, + {ICR_ARBITRATION_LOST, "LOST ARB."}, {ICR_ASSERT_ACK, "ASSERT ACK"}, {ICR_ASSERT_BSY, "ASSERT BSY"}, {ICR_ASSERT_SEL, "ASSERT SEL"}, @@ -270,14 +278,14 @@ icrs[] = { {0, NULL} }, mrs[] = { - {MR_BLOCK_DMA_MODE, "MODE BLOCK DMA"}, - {MR_TARGET, "MODE TARGET"}, - {MR_ENABLE_PAR_CHECK, "MODE PARITY CHECK"}, - {MR_ENABLE_PAR_INTR, "MODE PARITY INTR"}, - {MR_ENABLE_EOP_INTR, "MODE EOP INTR"}, - {MR_MONITOR_BSY, "MODE MONITOR BSY"}, - {MR_DMA_MODE, "MODE DMA"}, - {MR_ARBITRATE, "MODE ARBITRATION"}, + {MR_BLOCK_DMA_MODE, "BLOCK DMA MODE"}, + {MR_TARGET, "TARGET"}, + {MR_ENABLE_PAR_CHECK, "PARITY CHECK"}, + {MR_ENABLE_PAR_INTR, "PARITY INTR"}, + {MR_ENABLE_EOP_INTR, "EOP INTR"}, + {MR_MONITOR_BSY, "MONITOR BSY"}, + {MR_DMA_MODE, "DMA MODE"}, + {MR_ARBITRATE, "ARBITRATE"}, {0, NULL} }; @@ -298,23 +306,23 @@ static void NCR5380_print(struct Scsi_Ho icr = NCR5380_read(INITIATOR_COMMAND_REG); basr = NCR5380_read(BUS_AND_STATUS_REG); - printk("STATUS_REG: %02x ", status); + printk(KERN_DEBUG "SR = 0x%02x : ", status); for (i = 0; signals[i].mask; ++i) if (status & signals[i].mask) - printk(",%s", signals[i].name); - printk("\nBASR: %02x ", basr); + printk(KERN_CONT "%s, ", signals[i].name); + printk(KERN_CONT "\nBASR = 0x%02x : ", basr); for (i = 0; basrs[i].mask; ++i) if (basr & basrs[i].mask) - printk(",%s", basrs[i].name); - printk("\nICR: %02x ", icr); + printk(KERN_CONT "%s, ", basrs[i].name); + printk(KERN_CONT "\nICR = 0x%02x : ", icr); for (i = 0; icrs[i].mask; ++i) if (icr & icrs[i].mask) - printk(",%s", icrs[i].name); - printk("\nMODE: %02x ", mr); + printk(KERN_CONT "%s, ", icrs[i].name); + printk(KERN_CONT "\nMR = 0x%02x : ", mr); for (i = 0; mrs[i].mask; ++i) if (mr & mrs[i].mask) - printk(",%s", mrs[i].name); - printk("\n"); + printk(KERN_CONT "%s, ", mrs[i].name); + printk(KERN_CONT "\n"); } static struct {
[PATCH v3 22/23] mac_scsi: Fix pseudo DMA implementation
Fix various issues: Comments about bus errors are incorrect. The PDMA asm must return the size of the memory access that faulted so the transfer count can be adjusted accordingly. A phase change may cause a bus error but should not be treated as failure. A bus error does not always imply a phase change and generally the transfer may continue. Scatter/gather doesn't seem to work with PDMA due to overruns. This is a pity because peak throughput seems to double with SG_ALL. Tested on a Mac LC III. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke --- Changed since v1: - Set the default cmd_per_lun to 4 based on test results. Changed since v2: - Revert the default cmd_per_lun to 2, like in the v1 patch, because a uniform default across all ten 5380 wrapper drivers is worth more than a tiny improvement in one particular microbenchmark on one system. - Add 'reviewed-by' tag. --- drivers/scsi/NCR5380.h |2 drivers/scsi/mac_scsi.c | 210 ++-- 2 files changed, 118 insertions(+), 94 deletions(-) Index: linux/drivers/scsi/mac_scsi.c === --- linux.orig/drivers/scsi/mac_scsi.c 2016-03-21 13:31:33.0 +1100 +++ linux/drivers/scsi/mac_scsi.c 2016-03-21 13:31:45.0 +1100 @@ -28,7 +28,8 @@ /* Definitions for the core NCR5380 driver. */ -#define NCR5380_implementation_fields unsigned char *pdma_base +#define NCR5380_implementation_fields unsigned char *pdma_base; \ +int pdma_residual #define NCR5380_read(reg) macscsi_read(instance, reg) #define NCR5380_write(reg, value) macscsi_write(instance, reg, value) @@ -37,7 +38,7 @@ macscsi_dma_xfer_len(instance, cmd) #define NCR5380_dma_recv_setup macscsi_pread #define NCR5380_dma_send_setup macscsi_pwrite -#define NCR5380_dma_residual(instance) (0) +#define NCR5380_dma_residual(instance) (hostdata->pdma_residual) #define NCR5380_intrmacscsi_intr #define NCR5380_queue_command macscsi_queue_command @@ -104,18 +105,9 @@ static int __init mac_scsi_setup(char *s __setup("mac5380=", mac_scsi_setup); #endif /* !MODULE */ -/* - Pseudo-DMA: (Ove Edlund) - The code attempts to catch bus errors that occur if one for example - "trips over the cable". - XXX: Since bus errors in the PDMA routines never happen on my - computer, the bus error code is untested. - If the code works as intended, a bus error results in Pseudo-DMA - being disabled, meaning that the driver switches to slow handshake. - If bus errors are NOT extremely rare, this has to be changed. -*/ +/* Pseudo DMA asm originally by Ove Edlund */ -#define CP_IO_TO_MEM(s,d,len) \ +#define CP_IO_TO_MEM(s,d,n)\ __asm__ __volatile__ \ ("cmp.w #4,%2\n" \ "bls8f\n" \ @@ -152,61 +144,73 @@ __asm__ __volatile__ \ " 9: \n" \ ".section .fixup,\"ax\"\n"\ ".even\n" \ - "90: moveq.l #1, %2\n"\ + "91: moveq.l #1, %2\n"\ + "jra 9b\n"\ + "94: moveq.l #4, %2\n"\ "jra 9b\n"\ ".previous\n" \ ".section __ex_table,\"a\"\n" \ " .align 4\n" \ - " .long 1b,90b\n" \ - " .long 3b,90b\n" \ - " .long 31b,90b\n" \ - " .long 32b,90b\n" \ - " .long 33b,90b\n" \ - " .long 34b,90b\n" \ - " .long 35b,90b\n" \ - " .long 36b,90b\n" \ - " .long 37b,90b\n" \ - " .long 5b,90b\n" \ - " .long 7b,90b\n" \ + " .long 1b,91b\n" \ + " .long 3b,94b\n" \ + " .long 31b,94b\n" \ + " .long 32b,94b\n" \ + " .long 33b,94b\n" \ + " .long 34b,94b\n" \ + " .long 35b,94b\n" \ + " .long 36b,94b\n" \ + " .long 37b,94b\n" \ + " .long 5b,94b\n"
[PATCH v3 23/23] ncr5380: Call complete_cmd() for disconnected commands on bus reset
I'm told that some targets are liable to disconnect a REQUEST SENSE command. Theoretically this would cause a command undergoing autosense to be moved onto the disconnected list. The bus reset handler must call complete_cmd() for these commands, otherwise the hostdata->sensing pointer will not get cleared. That would cause autosense processing to stall and a timeout or an incorrect scsi_eh_restore_cmnd() would eventually follow. Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:40.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:47.0 +1100 @@ -2437,7 +2437,7 @@ static int NCR5380_bus_reset(struct scsi struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); set_host_byte(cmd, DID_RESET); - cmd->scsi_done(cmd); + complete_cmd(instance, cmd); } INIT_LIST_HEAD(&hostdata->disconnected);
[PATCH v3 19/23] ncr5380: Update usage documentation
Update kernel parameter documentation for atari_scsi, mac_scsi and g_NCR5380 drivers. Remove duplication. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke --- Documentation/scsi/g_NCR5380.txt | 17 ++- Documentation/scsi/scsi-parameters.txt | 11 +++--- drivers/scsi/g_NCR5380.c | 36 - 3 files changed, 16 insertions(+), 48 deletions(-) Index: linux/Documentation/scsi/scsi-parameters.txt === --- linux.orig/Documentation/scsi/scsi-parameters.txt 2016-03-21 13:31:06.0 +1100 +++ linux/Documentation/scsi/scsi-parameters.txt2016-03-21 13:31:42.0 +1100 @@ -27,13 +27,15 @@ parameters may be changed at runtime by aic79xx=[HW,SCSI] See Documentation/scsi/aic79xx.txt. - atascsi=[HW,SCSI] Atari SCSI + atascsi=[HW,SCSI] + See drivers/scsi/atari_scsi.c. BusLogic= [HW,SCSI] See drivers/scsi/BusLogic.c, comment before function BusLogic_ParseDriverOptions(). dtc3181e= [HW,SCSI] + See Documentation/scsi/g_NCR5380.txt. eata= [HW,SCSI] @@ -51,8 +53,8 @@ parameters may be changed at runtime by ips=[HW,SCSI] Adaptec / IBM ServeRAID controller See header of drivers/scsi/ips.c. - mac5380=[HW,SCSI] Format: - + mac5380=[HW,SCSI] + See drivers/scsi/mac_scsi.c. max_luns= [SCSI] Maximum number of LUNs to probe. Should be between 1 and 2^32-1. @@ -65,10 +67,13 @@ parameters may be changed at runtime by See header of drivers/scsi/NCR_D700.c. ncr5380=[HW,SCSI] + See Documentation/scsi/g_NCR5380.txt. ncr53c400= [HW,SCSI] + See Documentation/scsi/g_NCR5380.txt. ncr53c400a= [HW,SCSI] + See Documentation/scsi/g_NCR5380.txt. ncr53c406a= [HW,SCSI] Index: linux/Documentation/scsi/g_NCR5380.txt === --- linux.orig/Documentation/scsi/g_NCR5380.txt 2016-03-21 13:31:06.0 +1100 +++ linux/Documentation/scsi/g_NCR5380.txt 2016-03-21 13:31:42.0 +1100 @@ -23,11 +23,10 @@ supported by the driver. If the default configuration does not work for you, you can use the kernel command lines (eg using the lilo append command): - ncr5380=port,irq,dma - ncr53c400=port,irq -or - ncr5380=base,irq,dma - ncr53c400=base,irq + ncr5380=addr,irq + ncr53c400=addr,irq + ncr53c400a=addr,irq + dtc3181e=addr,irq The driver does not probe for any addresses or ports other than those in the OVERRIDE or given to the kernel as above. @@ -36,19 +35,17 @@ This driver provides some information on /proc/scsi/g_NCR5380/x where x is the scsi card number as detected at boot time. More info to come in the future. -When NCR53c400 support is compiled in, BIOS parameters will be returned by -the driver (the raw 5380 driver does not and I don't plan to fiddle with -it!). - This driver works as a module. When included as a module, parameters can be passed on the insmod/modprobe command line: ncr_irq=xx the interrupt ncr_addr=xx the port or base address (for port or memory mapped, resp.) - ncr_dma=xx the DMA ncr_5380=1 to set up for a NCR5380 board ncr_53c400=1 to set up for a NCR53C400 board + ncr_53c400a=1 to set up for a NCR53C400A board + dtc_3181e=1 to set up for a Domex Technology Corp 3181E board + hp_c2502=1 to set up for a Hewlett Packard C2502 board e.g. modprobe g_NCR5380 ncr_irq=5 ncr_addr=0x350 ncr_5380=1 for a port mapped NCR5380 board or Index: linux/drivers/scsi/g_NCR5380.c === --- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:40.0 +1100 +++ linux/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:42.0 +1100 @@ -18,42 +18,8 @@ * * Added ISAPNP support for DTC436 adapters, * Thomas Sailer, sai...@ife.ee.ethz.ch - */ - -/* - * TODO : flesh out DMA support, find some one actually using this (I have - * a memory mapped Trantor board that works fine) - */ - -/* - * The card is detected and initialized in one of several ways : - * 1. With command line overrides - NCR5380=port,irq may be - * used on the LILO command line to override the defaults. - * - * 2. With the GENERIC_NCR5380_OVERRIDE compile time define. This is - * specified as an array of address, irq, dma, board tuples. Ie, for - * one board at 0x350, IRQ5, no dma, I could say - * -DGENERIC_NCR5380_OVERRIDE={{0xcc000, 5,
[PATCH v3 02/23] ncr5380: Remove FLAG_NO_PSEUDO_DMA where possible
Drivers that define PSEUDO_DMA also define NCR5380_dma_xfer_len. The core driver must call NCR5380_dma_xfer_len which means FLAG_NO_PSEUDO_DMA can be eradicated from the core driver. dmx3191d doesn't define PSEUDO_DMA and has no use for FLAG_NO_PSEUDO_DMA, so remove it there also. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c |3 +-- drivers/scsi/dmx3191d.c |2 +- drivers/scsi/g_NCR5380.c |7 ++- drivers/scsi/g_NCR5380.h |2 +- drivers/scsi/mac_scsi.c | 15 ++- 5 files changed, 23 insertions(+), 6 deletions(-) Index: linux/drivers/scsi/dmx3191d.c === --- linux.orig/drivers/scsi/dmx3191d.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/dmx3191d.c 2016-03-21 13:31:09.0 +1100 @@ -93,7 +93,7 @@ static int dmx3191d_probe_one(struct pci */ shost->irq = NO_IRQ; - error = NCR5380_init(shost, FLAG_NO_PSEUDO_DMA); + error = NCR5380_init(shost, 0); if (error) goto out_host_put; Index: linux/drivers/scsi/mac_scsi.c === --- linux.orig/drivers/scsi/mac_scsi.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/mac_scsi.c 2016-03-21 13:31:09.0 +1100 @@ -37,7 +37,9 @@ #define NCR5380_pread macscsi_pread #define NCR5380_pwrite macscsi_pwrite -#define NCR5380_dma_xfer_len(instance, cmd, phase) (cmd->transfersize) + +#define NCR5380_dma_xfer_len(instance, cmd, phase) \ +macscsi_dma_xfer_len(instance, cmd) #define NCR5380_intrmacscsi_intr #define NCR5380_queue_command macscsi_queue_command @@ -303,6 +305,17 @@ static int macscsi_pwrite(struct Scsi_Ho } #endif +static int macscsi_dma_xfer_len(struct Scsi_Host *instance, +struct scsi_cmnd *cmd) +{ + struct NCR5380_hostdata *hostdata = shost_priv(instance); + + if (hostdata->flags & FLAG_NO_PSEUDO_DMA) + return 0; + + return cmd->transfersize; +} + #include "NCR5380.c" #define DRV_MODULE_NAME "mac_scsi" Index: linux/drivers/scsi/g_NCR5380.c === --- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:09.0 +1100 @@ -712,10 +712,15 @@ static inline int NCR5380_pwrite(struct return 0; } -static int generic_NCR5380_dma_xfer_len(struct scsi_cmnd *cmd) +static int generic_NCR5380_dma_xfer_len(struct Scsi_Host *instance, +struct scsi_cmnd *cmd) { + struct NCR5380_hostdata *hostdata = shost_priv(instance); int transfersize = cmd->transfersize; + if (hostdata->flags & FLAG_NO_PSEUDO_DMA) + return 0; + /* Limit transfers to 32K, for xx400 & xx406 * pseudoDMA that transfers in 128 bytes blocks. */ Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:09.0 +1100 @@ -1833,8 +1833,7 @@ static void NCR5380_information_transfer #if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) transfersize = 0; - if (!cmd->device->borken && - !(hostdata->flags & FLAG_NO_PSEUDO_DMA)) + if (!cmd->device->borken) transfersize = NCR5380_dma_xfer_len(instance, cmd, phase); if (transfersize) { Index: linux/drivers/scsi/g_NCR5380.h === --- linux.orig/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:09.0 +1100 @@ -61,7 +61,7 @@ #endif #define NCR5380_dma_xfer_len(instance, cmd, phase) \ -generic_NCR5380_dma_xfer_len(cmd) +generic_NCR5380_dma_xfer_len(instance, cmd) #define NCR5380_intr generic_NCR5380_intr #define NCR5380_queue_command generic_NCR5380_queue_command
[PATCH v3 06/23] ncr5380: Remove PSEUDO_DMA macro
For those wrapper drivers which only implement Programmed IO, have NCR5380_dma_xfer_len() evaluate to zero. That allows PDMA to be easily disabled at run-time and so the PSEUDO_DMA macro is no longer needed. Also remove the spin counters used for debugging pseudo DMA drivers. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 32 +--- drivers/scsi/NCR5380.h |4 drivers/scsi/arm/cumana_1.c |2 -- drivers/scsi/arm/oak.c |3 +-- drivers/scsi/dmx3191d.c |4 drivers/scsi/dtc.c |7 --- drivers/scsi/dtc.h |2 -- drivers/scsi/g_NCR5380.c|1 - drivers/scsi/g_NCR5380.h|1 - drivers/scsi/mac_scsi.c | 10 -- drivers/scsi/pas16.c| 10 -- drivers/scsi/pas16.h|2 -- drivers/scsi/t128.c |4 drivers/scsi/t128.h |2 -- 14 files changed, 6 insertions(+), 78 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:14.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:16.0 +1100 @@ -469,34 +469,9 @@ static void prepare_info(struct Scsi_Hos #ifdef PARITY "PARITY " #endif -#ifdef PSEUDO_DMA -"PSEUDO_DMA " -#endif ""); } -#ifdef PSEUDO_DMA -static int __maybe_unused NCR5380_write_info(struct Scsi_Host *instance, - char *buffer, int length) -{ - struct NCR5380_hostdata *hostdata = shost_priv(instance); - - hostdata->spin_max_r = 0; - hostdata->spin_max_w = 0; - return 0; -} - -static int __maybe_unused NCR5380_show_info(struct seq_file *m, -struct Scsi_Host *instance) -{ - struct NCR5380_hostdata *hostdata = shost_priv(instance); - - seq_printf(m, "Highwater I/O busy spin counts: write %d, read %d\n", - hostdata->spin_max_w, hostdata->spin_max_r); - return 0; -} -#endif - /** * NCR5380_init - initialise an NCR5380 * @instance: adapter to configure @@ -1436,7 +1411,6 @@ timeout: return -1; } -#if defined(PSEUDO_DMA) /* * Function : int NCR5380_transfer_dma (struct Scsi_Host *instance, * unsigned char *phase, int *count, unsigned char **data) @@ -1592,7 +1566,6 @@ static int NCR5380_transfer_dma(struct S *phase = NCR5380_read(STATUS_REG) & PHASE_MASK; return foo; } -#endif /* PSEUDO_DMA */ /* * Function : NCR5380_information_transfer (struct Scsi_Host *instance) @@ -1683,7 +1656,6 @@ static void NCR5380_information_transfer * in an unconditional loop. */ -#if defined(PSEUDO_DMA) transfersize = 0; if (!cmd->device->borken) transfersize = NCR5380_dma_xfer_len(instance, cmd, phase); @@ -1706,9 +1678,7 @@ static void NCR5380_information_transfer /* XXX - need to source or sink data here, as appropriate */ } else cmd->SCp.this_residual -= transfersize - len; - } else -#endif /* PSEUDO_DMA */ - { + } else { /* Break up transfer into 3 ms chunks, * presuming 6 accesses per handshake. */ Index: linux/drivers/scsi/NCR5380.h === --- linux.orig/drivers/scsi/NCR5380.h 2016-03-21 13:31:14.0 +1100 +++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:16.0 +1100 @@ -257,10 +257,6 @@ struct NCR5380_hostdata { #ifdef SUPPORT_TAGS struct tag_alloc TagAlloc[8][8];/* 8 targets and 8 LUNs */ #endif -#ifdef PSEUDO_DMA - unsigned spin_max_r; - unsigned spin_max_w; -#endif struct workqueue_struct *work_q; unsigned long accesses_per_ms; /* chip register accesses per ms */ }; Index: linux/drivers/scsi/arm/cumana_1.c === --- linux.orig/drivers/scsi/arm/cumana_1.c 2016-03-21 13:31:14.0 +1100 +++ linux/drivers/scsi/arm/cumana_1.c 2016-03-21 13:31:16.0 +1100 @@ -13,8 +13,6 @@ #include -#define PSEUDO_DMA - #define priv(host) ((struct NCR5380_hostdata *)(host)->hostdata) #define NCR5380_read(reg) cumanascsi_read(instance, reg) #define NCR5380_write(reg, value) cumanascsi_write(instance, reg, value) Index: linux/drivers/scsi/arm/oak.c
[PATCH v3 21/23] atari_scsi: Allow can_queue to be increased for Falcon
The benefit of limiting can_queue to 1 is that atari_scsi shares the ST DMA chip more fairly with other drivers (e.g. falcon-ide). Unfortunately, this can limit SCSI bus utilization. On systems without IDE, atari_scsi should issue SCSI commands whenever it can arbitrate for the bus. Make that possible by making can_queue configurable. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/atari_scsi.c | 83 -- 1 file changed, 22 insertions(+), 61 deletions(-) Index: linux/drivers/scsi/atari_scsi.c === --- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:43.0 +1100 +++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:44.0 +1100 @@ -14,55 +14,23 @@ * */ - -/**/ -/**/ -/* Notes for Falcon SCSI: */ -/* -- */ -/**/ -/* Since the Falcon SCSI uses the ST-DMA chip, that is shared among */ -/* several device drivers, locking and unlocking the access to this */ -/* chip is required. But locking is not possible from an interrupt, */ -/* since it puts the process to sleep if the lock is not available. */ -/* This prevents "late" locking of the DMA chip, i.e. locking it just */ -/* before using it, since in case of disconnection-reconnection */ -/* commands, the DMA is started from the reselection interrupt. */ -/**/ -/* Two possible schemes for ST-DMA-locking would be: */ -/* 1) The lock is taken for each command separately and disconnecting*/ -/* is forbidden (i.e. can_queue = 1). */ -/* 2) The DMA chip is locked when the first command comes in and */ -/* released when the last command is finished and all queues are */ -/* empty. */ -/* The first alternative would result in bad performance, since the */ -/* interleaving of commands would not be used. The second is unfair to*/ -/* other drivers using the ST-DMA, because the queues will seldom be */ -/* totally empty if there is a lot of disk traffic. */ -/**/ -/* For this reasons I decided to employ a more elaborate scheme: */ -/* - First, we give up the lock every time we can (for fairness), this*/ -/*means every time a command finishes and there are no other commands */ -/*on the disconnected queue. */ -/* - If there are others waiting to lock the DMA chip, we stop */ -/*issuing commands, i.e. moving them onto the issue queue. */ -/*Because of that, the disconnected queue will run empty in a */ -/*while. Instead we go to sleep on a 'fairness_queue'.*/ -/* - If the lock is released, all processes waiting on the fairness */ -/*queue will be woken. The first of them tries to re-lock the DMA, */ -/*the others wait for the first to finish this task. After that, */ -/*they can all run on and do their commands...*/ -/* This sounds complicated (and it is it :-(), but it seems to be a */ -/* good compromise between fairness and performance: As long as no one */ -/* else wants to work with the ST-DMA chip, SCSI can go along as */ -/* usual. If now someone else comes, this behaviour is changed to a */ -/* "fairness mode": just already initiated commands are finished and */ -/* then the lock is released. The other one waiting will probably win */ -/* the race for locking the DMA, since it was waiting for longer. And */ -/* after it has finished, SCSI can go ahead again. Finally: I hope I */ -/* have not produced any deadlock possibilities! */ -/**/ -/**/ - +/* + * Notes for Falcon SCSI DMA + * + * The 5380 device is one of several that all share the DMA chip. Hence + * "locking" and "unlocking" access to this chip is required. + * + * Two possible schemes for ST DMA acquisition by atari_scsi are: + * 1) The lock is taken for each command separately (i.e. can_queue == 1). + * 2) The lock is taken when the first command arrives and released + * when the last command is finished (i.e. can_queue > 1). + * + * The firs
[PATCH v3 04/23] atari_NCR5380: Remove DMA_MIN_SIZE macro
Only the atari_scsi and sun3_scsi drivers define DMA_MIN_SIZE. Both drivers also define NCR5380_dma_xfer_len, which means DMA_MIN_SIZE can be removed from the core driver. This removes another discrepancy between the two core drivers. Signed-off-by: Finn Thain Tested-by: Michael Schmitz --- Changes since v1: - Retain MIN_DMA_SIZE macro in wrapper drivers. --- drivers/scsi/atari_NCR5380.c | 16 drivers/scsi/atari_scsi.c|6 +- drivers/scsi/sun3_scsi.c | 19 +-- 3 files changed, 22 insertions(+), 19 deletions(-) Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-03-21 13:31:10.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-03-21 13:31:13.0 +1100 @@ -1857,12 +1857,11 @@ static void NCR5380_information_transfer d = cmd->SCp.ptr; } /* this command setup for dma yet? */ - if ((count >= DMA_MIN_SIZE) && (sun3_dma_setup_done != cmd)) { - if (cmd->request->cmd_type == REQ_TYPE_FS) { - sun3scsi_dma_setup(instance, d, count, - rq_data_dir(cmd->request)); - sun3_dma_setup_done = cmd; - } + if (sun3_dma_setup_done != cmd && + sun3scsi_dma_xfer_len(count, cmd) > 0) { + sun3scsi_dma_setup(instance, d, count, + rq_data_dir(cmd->request)); + sun3_dma_setup_done = cmd; } #ifdef SUN3_SCSI_VME dregs->csr |= CSR_INTR; @@ -1927,7 +1926,7 @@ static void NCR5380_information_transfer #endif transfersize = NCR5380_dma_xfer_len(instance, cmd, phase); - if (transfersize >= DMA_MIN_SIZE) { + if (transfersize > 0) { len = transfersize; cmd->SCp.phase = phase; if (NCR5380_transfer_dma(instance, &phase, @@ -2366,7 +2365,8 @@ static void NCR5380_reselect(struct Scsi d = tmp->SCp.ptr; } /* setup this command for dma if not already */ - if ((count >= DMA_MIN_SIZE) && (sun3_dma_setup_done != tmp)) { + if (sun3_dma_setup_done != tmp && + sun3scsi_dma_xfer_len(count, tmp) > 0) { sun3scsi_dma_setup(instance, d, count, rq_data_dir(tmp->request)); sun3_dma_setup_done = tmp; Index: linux/drivers/scsi/atari_scsi.c === --- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:10.0 +1100 +++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:13.0 +1100 @@ -83,11 +83,12 @@ #include +#define DMA_MIN_SIZE32 + /* Definitions for the core NCR5380 driver. */ #define SUPPORT_TAGS #define MAX_TAGS32 -#define DMA_MIN_SIZE32 #define NCR5380_implementation_fields /* none */ @@ -605,6 +606,9 @@ static unsigned long atari_dma_xfer_len( { unsigned long possible_len, limit; + if (wanted_len < DMA_MIN_SIZE) + return 0; + if (IS_A_TT()) /* TT SCSI DMA can transfer arbitrary #bytes */ return wanted_len; Index: linux/drivers/scsi/sun3_scsi.c === --- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:10.0 +1100 +++ linux/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:13.0 +1100 @@ -36,12 +36,12 @@ #include #include "sun3_scsi.h" -/* Definitions for the core NCR5380 driver. */ - -/* #define SUPPORT_TAGS */ /* minimum number of bytes to do dma on */ #define DMA_MIN_SIZE129 +/* Definitions for the core NCR5380 driver. */ + +/* #define SUPPORT_TAGS */ /* #define MAX_TAGS 32 */ #define NCR5380_implementation_fields /* none */ @@ -61,7 +61,7 @@ #define NCR5380_dma_residual(instance) \ sun3scsi_dma_residual(instance) #define NCR5380_dma_xfer_len(instance, cmd, phase) \ -sun3scsi_dma_xfer_len(cmd->SCp.this_residual, cmd, !((phase) & SR_IO)) +sun3scsi_dma_xfer_len(cmd->SCp.this_residual, cmd) #define NCR5380_acquire_dma_irq(instance)(1) #define N
[PATCH v3 00/23] ncr5380: Eliminate macros, reduce code duplication, fix bugs etc
This patch series has more macro elimination and some tweaks to the DMA hooks so that all the wrapper drivers can share the same core DMA algorithm. This resolves the major discrepancies between the two core drivers, which relate to code conditional on the REAL_DMA and PSEUDO_DMA macros. After all the wrapper drivers agree on the DMA hook api, the core driver fork gets resolved. NCR5380.c is adopted by atari_scsi and sun3_scsi and atari_NCR5380.c is then deleted. Historically, the 5380 drivers suffered from over-use of conditional compilation, which caused the compile-time configuration space to explode, leading to core driver code that was practically untestable, unmaintainable and difficult to reason about. It also prevented driver modules from sharing object code. Along with REAL_DMA, REAL_DMA_POLL and PSEUDO_DMA, most of the remaining macros are also eradicated, such as CONFIG_SCSI_GENERIC_NCR53C400, SUPPORT_TAGS, DONT_USE_INTR, AUTOPROBE_IRQ and BIOSPARAM. Also in this patch series, some duplicated documentation is removed and the PDMA implementation in mac_scsi finally gets fixed. This patch series was tested by exercising the dmx3191d and mac_scsi modules on suitable hardware. Michael has tested atari_scsi on an Atari Falcon. Help with driver testing on ISA cards is sought as I don't have such hardware. Likewise RiscPC ecards and Sun 3. Changes since v1: - Patch 4: don't remove MIN_DMA_SIZE macro from wrapper drivers. - Patch 9: improve commit log entry and add 'Reviewed-by' tag. - Patch 14: reduce shost->max_lun limit instead of adding MAX_LUN limit. - Patches 20 and 22: set the default cmd_per_lun to 4. - For the rest: add 'Reviewed-by' tag. Changes since v2: - Patches 20 and 22: revert the default cmd_per_lun to 2, like the v1 patch series. - Add patch 23 to fix a theoretical bus reset/autosense issue. --- Documentation/scsi/g_NCR5380.txt | 17 Documentation/scsi/scsi-parameters.txt | 11 drivers/scsi/Kconfig | 11 drivers/scsi/NCR5380.c | 659 drivers/scsi/NCR5380.h | 143 - drivers/scsi/arm/cumana_1.c| 25 drivers/scsi/arm/oak.c | 22 drivers/scsi/atari_NCR5380.c | 2676 - drivers/scsi/atari_scsi.c | 144 - drivers/scsi/dmx3191d.c| 10 drivers/scsi/dtc.c | 27 drivers/scsi/dtc.h |7 drivers/scsi/g_NCR5380.c | 143 - drivers/scsi/g_NCR5380.h | 26 drivers/scsi/mac_scsi.c| 239 +- drivers/scsi/pas16.c | 27 drivers/scsi/pas16.h |5 drivers/scsi/sun3_scsi.c | 47 drivers/scsi/t128.c| 19 drivers/scsi/t128.h|7 20 files changed, 634 insertions(+), 3631 deletions(-)
[PATCH v3 05/23] ncr5380: Disable the DMA errata workaround flag by default
The only chip that needs the workarounds enabled is an early NMOS device. That means that the common case is to disable them. Unfortunately the sense of the flag is such that it has to be set for the common case. Rename the flag so that zero can be used to mean "no errata workarounds needed". This simplifies the code. Signed-off-by: Finn Thain Reviewed-by: Hannes Reinecke Tested-by: Michael Schmitz --- drivers/scsi/NCR5380.c | 14 +++--- drivers/scsi/NCR5380.h |2 +- drivers/scsi/arm/cumana_1.c |2 +- drivers/scsi/arm/oak.c |2 +- drivers/scsi/dtc.c |2 +- drivers/scsi/g_NCR5380.c|8 +--- drivers/scsi/pas16.c|2 +- drivers/scsi/t128.c |2 +- 8 files changed, 14 insertions(+), 20 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-03-21 13:31:10.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:14.0 +1100 @@ -457,7 +457,7 @@ static void prepare_info(struct Scsi_Hos instance->base, instance->irq, instance->can_queue, instance->cmd_per_lun, instance->sg_tablesize, instance->this_id, -hostdata->flags & FLAG_NO_DMA_FIXUP ? "NO_DMA_FIXUP " : "", +hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "", hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "", hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY " : "", #ifdef AUTOPROBE_IRQ @@ -1480,11 +1480,11 @@ static int NCR5380_transfer_dma(struct S * before the setting of DMA mode to after transfer of the last byte. */ - if (hostdata->flags & FLAG_NO_DMA_FIXUP) + if (hostdata->flags & FLAG_DMA_FIXUP) + NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY); + else NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY | MR_ENABLE_EOP_INTR); - else - NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY); dprintk(NDEBUG_DMA, "scsi%d : mode reg = 0x%X\n", instance->host_no, NCR5380_read(MODE_REG)); @@ -1540,8 +1540,8 @@ static int NCR5380_transfer_dma(struct S if (p & SR_IO) { foo = NCR5380_pread(instance, d, - hostdata->flags & FLAG_NO_DMA_FIXUP ? c : c - 1); - if (!foo && !(hostdata->flags & FLAG_NO_DMA_FIXUP)) { + hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c); + if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { /* * The workaround was to transfer fewer bytes than we * intended to with the pseudo-DMA read function, wait for @@ -1571,7 +1571,7 @@ static int NCR5380_transfer_dma(struct S } } else { foo = NCR5380_pwrite(instance, d, c); - if (!foo && !(hostdata->flags & FLAG_NO_DMA_FIXUP)) { + if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) { /* * Wait for the last byte to be sent. If REQ is being asserted for * the byte we're interested, we'll ACK it and it will go false. Index: linux/drivers/scsi/NCR5380.h === --- linux.orig/drivers/scsi/NCR5380.h 2016-03-21 13:31:10.0 +1100 +++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:14.0 +1100 @@ -220,7 +220,7 @@ #define NO_IRQ 0 #endif -#define FLAG_NO_DMA_FIXUP 1 /* No DMA errata workarounds */ +#define FLAG_DMA_FIXUP 1 /* Use DMA errata workarounds */ #define FLAG_NO_PSEUDO_DMA 8 /* Inhibit DMA */ #define FLAG_LATE_DMA_SETUP32 /* Setup NCR before DMA H/W */ #define FLAG_TAGGED_QUEUING64 /* as X3T9.2 spelled it */ Index: linux/drivers/scsi/dtc.c === --- linux.orig/drivers/scsi/dtc.c 2016-03-21 13:31:07.0 +1100 +++ linux/drivers/scsi/dtc.c2016-03-21 13:31:14.0 +1100 @@ -229,7 +229,7 @@ found: instance->base = addr; ((struct NCR5380_hostdata *)(instance)->hostdata)->base = base; - if (NCR5380_init(instance, FLAG_NO_DMA_FIXUP)) + if (NCR5380_init(instance, 0)) goto out_unregister; NCR5380_maybe_reset_bus(instance); Index: linux/drivers/scsi/g_NCR5380.c === --- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:09.0 +1100 +++ linux/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:14.0 +1100 @@ -348,23 +348,17 @@ static in
Re: [PATCH 0/2] ARM: uniphier: UniPhier updates for Linux 4.6-rc1 (2nd round)
Hi Arnd, 2016-03-19 8:49 GMT+09:00 Masahiro Yamada : > Hi Arnd, > > 2016-03-19 1:49 GMT+09:00 Arnd Bergmann : >> On Tuesday 15 March 2016 11:01:00 Masahiro Yamada wrote: >>> Olof, Arnd, >>> >>> >>> I sent my patches around -rc4 and >>> took action soon as requested. >>> >>> But, my series is still not applied due to the long silence on your side. >>> >>> Please respond! >> >> Sorry for all the delays, we screwed this one up, and you did everything >> right. I have put the DT changes into the next/dt2 branch now, and applied >> the two other patches to next/soc directly. >> >> Please check that the for-next branch in arm-soc has everything you need now. >> > > I checked both branch and everything is fine. > > Thanks you very much! I thought you'd include DT updates in the pull requests, but you didn't. Why was next/dt2 missed? -- Best Regards Masahiro Yamada
[git pull] drm pull for 4.6-rc1
Hi Linus, This is the main drm pull request for 4.6 kernel. The highlights are below, and there are a few merge conflicts, but I think they should all be simple enough for you to take care off. At least at the moment they are just the writecombine interface changes. Overall the coolest thing here for me is the nouveau maxwell signed firmware support from NVidia, it's taken a long while to extract this from them. I also wish the ARM vendors just designed one set of display IP, ARM display block proliferation is definitely increasing. Core: drm_event cleanups Internal API cleanup making mode_fixup optional. Apple GMUX vga switcheroo support. DP AUX testing interface Panel: Refactoring of DSI core for use over more transports. New driver: ARM hdlcd driver i915: FBC/PSR (framebuffer compression, panel self refresh) enabled by default. Ongoing atomic display support work Ongoing runtime PM work Pixel clock limit checks VBT DSI description support GEM fixes GuC firmware scheduler enhancements amdkfd: Deferred probing fixes to avoid make file or link ordering. amdgpu/radeon: ACP support for i2s audio support. Command Submission/GPU scheduler/GPUVM optimisations Initial GPU reset support for amdgpu vmwgfx: Support for DX10 gen mipmaps Pageflipping and other fixes. exynos: Exynos5420 SoC support for FIMD Exynos5422 SoC support for MIPI-DSI nouveau: GM20x secure boot support - adds acceleration for Maxwell GPUs. GM200 support GM20B clock driver support Power sensors work etnaviv: Correctness fixes for GPU cache flushing Better support for i.MX6 systems. imx-drm: VBlank IRQ support Fence support OF endpoint support msm: HDMI support for 8996 (snapdragon 820) Adreno 430 support Timestamp queries support virtio-gpu: Fixes for Android support. rockchip: Add support for Innosilicion HDMI rcar-du: Support for 4 crtcs R8A7795 support RCar Gen 3 support omapdrm: HDMI interlace output support dma-buf import support Refactoring to remove a lot of legacy code. tilcdc: Rewrite of pageflipping code dma-buf support pinctrl support vc4: HDMI modesetting bug fixes Significant 3D performance improvement. fsl-dcu (FreeScale): Lots of fixes tegra: Two small fixes sti: Atomic support for planes Improved HDMI support The following changes since commit 2a4fb270daa9c1f1d1b86a53d66ed86cc64ad232: Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2016-03-11 12:35:54 -0800) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-next for you to fetch changes up to 568d7c764ae01f3706085ac8f0d8a8ac7e826bd7: drm/amdgpu: release_pages requires linux/pagemap.h (2016-03-21 13:22:52 +1000) Abhay Kumar (1): drm/i915: edp resume/On time optimization. Akshay Bhat (1): drm/panel: simple: Fix g121x1_l03 hsync/vsync polarity Alan (1): i915: cast before shifting in i915_pte_count Alan Cox (1): gma500: clean up an excessive and confusing helper Alex Dai (7): drm/i915/guc: Move GuC wq_check_space to alloc_request_extras drm/i915/guc: Add GuC ADS (Addition Data Structure) - allocation drm/i915/guc: Add GuC ADS - scheduler policies drm/i915/guc: Add GuC ADS - MMIO reg state drm/i915/guc: Add GuC ADS - enabling ADS drm/i915/guc: Fix a memory leak where guc->execbuf_client is not freed drm/i915/guc: Decouple GuC engine id from ring id Alex Deucher (33): drm/amdgpu: remove some more semaphore leftovers drm/amdgpu: clean up asic level reset for CI drm/amdgpu: clean up asic level reset for VI drm/amdgpu: post card after hard reset drm/amdgpu: add a debugfs property to trigger a GPU reset drm/amdgpu: drop hard_reset module parameter drm/amd: add dce8 enum register header drm/amdgpu: remove unused function drm/amdgpu: add check for atombios GPU virtualization table drm/amdgpu: track whether the asic supports SR-IOV drm/amdgpu: always repost cards that support SR-IOV drm/amdgpu/gmc8: skip MC ucode loading on SR-IOV capable boards drm/amdgpu/smu: skip SMC ucode loading on SR-IOV capable boards (v2) drm/amdgpu/gfx: minor code cleanup drm/amdgpu/gfx: clean up harvest configuration (v2) drm/amdgpu/gfx7: rework gpu_init() drm/amdgpu/cik: move sdma tiling config setup into sdma code drm/amdgpu/cik: move uvd tiling config setup into uvd code drm/amdgpu/vi: move sdma tiling config setup into sdma code drm/amdgpu/vi:
Re: [PATCH v3] staging: netlogic: Fixed alignment of parentheseis checkpatch warning
On Sat, 19 Mar 2016 19:22:09 -0700, Joe Perches said: > On Sun, 2016-03-20 at 07:48 +0530, Parth Sane wrote: > > Hi, > > Thanks for pointing out that the changes have been done. Nevertheless > > this was a good learning exercise. How do I check which changes have > > already been done? > > Use this tree: > > http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git And note that doing a 'git clone' of this won't do what you want.. What you *want* to do: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git $ git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git $ git fetch --tags linux-next This will get you a tree that you can actually work with... ... # later on (linux-next is updated most weekdays) $ git remote update to find out what the current tree looks like. You do *not* want to use 'git pull' against linux-next because it rebases every night pgpQGfpChrT6Q.pgp Description: PGP signature
Re: [PATCH 2/3] x86/topology: Fix AMD core count
On Mon, Mar 21, 2016 at 11:07:44AM +0800, Huang Rui wrote: > On Fri, Mar 18, 2016 at 05:41:01PM +0100, Borislav Petkov wrote: > > On Fri, Mar 18, 2016 at 04:03:47PM +0100, Peter Zijlstra wrote: > > > It turns out AMD gets x86_max_cores wrong when there are compute > > > units. > > > > > > The issue is that Linux assumes: > > > > > > nr_logical_cpus = nr_cores * nr_siblings > > > > > > But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings > > > to 2 as well. > > > > > > Cc: Ingo Molnar > > > Cc: Borislav Petkov > > > Cc: Thomas Gleixner > > > Cc: Andreas Herrmann > > > Reported-by: Xiong Zhou > > > Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id") > > > Signed-off-by: Peter Zijlstra (Intel) > > > Link: > > > http://lkml.kernel.org/r/20160317095220.go6...@twins.programming.kicks-ass.net > > > --- > > > arch/x86/kernel/cpu/amd.c |8 > > > arch/x86/kernel/smpboot.c | 11 ++- > > > 2 files changed, 10 insertions(+), 9 deletions(-) > > > > > > --- a/arch/x86/kernel/cpu/amd.c > > > +++ b/arch/x86/kernel/cpu/amd.c > > > @@ -313,9 +313,9 @@ static void amd_get_topology(struct cpui > > > node_id = ecx & 7; > > > > > > /* get compute unit information */ > > > - smp_num_siblings = ((ebx >> 8) & 3) + 1; > > > + cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1; > > > + c->x86_max_cores /= smp_num_siblings; > > > c->compute_unit_id = ebx & 0xff; > > > - cores_per_cu += ((ebx >> 8) & 3); > > > } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { > > > u64 value; > > > > > > @@ -331,8 +331,8 @@ static void amd_get_topology(struct cpui > > > u32 cus_per_node; > > > > > > set_cpu_cap(c, X86_FEATURE_AMD_DCM); > > > - cores_per_node = c->x86_max_cores / nodes_per_socket; > > > - cus_per_node = cores_per_node / cores_per_cu; > > > + cus_per_node = c->x86_max_cores / nodes_per_socket; > > > + cores_per_node = cus_per_node * cores_per_cu; > > > > > > /* store NodeID, use llc_shared_map to store sibling info */ > > > per_cpu(cpu_llc_id, cpu) = node_id; > > > > Looks ok to me, however it probably would be prudent if AMD tested it on > > a bunch of machines just to make sure we don't break anything else. I'm > > thinking F15h and F16h, something big... > > > > Rui, can you find some time to run this one please? > > > > Look at before/after info in /proc/cpuinfo, topology in sysfs and dmesg > > before and after might be useful too. > > > > OK, we will find some fam15h, fam16h platforms to verify it. Please > wait for my feedback. > > But I am confused with c->x86_max_cores /= smp_num_siblings, what is > the real meaning of c->x86_max_cores here for AMD, the whole compute > unit numbers per socket? > > + Sherry, for her awareness. > I quickly applied this patch on tip/master with on a fam15h machine. The issue is still existed, only one core can be detected. processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping: 0 microcode : 0x6000822 cpu MHz : 2792.882 cache size : 2048 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs: fxsave_leak sysret_ss_attrs bogomips: 5585.76 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):1 On-line CPU(s) list: 0 Thread(s) per core:1 Core(s) per socket:1 Socket(s): 1 Vendor ID: AuthenticAMD CPU family:21 Model: 2 Stepping: 0 CPU MHz: 2792.882 BogoMIPS: 5585.76 Virtualization:AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K Thanks, Rui
Re: [PATCH 2/3] x86/topology: Fix AMD core count
On Fri, Mar 18, 2016 at 05:41:01PM +0100, Borislav Petkov wrote: > On Fri, Mar 18, 2016 at 04:03:47PM +0100, Peter Zijlstra wrote: > > It turns out AMD gets x86_max_cores wrong when there are compute > > units. > > > > The issue is that Linux assumes: > > > > nr_logical_cpus = nr_cores * nr_siblings > > > > But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings > > to 2 as well. > > > > Cc: Ingo Molnar > > Cc: Borislav Petkov > > Cc: Thomas Gleixner > > Cc: Andreas Herrmann > > Reported-by: Xiong Zhou > > Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id") > > Signed-off-by: Peter Zijlstra (Intel) > > Link: > > http://lkml.kernel.org/r/20160317095220.go6...@twins.programming.kicks-ass.net > > --- > > arch/x86/kernel/cpu/amd.c |8 > > arch/x86/kernel/smpboot.c | 11 ++- > > 2 files changed, 10 insertions(+), 9 deletions(-) > > > > --- a/arch/x86/kernel/cpu/amd.c > > +++ b/arch/x86/kernel/cpu/amd.c > > @@ -313,9 +313,9 @@ static void amd_get_topology(struct cpui > > node_id = ecx & 7; > > > > /* get compute unit information */ > > - smp_num_siblings = ((ebx >> 8) & 3) + 1; > > + cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1; > > + c->x86_max_cores /= smp_num_siblings; > > c->compute_unit_id = ebx & 0xff; > > - cores_per_cu += ((ebx >> 8) & 3); > > } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { > > u64 value; > > > > @@ -331,8 +331,8 @@ static void amd_get_topology(struct cpui > > u32 cus_per_node; > > > > set_cpu_cap(c, X86_FEATURE_AMD_DCM); > > - cores_per_node = c->x86_max_cores / nodes_per_socket; > > - cus_per_node = cores_per_node / cores_per_cu; > > + cus_per_node = c->x86_max_cores / nodes_per_socket; > > + cores_per_node = cus_per_node * cores_per_cu; > > > > /* store NodeID, use llc_shared_map to store sibling info */ > > per_cpu(cpu_llc_id, cpu) = node_id; > > Looks ok to me, however it probably would be prudent if AMD tested it on > a bunch of machines just to make sure we don't break anything else. I'm > thinking F15h and F16h, something big... > > Rui, can you find some time to run this one please? > > Look at before/after info in /proc/cpuinfo, topology in sysfs and dmesg > before and after might be useful too. > OK, we will find some fam15h, fam16h platforms to verify it. Please wait for my feedback. But I am confused with c->x86_max_cores /= smp_num_siblings, what is the real meaning of c->x86_max_cores here for AMD, the whole compute unit numbers per socket? + Sherry, for her awareness. Thanks, Rui
RE: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection
Hi, Yunhai You mean that EVCR.bit7 cannot clear(enable quad mode) if not write SR.bit7 to 0? They don't have any connection each other. > -Original Message- > From: Yunhui Cui [mailto:yunhui@nxp.com] > Sent: Friday, March 18, 2016 6:09 PM > To: Bean Huo 霍斌斌 (beanhuo); Yunhui Cui > Cc: linux-...@lists.infradead.org; dw...@infradead.org; > computersforpe...@gmail.com; han...@freescale.com; > linux-kernel@vger.kernel.org; linux-...@lists.infradead.org; > linux-arm-ker...@lists.infradead.org; Yao Yuan > Subject: RE: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection > > Hi Bean, > > Thanks for your suggestions very much. > Yes, the flash N25Q128A status register write enable/disable bit is disable in > initial state. > But, This patch aims to clear status registerV bit[7](write enable/disable > bit) to > 0, which enables the bit. > Frankly speaking, I also don't want to add this patch. > The reason for this is that clear status register bit[7] to 0 is a must to > set quad > mode to Enhanced Volatile Configuration Register using command > SPINOR_OP_WD_EVCR. Otherwise it will output "Micron EVCR Quad bit not > clear" in spi-nor.c I looked up the datasheet, but I really don't find out any > connection between status register bit[7](write enable/disable bit) equals 0 > and seting quad mode to Enhanced Volatile Configuration Register. > > Just as I want to send the issue to Micron team , could you give me some > solutions ? > > > Thanks > Yunhui > > -Original Message- > From: Bean Huo 霍斌斌 (beanhuo) [mailto:bean...@micron.com] > Sent: Thursday, March 03, 2016 9:39 PM > To: Yunhui Cui > Cc: linux-...@lists.infradead.org; dw...@infradead.org; > computersforpe...@gmail.com; han...@freescale.com; > linux-kernel@vger.kernel.org; linux-...@lists.infradead.org; > linux-arm-ker...@lists.infradead.org; Yao Yuan; Yunhui Cui > Subject: Re: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection > > > From: Yunhui Cui > > To: , , > > > > Cc: , , > > , , Yunhui > > Cui > > > > Subject: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW > > protection > > Message-ID: > <1456988044-37061-4-git-send-email-b56...@freescale.com> > > Content-Type: text/plain > > > > From: Yunhui Cui > > > > For Micron family ,The status register write enable/disable bit, > > provides hardware data protection for the device. > > When the enable/disable bit is set to 1, the status register > > nonvolatile bits become read-only and the WRITE STATUS REGISTER > > operation will not execute. > > > > Signed-off-by: Yunhui Cui > > --- > > drivers/mtd/spi-nor/spi-nor.c | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/mtd/spi-nor/spi-nor.c > > b/drivers/mtd/spi-nor/spi-nor.c index ed0c19c..917f814 100644 > > --- a/drivers/mtd/spi-nor/spi-nor.c > > +++ b/drivers/mtd/spi-nor/spi-nor.c > > @@ -39,6 +39,7 @@ > > > > #define SPI_NOR_MAX_ID_LEN 6 > > #define SPI_NOR_MAX_ADDR_WIDTH 4 > > +#define SPI_NOR_MICRON_WRITE_ENABLE0x7f > > > > struct flash_info { > > char*name; > > @@ -1238,6 +1239,14 @@ int spi_nor_scan(struct spi_nor *nor, const > > char *name, enum read_mode mode) > > write_sr(nor, 0); > > } > > > > + if (JEDEC_MFR(info) == SNOR_MFR_MICRON) { > > + ret = read_sr(nor); > > + ret &= SPI_NOR_MICRON_WRITE_ENABLE; > > + > For Micron the status register write enable/disable bit, its default/factory > value is disable. > Can here first check ,then program? > > + write_enable(nor); > > + write_sr(nor, ret); > > + } > > + > > if (!mtd->name) > > mtd->name = dev_name(dev); > > mtd->priv = nor;
Re: [LKP] [lkp] [futex] 65d8fc777f: +25.6% will-it-scale.per_process_ops
Hi, Thomas, Thanks a lot for your valuable input! Thomas Gleixner writes: > On Fri, 18 Mar 2016, Huang, Ying wrote: >> Usually we will put most important change we think in the subject of the >> mail, for this email, it is, >> >> +25.6% will-it-scale.per_process_ops > > That is confusing on it's own, because the reader does not know at all whether > this is an improvement or a regression. > > So something like this might be useful: > > Subject: subsystem: 12digitsha1: 25% performance improvement > > or in some other case > > Subject: subsystem: 12digitsha1: 25% performance regression > > So in the latter case I will look into that mail immediately. The improvement > one can wait until I have cared about urgent stuff. > > In the subject line it is pretty much irrelevant which foo-bla-ops test has > produced that result. It really does not matter. If it's a regression, it's > urgent. If it's an improvement it's informal and it can wait to be read. > > So in that case it would be: > > futex: 65d8fc777f6d: 25% performance improvement > > You can grab the subsystem prefix from the commit. We will include regression/improvement information in subject at least. >> and, we try to put most important changes at the top of the comparison >> result below. That is the will-it-scale.xxx below. >> >> We are thinking about how to improve this. You input is valuable for >> us. We are thinking change the "below changes" line to something like >> below. >> >> FYI, we noticed the +25.6% will-it-scale.per_process_ops improvement on >> ... >> >> Does this looks better? > > A bit, but it still does not tell me much. It's completely non obvious what > 'will-it-scale.per_process_ops' means. will-it-scale is a test suite, per_process_ops is one of its results. That is the convention used in original report. > Let me give you an example how a useful > and easy to understand summary of the change could look like: > > > FYI, we noticed 25.6% performance improvement due to commit > >65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()" > > in the will-it-scale.per_process_ops test. > > will-it-scale.per_process_ops tests the futex operations for process shared > futexes (Or whatever that test really does). There is a futex sub test case for will-it-scale test suite. But I got your point, we need some description for the test case. If email is not too limited for the full description, we will put it in some web site and include short description and link to the full description in email. > The commit has no significant impact on any other test in the test suite. Sorry, we have no enough machine power to test all test cases for each bisect result. So we will have no such information until we find a way to do that. > So those few lines tell precisely what this is about. It's something I already > expected, so I really can skip the rest of the mail unless I'm interested in > reproducing the result. We will put important information at first of the email. And details later. Better to have clear mark. So people can get important information and ignore the details if they don't want. > Now lets look at a performance regression. > > Subject: futex: 65d8fc777f6d: 25% performance regression > > FYI, we noticed a 25.2% performance regression due to commit > > 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()" > > in the will-it-scale.per_process_ops test. > > will-it-scale.per_process_ops tests the futex operations for process shared > futexes (Or whatever that test really does). > > The commit has no significant impact on any other test in the test suite. > > In that case I will certainly be interested how to reproduce that test. So I > need the following information: > > Machine description: Intel IvyBridge 2 sockets, 32 cores, 64G RAM > Config file: http://wherever.you.store/your/results/test-nr/config We have some information about this before. But not organized good enough, will improve it. > Test: > http://wherever.you.store/your/tests/will-it-scale.per_process_ops.tar.bz2 > > That tarball should contain: > > README > test_script.sh > test_binary > > README should tell: > >will-it-scale.per_process_ops > >Short explanation of the test > >Preliminaries: > - perf > - whatever > > So that allows me to reproduce that test more or less with no effort. And > that's the really important part. For reproducing, now we use lkp-tests tool, which includes scripts to build the test case, run the test, collect various information, compare the test result, with the job file attached with the report email. That is not the easiest way, we will continuously improve it. > You can provide nice charts and full comparison tables for all tests on a web > site for those who are interested in large stats and pretty charts. > > Full results: http://wherever.you.store/your/results/test-nr/results Before we have a website for detaile
[PATCH] regulator: Lookup unresolved parent supplies before regulators cleanup
Commit 6261b06de565 ("regulator: Defer lookup of supply to regulator_get") moved the regulator supplies lookup logic from the regulators registration to the regulators get time. Unfortunately, that changed the behavior of the regulator core since now a parent supply with a child regulator marked as always-on, won't be enabled unless a client driver attempts to get the child regulator during boot. This patch makes the unresolved parent supplies to be looked up before the regulators late cleanup, so those with a child marked as always on will be enabled regardless if a driver attempted to get the child regulator or not. That was the behavior before the mentioned commit, since parent supplies were looked up at regulator registration time instead of during child get. Cc: # 4.3+ Fixes: 6261b06de565 ("regulator: Defer lookup of supply to regulator_get") Signed-off-by: Javier Martinez Canillas --- Hello, The commit that caused this issue landed into v4.1 but $SUBJECT can't be cherry-picked to older kernel versions than v4.3 without causing a merge conflict. So I added v4.3+ to stable, please let me know if isn't right. Best regards, Javier drivers/regulator/core.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 6dd63523bcfe..15dbb771e1d8 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -4376,6 +4376,11 @@ static int __init regulator_init(void) /* init early to allow our consumers to complete system booting */ core_initcall(regulator_init); +static int __init regulator_late_resolve_supply(struct device *dev, void *data) +{ + return regulator_resolve_supply(dev_to_rdev(dev)); +} + static int __init regulator_late_cleanup(struct device *dev, void *data) { struct regulator_dev *rdev = dev_to_rdev(dev); @@ -4436,6 +4441,14 @@ static int __init regulator_init_complete(void) if (of_have_populated_dt()) has_full_constraints = true; + /* At this point there may be regulators that were not looked +* up by a client driver, so its parent supply was not resolved +* and could be wrongly disabled when needed to remain enabled +* to meet their child constraints. +*/ + class_for_each_device(®ulator_class, NULL, NULL, + regulator_late_resolve_supply); + /* If we have a full configuration then disable any regulators * we have permission to change the status for and which are * not in use or always_on. This is effectively the default -- 2.5.0
[PATCH] regulator: Remove unneded check for regulator supply
The regulator_resolve_supply() function checks if a supply has been associated with a regulator to avoid enabling it if that is not the case. But the supply was already looked up with regulator_resolve_supply() and set with set_supply() before the check and both return on error. So the fact that this statement has been reached means that neither of them failed and a supply must be associated with the regulator. Signed-off-by: Javier Martinez Canillas --- drivers/regulator/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index e0b764284773..6dd63523bcfe 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1532,7 +1532,7 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) } /* Cascade always-on state to supply */ - if (_regulator_is_enabled(rdev) && rdev->supply) { + if (_regulator_is_enabled(rdev)) { ret = regulator_enable(rdev->supply); if (ret < 0) { _regulator_put(rdev->supply); -- 2.5.0
RE: [PATCH v5 3/5] ARM: at91: pm: configure PMC fast startup signals
Hi Alexandre, > -Original Message- > From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com] > Sent: 2016年3月18日 1:15 > To: Yang, Wenyou > Cc: Ferre, Nicolas ; Jean-Christophe Plagniol- > Villard ; Russell King ; linux- > ker...@vger.kernel.org; devicet...@vger.kernel.org; linux-arm- > ker...@lists.infradead.org; linux-...@vger.kernel.org; Rob Herring > ; Pawel Moll ; Mark Brown > ; Ian Campbell ; Kumar > Gala > Subject: Re: [PATCH v5 3/5] ARM: at91: pm: configure PMC fast startup signals > > On 16/03/2016 at 14:58:07 +0800, Wenyou Yang wrote : > > The fast startup signal is used as wake up sources for ULP1 mode. > > As soon as a fast startup signal is asserted, the embedded 12 MHz RC > > oscillator restarts automatically. > > > > This patch is to configure the fast startup signals, which signal is > > enabled to trigger the PMC to wake up the system from ULP1 mode should > > be configured via the DT. > > > > Signed-off-by: Wenyou Yang > > I would actually avoid doing that from the PMC driver and do that > configuration > from the aic5 driver. It has all the information you need, it knows what kind > of level > or edge is needed to wake up and what are the wakeup interrupts to enable. > This > will allow you to stop introducing a new binding. Also, this will avoid > discrepancies > between what is configured in the DT and what the user really wants (for > exemple > differences between the edge direction configured for the PIOBu in userspace > versus what is in the device tree or wakeonlan activation/deactivation). Thank you for your feedback. But some wake-ups such as WKUP pin, ACC_CE, RXLP_MCE, don't have the corresponding interrupt number. Moreover, I think, the ULP1 is very different form the ULP0, it is not woken up by the interrupt. It is fallen sleep and woken up by the some mechanism in the PMC. Maybe I was wrong. I still think the aic5 driver should be devoted on the AIC5's behaviors. > > You can get the PMC syscon from irq-atmel-aic5.c and then use a table to map > the hwirq to the offset in PMC_FSMR. Use it in aic5_set_type to set the > polarity > and then in aic5_suspend to enable the wakeup. > > Maybe we could even go further and avoid ulp1 if no ulp1 compatbile wakeup > sources are defined but there are ulp0 wakeup sources. > > > -- > Alexandre Belloni, Free Electrons > Embedded Linux, Kernel and Android engineering http://free-electrons.com Best Regards, Wenyou Yang
[PATCH 1/1] x86/perf/intel/uncore: remove ev_sel_ext bit support for PCU
From: Kan Liang The ev_sel_ext in PCU_MSR_PMON_CTL is locked. So there could be #GP if writing that bit to 1. Also there is no public events which use the bit. This patch removes ev_sel_ext bit support for PCU. Signed-off-by: Kan Liang --- arch/x86/events/intel/uncore_snbep.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 93f6bd9..ab2bcaa 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -46,7 +46,6 @@ (SNBEP_PMON_CTL_EV_SEL_MASK | \ SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \ SNBEP_PMON_CTL_EDGE_DET | \ -SNBEP_PMON_CTL_EV_SEL_EXT | \ SNBEP_PMON_CTL_INVERT | \ SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \ SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \ @@ -148,7 +147,6 @@ /* IVBEP PCU */ #define IVBEP_PCU_MSR_PMON_RAW_EVENT_MASK \ (SNBEP_PMON_CTL_EV_SEL_MASK | \ -SNBEP_PMON_CTL_EV_SEL_EXT | \ SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \ SNBEP_PMON_CTL_EDGE_DET | \ SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \ @@ -258,7 +256,6 @@ SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \ SNBEP_PMON_CTL_EDGE_DET | \ SNBEP_CBO_PMON_CTL_TID_EN | \ -SNBEP_PMON_CTL_EV_SEL_EXT | \ SNBEP_PMON_CTL_INVERT | \ KNL_PCU_MSR_PMON_CTL_TRESH_MASK | \ SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \ @@ -472,7 +469,7 @@ static struct attribute *snbep_uncore_cbox_formats_attr[] = { }; static struct attribute *snbep_uncore_pcu_formats_attr[] = { - &format_attr_event_ext.attr, + &format_attr_event.attr, &format_attr_occ_sel.attr, &format_attr_edge.attr, &format_attr_inv.attr, @@ -1313,7 +1310,7 @@ static struct attribute *ivbep_uncore_cbox_formats_attr[] = { }; static struct attribute *ivbep_uncore_pcu_formats_attr[] = { - &format_attr_event_ext.attr, + &format_attr_event.attr, &format_attr_occ_sel.attr, &format_attr_edge.attr, &format_attr_thresh5.attr, -- 2.5.0
Re: [GIT PULL] Protection Keys (pkeys) support
So I finally got around to this one and the objtool pull request, and note that there's a conflict in the arch/x86/Kconfig file. And I'm not sending this email because the conflict would have been hard to resolve - it was completely trivial. But the conflict does show that once again people are starting to add the new options to the end of the list, even though that list is supposedly sorted. HOWEVER. I didn't actually fix that up in the merge, because I think that those options should be done differently anyway. So all of these are under the "X86" config options as "select" statements that are true for x86. However, all the new ones (and an alarming number of old ones) aren't actually really "these are true for x86". No, they are *conditionally* true for x86. For example, if we were to sort those thing, the two PKEY-related options would have to be split up: select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS which would actually make it really nasty to see that they are related. There's also a *lot* of those X86 selects that are "if X86_64". So they really aren't x86 options, they are x86-64 options. So instead of having a _huge_ list of select statements under the X86 option, why aren't those split up, and the select statements are closer to the thing that actually controls them. I realize that for many *common* options that really are "this architecture uses the generic XYZ feature", the current "select" model is really good. But it's starting to look really quite nasty for some of these more specialized options, and I really think it would be better to move (for example) the select for ARCH_HAS_PKEYS and ARCH_USES_HIGH_VMA_FLAGS to actually be under the X86_INTEL_MEMORY_PROTECTION_KEYS config option, rather than try to lie and make it look like this is somehow some "x86 feature". It's much more specific than that. Anyway, it's all merged in my tree, but is going through the built tests and I'll do a boot test too before pushing out. So no need to do anything wrt these pull requests, this was more of a "Hmm, I really think the x86 Kconfig file is getting pretty nasty". Linus
[GIT PULL] xfs: updates for 4.6-rc1
Hi Linus, Can you please pull the XFS update from the location below? There's quite a lot in this request, and there's some cross-over with ext4, dax and quota code due to the nature of the changes being made. There are conflicts with the ext4 code that has already been merged this cycle. Ted didn't pull the stable xfs-dio-fixes-4.6 branch with the DIO completion unwritten extent error handling fixes before merging a rework of the ext4 unwritten extent code, so there's a bunch of non-trivial conflicts in that. The quota changes don't appear to have created any conflicts at this point - I think Jan pulled the stable xfs-get-next-dquot-4.6 branch to base his further work on that, so I don't expect merge problems here. Finally, there's a merge conflict between the XFS writepages rework and the DAX flushing fixes that were merged in 4.5-rc6. That's a trivial conflict to resolve, though. I've attached the merge resolution diff from my local test merge at the end after the pull-req output - the XFS part is correct, but I'm not sure about the ext4 parts of it. If you need confirmation as to whether that is the correct resolution, then Ted and/or Jan (cc'd) will need to look at it As for the rest of the XFS changes, there are lots of little things all over the place, which add up to a lot of changes in the end. The major changes are that we've reduced the size of the struct xfs_inode by ~100 bytes (gives an inode cache footprint reduction of >10%), the writepage code now only does a single set of mapping tree lockups so uses less CPU, delayed allocation reservations won't overrun under random write loads anymore, and we added compile time verification for on-disk structure sizes so we find out when a commit or platform/compiler change breaks the on disk structure as early as possible. Cheers, Dave. The following changes since commit 7f6aff3a29b08fc4234c8136eb1ac31b4897522c: xfs: only run torn log write detection on dirty logs (2016-03-07 08:22:22 +1100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git tags/xfs-for-linus-4.6-rc1 for you to fetch changes up to 2cdb958aba6afbced5bc563167813b972b6acbfe: Merge branch 'xfs-misc-fixes-4.6-4' into for-next (2016-03-15 11:44:35 +1100) xfs: Changes for 4.6-rc1 Change summary: o error propagation for direct IO failures fixes for both XFS and ext4 o new quota interfaces and XFS implementation for iterating all the quota IDs in the filesystem o locking fixes for real-time device extent allocation o reduction of duplicate information in the xfs and vfs inode, saving roughly 100 bytes of memory per cached inode. o buffer flag cleanup o rework of the writepage code to use the generic write clustering mechanisms o several fixes for inode flag based DAX enablement o rework of remount option parsing o compile time verification of on-disk format structure sizes o delayed allocation reservation overrun fixes o lots of little error handling fixes o small memory leak fixes o enable xfsaild freezing again Brian Foster (6): xfs: clean up unwritten buffers on write failure xfs: fix xfs_log_ticket leak in xfs_end_io() after fs shutdown xfs: debug mode forced buffered write failure xfs: update freeblocks counter after extent deletion xfs: refactor delalloc indlen reservation split into helper xfs: borrow indirect blocks from freed extent when available Carlos Maiolino (1): xfs: Split default quota limits by quota type Christoph Hellwig (8): direct-io: always call ->end_io if non-NULL xfs: don't use ioends for direct write completions xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO xfs: handle errors from ->free_blocks in xfs_btree_kill_iroot xfs: factor btree block freeing into a helper xfs: move buffer invalidation to xfs_btree_free_block xfs: remove xfs_trans_get_block_res xfs: always set rvalp in xfs_dir2_node_trim_free Colin Ian King (1): xfs: fix format specifier , should be %llx and not %llu Darrick J. Wong (5): xfs: move struct xfs_attr_shortform to xfs_da_format.h xfs: fix computation of inode btree maxlevels xfs: use named array initializers for log item dumping xfs: ioends require logically contiguous file offsets xfs: check sizes of XFS on-disk structures at compile time Dave Chinner (41): xfs: lock rt summary inode on allocation xfs: RT bitmap and summary buffers are not typed xfs: RT bitmap and summary buffers need verifiers xfs: introduce inode log format object xfs: remove timestamps from incore inode xfs: cull unnecessary icdinode fields xfs: move v1 inode conversion to xfs_inode_from_disk xfs: reinitialise recycled VFS inode correctly xfs: use vfs inode nlink field everywhere
Re: [PATCH] Revert "arm64: Increase the max granular size"
Hello, Tirumalesh: 2016-03-19 5:05 GMT+08:00 Chalamarla, Tirumalesh : > > > > > > On 3/16/16, 2:32 AM, "linux-arm-kernel on behalf of Ganesh Mahendran" > opensource.gan...@gmail.com> wrote: > >>Reverts commit 97303480753e ("arm64: Increase the max granular size"). >> >>The commit 97303480753e ("arm64: Increase the max granular size") will >>degrade system performente in some cpus. >> >>We test wifi network throughput with iperf on Qualcomm msm8996 CPU: >> >>run on host: >> # iperf -s >>run on device: >> # iperf -c -t 100 -i 1 >> >> >>Test result: >> >>with commit 97303480753e ("arm64: Increase the max granular size"): >>172MBits/sec >> >>without commit 97303480753e ("arm64: Increase the max granular size"): >>230MBits/sec >> >> >>Some module like slab/net will use the L1_CACHE_SHIFT, so if we do not >>set the parameter correctly, it may affect the system performance. >> >>So revert the commit. > > Is there any explanation why is this so? May be there is an alternative to > this, apart from reverting the commit. > I just think the commit 97303480753e ("arm64: Increase the max granular size") introduced new problem for other Socs which the L1 cache line size is not 128 Bytes. So I wanted to revert this commit. > Until now it seems L1_CACHE_SHIFT is the max of supported chips. But now we > are making it 64byte, is there any reason why not 32. > We could not simply set the L1_CACHE_SHIFT to max. There are other places which use L1 cache line size. If we just set the L1 cache line size to the max, the memory footprint and the system performance will be affected. For example: -- #define SMP_CACHE_BYTES L1_CACHE_BYTES #define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES) -- Thanks. > Thanks, > Tirumalesh. >> >>Cc: sta...@vger.kernel.org >>Signed-off-by: Ganesh Mahendran >>--- >> arch/arm64/include/asm/cache.h |2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >>diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h >>index 5082b30..bde4499 100644 >>--- a/arch/arm64/include/asm/cache.h >>+++ b/arch/arm64/include/asm/cache.h >>@@ -18,7 +18,7 @@ >> >> #include >> >>-#define L1_CACHE_SHIFT7 >>+#define L1_CACHE_SHIFT6 >> #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT) >> >> /* >>-- >>1.7.9.5 >> >> >>___ >>linux-arm-kernel mailing list >>linux-arm-ker...@lists.infradead.org >>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
RE: [PATCH v2 0/4] ARM64:SoC add a new platform, LG Electronics's lg1k
> Subject: [PATCH v2 0/4] ARM64:SoC add a new platform, LG Electronics's lg1k > > This is an initial series for supporting LG Electronics's lg1k SoCs, based on > ARM Cortex-A53, mainly used for digital TVs. > > Chanho Min (4): > arm64: add Kconfig entry for LG1K SoC family > arm64: defconfig: enable ARCH_LG1K > arm64: dts: Add dts files for LG Electronics's lg1312 SoC > MAINTAINERS: add myself as ARM/LG1K maintainer > > MAINTAINERS |6 + > arch/arm64/Kconfig.platforms |4 + > arch/arm64/boot/dts/Makefile |1 + > arch/arm64/boot/dts/lg/Makefile |5 + > arch/arm64/boot/dts/lg/lg1312-ref.dts | 36 > arch/arm64/boot/dts/lg/lg1312.dtsi| 351 + > arch/arm64/configs/defconfig |1 + > 7 files changed, 404 insertions(+) > create mode 100644 arch/arm64/boot/dts/lg/Makefile create mode 100644 > arch/arm64/boot/dts/lg/lg1312-ref.dts > create mode 100644 arch/arm64/boot/dts/lg/lg1312.dtsi Please review or Ack these patches. Chanho
Re: Nokia N900 - audio TPA6130A2 problems
Hi, On Mon, Mar 21, 2016 at 01:04:18AM +0100, Sebastian Reichel wrote: > On Sun, Mar 20, 2016 at 09:43:11PM +0200, Ivaylo Dimitrov wrote: > > On 20.03.2016 07:17, Sebastian Reichel wrote: > > >On Sat, Mar 19, 2016 at 10:49:57AM +0200, Ivaylo Dimitrov wrote: > > >>On 18.03.2016 17:04, Sebastian Reichel wrote: > > >>>On Fri, Mar 18, 2016 at 03:45:26PM +0200, Ivaylo Dimitrov wrote: > > On 18.03.2016 15:36, Sebastian Reichel wrote: > > Regulator is V28_A, which is always-on, so it is enabled no matter what > > probe does. Anyway, I added a various delays after regulator_enable(), > > to no > > success. > > >> > > >>I guess we're getting closer - I put some printks in various functions in > > >>the twl-regulator.c, here is the result: > > >> > > >>on power-up: > > >> > > >>[2.378601] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008 > > >>[2.384948] twl4030reg_enable VMMC2 grp 0x0020 > > >>[2.408416] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008 > > >>[7.196685] twl4030reg_is_enabled VMMC2 state 0x002e > > >>[7.202819] twl4030reg_is_enabled VMMC2 state 0x002e > > >>[7.209777] twl4030reg_is_enabled VMMC2 state 0x002e > > >>[7.215728] twl4030reg_is_enabled VMMC2 state 0x002e > > >>[7.223205] twl4030reg_is_enabled VMMC2 state 0x002e > > > > > >Ok, so normal power up results in running VMMC2 (always-on works), > > >but voltage is not configured correctly. 2.6V is default according > > >to the TRM. I think this is a "bug" in the regulator framework. It > > >should setup the minimum allowed voltage before enabling the > > >always-on regulator. > > > > > > > /sys/kernel/debug/regulator/regulator_summary shows 2850mV for V28_A, so I > > would remove the quotes. Also, always-on is because if V28_A regulator is > > turned off, there is a leakage through tlv320aic34 VIO. BTW one of the > > things I did while trying to find the problem, was to remove that always-on > > property from the DTS - it didn't help. > > Right thinking about it, the voltage must also be configured for the > non always-on cases. So it's not a problem with the regulator > framework, but with twl-regulator's probe function, that should take > care of this. > > > >In case of the tpa6130a2/tpa6140a2 driver it may also be nice to add > > >something like this to the driver (Vdd may be between 2.5V and 5.5V > > >according to both datasheets): > > > > > >if (regulator_can_change_voltage(data->supply)) > > > regulator_set_voltage(data->supply, 250, 550); > > > > > > > and add DT property for that voltage range, as max output power and > > harmonics depend on the supply voltage. > > I guess that's 2nd step. > > > >>after restart from stock kernel: > > >> > > >>[2.388610] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a > > >>[2.394958] twl4030reg_enable VMMC2 grp 0x0028 > > > > > >I had a quick glance at this. I think stock kernel put VMMC2 > > >into sleep mode. Mainline kernel does not expect sleep mode > > >being set and does not disable it. > > > > > > > Well, one would think that kernel should not have expectations on what would > > be the state of the hardware by the time it takes control over it, but setup > > everything needed instead. > > I thought it's obvious, that this is not the desired behaviour :) > > > >>[2.418426] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a > > >>[7.186645] twl4030reg_is_enabled VMMC2 state 0x0020 > > >>[7.192718] twl4030reg_is_enabled VMMC2 state 0x0020 > > >>[7.199615] twl4030reg_is_enabled VMMC2 state 0x0020 > > >>[7.205535] twl4030reg_is_enabled VMMC2 state 0x0020 > > >>[7.212951] twl4030reg_is_enabled VMMC2 state 0x0020 > > >> > > >>I don't see twl4030ldo_set_voltage_sel() for VMMC2(V28_A) regulator, > > >>though > > >>there are calls for VMMC1 and VAUX3. > > > > > >I guess that's because the voltage is only configured if at least > > >one regulator consumer requests anything specific. > > > > > > > But then the board DTS is simply ignored. Doesn't look good :) > > > > >>So, it seems to me that V28_A is not enabled or correctly set-up > > >>and all devices connected to it does not function. And it looks > > >>like even after power-on VMMC2 is not correctly set-up - it is > > >>supposed to have voltage of 2.85V (10) but kernel leaves it to > > >>2.60V (8). However my twl-fu ends here so any help is appreciated. > > > > > >So in case of reboot from stock kernel voltage is already configured > > >to 2.8V, but it does not work, because of the sleep mode. > > > > > > > Yeah, that sleep is pretty clear, I was rather asking - "any idea how to fix > > that?". Or it is someone else expected to fix it? > > You may have noticed, that I included Mark and Liam. I hope they > can give some feedback. I think there are two bugs: > > 1. twl_probe() should setup a default voltage based on DT >information. I just had a look at the regulator core code. I think the voltage
Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async
On Mon, Mar 21, 2016 at 09:43:47AM +0900, Sergey Senozhatsky wrote: > On (03/21/16 09:06), Byungchul Park wrote: > > On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote: > [..] > > > + if (!sync_print) { > > > + if (in_sched) { > > > + /* > > > + * @in_sched messages may come too early, when we don't > > > + * yet have @printk_kthread. We can't print deferred > > > + * messages directly, because this may deadlock, route > > > + * them via IRQ context. > > > + */ > > > + __this_cpu_or(printk_pending, > > > + PRINTK_PENDING_OUTPUT); > > > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > > + } else if (printk_kthread && !in_panic) { > > > + /* Offload printing to a schedulable context. */ > > > + wake_up_process(printk_kthread); > > > > It will not print the "lockup suspected" message at all, for e.g. rq->lock, > > p->pi_lock and any locks which are used within wake_up_process(). > > this will switch to old SYNC printk() mode should such a lockup ever > happen, which is a giant advantage over any other implementation; doing > wake_up_process() within the 'we can detect recursive printk() here' > gives us better control. > > why > > printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ... > is better? What is IRQ? And I didn't say the recursion is good. I just said it can be avoid without using the last resort. > > > > Furtheremore, any printk() within wake_up_process() cannot work at all, as > > well. > > there is printk_deferred() which has LOGLEVEL_SCHED and which must be used > in sched functions. It would be good for all scheduler code to use the printk_deferred() as you said, but it's not true yet. > > -ss
Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async
On (03/21/16 09:06), Byungchul Park wrote: > On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote: [..] > > + if (!sync_print) { > > + if (in_sched) { > > + /* > > +* @in_sched messages may come too early, when we don't > > +* yet have @printk_kthread. We can't print deferred > > +* messages directly, because this may deadlock, route > > +* them via IRQ context. > > +*/ > > + __this_cpu_or(printk_pending, > > + PRINTK_PENDING_OUTPUT); > > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > + } else if (printk_kthread && !in_panic) { > > + /* Offload printing to a schedulable context. */ > > + wake_up_process(printk_kthread); > > It will not print the "lockup suspected" message at all, for e.g. rq->lock, > p->pi_lock and any locks which are used within wake_up_process(). this will switch to old SYNC printk() mode should such a lockup ever happen, which is a giant advantage over any other implementation; doing wake_up_process() within the 'we can detect recursive printk() here' gives us better control. why printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ... is better? > Furtheremore, any printk() within wake_up_process() cannot work at all, as > well. there is printk_deferred() which has LOGLEVEL_SCHED and which must be used in sched functions. -ss
linux-next: manual merge of the ext4 tree with Linus' tree
Hi Theodore, Today's linux-next merge of the ext4 tree got a conflict in: fs/overlayfs/super.c between commit: b5891cfab08f ("ovl: fix working on distributed fs as lower layer") from Linus' tree and commit: a7f7fb45f728 ("vfs: add file_dentry()") from the ext4 tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc fs/overlayfs/super.c index 619ad4b016d2,10dbdc7da69d.. --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@@ -343,7 -356,7 +358,8 @@@ static const struct dentry_operations o static const struct dentry_operations ovl_reval_dentry_operations = { .d_release = ovl_dentry_release, + .d_select_inode = ovl_d_select_inode, + .d_native_dentry = ovl_d_native_dentry, .d_revalidate = ovl_dentry_revalidate, .d_weak_revalidate = ovl_dentry_weak_revalidate, };
Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async
On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote: > @@ -1748,13 +1872,42 @@ asmlinkage int vprintk_emit(int facility, int level, >dict, dictlen, text, text_len); > } > > + /* > + * By default we print message to console asynchronously so that kernel > + * doesn't get stalled due to slow serial console. That can lead to > + * softlockups, lost interrupts, or userspace timing out under heavy > + * printing load. > + * > + * However we resort to synchronous printing of messages during early > + * boot, when synchronous printing was explicitly requested by > + * kernel parameter, or when console_verbose() was called to print > + * everything during panic / oops. > + */ > + if (!sync_print) { > + if (in_sched) { > + /* > + * @in_sched messages may come too early, when we don't > + * yet have @printk_kthread. We can't print deferred > + * messages directly, because this may deadlock, route > + * them via IRQ context. > + */ > + __this_cpu_or(printk_pending, > + PRINTK_PENDING_OUTPUT); > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > + } else if (printk_kthread && !in_panic) { > + /* Offload printing to a schedulable context. */ > + wake_up_process(printk_kthread); It will not print the "lockup suspected" message at all, for e.g. rq->lock, p->pi_lock and any locks which are used within wake_up_process(). Furtheremore, any printk() within wake_up_process() cannot work at all, as well. It's too bad to use any functions potentially including printk() inside of this critical section. > + } else { > + sync_print = true; > + } > + } > + > logbuf_cpu = UINT_MAX; > raw_spin_unlock(&logbuf_lock); > lockdep_on(); > local_irq_restore(flags);
[PATCH] drivers/rtc/rtc-mcp795.c: add devicetree support
Add device tree support to the rtc-mcp795 driver. Signed-off-by: Emil Bartczak --- Documentation/devicetree/bindings/rtc/maxim,mcp795.txt | 11 +++ drivers/rtc/rtc-mcp795.c | 10 ++ 2 files changed, 21 insertions(+) create mode 100644 Documentation/devicetree/bindings/rtc/maxim,mcp795.txt diff --git a/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt b/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt new file mode 100644 index 000..a59fdd8 --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt @@ -0,0 +1,11 @@ +* Maxim MCP795 SPI Serial Real-Time Clock + +Required properties: +- compatible: Should contain "maxim,mcp795". +- reg: SPI address for chip + +Example: + mcp795: rtc@0 { + compatible = "maxim,mcp795"; + reg = <0>; + }; diff --git a/drivers/rtc/rtc-mcp795.c b/drivers/rtc/rtc-mcp795.c index 1c91ce8..025bb33 100644 --- a/drivers/rtc/rtc-mcp795.c +++ b/drivers/rtc/rtc-mcp795.c @@ -20,6 +20,7 @@ #include #include #include +#include /* MCP795 Instructions, see datasheet table 3-1 */ #define MCP795_EEREAD 0x03 @@ -183,9 +184,18 @@ static int mcp795_probe(struct spi_device *spi) return 0; } +#ifdef CONFIG_OF +static const struct of_device_id mcp795_of_match[] = { + { .compatible = "maxim,mcp795" }, + { } +}; +MODULE_DEVICE_TABLE(of, mcp795_of_match); +#endif + static struct spi_driver mcp795_driver = { .driver = { .name = "rtc-mcp795", + .of_match_table = of_match_ptr(mcp795_of_match), }, .probe = mcp795_probe, }; -- 1.9.1
Re: Nokia N900 - audio TPA6130A2 problems
Hi, On Sun, Mar 20, 2016 at 09:43:11PM +0200, Ivaylo Dimitrov wrote: > On 20.03.2016 07:17, Sebastian Reichel wrote: > >On Sat, Mar 19, 2016 at 10:49:57AM +0200, Ivaylo Dimitrov wrote: > >>On 18.03.2016 17:04, Sebastian Reichel wrote: > >>>On Fri, Mar 18, 2016 at 03:45:26PM +0200, Ivaylo Dimitrov wrote: > On 18.03.2016 15:36, Sebastian Reichel wrote: > Regulator is V28_A, which is always-on, so it is enabled no matter what > probe does. Anyway, I added a various delays after regulator_enable(), to > no > success. > >> > >>I guess we're getting closer - I put some printks in various functions in > >>the twl-regulator.c, here is the result: > >> > >>on power-up: > >> > >>[2.378601] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008 > >>[2.384948] twl4030reg_enable VMMC2 grp 0x0020 > >>[2.408416] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008 > >>[7.196685] twl4030reg_is_enabled VMMC2 state 0x002e > >>[7.202819] twl4030reg_is_enabled VMMC2 state 0x002e > >>[7.209777] twl4030reg_is_enabled VMMC2 state 0x002e > >>[7.215728] twl4030reg_is_enabled VMMC2 state 0x002e > >>[7.223205] twl4030reg_is_enabled VMMC2 state 0x002e > > > >Ok, so normal power up results in running VMMC2 (always-on works), > >but voltage is not configured correctly. 2.6V is default according > >to the TRM. I think this is a "bug" in the regulator framework. It > >should setup the minimum allowed voltage before enabling the > >always-on regulator. > > > > /sys/kernel/debug/regulator/regulator_summary shows 2850mV for V28_A, so I > would remove the quotes. Also, always-on is because if V28_A regulator is > turned off, there is a leakage through tlv320aic34 VIO. BTW one of the > things I did while trying to find the problem, was to remove that always-on > property from the DTS - it didn't help. Right thinking about it, the voltage must also be configured for the non always-on cases. So it's not a problem with the regulator framework, but with twl-regulator's probe function, that should take care of this. > >In case of the tpa6130a2/tpa6140a2 driver it may also be nice to add > >something like this to the driver (Vdd may be between 2.5V and 5.5V > >according to both datasheets): > > > >if (regulator_can_change_voltage(data->supply)) > > regulator_set_voltage(data->supply, 250, 550); > > > > and add DT property for that voltage range, as max output power and > harmonics depend on the supply voltage. I guess that's 2nd step. > >>after restart from stock kernel: > >> > >>[2.388610] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a > >>[2.394958] twl4030reg_enable VMMC2 grp 0x0028 > > > >I had a quick glance at this. I think stock kernel put VMMC2 > >into sleep mode. Mainline kernel does not expect sleep mode > >being set and does not disable it. > > > > Well, one would think that kernel should not have expectations on what would > be the state of the hardware by the time it takes control over it, but setup > everything needed instead. I thought it's obvious, that this is not the desired behaviour :) > >>[2.418426] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a > >>[7.186645] twl4030reg_is_enabled VMMC2 state 0x0020 > >>[7.192718] twl4030reg_is_enabled VMMC2 state 0x0020 > >>[7.199615] twl4030reg_is_enabled VMMC2 state 0x0020 > >>[7.205535] twl4030reg_is_enabled VMMC2 state 0x0020 > >>[7.212951] twl4030reg_is_enabled VMMC2 state 0x0020 > >> > >>I don't see twl4030ldo_set_voltage_sel() for VMMC2(V28_A) regulator, though > >>there are calls for VMMC1 and VAUX3. > > > >I guess that's because the voltage is only configured if at least > >one regulator consumer requests anything specific. > > > > But then the board DTS is simply ignored. Doesn't look good :) > > >>So, it seems to me that V28_A is not enabled or correctly set-up > >>and all devices connected to it does not function. And it looks > >>like even after power-on VMMC2 is not correctly set-up - it is > >>supposed to have voltage of 2.85V (10) but kernel leaves it to > >>2.60V (8). However my twl-fu ends here so any help is appreciated. > > > >So in case of reboot from stock kernel voltage is already configured > >to 2.8V, but it does not work, because of the sleep mode. > > > > Yeah, that sleep is pretty clear, I was rather asking - "any idea how to fix > that?". Or it is someone else expected to fix it? You may have noticed, that I included Mark and Liam. I hope they can give some feedback. I think there are two bugs: 1. twl_probe() should setup a default voltage based on DT information. 2. if regulator is in sleep mode, regulator enable should disable sleep mode. -- Sebastian signature.asc Description: PGP signature
Re: [PATCH 70/71] mm: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
On Sun, Mar 20, 2016 at 09:41:17PM +0300, Kirill A. Shutemov wrote: > PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago > with promise that one day it will be possible to implement page cache with > bigger chunks than PAGE_SIZE. > > This promise never materialized. And unlikely will. > > We have many places where PAGE_CACHE_SIZE assumed to be equal to > PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* > or PAGE_* constant should be used in a particular case, especially on the > border between fs and mm. > > Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much > breakage to be doable. > > Let's stop pretending that pages in page cache are special. They are not. > > The changes are pretty straight-forward: > > - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ; > > - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; > > - page_cache_get() -> get_page(); > > - page_cache_release() -> put_page(); > > Signed-off-by: Kirill A. Shutemov > --- ... > extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma, > @@ -425,7 +425,7 @@ static inline pgoff_t linear_page_index(struct > vm_area_struct *vma, > return linear_hugepage_index(vma, address); > pgoff = (address - vma->vm_start) >> PAGE_SHIFT; > pgoff += vma->vm_pgoff; > - return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT); > + return pgoff >> (PAGE_SHIFT - PAGE_SHIFT); ^ Guenter
[PATCH] perf/x86/intel/rapl: Add missing Broadwell models
Added Broadwell-H and Broadwell-Server. Signed-off-by: Srinivas Pandruvada --- arch/x86/events/intel/rapl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 69904e7..6196f41 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -753,6 +753,7 @@ static int __init rapl_pmu_init(void) rapl_pmu_events_group.attrs = rapl_events_cln_attr; break; case 63: /* Haswell-Server */ + case 79: /* Broadwell-Server */ apply_quirk = true; rapl_cntr_mask = RAPL_IDX_SRV; rapl_pmu_events_group.attrs = rapl_events_srv_attr; @@ -760,6 +761,7 @@ static int __init rapl_pmu_init(void) case 60: /* Haswell */ case 69: /* Haswell-Celeron */ case 61: /* Broadwell */ + case 71: /* Broadwell-H */ rapl_cntr_mask = RAPL_IDX_HSW; rapl_pmu_events_group.attrs = rapl_events_hsw_attr; break; -- 2.5.0
Re: [PATCH 69/71] vfs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
Spotted an oops: - length is PAGE_CACHE_SIZE, then the private data should be released, + length is PAG__SIZE, then the private data should be released,
Re: [PATCH 0/5] staging: rtl8712: Fixed Multiple FSF address checkpatch warnings
On Sunday, March 20, 2016 11:12:32 PM Parth Sane wrote: > > Fixed Multiple FSF address checkpatch warnings to conform to kernel coding > > style. > > > > Parth Sane (5): > > staging: rtl8712: Fixed FSF address warning in basic_types.h > > staging: rtl8712: Fixed FSF address warning in drv_types.h > > staging: rtl9712: Fixed FSF address warning in ethernet.h > > staging: rtl9712: Fixed FSF address warning in hal_init.c > > staging: rtl9712: Fixed FSF address warning in ieee80211.c > > > > drivers/staging/rtl8712/basic_types.h | 4 > > drivers/staging/rtl8712/drv_types.h | 4 > > drivers/staging/rtl8712/ethernet.h| 4 > > drivers/staging/rtl8712/hal_init.c| 4 > > drivers/staging/rtl8712/ieee80211.c | 4 > > 5 files changed, 20 deletions(-) > > > > -- > > 1.9.1 > Hi, > The thing is all these patches are related and have a cover letter explaining > changes. But this seems to be a trivial change which is self explanatory. > This should possibly suffice. What do you think? > Regards, > Parth Sane The cover letter does not end up in the repository. A cover letter can be helpful, but is not required. You MUST, however add a descriptive commit message to each patch for them to be accepted into the kernel. See Documentation/SubmittingPatches in the kernel sources.
[PATCH v3 2/2] powercap: intel_rapl: PSys support
Skylake processor supports a new set of RAPL registers for controlling entire SoC instead of just CPU package. This is useful for thermal and power control when source of power/thermal is not just CPU/GPU. This change adds a new platform domain (AKA PSys) to the current power capping Intel RAPL driver. PSys also supports PL1 (long term) and PL2 (short term) control like package domain. This also follows same MSRs for energy and time units as package domain. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional implementation, which OEMs needs to support. So not all Skylake systems will support PSys. Signed-off-by: Srinivas Pandruvada --- drivers/powercap/intel_rapl.c | 66 +++ 1 file changed, 66 insertions(+) diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c index cdfd01f0..2c0235d 100644 --- a/drivers/powercap/intel_rapl.c +++ b/drivers/powercap/intel_rapl.c @@ -34,6 +34,9 @@ #include #include +/* Local defines */ +#define MSR_PLATFORM_POWER_LIMIT 0x065c + /* bitmasks for RAPL MSRs, used by primitive access functions */ #define ENERGY_STATUS_MASK 0x @@ -86,6 +89,7 @@ enum rapl_domain_type { RAPL_DOMAIN_PP0, /* core power plane */ RAPL_DOMAIN_PP1, /* graphics uncore */ RAPL_DOMAIN_DRAM,/* DRAM control_type */ + RAPL_DOMAIN_PLATFORM, /* PSys control_type */ RAPL_DOMAIN_MAX, }; @@ -251,9 +255,11 @@ static const char * const rapl_domain_names[] = { "core", "uncore", "dram", + "psys", }; static struct powercap_control_type *control_type; /* PowerCap Controller */ +static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */ /* caller to ensure CPU hotplug lock is held */ static struct rapl_package *find_package_by_id(int id) @@ -409,6 +415,14 @@ static const struct powercap_zone_ops zone_ops[] = { .set_enable = set_domain_enable, .get_enable = get_domain_enable, }, + /* RAPL_DOMAIN_PLATFORM */ + { + .get_energy_uj = get_energy_counter, + .get_max_energy_range_uj = get_max_energy_counter, + .release = release_zone, + .set_enable = set_domain_enable, + .get_enable = get_domain_enable, + }, }; static int set_power_limit(struct powercap_zone *power_zone, int id, @@ -1159,6 +1173,13 @@ static int rapl_unregister_powercap(void) powercap_unregister_zone(control_type, &rd_package->power_zone); } + + if (platform_rapl_domain) { + powercap_unregister_zone(control_type, +&platform_rapl_domain->power_zone); + kfree(platform_rapl_domain); + } + powercap_unregister_control_type(control_type); return 0; @@ -1238,6 +1259,47 @@ err_cleanup: return ret; } +static int rapl_register_psys(void) +{ + struct rapl_domain *rd; + struct powercap_zone *power_zone; + u64 val; + + if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_ENERGY_STATUS, &val) || !val) + return -ENODEV; + + if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_POWER_LIMIT, &val) || !val) + return -ENODEV; + + rd = kzalloc(sizeof(*rd), GFP_KERNEL); + if (!rd) + return -ENOMEM; + + rd->name = rapl_domain_names[RAPL_DOMAIN_PLATFORM]; + rd->id = RAPL_DOMAIN_PLATFORM; + rd->msrs[0] = MSR_PLATFORM_POWER_LIMIT; + rd->msrs[1] = MSR_PLATFORM_ENERGY_STATUS; + rd->rpl[0].prim_id = PL1_ENABLE; + rd->rpl[0].name = pl1_name; + rd->rpl[1].prim_id = PL2_ENABLE; + rd->rpl[1].name = pl2_name; + rd->rp = find_package_by_id(0); + + power_zone = powercap_register_zone(&rd->power_zone, control_type, + "psys", NULL, + &zone_ops[RAPL_DOMAIN_PLATFORM], + 2, &constraint_ops); + + if (IS_ERR(power_zone)) { + kfree(rd); + return PTR_ERR(power_zone); + } + + platform_rapl_domain = rd; + + return 0; +} + static int rapl_register_powercap(void) { struct rapl_domain *rd; @@ -1254,6 +1316,10 @@ static int rapl_register_powercap(void) list_for_each_entry(rp, &rapl_packages, plist) if (rapl_package_register_powercap(rp)) goto err_cleanup_package; + + /* Don't bail out if PSys is not supported */ + rapl_register_psys(); + return ret; err_cleanup_package: -- 2.5.0
[PATCH v3 1/2] perf/x86/intel/rapl: support Skylake RAPL domains
Added Skylake support for RAPL domains. In addition to RAPL domains in Broadwell clients, it has support for platform domain (aka PSys). Also fixed error in comment for gpu counter, which previously was dram counter. Signed-off-by: Srinivas Pandruvada --- arch/x86/events/intel/rapl.c | 50 ++-- arch/x86/include/asm/msr-index.h | 2 ++ 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index b834a3f..69904e7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -27,10 +27,14 @@ * event: rapl_energy_dram *perf code: 0x3 * - * dram counter: consumption of the builtin-gpu domain (client only) + * gpu counter: consumption of the builtin-gpu domain (client only) * event: rapl_energy_gpu *perf code: 0x4 * + * psys counter: consumption of the builtin-psys domain (client only) + * event: rapl_energy_psys + *perf code: 0x5 + * * We manage those counters as free running (read-only). They may be * use simultaneously by other tools, such as turbostat. * @@ -64,13 +68,16 @@ #define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */ #define RAPL_IDX_PP1_NRG_STAT 3 /* gpu */ #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */ +#define RAPL_IDX_PSYS_NRG_STAT 4 /* psys */ +#define INTEL_RAPL_PSYS0x5 /* pseudo-encoding */ -#define NR_RAPL_DOMAINS 0x4 +#define NR_RAPL_DOMAINS 0x5 static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { "pp0-core", "package", "dram", "pp1-gpu", + "psys", }; /* Clients have PP0, PKG */ @@ -89,6 +96,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { 1<
[PATCH v3 0/2][Resend] Skylake PSys support
Sorry, I had typo in Mingo's email address. So resending. v3: As suggested by tglx adding support first in perf-rapl. Perf RAPL was missing RAPL support for Skylake Added support including Psys v2: Moved PSYS MSR defines to intel_rapl.c as suggested by Boris Srinivas Pandruvada (2): perf/x86/intel/rapl: support Skylake RAPL domains powercap: intel_rapl: PSys support arch/x86/events/intel/rapl.c | 50 -- arch/x86/include/asm/msr-index.h | 2 ++ drivers/powercap/intel_rapl.c| 66 3 files changed, 116 insertions(+), 2 deletions(-) -- 2.5.0
[PATCH v3 2/2] powercap: intel_rapl: PSys support
Skylake processor supports a new set of RAPL registers for controlling entire SoC instead of just CPU package. This is useful for thermal and power control when source of power/thermal is not just CPU/GPU. This change adds a new platform domain (AKA PSys) to the current power capping Intel RAPL driver. PSys also supports PL1 (long term) and PL2 (short term) control like package domain. This also follows same MSRs for energy and time units as package domain. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional implementation, which OEMs needs to support. So not all Skylake systems will support PSys. Signed-off-by: Srinivas Pandruvada --- drivers/powercap/intel_rapl.c | 66 +++ 1 file changed, 66 insertions(+) diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c index cdfd01f0..2c0235d 100644 --- a/drivers/powercap/intel_rapl.c +++ b/drivers/powercap/intel_rapl.c @@ -34,6 +34,9 @@ #include #include +/* Local defines */ +#define MSR_PLATFORM_POWER_LIMIT 0x065c + /* bitmasks for RAPL MSRs, used by primitive access functions */ #define ENERGY_STATUS_MASK 0x @@ -86,6 +89,7 @@ enum rapl_domain_type { RAPL_DOMAIN_PP0, /* core power plane */ RAPL_DOMAIN_PP1, /* graphics uncore */ RAPL_DOMAIN_DRAM,/* DRAM control_type */ + RAPL_DOMAIN_PLATFORM, /* PSys control_type */ RAPL_DOMAIN_MAX, }; @@ -251,9 +255,11 @@ static const char * const rapl_domain_names[] = { "core", "uncore", "dram", + "psys", }; static struct powercap_control_type *control_type; /* PowerCap Controller */ +static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */ /* caller to ensure CPU hotplug lock is held */ static struct rapl_package *find_package_by_id(int id) @@ -409,6 +415,14 @@ static const struct powercap_zone_ops zone_ops[] = { .set_enable = set_domain_enable, .get_enable = get_domain_enable, }, + /* RAPL_DOMAIN_PLATFORM */ + { + .get_energy_uj = get_energy_counter, + .get_max_energy_range_uj = get_max_energy_counter, + .release = release_zone, + .set_enable = set_domain_enable, + .get_enable = get_domain_enable, + }, }; static int set_power_limit(struct powercap_zone *power_zone, int id, @@ -1159,6 +1173,13 @@ static int rapl_unregister_powercap(void) powercap_unregister_zone(control_type, &rd_package->power_zone); } + + if (platform_rapl_domain) { + powercap_unregister_zone(control_type, +&platform_rapl_domain->power_zone); + kfree(platform_rapl_domain); + } + powercap_unregister_control_type(control_type); return 0; @@ -1238,6 +1259,47 @@ err_cleanup: return ret; } +static int rapl_register_psys(void) +{ + struct rapl_domain *rd; + struct powercap_zone *power_zone; + u64 val; + + if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_ENERGY_STATUS, &val) || !val) + return -ENODEV; + + if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_POWER_LIMIT, &val) || !val) + return -ENODEV; + + rd = kzalloc(sizeof(*rd), GFP_KERNEL); + if (!rd) + return -ENOMEM; + + rd->name = rapl_domain_names[RAPL_DOMAIN_PLATFORM]; + rd->id = RAPL_DOMAIN_PLATFORM; + rd->msrs[0] = MSR_PLATFORM_POWER_LIMIT; + rd->msrs[1] = MSR_PLATFORM_ENERGY_STATUS; + rd->rpl[0].prim_id = PL1_ENABLE; + rd->rpl[0].name = pl1_name; + rd->rpl[1].prim_id = PL2_ENABLE; + rd->rpl[1].name = pl2_name; + rd->rp = find_package_by_id(0); + + power_zone = powercap_register_zone(&rd->power_zone, control_type, + "psys", NULL, + &zone_ops[RAPL_DOMAIN_PLATFORM], + 2, &constraint_ops); + + if (IS_ERR(power_zone)) { + kfree(rd); + return PTR_ERR(power_zone); + } + + platform_rapl_domain = rd; + + return 0; +} + static int rapl_register_powercap(void) { struct rapl_domain *rd; @@ -1254,6 +1316,10 @@ static int rapl_register_powercap(void) list_for_each_entry(rp, &rapl_packages, plist) if (rapl_package_register_powercap(rp)) goto err_cleanup_package; + + /* Don't bail out if PSys is not supported */ + rapl_register_psys(); + return ret; err_cleanup_package: -- 2.5.0
[PATCH v3 0/2] Skylake PSys support
v3: As suggested by tglx adding support first in perf-rapl. Perf RAPL was missing RAPL support for Skylake Added support including Psys v2: Moved PSYS MSR defines to intel_rapl.c as suggested by Boris Srinivas Pandruvada (2): perf/x86/intel/rapl: support Skylake RAPL domains powercap: intel_rapl: PSys support arch/x86/events/intel/rapl.c | 50 -- arch/x86/include/asm/msr-index.h | 2 ++ drivers/powercap/intel_rapl.c| 66 3 files changed, 116 insertions(+), 2 deletions(-) -- 2.5.0
[PATCH v3 1/2] perf/x86/intel/rapl: support Skylake RAPL domains
Added Skylake support for RAPL domains. In addition to RAPL domains in Broadwell clients, it has support for platform domain (aka PSys). Also fixed error in comment for gpu counter, which previously was dram counter. Signed-off-by: Srinivas Pandruvada --- arch/x86/events/intel/rapl.c | 50 ++-- arch/x86/include/asm/msr-index.h | 2 ++ 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index b834a3f..69904e7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -27,10 +27,14 @@ * event: rapl_energy_dram *perf code: 0x3 * - * dram counter: consumption of the builtin-gpu domain (client only) + * gpu counter: consumption of the builtin-gpu domain (client only) * event: rapl_energy_gpu *perf code: 0x4 * + * psys counter: consumption of the builtin-psys domain (client only) + * event: rapl_energy_psys + *perf code: 0x5 + * * We manage those counters as free running (read-only). They may be * use simultaneously by other tools, such as turbostat. * @@ -64,13 +68,16 @@ #define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */ #define RAPL_IDX_PP1_NRG_STAT 3 /* gpu */ #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */ +#define RAPL_IDX_PSYS_NRG_STAT 4 /* psys */ +#define INTEL_RAPL_PSYS0x5 /* pseudo-encoding */ -#define NR_RAPL_DOMAINS 0x4 +#define NR_RAPL_DOMAINS 0x5 static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { "pp0-core", "package", "dram", "pp1-gpu", + "psys", }; /* Clients have PP0, PKG */ @@ -89,6 +96,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { 1<
Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM
On 03/20/2016 03:43 AM, Geert Uytterhoeven wrote: If CONFIG_PM=n: drivers/net/usb/lan78xx.c: In function ‘lan78xx_get_stats64’: drivers/net/usb/lan78xx.c:3274: error: ‘struct dev_pm_info’ has no member named ‘runtime_auto’ If PM is disabled, the runtime_auto flag is not available, but auto suspend is not enabled anyway. Hence protect the check for runtime_auto by #ifdef CONFIG_PM to fix this. Fixes: a59f8c5b048dc938 ("lan78xx: add ndo_get_stats64") Reported-by: Guenter Roeck Signed-off-by: Geert Uytterhoeven --- Alternatively, we can add a dev_pm_runtime_auto_is_enabled() wrapper to include/linux/pm.h, which always return false if CONFIG_PM is disabled. The only other user in non-core code (drivers/usb/core/sysfs.c) has a big #ifdef CONFIG_PM check around all PM-related code. Thoughts? Not that it matters anymore since David reverted the original patch, but my reason for not sending a similar patch was that I wasn't sure if .runtime_auto should be accessed from drivers in the first place, or if there is some logical problem with the code. Guenter --- drivers/net/usb/lan78xx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index d36d5ebf37f355f2..7b9ac47b2ecf9905 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3271,7 +3271,9 @@ struct rtnl_link_stats64 *lan78xx_get_stats64(struct net_device *netdev, * periodic reading from HW will prevent from entering USB auto suspend. * if autosuspend is disabled, read from HW. */ +#ifdef CONFIG_PM if (!dev->udev->dev.power.runtime_auto) +#endif lan78xx_update_stats(dev); mutex_lock(&dev->stats.access_lock);
[GIT PULL] f2fs updates for v4.6
Hi Linus, I made another pull request which removes the previous wrong commits and adds a single commit to migrate the f2fs crypto into fs/crypto. Could you please consider to pull this? Thanks, The following changes since commit 4de8ebeff8ddefaceeb7fc6a9b1a514fc9624509: Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace (2016-02-22 14:09:18 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git tags/for-f2fs-4.6 for you to fetch changes up to 12bb0a8fd47e6020a7b52dc283a2d855f03d6ef5: f2fs: submit node page write bios when really required (2016-03-17 21:19:47 -0700) = New Features = - uplift filesystem encryption into fs/crypto/ - give sysfs entries to control memroy consumption = Enhancement == - aio performance by preallocating blocks in ->write_iter - use writepages lock for only WB_SYNC_ALL - avoid redundant inline_data conversion - enhance forground GC - use wait_for_stable_page as possible - speed up SEEK_DATA and fiiemap = Bug Fixes = - corner case in terms of -ENOSPC for inline_data - hung task caused by long latency in shrinker - corruption between atomic write and f2fs_trace_pid - avoid garbage lengths in dentries - revoke atomicly written pages if an error occurs In addition, there are various minor bug fixes and clean-ups. Arnd Bergmann (1): f2fs: add missing argument to f2fs_setxattr stub Chao Yu (33): f2fs: relocate is_merged_page f2fs: flush dirty nat entries when exceeding threshold f2fs: export dirty_nats_ratio in sysfs f2fs: correct search area in get_new_segment f2fs: enhance foreground GC f2fs: simplify f2fs_map_blocks f2fs: simplify __allocate_data_blocks f2fs: remove unneeded pointer conversion f2fs: introduce get_next_page_offset to speed up SEEK_DATA f2fs: speed up handling holes in fiemap f2fs: introduce f2fs_submit_merged_bio_cond f2fs: split drop_inmem_pages from commit_inmem_pages f2fs: support revoking atomic written pages f2fs crypto: make sure the encryption info is initialized on opendir(2) f2fs crypto: handle unexpected lack of encryption keys f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink f2fs: introduce f2fs_journal struct to wrap journal info f2fs: enhance IO path with block plug f2fs: split journal cache from curseg cache f2fs: reorder nat cache lock in cache_nat_entry f2fs: detect error of update_dent_inode in ->rename f2fs: fix to delete old dirent in converted inline directory in ->rename f2fs: fix the wrong stat count of calling gc f2fs: show more info about superblock recovery f2fs: try to flush inode after merging inline data f2fs: trace old block address for CoWed page f2fs: fix incorrect upper bound when iterating inode mapping tree f2fs crypto: fix incorrect positioning for GCing encrypted data page f2fs: introduce f2fs_update_data_blkaddr for cleanup f2fs: introduce f2fs_flush_merged_bios for cleanup f2fs: fix to avoid deadlock when merging inline data f2fs: clean up opened code with f2fs_update_dentry f2fs: fix to avoid unneeded unlock_new_inode Fan Li (2): f2fs: avoid unnecessary search while finding victim in gc f2fs: modify the readahead method in ra_node_page() Hou Pengyang (2): f2fs: reconstruct the code to free an extent_node f2fs: improve shrink performance of extent nodes Jaegeuk Kim (32): f2fs: remove needless condition check f2fs: use writepages->lock for WB_SYNC_ALL f2fs: fix to overcome inline_data floods f2fs: do f2fs_balance_fs when block is allocated f2fs: avoid multiple node page writes due to inline_data f2fs: don't need to sync node page at every time f2fs: avoid needless sync_inode_page when reading inline_data f2fs: don't need to call set_page_dirty for io error f2fs: use wait_for_stable_page to avoid contention f2fs: use wq_has_sleeper for cp_wait wait_queue f2fs: move extent_node list operations being coupled with rbtree operation f2fs: don't set cached_en if it will be freed f2fs: give scheduling point in shrinking path f2fs: wait on page's writeback in writepages path f2fs: flush bios to handle cp_error in put_super f2fs: fix conflict on page->private usage f2fs: move dio preallocation into f2fs_file_write_iter f2fs: preallocate blocks for buffered aio writes f2fs: increase i_size to avoid missing data f2fs crypto: replace some BUG_ON()'s with error checks f2fs crypto: fix spelling typo in comment f2fs crypto: f2fs_page_crypto() doesn't need a encryption context f2fs crypto: ch
[PATCH] Staging: wlan-ng: removed "goto " instructions where this is not necessary.
This patch removes "goto " instructions which do only a return. In this way, aditional instructions were removed. Signed-off-by: Claudiu Beznea --- drivers/staging/wlan-ng/cfg80211.c | 112 + 1 file changed, 39 insertions(+), 73 deletions(-) diff --git a/drivers/staging/wlan-ng/cfg80211.c b/drivers/staging/wlan-ng/cfg80211.c index 8bad018..63d7c99 100644 --- a/drivers/staging/wlan-ng/cfg80211.c +++ b/drivers/staging/wlan-ng/cfg80211.c @@ -62,7 +62,6 @@ static int prism2_result2err(int prism2_result) err = -EOPNOTSUPP; break; default: - err = 0; break; } @@ -111,13 +110,13 @@ static int prism2_change_virtual_intf(struct wiphy *wiphy, switch (type) { case NL80211_IFTYPE_ADHOC: if (wlandev->macmode == WLAN_MACMODE_IBSS_STA) - goto exit; + return err; wlandev->macmode = WLAN_MACMODE_IBSS_STA; data = 0; break; case NL80211_IFTYPE_STATION: if (wlandev->macmode == WLAN_MACMODE_ESS_STA) - goto exit; + return err; wlandev->macmode = WLAN_MACMODE_ESS_STA; data = 1; break; @@ -136,7 +135,6 @@ static int prism2_change_virtual_intf(struct wiphy *wiphy, dev->ieee80211_ptr->iftype = type; -exit: return err; } @@ -146,9 +144,7 @@ static int prism2_add_key(struct wiphy *wiphy, struct net_device *dev, { wlandevice_t *wlandev = dev->ml_priv; u32 did; - - int err = 0; - int result = 0; + int result; switch (params->cipher) { case WLAN_CIPHER_SUITE_WEP40: @@ -157,7 +153,7 @@ static int prism2_add_key(struct wiphy *wiphy, struct net_device *dev, DIDmib_dot11smt_dot11PrivacyTable_dot11WEPDefaultKeyID, key_index); if (result) - goto exit; + return -EFAULT; /* send key to driver */ switch (key_index) { @@ -178,26 +174,22 @@ static int prism2_add_key(struct wiphy *wiphy, struct net_device *dev, break; default: - err = -EINVAL; - goto exit; + return -EINVAL; } result = prism2_domibset_pstr32(wlandev, did, params->key_len, params->key); if (result) - goto exit; + return -EFAULT; + break; default: pr_debug("Unsupported cipher suite\n"); - result = 1; + return -EFAULT; } -exit: - if (result) - err = -EFAULT; - - return err; + return 0; } static int prism2_get_key(struct wiphy *wiphy, struct net_device *dev, @@ -235,8 +227,7 @@ static int prism2_del_key(struct wiphy *wiphy, struct net_device *dev, { wlandevice_t *wlandev = dev->ml_priv; u32 did; - int err = 0; - int result = 0; + int result; /* There is no direct way in the hardware (AFAIK) of removing * a key, so we will cheat by setting the key to a bogus value @@ -265,35 +256,30 @@ static int prism2_del_key(struct wiphy *wiphy, struct net_device *dev, break; default: - err = -EINVAL; - goto exit; + return -EINVAL; } result = prism2_domibset_pstr32(wlandev, did, 13, "0"); - -exit: if (result) - err = -EFAULT; + return -EFAULT; - return err; + return 0; } static int prism2_set_default_key(struct wiphy *wiphy, struct net_device *dev, u8 key_index, bool unicast, bool multicast) { wlandevice_t *wlandev = dev->ml_priv; - - int err = 0; - int result = 0; + int result; result = prism2_domibset_uint32(wlandev, DIDmib_dot11smt_dot11PrivacyTable_dot11WEPDefaultKeyID, key_index); if (result) - err = -EFAULT; + return -EFAULT; - return err; + return 0; } static int prism2_get_station(struct wiphy *wiphy, struct net_device *dev, @@ -451,7 +437,6 @@ static int prism2_set_wiphy_params(struct wiphy *wiphy, u32 changed) wlandevice_t *wlandev = priv->wlandev; u32 data; int result; - int err = 0; if (changed & WIPHY_PARAM_RTS_THRESHOLD) { if (wiphy->rts_threshold == -1) @@ -462,10 +447,8 @@ static int prism2_set_wiphy_params(struct wiphy *wiphy, u32 changed) result = prism2_domibset_uint32(wla
[PATCH] PKCS#7: pkcs7_validate_trust(): initialize the _trusted output argument
Despite what the DocBook comment to pkcs7_validate_trust() says, the *_trusted argument is never set to false. pkcs7_validate_trust() only positively sets *_trusted upon encountering a trusted PKCS#7 SignedInfo block. This is quite unfortunate since its callers, system_verify_data() for example, depend on pkcs7_validate_trust() clearing *_trusted on non-trust. Indeed, UBSAN splats when attempting to load the uninitialized local variable 'trusted' from system_verify_data() in pkcs7_validate_trust(): UBSAN: Undefined behaviour in crypto/asymmetric_keys/pkcs7_trust.c:194:14 load of value 82 is not a valid value for type '_Bool' [...] Call Trace: [] dump_stack+0xbc/0x117 [] ? _atomic_dec_and_lock+0x169/0x169 [] ubsan_epilogue+0xd/0x4e [] __ubsan_handle_load_invalid_value+0x111/0x158 [] ? val_to_string.constprop.12+0xcf/0xcf [] ? x509_request_asymmetric_key+0x114/0x370 [] ? kfree+0x220/0x370 [] ? public_key_verify_signature_2+0x32/0x50 [] pkcs7_validate_trust+0x524/0x5f0 [] system_verify_data+0xca/0x170 [] ? top_trace_array+0x9b/0x9b [] ? __vfs_read+0x279/0x3d0 [] mod_verify_sig+0x1ff/0x290 [...] The implication is that pkcs7_validate_trust() effectively grants trust when it really shouldn't have. Fix this by explicitly setting *_trusted to false at the very beginning of pkcs7_validate_trust(). Signed-off-by: Nicolai Stange --- Applicable to linux-next-20160318 crypto/asymmetric_keys/pkcs7_trust.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/asymmetric_keys/pkcs7_trust.c b/crypto/asymmetric_keys/pkcs7_trust.c index 3bbdcc7..7d7a39b4 100644 --- a/crypto/asymmetric_keys/pkcs7_trust.c +++ b/crypto/asymmetric_keys/pkcs7_trust.c @@ -178,6 +178,8 @@ int pkcs7_validate_trust(struct pkcs7_message *pkcs7, int cached_ret = -ENOKEY; int ret; + *_trusted = false; + for (p = pkcs7->certs; p; p = p->next) p->seen = false; -- 2.7.3
Re: [PATCH 01/71] arc: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
On Sun, 20 Mar 2016, Linus Torvalds wrote: > On Sun, Mar 20, 2016 at 12:34 PM, Kirill A. Shutemov > wrote: > > > > Hm. Okay. Re-split this way would take some time. I'll post updated > > patchset tomorrow. > > Oh, I was assuming this was automated with coccinelle or at least some > simple shell scripting.. > > Generally, for things like this, automation really is great. > > In fact, I like it when people attach the scripts to the commit > message, further clarifying exactly what they did (even if the end > result then often includes manual fixups for patterns that didn't > _quite_ match, or where the automated script just generated ugly > indentation or similar). Fine by me to make these changes - once upon a time I had a better grip than most of when and how to use PAGE_CACHE_blah; but have long lost it, and agree with all those who find the imaginary distinction now a drag. Just a plea, which I expect you already intend, to apply these changes either just before 4.6-rc1 or just before 4.7-rc1 (I think I'd opt for 4.6-rc1 myself), without any interim of days or months in linux-next, where a period of divergence would be quite tiresome. Holding back Kirill's 71/71 until the coast is clear just a little later. Thanks, Hugh