Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-03-20 Thread Ming Lin
On Wed, 2016-03-16 at 09:23 +0100, Christoph Hellwig wrote:
> > 
> We can defintively kill this one.

We want to support different size of pools.
How can we kill this one?

Or did you mean we just create a single pool with size SG_CHUNK_SIZE?

> 
> > +static __init int sg_mempool_init(void)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < SG_MEMPOOL_NR; i++) {
> > +   struct sg_mempool *sgp = sg_pools + i;
> > +   int size = sgp->size * sizeof(struct scatterlist);
> > +
> > +   sgp->slab = kmem_cache_create(sgp->name, size, 0,
> > +   SLAB_HWCACHE_ALIGN, NULL);
> 
> Having these mempoools around in every kernel will make some embedded
> developers rather unhappy.  We could either not create them at
> runtime, which would require either a check in the fast path, or
> an init call in every driver, or just move the functions you
> added into a separe file, which will be compiled only based on a
> Kconfig
> symbol, and could even be potentially modular.  I think that
> second option might be easier.

I created lib/sg_pool.c with CONFIG_SG_POOL.


[lkp] [drm/dp_helper] 31f8862c6e: No primary result change, 128.2% piglit.time.voluntary_context_switches

2016-03-20 Thread kernel test robot
FYI, we noticed that piglit.time.voluntary_context_switches +917.2% change with 
your commit.

https://github.com/0day-ci/linux 
Lyude/drm-dp_helper-retry-on-ETIMEDOUT-in-drm_dp_dpcd_access/20160317-234351
commit 31f8862c6e6303223e946e6fcbdfa7f87274baef ("drm/dp_helper: retry on 
-ETIMEDOUT in drm_dp_dpcd_access()")


=
compiler/group/kconfig/rootfs/tbox_group/testcase:
  gcc-4.9/igt-071/x86_64-rhel/debian-x86_64-2015-02-07.cgz/snb-black/piglit

commit: 
  cf481068cdd430a22425d7712c8deeb25efdedc1
  31f8862c6e6303223e946e6fcbdfa7f87274baef

cf481068cdd430a2 31f8862c6e6303223e946e6fcb 
 -- 
 %stddev %change %stddev
 \  |\  
111.96 ±  0%+128.2% 255.52 ±  0%  piglit.time.elapsed_time
111.96 ±  0%+128.2% 255.52 ±  0%  piglit.time.elapsed_time.max
  8.25 ±  5% -39.4%   5.00 ±  0%  
piglit.time.percent_of_cpu_this_job_got
 31676 ±  0%+917.2% 32 ±  0%  
piglit.time.voluntary_context_switches
111.96 ±  0%+128.2% 255.52 ±  0%  time.elapsed_time
111.96 ±  0%+128.2% 255.52 ±  0%  time.elapsed_time.max
115.50 ±  1% +31.2% 151.50 ± 13%  time.involuntary_context_switches
  8.25 ±  5% -39.4%   5.00 ±  0%  time.percent_of_cpu_this_job_got
  8.31 ±  0% +45.7%  12.12 ±  0%  time.system_time
 31676 ±  0%+917.2% 32 ±  0%  time.voluntary_context_switches


snb-black: Sandy Bridge
Memory: 8G


   piglit.time.voluntary_context_switches

  35 ++-+
 |  OO OO   |
  30 ++ |
 |  |
  25 ++ |
 |  |
  20 ++ |
 |  |
  15 ++ |
 |  |
  10 OO OO OO OO OO OOO OO OO OO OO |
 |  |
   5 ++ |
 **.**.**. *.**.***.**.**.**.**.**.***.**.**.**.**.***.**.**.**.**.**
   0 ++---*-+


[*] bisect-good sample
[O] bisect-bad  sample

To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml  # job file is attached in this email
bin/lkp run job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong Ye.
---
LKP_SERVER: inn
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
testcase: piglit
default-monitors:
  wait: activate-monitor
  kmsg: 
  heartbeat:
interval: 10
default-watchdogs:
  oom-killer: 
  watchdog: 
commit: 31f8862c6e6303223e946e6fcbdfa7f87274baef
model: Sandy Bridge
memory: 8G
nr_cpu: 8
hdd_partitions: 
swap_partitions: 
rootfs_partition: 
category: functional
timeout: 30m
piglit:
  group: igt-071
queue: bisect
testbox: snb-black
tbox_group: snb-black
kconfig: x86_64-rhel
enqueue_time: 2016-03-21 06:44:51.728433840 +08:00
compiler: gcc-4.9
rootfs: debian-x86_64-2015-02-07.cgz
id: 40af15dd8c1739d33ffe090f8e6d07650c5fbf07
user: lkp
head_commit: ff90403f45704fa7ad73121525559f3567886c53
base_commit: b562e44f507e863c6792946e4e1b1449fbbac85d
branch: linux-devel/devel-hourly-2016032017
result_root: 
"/result/piglit/igt-071/snb-black/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/31f8862c6e6303223e946e6fcbdfa7f87274baef/0"
job_file: 
"/lkp/scheduled/snb-black/bisect_piglit-igt-071-debian-x86_64-2015-02-07.cgz-x86_64-rhel-31f8862c6e6303223e946e6fcbdfa7f87274baef-20160321-39265-rnyf53-0.yaml"
max_uptime: 1800
initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- 
job=/lkp/scheduled/snb-black/bisect_piglit-igt-071-debian-x86_64-2015-02-07.cgz-x86_64-rhel-31f8862c6e6303223e946e6fcbdfa7f87274baef-20160321-39265-rnyf53-0.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel
- branch=linux-devel/devel-hourly-2016032017
- commit=31f8862c6e6303223e946e6fcbdfa7f87274baef
- 
BOOT_IMAGE=/pkg/linux/x86_64-rhel/gcc-4.9/31f8862c6e6303223e946e6fcbdfa7f87274baef/vmlinuz-4.5.0-rc7-

[lkp] [rcutorture] 5b3e3964db: torture_init_begin: refusing rcu init: spin_lock running

2016-03-20 Thread kernel test robot
FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/dev
commit 5b3e3964dba5f5a3210ca931d523c1e1f3119b31 ("rcutorture: Add RCU 
grace-period performance tests")

As below, the log "torture_init_begin: refusing rcu init: spin_lock running" 
showed with your commit.

[3.310757] spin_lock-torture:--- Start of test [debug]: nwriters_stress=4 
nreaders_stress=0 stat_interval=60 verbose=1 shuffle_interval=3 stutter=5 
shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
[3.310757] spin_lock-torture:--- Start of test [debug]: nwriters_stress=4 
nreaders_stress=0 stat_interval=60 verbose=1 shuffle_interval=3 stutter=5 
shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
[3.318722] spin_lock-torture: Creating torture_shuffle task
[3.318722] spin_lock-torture: Creating torture_shuffle task
[3.350213] spin_lock-torture: Creating torture_stutter task
[3.350213] spin_lock-torture: Creating torture_stutter task
[3.353000] spin_lock-torture: torture_shuffle task started
[3.353000] spin_lock-torture: torture_shuffle task started
[3.355562] spin_lock-torture: Creating lock_torture_writer task
[3.355562] spin_lock-torture: Creating lock_torture_writer task
[3.358373] spin_lock-torture: torture_stutter task started
[3.358373] spin_lock-torture: torture_stutter task started
[3.361060] spin_lock-torture: lock_torture_writer task started
[3.361060] spin_lock-torture: lock_torture_writer task started
[3.370011] spin_lock-torture: Creating lock_torture_writer task
[3.370011] spin_lock-torture: Creating lock_torture_writer task
[3.372856] spin_lock-torture: Creating lock_torture_writer task
[3.372856] spin_lock-torture: Creating lock_torture_writer task
[3.375817] spin_lock-torture: lock_torture_writer task started
[3.375817] spin_lock-torture: lock_torture_writer task started
[3.378697] spin_lock-torture: Creating lock_torture_writer task
[3.378697] spin_lock-torture: Creating lock_torture_writer task
[3.380049] spin_lock-torture: lock_torture_writer task started
[3.380049] spin_lock-torture: lock_torture_writer task started
[3.410169] spin_lock-torture: Creating lock_torture_stats task
[3.410169] spin_lock-torture: Creating lock_torture_stats task
[3.413129] spin_lock-torture: lock_torture_writer task started
[3.413129] spin_lock-torture: lock_torture_writer task started
[3.420137] torture_init_begin: refusing rcu init: spin_lock running
[3.420137] torture_init_begin: refusing rcu init: spin_lock running

[3.430064] spin_lock-torture: lock_torture_stats task started
[3.430064] spin_lock-torture: lock_torture_stats task started
[3.441101] futex hash table entries: 16 (order: -1, 2048 bytes)
[3.441101] futex hash table entries: 16 (order: -1, 2048 bytes)
[3.443791] audit: initializing netlink subsys (disabled)
[3.443791] audit: initializing netlink subsys (disabled)
[3.446329] audit: type=2000 audit(1458435960.381:1): initialized
[3.446329] audit: type=2000 audit(1458435960.381:1): initialized
[3.470185] zbud: loaded
[3.470185] zbud: loaded


FYI, raw QEMU command line is:

qemu-system-x86_64 -enable-kvm -cpu Nehalem -kernel 
/pkg/linux/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/vmlinuz-4.5.0-rc1-00035-g5b3e396
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-intel12-yocto-x86_64-6/bisect_boot-1-yocto-minimal-x86_64.cgz-x86_64-randconfig-i0-201612-5b3e3964dba5f5a3210ca931d523c1e1f3119b31-20160320-8459-1gazcic-1.yaml
 ARCH=x86_64 kconfig=x86_64-randconfig-i0-201612 
branch=linux-devel/devel-spot-201603200631 
commit=5b3e3964dba5f5a3210ca931d523c1e1f3119b31 
BOOT_IMAGE=/pkg/linux/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/vmlinuz-4.5.0-rc1-00035-g5b3e396
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-intel12-yocto-x86_64/yocto-minimal-x86_64.cgz/x86_64-randconfig-i0-201612/gcc-5/5b3e3964dba5f5a3210ca931d523c1e1f3119b31/0
 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug 
sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
ip=vm-intel12-yocto-x86_64-6::dhcp drbd.minor_count=8'  -initrd 
/fs/KVM/initrd-vm-intel12-yocto-x86_64-6 -m 320 -smp 2 -device 
e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog 
i6300esb -rtc base=localtime -drive 
file=/fs/KVM/disk0-vm-intel12-yocto-x86_64-6,media=disk,if=virtio -drive 
file=/fs/KVM/disk1-vm-intel12-yocto-x86_64-6,media=disk,if=virtio -pidfile 
/dev/shm/kboot/pid-vm-intel12-yocto-x86_64-6 -serial 
file:/dev/shm/kboot/serial-vm-intel12-yocto-x86_64-6 -daemonize -display none 
-monitor null 

Thanks,
Xiaolong Ye.


dmesg.xz
Description: Binary data


Re: [PATCH 1/2] media/dvb-core: fix inverted check

2016-03-20 Thread Olli Salonen
Hi Max,

Already in the tree:
http://git.linuxtv.org/media_tree.git/commit/drivers/media/dvb-core?id=711f3fba6ffd3914fd1b5ed9faf8d22bab6f2203

Cheers,
-olli

On 18 March 2016 at 23:31, Max Kellermann  wrote:
> Breakage caused by commit f50d51661a
>
> Signed-off-by: Max Kellermann 
> ---
>  drivers/media/dvb-core/dvbdev.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c
> index 560450a..c756d4b 100644
> --- a/drivers/media/dvb-core/dvbdev.c
> +++ b/drivers/media/dvb-core/dvbdev.c
> @@ -682,7 +682,7 @@ int dvb_create_media_graph(struct dvb_adapter *adap,
> if (demux && ca) {
> ret = media_create_pad_link(demux, 1, ca,
> 0, MEDIA_LNK_FL_ENABLED);
> -   if (!ret)
> +   if (ret)
> return -ENOMEM;
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/18] zsmalloc: remove unused pool param in obj_free

2016-03-20 Thread Minchan Kim
Let's remove unused pool param in obj_free

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 16556a6db628..a0890e9003e2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1438,8 +1438,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-static void obj_free(struct zs_pool *pool, struct size_class *class,
-   unsigned long obj)
+static void obj_free(struct size_class *class, unsigned long obj)
 {
struct link_free *link;
struct page *first_page, *f_page;
@@ -1485,7 +1484,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
class = pool->size_class[class_idx];
 
spin_lock(&class->lock);
-   obj_free(pool, class, obj);
+   obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
if (fullness == ZS_EMPTY) {
zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
@@ -1648,7 +1647,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
unpin_tag(handle);
-   obj_free(pool, class, used_obj);
+   obj_free(class, used_obj);
}
 
/* Remember last position in this iteration */
-- 
1.9.1



[lkp] [cpufreq] 9be4fd2c77: No primary result change, 56.4% fsmark.time.involuntary_context_switches

2016-03-20 Thread kernel test robot
  |
  1000 ++   |
   900 ++   |
   ||
   800 ++.*  .** .*  .**   * .* *. ** .* *.  *. *   |
   **  **   *  **   *.* *.***  **.**  *  *  *  **  ***.* *.***.***.*|
   700 ++---**-**


[*] bisect-good sample
[O] bisect-bad  sample

To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml  # job file is attached in this email
bin/lkp run job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong Ye.
---
LKP_SERVER: inn
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
testcase: fsmark
default-monitors:
  wait: activate-monitor
  kmsg: 
  uptime: 
  iostat: 
  heartbeat: 
  vmstat: 
  numa-numastat: 
  numa-vmstat: 
  numa-meminfo: 
  proc-vmstat: 
  proc-stat:
interval: 10
  meminfo: 
  slabinfo: 
  interrupts: 
  lock_stat: 
  latency_stats: 
  softirqs: 
  bdi_dev_mapping: 
  diskstats: 
  nfsstat: 
  cpuidle: 
  cpufreq-stats: 
  turbostat: 
  pmeter: 
  sched_debug:
interval: 60
cpufreq_governor: 
default-watchdogs:
  oom-killer: 
  watchdog: 
commit: 9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4
model: Westmere-EP
memory: 16G
nr_hdd_partitions: 10
hdd_partitions: "/dev/disk/by-id/scsi-35000c500*-part1"
swap_partitions: 
rootfs_partition: 
"/dev/disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR5408564-part3"
category: benchmark
iterations: 1x
nr_threads: 32t
disk: 1HDD
fs: ext4
fs2: nfsv4
fsmark:
  filesize: 8K
  test_size: 400M
  sync_method: fsyncBeforeClose
  nr_directories: 16d
  nr_files_per_directory: 256fpd
queue: bisect
testbox: lkp-ws02
tbox_group: lkp-ws02
kconfig: x86_64-rhel
enqueue_time: 2016-03-20 10:09:24.525219588 +08:00
compiler: gcc-4.9
rootfs: debian-x86_64-2015-02-07.cgz
id: fdd404daccaee5e0c96a90d1c6f11354ed761f51
user: lkp
head_commit: 6c01e36f36861235cc151706b1bcb674e965c5a5
base_commit: b562e44f507e863c6792946e4e1b1449fbbac85d
branch: linux-devel/devel-hourly-2016031901
result_root: 
"/result/fsmark/1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd/lkp-ws02/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/0"
job_file: 
"/lkp/scheduled/lkp-ws02/bisect_fsmark-1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd-debian-x86_64-2015-02-07.cgz-x86_64-rhel-9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4-20160320-39429-bq8oj6-0.yaml"
nr_cpu: "$(nproc)"
max_uptime: 1756.70003
initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- 
job=/lkp/scheduled/lkp-ws02/bisect_fsmark-1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd-debian-x86_64-2015-02-07.cgz-x86_64-rhel-9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4-20160320-39429-bq8oj6-0.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel
- branch=linux-devel/devel-hourly-2016031901
- commit=9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4
- 
BOOT_IMAGE=/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/vmlinuz-4.5.0-rc2-4-g9be4fd2
- max_uptime=1756
- 
RESULT_ROOT=/result/fsmark/1x-32t-1HDD-ext4-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd/lkp-ws02/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/0
- LKP_SERVER=inn
- |-
  ipmi_watchdog.start_now=1

  earlyprintk=ttyS0,115200 systemd.log_level=err
  debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
  panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0
  console=ttyS0,115200 console=tty0 vga=normal

  rw
lkp_initrd: "/lkp/lkp/lkp-x86_64.cgz"
modules_initrd: 
"/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/modules.cgz"
bm_initrd: 
"/osimage/deps/debian-x86_64-2015-02-07.cgz/lkp.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/run-ipconfig.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/turbostat.cgz,/lkp/benchmarks/turbostat.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs2.cgz,/lkp/benchmarks/fsmark.cgz"
linux_headers_initrd: 
"/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/linux-headers.cgz"
repeat_to: 2
kernel: 
"/pkg/linux/x86_64-rhel/gcc-4.9/9be4fd2c7723a3057b0b39676fe4c8d5fd7118a4/vmlinuz-4.5.0-rc2-4-g9be4fd2"
dequeue_time: 2016-03-20 10:16:38.380437312 +08:00
job_state: finished
loadavg: 37.39 31.57 16.10 3/373 13706
start_time: '1458440258'
end_time: '1458440752'
version: "/lkp/lkp/.src-20160318-155012"
2016-03-20 

[PATCH v2 10/18] zsmalloc: factor page chain functionality out

2016-03-20 Thread Minchan Kim
For migration, we need to create sub-page chain of zspage
dynamically so this patch factors it out from alloc_zspage.

As a minor refactoring, it makes OBJ_ALLOCATED_TAG assign
more clear in obj_malloc(it could be another patch but it's
trivial so I want to put together in this patch).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 80 ++-
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 958f27a9079d..833da8f4ffc9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -982,7 +982,9 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
unsigned long off = 0;
struct page *page = first_page;
 
-   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   first_page->freelist = NULL;
+   INIT_LIST_HEAD(&first_page->lru);
+   set_zspage_inuse(first_page, 0);
 
while (page) {
struct page *next_page;
@@ -1027,13 +1029,44 @@ static void init_zspage(struct size_class *class, 
struct page *first_page)
set_freeobj(first_page, 0);
 }
 
+static void create_page_chain(struct page *pages[], int nr_pages)
+{
+   int i;
+   struct page *page;
+   struct page *prev_page = NULL;
+   struct page *first_page = NULL;
+
+   for (i = 0; i < nr_pages; i++) {
+   page = pages[i];
+
+   INIT_LIST_HEAD(&page->lru);
+   if (i == 0) {
+   SetPagePrivate(page);
+   set_page_private(page, 0);
+   first_page = page;
+   }
+
+   if (i == 1)
+   set_page_private(first_page, (unsigned long)page);
+   if (i >= 1)
+   set_page_private(page, (unsigned long)first_page);
+   if (i >= 2)
+   list_add(&page->lru, &prev_page->lru);
+   if (i == nr_pages - 1)
+   SetPagePrivate2(page);
+
+   prev_page = page;
+   }
+}
+
 /*
  * Allocate a zspage for the given size class
  */
 static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 {
-   int i, error;
-   struct page *first_page = NULL, *uninitialized_var(prev_page);
+   int i;
+   struct page *first_page = NULL;
+   struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
 
/*
 * Allocate individual pages and link them together as:
@@ -1046,43 +1079,23 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 * (i.e. no other sub-page has this flag set) and PG_private_2 to
 * identify the last page.
 */
-   error = -ENOMEM;
for (i = 0; i < class->pages_per_zspage; i++) {
struct page *page;
 
page = alloc_page(flags);
-   if (!page)
-   goto cleanup;
-
-   INIT_LIST_HEAD(&page->lru);
-   if (i == 0) {   /* first page */
-   page->freelist = NULL;
-   SetPagePrivate(page);
-   set_page_private(page, 0);
-   first_page = page;
-   set_zspage_inuse(page, 0);
+   if (!page) {
+   while (--i >= 0)
+   __free_page(pages[i]);
+   return NULL;
}
-   if (i == 1)
-   set_page_private(first_page, (unsigned long)page);
-   if (i >= 1)
-   set_page_private(page, (unsigned long)first_page);
-   if (i >= 2)
-   list_add(&page->lru, &prev_page->lru);
-   if (i == class->pages_per_zspage - 1)   /* last page */
-   SetPagePrivate2(page);
-   prev_page = page;
+
+   pages[i] = page;
}
 
+   create_page_chain(pages, class->pages_per_zspage);
+   first_page = pages[0];
init_zspage(class, first_page);
 
-   error = 0; /* Success */
-
-cleanup:
-   if (unlikely(error) && first_page) {
-   free_zspage(first_page);
-   first_page = NULL;
-   }
-
return first_page;
 }
 
@@ -1422,7 +1435,6 @@ static unsigned long obj_malloc(struct size_class *class,
unsigned long m_offset;
void *vaddr;
 
-   handle |= OBJ_ALLOCATED_TAG;
obj = get_freeobj(first_page);
objidx_to_page_and_offset(class, first_page, obj,
&m_page, &m_offset);
@@ -1432,10 +1444,10 @@ static unsigned long obj_malloc(struct size_class 
*class,
set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
if (!class->huge)
/* record handle in the header of allocated chunk */
-   link->handle = handle;
+   link->handle = handle | OBJ_ALLOCATED_TAG;
else
/* record handle in first_page

[PATCH v2 09/18] zsmalloc: move struct zs_meta from mapping to freelist

2016-03-20 Thread Minchan Kim
For supporting migration from VM, we need to have address_space
on every page so zsmalloc shouldn't use page->mapping. So,
this patch moves zs_meta from mapping to freelist.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0c8ccd87c084..958f27a9079d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -29,7 +29,7 @@
  * Look at size_class->huge.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: override by struct zs_meta
+ * page->freelist: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -418,7 +418,7 @@ static int get_zspage_inuse(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
 
return m->inuse;
 }
@@ -429,7 +429,7 @@ static void set_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
m->inuse = val;
 }
 
@@ -439,7 +439,7 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
m->inuse += val;
 }
 
@@ -449,7 +449,7 @@ static void set_freeobj(struct page *first_page, int idx)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
m->freeobj = idx;
 }
 
@@ -459,7 +459,7 @@ static unsigned long get_freeobj(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
return m->freeobj;
 }
 
@@ -471,7 +471,7 @@ static void get_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
*fullness = m->fullness;
*class_idx = m->class;
 }
@@ -484,7 +484,7 @@ static void set_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)&first_page->mapping;
+   m = (struct zs_meta *)&first_page->freelist;
m->fullness = fullness;
m->class = class_idx;
 }
@@ -946,7 +946,7 @@ static void reset_page(struct page *page)
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
-   page->mapping = NULL;
+   page->freelist = NULL;
page_mapcount_reset(page);
 }
 
@@ -1056,6 +1056,7 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 
INIT_LIST_HEAD(&page->lru);
if (i == 0) {   /* first page */
+   page->freelist = NULL;
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
@@ -2068,9 +2069,9 @@ static int __init zs_init(void)
 
/*
 * A zspage's a free object index, class index, fullness group,
-* inuse object count are encoded in its (first)page->mapping
+* inuse object count are encoded in its (first)page->freelist
 * so sizeof(struct zs_meta) should be less than
-* sizeof(page->mapping(i.e., unsigned long)).
+* sizeof(page->freelist(i.e., void *)).
 */
BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
 
-- 
1.9.1



[PATCH v2 06/18] zsmalloc: keep max_object in size_class

2016-03-20 Thread Minchan Kim
Every zspage in a size_class has same number of max objects so
we could move it to a size_class.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a0890e9003e2..8649d0243e6c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -32,8 +32,6 @@
  * page->freelist: points to the first free object in zspage.
  * Free objects are linked together using in-place
  * metadata.
- * page->objects: maximum number of objects we can store in this
- * zspage (class->zspage_order * PAGE_SIZE / class->size)
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
@@ -211,6 +209,7 @@ struct size_class {
 * of ZS_ALIGN.
 */
int size;
+   int objs_per_zspage;
unsigned int index;
 
struct zs_size_stat stats;
@@ -627,21 +626,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *first_page)
+static enum fullness_group get_fullness_group(struct size_class *class,
+   struct page *first_page)
 {
-   int inuse, max_objects;
+   int inuse, objs_per_zspage;
enum fullness_group fg;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
-   max_objects = first_page->objects;
+   objs_per_zspage = class->objs_per_zspage;
 
if (inuse == 0)
fg = ZS_EMPTY;
-   else if (inuse == max_objects)
+   else if (inuse == objs_per_zspage)
fg = ZS_FULL;
-   else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+   else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
fg = ZS_ALMOST_EMPTY;
else
fg = ZS_ALMOST_FULL;
@@ -728,7 +728,7 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
enum fullness_group currfg, newfg;
 
get_zspage_mapping(first_page, &class_idx, &currfg);
-   newfg = get_fullness_group(first_page);
+   newfg = get_fullness_group(class, first_page);
if (newfg == currfg)
goto out;
 
@@ -1008,9 +1008,6 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
-   /* Maximum number of objects we can store in this zspage */
-   first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
-
error = 0; /* Success */
 
 cleanup:
@@ -1238,11 +1235,11 @@ static bool can_merge(struct size_class *prev, int 
size, int pages_per_zspage)
return true;
 }
 
-static bool zspage_full(struct page *first_page)
+static bool zspage_full(struct size_class *class, struct page *first_page)
 {
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   return first_page->inuse == first_page->objects;
+   return first_page->inuse == class->objs_per_zspage;
 }
 
 unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1628,7 +1625,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
}
 
/* Stop if there is no more space */
-   if (zspage_full(d_page)) {
+   if (zspage_full(class, d_page)) {
unpin_tag(handle);
ret = -ENOMEM;
break;
@@ -1687,7 +1684,7 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
 {
enum fullness_group fullness;
 
-   fullness = get_fullness_group(first_page);
+   fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
@@ -1936,8 +1933,9 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
flags)
class->size = size;
class->index = i;
class->pages_per_zspage = pages_per_zspage;
-   if (pages_per_zspage == 1 &&
-   get_maxobj_per_zspage(size, pages_per_zspage) == 1)
+   class->objs_per_zspage = class->pages_per_zspage *
+   PAGE_SIZE / class->size;
+   if (pages_per_zspage == 1 && class->objs_per_zspage == 1)
class->huge = true;
spin_lock_init(&class->lock);
pool->size_class[i] = class;
-- 
1.9.1



[PATCH v2 12/18] zsmalloc: zs_compact refactoring

2016-03-20 Thread Minchan Kim
Currently, we rely on class->lock to prevent zspage destruction.
It was okay until now because the critical section is short but
with run-time migration, it could be long so class->lock is not
a good apporach any more.

So, this patch introduces [un]freeze_zspage functions which
freeze allocated objects in the zspage with pinning tag so
user cannot free using object. With those functions, this patch
redesign compaction.

Those functions will be used for implementing zspage runtime
migrations, too.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 393 ++
 1 file changed, 257 insertions(+), 136 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9c0ab1e92e9b..990d752fb65b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -922,6 +922,13 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
return *(unsigned long *)obj;
 }
 
+static inline int testpin_tag(unsigned long handle)
+{
+   unsigned long *ptr = (unsigned long *)handle;
+
+   return test_bit(HANDLE_PIN_BIT, ptr);
+}
+
 static inline int trypin_tag(unsigned long handle)
 {
unsigned long *ptr = (unsigned long *)handle;
@@ -950,8 +957,7 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct zs_pool *pool, struct size_class *class,
-   struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -974,11 +980,6 @@ static void free_zspage(struct zs_pool *pool, struct 
size_class *class,
}
reset_page(head_extra);
__free_page(head_extra);
-
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   &pool->pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1326,6 +1327,11 @@ static bool zspage_full(struct size_class *class, struct 
page *first_page)
return get_zspage_inuse(first_page) == class->objs_per_zspage;
 }
 
+static bool zspage_empty(struct size_class *class, struct page *first_page)
+{
+   return get_zspage_inuse(first_page) == 0;
+}
+
 unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
return atomic_long_read(&pool->pages_allocated);
@@ -1456,7 +1462,6 @@ static unsigned long obj_malloc(struct size_class *class,
set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
kunmap_atomic(vaddr);
mod_zspage_inuse(first_page, 1);
-   zs_stat_inc(class, OBJ_USED, 1);
 
obj = location_to_obj(m_page, obj);
 
@@ -1511,6 +1516,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
}
 
obj = obj_malloc(class, first_page, handle);
+   zs_stat_inc(class, OBJ_USED, 1);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1541,7 +1547,6 @@ static void obj_free(struct size_class *class, unsigned 
long obj)
kunmap_atomic(vaddr);
set_freeobj(first_page, f_objidx);
mod_zspage_inuse(first_page, -1);
-   zs_stat_dec(class, OBJ_USED, 1);
 }
 
 void zs_free(struct zs_pool *pool, unsigned long handle)
@@ -1565,10 +1570,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 
spin_lock(&class->lock);
obj_free(class, obj);
+   zs_stat_dec(class, OBJ_USED, 1);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY)
-   free_zspage(pool, class, first_page);
+   if (fullness == ZS_EMPTY) {
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   spin_unlock(&class->lock);
+   atomic_long_sub(class->pages_per_zspage,
+   &pool->pages_allocated);
+   free_zspage(pool, first_page);
+   goto out;
+   }
spin_unlock(&class->lock);
+out:
unpin_tag(handle);
 
free_handle(pool, handle);
@@ -1638,127 +1652,66 @@ static void zs_object_copy(struct size_class *class, 
unsigned long dst,
kunmap_atomic(s_addr);
 }
 
-/*
- * Find alloced object in zspage from index object and
- * return handle.
- */
-static unsigned long find_alloced_obj(struct size_class *class,
-   struct page *page, int index)
+static unsigned long handle_from_obj(struct size_class *class,
+   struct page *first_page, int obj_idx)
 {
-   unsigned long head;
-   int offset = 0;
-   unsigned long handle = 0;
-   void *addr = kmap_atomic(page);
-
-   if (!is_first_page(page))
-   offset = page->index;
-   offset += class->size * inde

[PATCH v2 18/18] zram: use __GFP_MOVABLE for memory allocation

2016-03-20 Thread Minchan Kim
Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE
from now on.

I did test to see how it helps to make higher order pages.
Test scenario is as follows.

KVM guest, 1G memory, ext4 formated zram block device,

for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done

wait `pidof dd`

for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt

echo "init"
cat /proc/buddyinfo

echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo

old:

init
Node 0, zone  DMA208120 51 41 11  0  0  0   
   0  0  0
Node 0, zoneDMA32  16380  13777   9184   3805789 54  3  0   
   0  0  0
compaction
Node 0, zone  DMA132 82 40 39 16  2  1  0   
   0  0  0
Node 0, zoneDMA32   5219   5526   4969   3455   1831677139 15   
   0  0  0

new:

init
Node 0, zone  DMA379115 97 19  2  0  0  0   
   0  0  0
Node 0, zoneDMA32  18891  16774  10862   3947637 21  0  0   
   0  0  0
compaction  1
Node 0, zone  DMA214 66 87 29 10  3  0  0   
   0  0  0
Node 0, zoneDMA32   1612   3139   3154   2469   1745990384 94   
   7  0  0

As you can see, compaction made so many high-order pages. Yay!

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 3 ++-
 mm/zsmalloc.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 46055dbc4095..da8298b9f05e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -517,7 +517,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
goto out_error;
}
 
-   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM
+   |__GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 35bafa0bc3f1..8557da6dbaf2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -308,7 +308,7 @@ static void destroy_handle_cache(struct zs_pool *pool)
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   pool->flags & ~__GFP_HIGHMEM);
+   pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
-- 
1.9.1



[PATCH v2 16/18] zsmalloc: use single linked list for page chain

2016-03-20 Thread Minchan Kim
For tail page migration, we shouldn't use page->lru which
was used for page chaining because VM will use it for own
purpose so that we need another field for chaining.
For chaining, singly linked list is enough and page->index
of tail page to point first object offset in the page could
be replaced in run-time calculation.

So, this patch change page->lru list for chaining with singly
linked list via page->freelist squeeze and introduces
get_first_obj_ofs to get first object offset in a page.

With that, it could maintain page chaining without using
page->lru.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 119 ++
 1 file changed, 78 insertions(+), 41 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b3b31fdfea0f..9b4b03d8f993 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,10 +17,7 @@
  *
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
+ * page->index (union with page->freelist): override by struct zs_meta
  *
  * For _first_ page only:
  *
@@ -271,10 +268,19 @@ struct zs_pool {
 };
 
 struct zs_meta {
-   unsigned long freeobj:FREEOBJ_BITS;
-   unsigned long class:CLASS_BITS;
-   unsigned long fullness:FULLNESS_BITS;
-   unsigned long inuse:INUSE_BITS;
+   union {
+   /* first page */
+   struct {
+   unsigned long freeobj:FREEOBJ_BITS;
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+   };
+   /* tail pages */
+   struct {
+   struct page *next;
+   };
+   };
 };
 
 struct mapping_area {
@@ -491,6 +497,34 @@ static unsigned long get_freeobj(struct page *first_page)
return m->freeobj;
 }
 
+static void set_next_page(struct page *page, struct page *next)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(is_first_page(page), page);
+
+   m = (struct zs_meta *)&page->index;
+   m->next = next;
+}
+
+static struct page *get_next_page(struct page *page)
+{
+   struct page *next;
+
+   if (is_last_page(page))
+   next = NULL;
+   else if (is_first_page(page))
+   next = (struct page *)page_private(page);
+   else {
+   struct zs_meta *m = (struct zs_meta *)&page->index;
+
+   VM_BUG_ON(!m->next);
+   next = m->next;
+   }
+
+   return next;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -871,18 +905,30 @@ static struct page *get_first_page(struct page *page)
return (struct page *)page_private(page);
 }
 
-static struct page *get_next_page(struct page *page)
+int get_first_obj_ofs(struct size_class *class, struct page *first_page,
+   struct page *page)
 {
-   struct page *next;
+   int pos, bound;
+   int page_idx = 0;
+   int ofs = 0;
+   struct page *cursor = first_page;
 
-   if (is_last_page(page))
-   next = NULL;
-   else if (is_first_page(page))
-   next = (struct page *)page_private(page);
-   else
-   next = list_entry(page->lru.next, struct page, lru);
+   if (first_page == page)
+   goto out;
 
-   return next;
+   while (page != cursor) {
+   page_idx++;
+   cursor = get_next_page(cursor);
+   }
+
+   bound = PAGE_SIZE * page_idx;
+   pos = (((class->objs_per_zspage * class->size) *
+   page_idx / class->pages_per_zspage) / class->size
+   ) * class->size;
+
+   ofs = (pos + class->size) % PAGE_SIZE;
+out:
+   return ofs;
 }
 
 static void objidx_to_page_and_offset(struct size_class *class,
@@ -1008,27 +1054,25 @@ void lock_zspage(struct page *first_page)
 
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
-   struct page *nextp, *tmp, *head_extra;
+   struct page *nextp, *tmp;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
lock_zspage(first_page);
-   head_extra = (struct page *)page_private(first_page);
+   nextp = (struct page *)page_private(first_page);
 
/* zspage with only 1 system page */
-   if (!head_extra)
+   if (!nextp)
goto out;
 
-   list_for_each_entry_safe(nextp, tmp, &head_extra->lru, lru) {
-   list_del(&nextp->lru);
-   reset_page(nextp);
- 

[PATCH v2 04/18] zsmalloc: reordering function parameter

2016-03-20 Thread Minchan Kim
This patch cleans up function parameter ordering to order
higher data structure first.

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 50 ++
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 6a7b9313ee8c..16556a6db628 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -569,7 +569,7 @@ static const struct file_operations zs_stat_size_ops = {
.release= single_release,
 };
 
-static int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
struct dentry *entry;
 
@@ -609,7 +609,7 @@ static void __exit zs_stat_exit(void)
 {
 }
 
-static inline int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static inline int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
return 0;
 }
@@ -655,8 +655,9 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void insert_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -687,8 +688,9 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void remove_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -730,8 +732,8 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(first_page, class, currfg);
-   insert_zspage(first_page, class, newfg);
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -915,7 +917,7 @@ static void free_zspage(struct page *first_page)
 }
 
 /* Initialize a newly allocated zspage */
-static void init_zspage(struct page *first_page, struct size_class *class)
+static void init_zspage(struct size_class *class, struct page *first_page)
 {
unsigned long off = 0;
struct page *page = first_page;
@@ -1003,7 +1005,7 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
prev_page = page;
}
 
-   init_zspage(first_page, class);
+   init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
/* Maximum number of objects we can store in this zspage */
@@ -1348,8 +1350,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long 
handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-static unsigned long obj_malloc(struct page *first_page,
-   struct size_class *class, unsigned long handle)
+static unsigned long obj_malloc(struct size_class *class,
+   struct page *first_page, unsigned long handle)
 {
unsigned long obj;
struct link_free *link;
@@ -1426,7 +1428,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
class->size, class->pages_per_zspage));
}
 
-   obj = obj_malloc(first_page, class, handle);
+   obj = obj_malloc(class, first_page, handle);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1499,8 +1501,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(unsigned long dst, unsigned long src,
-   struct size_class *class)
+static void zs_object_copy(struct size_class *class, unsigned long dst,
+   unsigned long src)
 {
struct page *s_page, *d_page;
unsigned long s_objidx, d_objidx;
@@ -1566,8 +1568,8 @@ static void zs_object_copy(unsigned long dst, unsigned 
long src,
  * Find alloced object in zspage from index object and
  * return handle.
  */
-static unsigned long find_alloced_obj(struct page *page, int index,
-   struct size_class *class)
+static unsigned long find_alloced_obj(struct size_class *class,
+   struct page *page, int index)
 {
unsigned long head;
int offset = 0;
@@ -1617,7 +1619,7 @@ static int migrate_zspage(struct zs_pool *p

[PATCH v2 07/18] zsmalloc: squeeze inuse into page->mapping

2016-03-20 Thread Minchan Kim
Currently, we store class:fullness into page->mapping.
The number of class we can support is 255 and fullness is 4 so
(8 + 2 = 10bit) is enough to represent them.
Meanwhile, the bits we need to store in-use objects in zspage
is that 11bit is enough.

For example, If we assume that 64K PAGE_SIZE, class_size 32
which is worst case, class->pages_per_zspage become 1 so
the number of objects in zspage is 2048 so 11bit is enough.
The next class is 32 + 256(i.e., ZS_SIZE_CLASS_DELTA).
With worst case that ZS_MAX_PAGES_PER_ZSPAGE, 64K * 4 /
(32 + 256) = 910 so 11bit is still enough.

So, we could squeeze inuse object count to page->mapping.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 103 --
 1 file changed, 71 insertions(+), 32 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8649d0243e6c..4dd72a803568 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -34,8 +34,7 @@
  * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: class index and fullness group of the zspage
- * page->inuse: the number of objects that are used in this zspage
+ * page->mapping: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -132,6 +131,13 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define CLASS_BITS 8
+#define CLASS_MASK ((1 << CLASS_BITS) - 1)
+#define FULLNESS_BITS  2
+#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+#define INUSE_BITS 11
+#define INUSE_MASK ((1 << INUSE_BITS) - 1)
+
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
  * trader-off here:
@@ -145,7 +151,7 @@
  *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
  *  (reason above)
  */
-#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> 8)
+#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS)
 
 /*
  * We do not maintain any list for completely empty or full pages
@@ -155,7 +161,7 @@ enum fullness_group {
ZS_ALMOST_EMPTY,
_ZS_NR_FULLNESS_GROUPS,
 
-   ZS_EMPTY,
+   ZS_EMPTY = _ZS_NR_FULLNESS_GROUPS,
ZS_FULL
 };
 
@@ -263,14 +269,11 @@ struct zs_pool {
 #endif
 };
 
-/*
- * A zspage's class index and fullness group
- * are encoded in its (first)page->mapping
- */
-#define CLASS_IDX_BITS 28
-#define FULLNESS_BITS  4
-#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
-#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+struct zs_meta {
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+};
 
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
@@ -412,28 +415,61 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+static int get_zspage_inuse(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)&first_page->mapping;
+
+   return m->inuse;
+}
+
+static void set_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)&first_page->mapping;
+   m->inuse = val;
+}
+
+static void mod_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)&first_page->mapping;
+   m->inuse += val;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (unsigned long)first_page->mapping;
-   *fullness = m & FULLNESS_MASK;
-   *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
+   m = (struct zs_meta *)&first_page->mapping;
+   *fullness = m->fullness;
+   *class_idx = m->class;
 }
 
 static void set_zspage_mapping(struct page *first_page,
unsigned int class_idx,
enum fullness_group fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
-   (fullness & FULLNESS_MASK);
-   first_page->mapping = (struct address_space *)m;
+   m = (struct zs_meta *)&first_page->mapping;
+   m->fullness = fullness;
+   m->class = class_idx;
 }
 
 /*
@@ -632,9 +668,7 @@ static enum fullness_group get_fullness_group(struct 
size_class *class,
int inuse, objs_per_zspage;
enum fullness_group fg;
 
-   VM_BU

[PATCH v2 17/18] zsmalloc: migrate tail pages in zspage

2016-03-20 Thread Minchan Kim
This patch enables tail page migration of zspage.

In this point, I tested zsmalloc regression with micro-benchmark
which does zs_malloc/map/unmap/zs_free for all size class
in every CPU(my system is 12) during 20 sec.

It shows 1% regression which is really small when we consider
the benefit of this feature and realworkload overhead(i.e.,
most overhead comes from compression).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 131 +++---
 1 file changed, 115 insertions(+), 16 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9b4b03d8f993..35bafa0bc3f1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -551,6 +551,19 @@ static void set_zspage_mapping(struct page *first_page,
m->class = class_idx;
 }
 
+static bool check_isolated_page(struct page *first_page)
+{
+   struct page *cursor;
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (PageIsolated(cursor))
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * zsmalloc divides the pool into various size classes where each
  * class maintains a list of zspages where each zspage is divided
@@ -1052,6 +1065,44 @@ void lock_zspage(struct page *first_page)
} while ((cursor = get_next_page(cursor)) != NULL);
 }
 
+int trylock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor, *fail;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page) {
+   if (!trylock_page(cursor)) {
+   fail = cursor;
+   goto unlock;
+   }
+   }
+   }
+
+   return 1;
+unlock:
+   for (cursor = first_page; cursor != fail; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   }
+
+   return 0;
+}
+
+void unlock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor = first_page;
+
+   for (; cursor != NULL; cursor = get_next_page(cursor)) {
+   VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   };
+}
+
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp;
@@ -1090,16 +1141,17 @@ static void init_zspage(struct size_class *class, 
struct page *first_page,
first_page->freelist = NULL;
INIT_LIST_HEAD(&first_page->lru);
set_zspage_inuse(first_page, 0);
-   BUG_ON(!trylock_page(first_page));
-   first_page->mapping = mapping;
-   __SetPageMovable(first_page);
-   unlock_page(first_page);
 
while (page) {
struct page *next_page;
struct link_free *link;
void *vaddr;
 
+   BUG_ON(!trylock_page(page));
+   page->mapping = mapping;
+   __SetPageMovable(page);
+   unlock_page(page);
+
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
 
@@ -1850,6 +1902,7 @@ static enum fullness_group putback_zspage(struct 
size_class *class,
 
VM_BUG_ON_PAGE(!list_empty(&first_page->lru), first_page);
VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+   VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page);
 
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
@@ -1956,6 +2009,12 @@ static struct page *isolate_source_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+   continue;
+   }
+
remove_zspage(class, i, page);
 
inuse = get_zspage_inuse(page);
@@ -1964,6 +2023,7 @@ static struct page *isolate_source_page(struct size_class 
*class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+   unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -1995,6 +2055,12 @@ static struct page *isolate_target_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+

[PATCH v2 08/18] zsmalloc: squeeze freelist into page->mapping

2016-03-20 Thread Minchan Kim
Zsmalloc stores first free object's position into first_page->freelist
in each zspage. If we change it with object index from first_page
instead of location, we could squeeze it into page->mapping because
the number of bit we need to store offset is at most 11bit.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 159 +++---
 1 file changed, 96 insertions(+), 63 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 4dd72a803568..0c8ccd87c084 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -18,9 +18,7 @@
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
  * page->index (union with page->freelist): offset of the first object
- * starting in this page. For the first page, this is
- * always 0, so we use this field (aka freelist) to point
- * to the first free object in zspage.
+ * starting in this page.
  * page->lru: links together all component pages (except the first page)
  * of a zspage
  *
@@ -29,9 +27,6 @@
  * page->private: refers to the component page after the first page
  * If the page is first_page for huge object, it stores handle.
  * Look at size_class->huge.
- * page->freelist: points to the first free object in zspage.
- * Free objects are linked together using in-place
- * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: override by struct zs_meta
@@ -131,6 +126,7 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define FREEOBJ_BITS 11
 #define CLASS_BITS 8
 #define CLASS_MASK ((1 << CLASS_BITS) - 1)
 #define FULLNESS_BITS  2
@@ -228,17 +224,17 @@ struct size_class {
 
 /*
  * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
+ * For every zspage, first_page->freeobj gives head of this list.
  *
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
 struct link_free {
union {
/*
-* Position of next free chunk (encodes )
+* free object list
 * It's valid for non-allocated object
 */
-   void *next;
+   unsigned long next;
/*
 * Handle of allocated object.
 */
@@ -270,6 +266,7 @@ struct zs_pool {
 };
 
 struct zs_meta {
+   unsigned long freeobj:FREEOBJ_BITS;
unsigned long class:CLASS_BITS;
unsigned long fullness:FULLNESS_BITS;
unsigned long inuse:INUSE_BITS;
@@ -446,6 +443,26 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
m->inuse += val;
 }
 
+static void set_freeobj(struct page *first_page, int idx)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)&first_page->mapping;
+   m->freeobj = idx;
+}
+
+static unsigned long get_freeobj(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)&first_page->mapping;
+   return m->freeobj;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -837,30 +854,33 @@ static struct page *get_next_page(struct page *page)
return next;
 }
 
-/*
- * Encode  as a single handle value.
- * We use the least bit of handle for tagging.
- */
-static void *location_to_obj(struct page *page, unsigned long obj_idx)
+static void objidx_to_page_and_offset(struct size_class *class,
+   struct page *first_page,
+   unsigned long obj_idx,
+   struct page **obj_page,
+   unsigned long *offset_in_page)
 {
-   unsigned long obj;
+   int i;
+   unsigned long offset;
+   struct page *cursor;
+   int nr_page;
 
-   if (!page) {
-   VM_BUG_ON(obj_idx);
-   return NULL;
-   }
+   offset = obj_idx * class->size;
+   cursor = first_page;
+   nr_page = offset >> PAGE_SHIFT;
 
-   obj = page_to_pfn(page) << OBJ_INDEX_BITS;
-   obj |= ((obj_idx) & OBJ_INDEX_MASK);
-   obj <<= OBJ_TAG_BITS;
+   *offset_in_page = offset & ~PAGE_MASK;
+
+   for (i = 0; i < nr_page; i++)
+   cursor = get_next_page(cursor);
 
-   return (void *)obj;
+   *obj_page = cursor;
 }
 
-/*
- * Decode  pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * location_to_obj().
+/**
+ * obj_to_location - get (, ) fro

[PATCH v2 15/18] zsmalloc: migrate head page of zspage

2016-03-20 Thread Minchan Kim
This patch introduces run-time migration feature for zspage.
To begin with, it supports only head page migration for
easy review(later patches will support tail page migration).

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate
from class so anyone cannot allocate new object from the zspage.
IOW, allocation freeze

* zs_page_migrate

First of all, it freezes zspage to prevent zspage destrunction
so anyone cannot free object. Then, It copies content from oldpage
to newpage and create new page-chain with new page.
If it was successful, drop the refcount of old page to free
and putback new zspage to right data structure of zsmalloc.
Lastly, unfreeze zspages so we allows object allocation/free
from now on.

* zs_page_putback

It returns isolated zspage to right fullness_group list
if it fails to migrate a page.

NOTE: A hurdle to support migration is that destroying zspage
while migration is going on. Once a zspage is isolated,
anyone cannot allocate object from the zspage but can deallocate
object freely so a zspage could be destroyed until all of objects
in zspage are freezed to prevent deallocation. The problem is
large window betwwen zs_page_isolate and freeze_zspage
in zs_page_migrate so the zspage could be destroyed.

A easy approach to solve the problem is that object freezing
in zs_page_isolate but it has a drawback that any object cannot
be deallocated until migration fails after isolation. However,
There is large time gab between isolation and migration so
any object freeing in other CPU should spin by pin_tag which
would cause big latency. So, this patch introduces lock_zspage
which holds PG_lock of all pages in a zspage right before
freeing the zspage. VM migration locks the page, too right
before calling ->migratepage so such race doesn't exist any more.

Signed-off-by: Minchan Kim 
---
 include/uapi/linux/magic.h |   1 +
 mm/zsmalloc.c  | 329 +++--
 2 files changed, 317 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index e1fbe72c39c0..93b1affe4801 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -79,5 +79,6 @@
 #define NSFS_MAGIC 0x6e736673
 #define BPF_FS_MAGIC   0xcafe4a11
 #define BALLOON_KVM_MAGIC  0x13661366
+#define ZSMALLOC_MAGIC 0x58295829
 
 #endif /* __LINUX_MAGIC_H__ */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 990d752fb65b..b3b31fdfea0f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -56,6 +56,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -182,6 +184,8 @@ struct zs_size_stat {
 static struct dentry *zs_stat_root;
 #endif
 
+static struct vfsmount *zsmalloc_mnt;
+
 /*
  * number of size_classes
  */
@@ -263,6 +267,7 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
struct dentry *stat_dentry;
 #endif
+   struct inode *inode;
 };
 
 struct zs_meta {
@@ -412,6 +417,29 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+/*
+ * Indicate that whether zspage is isolated for page migration.
+ * Protected by size_class lock
+ */
+static void SetZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   SetPageUptodate(first_page);
+}
+
+static int ZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   return PageUptodate(first_page);
+}
+
+static void ClearZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   ClearPageUptodate(first_page);
+}
+
 static int get_zspage_inuse(struct page *first_page)
 {
struct zs_meta *m;
@@ -783,8 +811,11 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(class, currfg, first_page);
-   insert_zspage(class, newfg, first_page);
+   /* Later, putback will insert page to right list */
+   if (!ZsPageIsolate(first_page)) {
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
+   }
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -950,13 +981,31 @@ static void unpin_tag(unsigned long handle)
 
 static void reset_page(struct page *page)
 {
+   __ClearPageMovable(page);
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
page->freelist = NULL;
+   page->mapping = NULL;
page_mapcount_reset(page);
 }
 
+/**
+ * lock_zspage - lock all pages in the zspage
+ * @first_page: head page of the zspage
+ *
+ * To prevent destroy during migration, zspage freeing should
+ * hold locks of all pages in a zspage
+ */

[PATCH v2 14/18] mm/balloon: use general movable page feature into balloon

2016-03-20 Thread Minchan Kim
Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compact.c. Instead, this patch implements page->mapping
->{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.

Cc: virtualizat...@lists.linux-foundation.org
Cc: Rafael Aquini 
Cc: Konstantin Khlebnikov 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 drivers/virtio/virtio_balloon.c|  45 -
 include/linux/balloon_compaction.h |  47 -
 include/linux/page-flags.h |  52 +++
 include/uapi/linux/magic.h |   1 +
 mm/balloon_compaction.c| 101 -
 mm/compaction.c|   7 ---
 mm/migrate.c   |  22 ++--
 mm/vmscan.c|   2 +-
 8 files changed, 113 insertions(+), 164 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7b6d74f0c72f..46a69b6a0c4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+#ifdef CONFIG_BALLOON_COMPACTION
+static struct vfsmount *balloon_mnt;
+#endif
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -482,10 +487,29 @@ static int virtballoon_migratepage(struct 
balloon_dev_info *vb_dev_info,
 
mutex_unlock(&vb->balloon_lock);
 
+   ClearPageIsolated(page);
put_page(page); /* balloon reference */
 
return MIGRATEPAGE_SUCCESS;
 }
+
+static struct dentry *balloon_mount(struct file_system_type *fs_type,
+   int flags, const char *dev_name, void *data)
+{
+   static const struct dentry_operations ops = {
+   .d_dname = simple_dname,
+   };
+
+   return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops,
+   BALLOON_KVM_MAGIC);
+}
+
+static struct file_system_type balloon_fs = {
+   .name   = "balloon-kvm",
+   .mount  = balloon_mount,
+   .kill_sb= kill_anon_super,
+};
+
 #endif /* CONFIG_BALLOON_COMPACTION */
 
 static int virtballoon_probe(struct virtio_device *vdev)
@@ -516,12 +540,25 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
balloon_devinfo_init(&vb->vb_dev_info);
 #ifdef CONFIG_BALLOON_COMPACTION
+   balloon_mnt = kern_mount(&balloon_fs);
+   if (IS_ERR(balloon_mnt)) {
+   err = PTR_ERR(balloon_mnt);
+   goto out_free_vb;
+   }
+
vb->vb_dev_info.migratepage = virtballoon_migratepage;
+   vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
+   if (IS_ERR(vb->vb_dev_info.inode)) {
+   err = PTR_ERR(vb->vb_dev_info.inode);
+   vb->vb_dev_info.inode = NULL;
+   goto out_unmount;
+   }
+   vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
 #endif
 
err = init_vqs(vb);
if (err)
-   goto out_free_vb;
+   goto out_unmount;
 
vb->nb.notifier_call = virtballoon_oom_notify;
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
@@ -535,6 +572,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
 out_oom_notify:
vdev->config->del_vqs(vdev);
+out_unmount:
+   if (vb->vb_dev_info.inode)
+   iput(vb->vb_dev_info.inode);
+   kern_unmount(balloon_mnt);
 out_free_vb:
kfree(vb);
 out:
@@ -567,6 +608,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_stats_work);
 
remove_common(vb);
+   if (vb->vb_dev_info.inode)
+   iput(vb->vb_dev_info.inode);
kfree(vb);
 }
 
diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..43a858545844 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+   struct inode *inode;
 };
 
 extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct 
balloon_dev_info *balloon)
spin_lock_init(&balloon->pages_lock);
INIT_L

[PATCH v2 13/18] mm/compaction: support non-lru movable page migration

2016-03-20 Thread Minchan Kim
We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

Basically, this patch supports two page-flags and two functions related
to page migration. The flag and page->mapping stability are protected
by PG_lock.

PG_movable
PG_isolated

bool (*isolate_page) (struct page *, isolate_mode_t);
void (*putback_page) (struct page *);

Duty of subsystem want to make their pages as migratable are
as follows:

1. It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.

2. It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.

3. If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the
reference of the page to free.

4. If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.

Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: dri-de...@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 Documentation/filesystems/Locking  |   4 +
 Documentation/filesystems/vfs.txt  |   5 ++
 fs/proc/page.c |   3 +
 include/linux/fs.h |   2 +
 include/linux/migrate.h|   2 +
 include/linux/page-flags.h |  29 
 include/uapi/linux/kernel-page-flags.h |   1 +
 mm/compaction.c|  14 +++-
 mm/migrate.c   | 132 +
 9 files changed, 177 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 619af9bfdcb3..0bb79560abb3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
+   bool (*isolate_page) (struct page *, isolate_mode_t);
int (*migratepage)(struct address_space *, struct page *, struct page 
*);
+   void (*putback_page) (struct page *);
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned 
long);
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
 releasepage:   yes
 freepage:  yes
 direct_IO:
+isolate_page:  yes
 migratepage:   yes (both)
+putback_page:  yes
 launder_page:  yes
 is_partially_uptodate: yes
 error_remove_page: yes
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index b02a7d598258..4c1b6c3b4bc8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t 
offset);
+   /* isolate a page for migration */
+   bool (*isolate_page) (struct page *, isolate_mode_t);
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
+   /* put the page back to right list */
+   void (*putback_page) (struct page *);
int (*launder_page) (struct page *);
+
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 712f1b9992cc..e2066e73a9b8 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
if (page_is_idle(page))
u |= 1 << KPF_IDLE;
 
+   if (PageMovable(page))
+   u |= 1 << KPF_MOVABLE;
+
u |= kpf_copy_bit(k, KPF_LOCKED,PG_locked);
 
u |= kpf_copy_bit(k, KPF_SLAB,  PG_slab);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 14a97194b34b..b7ef2e41fa4a 100644
--- a/include/linux/fs.h
+

[PATCH v2 02/18] zsmalloc: use first_page rather than page

2016-03-20 Thread Minchan Kim
This patch cleans up function parameter "struct page".
Many functions of zsmalloc expects that page paramter is "first_page"
so use "first_page" rather than "page" for code readability.

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 62 ++-
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e72efb109fde..b09a80d398c9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -413,26 +413,28 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
-static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
+static void get_zspage_mapping(struct page *first_page,
+   unsigned int *class_idx,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   m = (unsigned long)page->mapping;
+   m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
*class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
 }
 
-static void set_zspage_mapping(struct page *page, unsigned int class_idx,
+static void set_zspage_mapping(struct page *first_page,
+   unsigned int class_idx,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
-   page->mapping = (struct address_space *)m;
+   first_page->mapping = (struct address_space *)m;
 }
 
 /*
@@ -625,14 +627,14 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *page)
+static enum fullness_group get_fullness_group(struct page *first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   inuse = page->inuse;
-   max_objects = page->objects;
+   inuse = first_page->inuse;
+   max_objects = first_page->objects;
 
if (inuse == 0)
fg = ZS_EMPTY;
@@ -652,12 +654,12 @@ static enum fullness_group get_fullness_group(struct page 
*page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *page, struct size_class *class,
+static void insert_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -667,7 +669,7 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 
head = &class->fullness_list[fullness];
if (!*head) {
-   *head = page;
+   *head = first_page;
return;
}
 
@@ -675,21 +677,21 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 * We want to see more ZS_FULL pages and less almost
 * empty/full. Put pages with higher ->inuse first.
 */
-   list_add_tail(&page->lru, &(*head)->lru);
-   if (page->inuse >= (*head)->inuse)
-   *head = page;
+   list_add_tail(&first_page->lru, &(*head)->lru);
+   if (first_page->inuse >= (*head)->inuse)
+   *head = first_page;
 }
 
 /*
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *page, struct size_class *class,
+static void remove_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -698,11 +700,11 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
BUG_ON(!*head);
if (list_empty(&(*head)->lru))
*head = NULL;
-   else if (*head == page)
+   else if (*head == first_page)
*head = (struct page *)list_entry((*head)->lru.next,
struct page, lru);
 
-   list_del_init(&page->lru);
+   list_del_init(&first_page->lru);
zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ?
CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
 }
@@ -717,21 +719,21 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
  * fullness group.
  */
 static enum 

[PATCH v2 00/18] Support non-lru page migration

2016-03-20 Thread Minchan Kim
Recently, I got many reports about perfermance degradation
in embedded system(Android mobile phone, webOS TV and so on)
and failed to fork easily.

The problem was fragmentation caused by zram and GPU driver
pages. Their pages cannot be migrated so compaction cannot
work well, either so reclaimer ends up shrinking all of working
set pages. It made system very slow and even to fail to fork
easily.

Other pain point is that they cannot work with CMA.
Most of CMA memory space could be idle(ie, it could be used
for movable pages unless driver is using) but if driver(i.e.,
zram) cannot migrate his page, that memory space could be
wasted. In our product which has big CMA memory, it reclaims
zones too exccessively although there are lots of free space
in CMA so system was very slow easily.

To solve these problem, this patch try to add facility to
migrate non-lru pages via introducing new friend functions
of migratepage in address_space_operation and new page flags.

(isolate_page, putback_page)
(PG_movable, PG_isolated)

For details, please read description in
"mm/compaction: support non-lru movable page migration".

Originally, Gioh Kim tried to support this feature but he moved
so I took over the work. But I took many code from his work and
changed a little bit.
Thanks, Gioh!

And I should mention Konstantin Khlebnikov. He really heped Gioh
at that time so he should deserve to have many credit, too.
Thanks, Konstantin!

This patchset consists of five parts

1. clean up migration
  mm: use put_page to free page instead of putback_lru_page

2. zsmalloc clean-up for preparing page migration
  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring

3. add non-lru page migration feature
  mm/compaction: support non-lru movable page migration

4. rework KVM memory-ballooning
  mm/balloon: use general movable page feature into balloon

5. add zsmalloc page migration
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

* From v1
  * rebase on v4.5-mmotm-2016-03-17-15-04
  * reordering patches to merge clean-up patches first
  * add Acked-by/Reviewed-by from Vlastimil and Sergey
  * use each own mount model instead of reusing anon_inode_fs - Al Viro
  * small changes - YiPing, Gioh

Minchan Kim (18):
  mm: use put_page to free page instead of putback_lru_page
  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring
  mm/compaction: support non-lru movable page migration
  mm/balloon: use general movable page feature into balloon
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

 Documentation/filesystems/Locking  |4 +
 Documentation/filesystems/vfs.txt  |5 +
 drivers/block/zram/zram_drv.c  |3 +-
 drivers/virtio/virtio_balloon.c|   45 +-
 fs/proc/page.c |3 +
 include/linux/balloon_compaction.h |   47 +-
 include/linux/fs.h |2 +
 include/linux/migrate.h|2 +
 include/linux/page-flags.h |   41 +-
 include/uapi/linux/kernel-page-flags.h |1 +
 include/uapi/linux/magic.h |2 +
 mm/balloon_compaction.c|  101 +--
 mm/compaction.c|   15 +-
 mm/migrate.c   |  198 +++--
 mm/vmscan.c|2 +-
 mm/zsmalloc.c  | 1338 +++-
 16 files changed, 1284 insertions(+), 525 deletions(-)

-- 
1.9.1



[PATCH v2 11/18] zsmalloc: separate free_zspage from putback_zspage

2016-03-20 Thread Minchan Kim
Currently, putback_zspage does free zspage under class->lock
if fullness become ZS_EMPTY but it makes trouble to implement
locking scheme for new zspage migration.
So, this patch is to separate free_zspage from putback_zspage
and free zspage out of class->lock which is preparation for
zspage migration.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 833da8f4ffc9..9c0ab1e92e9b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -950,7 +950,8 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct size_class *class,
+   struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -973,6 +974,11 @@ static void free_zspage(struct page *first_page)
}
reset_page(head_extra);
__free_page(head_extra);
+
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   atomic_long_sub(class->pages_per_zspage,
+   &pool->pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1560,13 +1566,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(&class->lock);
obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   &pool->pages_allocated);
-   free_zspage(first_page);
-   }
+   if (fullness == ZS_EMPTY)
+   free_zspage(pool, class, first_page);
spin_unlock(&class->lock);
unpin_tag(handle);
 
@@ -1753,7 +1754,7 @@ static struct page *isolate_target_page(struct size_class 
*class)
  * @class: destination class
  * @first_page: target page
  *
- * Return @fist_page's fullness_group
+ * Return @first_page's updated fullness_group
  */
 static enum fullness_group putback_zspage(struct zs_pool *pool,
struct size_class *class,
@@ -1765,15 +1766,6 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   &pool->pages_allocated);
-
-   free_zspage(first_page);
-   }
-
return fullness;
 }
 
@@ -1836,23 +1828,31 @@ static void __zs_compact(struct zs_pool *pool, struct 
size_class *class)
if (!migrate_zspage(pool, class, &cc))
break;
 
-   putback_zspage(pool, class, dst_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
}
 
/* Stop if we couldn't find slot */
if (dst_page == NULL)
break;
 
-   putback_zspage(pool, class, dst_page);
-   if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
+   if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
-   spin_unlock(&class->lock);
+   spin_unlock(&class->lock);
+   free_zspage(pool, class, src_page);
+   } else {
+   spin_unlock(&class->lock);
+   }
+
cond_resched();
spin_lock(&class->lock);
}
 
if (src_page)
-   putback_zspage(pool, class, src_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   src_page) == ZS_EMPTY, src_page);
 
spin_unlock(&class->lock);
 }
-- 
1.9.1



[PATCH v2 03/18] zsmalloc: clean up many BUG_ON

2016-03-20 Thread Minchan Kim
There are many BUG_ON in zsmalloc.c which is not recommened so
change them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE
2. use VM_BUG_ON_PAGE if we need to see struct page's fields
3. use those assertion in primitive functions so higher functions
can rely on the assertion in the primitive function.
4. Don't use assertion if following instruction can trigger Oops

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 42 +++---
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b09a80d398c9..6a7b9313ee8c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -418,7 +418,7 @@ static void get_zspage_mapping(struct page *first_page,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
@@ -430,7 +430,7 @@ static void set_zspage_mapping(struct page *first_page,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
@@ -631,7 +631,8 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(first_page));
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
max_objects = first_page->objects;
@@ -659,7 +660,7 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -691,13 +692,13 @@ static void remove_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
 
head = &class->fullness_list[fullness];
-   BUG_ON(!*head);
+   VM_BUG_ON_PAGE(!*head, first_page);
if (list_empty(&(*head)->lru))
*head = NULL;
else if (*head == first_page)
@@ -724,8 +725,6 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
int class_idx;
enum fullness_group currfg, newfg;
 
-   BUG_ON(!is_first_page(first_page));
-
get_zspage_mapping(first_page, &class_idx, &currfg);
newfg = get_fullness_group(first_page);
if (newfg == currfg)
@@ -811,7 +810,7 @@ static void *location_to_obj(struct page *page, unsigned 
long obj_idx)
unsigned long obj;
 
if (!page) {
-   BUG_ON(obj_idx);
+   VM_BUG_ON(obj_idx);
return NULL;
}
 
@@ -844,7 +843,7 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
void *obj)
 {
if (class->huge) {
-   VM_BUG_ON(!is_first_page(page));
+   VM_BUG_ON_PAGE(!is_first_page(page), page);
return page_private(page);
} else
return *(unsigned long *)obj;
@@ -894,8 +893,8 @@ static void free_zspage(struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
-   BUG_ON(!is_first_page(first_page));
-   BUG_ON(first_page->inuse);
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   VM_BUG_ON_PAGE(first_page->inuse, first_page);
 
head_extra = (struct page *)page_private(first_page);
 
@@ -921,7 +920,8 @@ static void init_zspage(struct page *first_page, struct 
size_class *class)
unsigned long off = 0;
struct page *page = first_page;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
while (page) {
struct page *next_page;
struct link_free *link;
@@ -1238,7 +1238,7 @@ static bool can_merge(struct size_class *prev, int size, 
int pages_per_zspage)
 
 static bool zspage_full(struct page *first_page)
 {
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
return first_page->inuse == first_page->objects;
 }
@@ -1276,14 +1276,12 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
struct page *pages[2];
void *ret;
 
-   BUG_ON(!handle);
-
/*
 * Because we use per-cpu mapping areas shared a

[PATCH v2 01/18] mm: use put_page to free page instead of putback_lru_page

2016-03-20 Thread Minchan Kim
Procedure of page migration is as follows:

First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.

For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.

To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.

So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.

Comment from Vlastimil
"Yeah, and compaction (perhaps also other migration users) has to drain
the lru pvec... Getting rid of this stuff is worth even by itself."

Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Acked-by: Vlastimil Babka 
Signed-off-by: Minchan Kim 
---
 mm/migrate.c | 50 +++---
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 6c822a7b27e0..b65c84267ce0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -913,6 +913,14 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
put_anon_vma(anon_vma);
unlock_page(page);
 out:
+   /* If migration is scucessful, move newpage to right list */
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   if (unlikely(__is_movable_balloon_page(newpage)))
+   put_page(newpage);
+   else
+   putback_lru_page(newpage);
+   }
+
return rc;
 }
 
@@ -946,6 +954,12 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 
if (page_count(page) == 1) {
/* page was freed from under us. So we are done. */
+   ClearPageActive(page);
+   ClearPageUnevictable(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
goto out;
}
 
@@ -958,10 +972,8 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
}
 
rc = __unmap_and_move(page, newpage, force, mode);
-   if (rc == MIGRATEPAGE_SUCCESS) {
-   put_new_page = NULL;
+   if (rc == MIGRATEPAGE_SUCCESS)
set_page_owner_migrate_reason(newpage, reason);
-   }
 
 out:
if (rc != -EAGAIN) {
@@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   /* Soft-offlined page shouldn't go through lru cache list */
+   }
+
+   /*
+* If migration is successful, drop the reference grabbed during
+* isolation. Otherwise, restore the page to LRU list unless we
+* want to retry.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   put_page(page);
if (reason == MR_MEMORY_FAILURE) {
-   put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
-   } else
+   }
+   } else {
+   if (rc != -EAGAIN)
putback_lru_page(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
}
 
-   /*
-* If migration was not successful and there's a freeing callback, use
-* it.  Otherwise, putback_lru_page() will drop the reference grabbed
-* during isolation.
-*/
-   if (put_new_page)
-   put_new_page(newpage, private);
-   else if (unlikely(__is_movable_balloon_page(newpage))) {
-   /* drop our reference, page already in the balloon */
-   put_page(newpage);
-   } else
-   putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;
-- 
1.9.1



Re: [PATCH v2 3/3] Make core_pattern support namespace

2016-03-20 Thread Eric W. Biederman
Zhao Lei  writes:

> Currently, each container shared one copy of coredump setting
> with the host system, if host system changed the setting, each
> running containers will be affected.
>
> Moreover, it is not easy to let each container keeping their own
> coredump setting.
>
> We can use some workaround as pipe program to make the second
> requirement possible, but it is not simple, and both host and
> container are limited to set to fixed pipe program.
> In one word, for host running contailer, we can't change core_pattern
> anymore.
> To make the problem more hard, if a host running more than one
> container product, each product will try to snatch the global
> coredump setting to fit their own requirement.
>
> For container based on namespace design, it is good to allow
> each container keeping their own coredump setting.
>
> It will bring us following benefit:
> 1: Each container can change their own coredump setting
>based on operation on /proc/sys/kernel/core_pattern
> 2: Coredump setting changed in host will not affect
>running containers.
> 3: Support both case of "putting coredump in guest" and
>"putting curedump in host".
>
> Each namespace-based software(lxc, docker, ..) can use this function
> to custom their dump setting.
>
> And this function makes each continer working as separate system,
> it fit for design goal of namespace

There are a lot of questionable things with this patchset.

> @@ -183,7 +182,7 @@ put_exe_file:
>  static int format_corename(struct core_name *cn, struct coredump_params 
> *cprm)
>  {
>   const struct cred *cred = current_cred();
> - const char *pat_ptr = core_pattern;
> + const char *pat_ptr = 
> current->nsproxy->pid_ns_for_children->core_pattern;

current->nsproxy->pid_ns_for_children as the name implies is completely
inappropriate for getting the pid namespace of the current task.

This should use task_active_pid_namespace.

>   int ispipe = (*pat_ptr == '|');
>   int pid_in_pattern = 0;
>   int err = 0;
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 918b117..a5af1e9 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct pidmap {
> atomic_t nr_free;
> @@ -45,6 +46,7 @@ struct pid_namespace {
>   int hide_pid;
>   int reboot; /* group exit code if this pidns was rebooted */
>   struct ns_common ns;
> + char core_pattern[CORENAME_MAX_SIZE];
>  };
>  
>  extern struct pid_namespace init_pid_ns;
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 4d73a83..c79c1d5 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -83,6 +83,7 @@ struct pid_namespace init_pid_ns = {
>  #ifdef CONFIG_PID_NS
>   .ns.ops = &pidns_operations,
>  #endif
> + .core_pattern = "core",
>  };
>  EXPORT_SYMBOL_GPL(init_pid_ns);
>  
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index a65ba13..16d6d21 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -123,6 +123,9 @@ static struct pid_namespace *create_pid_namespace(struct 
> user_namespace *user_ns
>   for (i = 1; i < PIDMAP_ENTRIES; i++)
>   atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
>  
> + strncpy(ns->core_pattern, parent_pid_ns->core_pattern,
> + sizeof(ns->core_pattern));
> +

This is pretty horrible.  You are giving unprivileged processes the
ability to run an already specified core dump helper in a pid namespace
of their choosing.

That is not backwards compatible, and it is possible this can lead to
privilege escalation by triciking a privileged dump process to do
something silly because it is running in the wrong pid namespace.

Similarly the entire concept of forking from the program dumping core
suffers from the same problem but for all other namespaces.

I was hoping that I would see a justification somewhere in the patch
descriptions describing why this set of decisions could be safe.  I do
not and so I assume this case was not considered.

If you had managed to fork for the child_reaper of the pid_namespace
that set the core pattern (as has been suggested) there would be some
chance that things would work correctly.As you are forking from the
program actually dumping core I see no chance that this patchset is
either safe or backwards compatible as currently written.

Eric


[PATCH] firewire: nosy: Replace timeval with timespec64

2016-03-20 Thread Tina Ruchandani
'struct timeval' uses a 32 bit field for its 'seconds' value which
will overflow in year 2038 and beyond. This patch replaces the use
of timeval in nosy.c with timespec64 which doesn't suffer from y2038
issue. The code is correct as is - since it is only using the
microseconds portion of timeval. However, this patch does the
replacement as part of a larger effort to remove all instances of
'struct timeval' from the kernel (that would help identify cases
where the code is actually broken).

Signed-off-by: Tina Ruchandani 
---
 drivers/firewire/nosy.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/firewire/nosy.c b/drivers/firewire/nosy.c
index 8a46077..631c977 100644
--- a/drivers/firewire/nosy.c
+++ b/drivers/firewire/nosy.c
@@ -446,14 +446,16 @@ static void
 bus_reset_irq_handler(struct pcilynx *lynx)
 {
struct client *client;
-   struct timeval tv;
+   struct timespec64 ts64;
+   u32timestamp;

-   do_gettimeofday(&tv);
+   ktime_get_real_ts64(&ts64);
+   timestamp = ts64.tv_nsec / NSEC_PER_USEC;

spin_lock(&lynx->client_list_lock);

list_for_each_entry(client, &lynx->client_list, link)
-   packet_buffer_put(&client->buffer, &tv.tv_usec, 4);
+   packet_buffer_put(&client->buffer, ×tamp, 4);

spin_unlock(&lynx->client_list_lock);
 }
--
2.8.0.rc3.226.g39d4020



Re: [PATCH 1/4] vfs: add file_dentry()

2016-03-20 Thread Al Viro
On Thu, Mar 17, 2016 at 10:02:00AM +0100, Miklos Szeredi wrote:
> Add a new helper, file_dentry() [*], to get the filesystem's own dentry
> from the file.  This simply compares file_inode(file->f_path.dentry) to
> file_inode(file) and if they are equal returns file->f_path.dentry (this is
> the common, non-overlayfs case).
> 
> In the uncommon case (regular file on overlayfs) it will call into
> overlayfs's ->d_native_dentry() to get the underlying dentry matching
> file_inode(file).

What's wrong with making ovl_dentry_real() an instance of optional
->d_real() method and having a flag (DCACHE_OP_REAL) controlling its
calls?  With d_real(dentry) returning either that or dentry itself,
and file_dentry(file) being simply d_real(file->f_path.dentry)...

Why do we need to look at the inode at all?  d_set_d_op() dereferences
->d_op anyway, as well as setting ->d_flags, so there's no extra cost
there, and "test bit in ->d_flags + branch not taken" is all it would
cost in normal case...


Re: [PATCH 1/4] vfs: add file_dentry()

2016-03-20 Thread Al Viro
On Mon, Mar 21, 2016 at 01:02:15AM -0400, Theodore Ts'o wrote:

> I have this patch in the ext4.git tree, but I'd like to get an
> Acked-by from Al before I send a pull request to Linus.
> 
> Al?  Any objections to my sending in this change via the ext4 tree?
>   - Ted

FWIW, I would rather add DCACHE_OP_REAL (set at d_set_d_op()
time) and turned that into

static inline struct dentry *d_real(const struct dentry *dentry)
{
if (unlikely(dentry->d_flags & DCACHE_OP_NATIVE_DENTRY))
returd dentry->d_op->d_real(dentry);
else
return dentry;
}
static inline struct dentry *file_dentry(const struct file *file)
{
return d_real(file->f_path.dentry);
}

and used ovl_dentry_real as ->d_real for overlayfs.  Miklos, do you
see any problems with that variant?


Re: [PATCH v11 3/9] arm64: add copy_to/from_user to kprobes blacklist

2016-03-20 Thread Pratyush Anand
Hi James,

On 18/03/2016:06:12:20 PM, James Morse wrote:
> Hi Pratyush,
> 
> On 18/03/16 14:43, Pratyush Anand wrote:
> > On 18/03/2016:02:02:49 PM, James Morse wrote:
> >> In kernel/entry.S when entered from EL0 we test for TIF_SINGLESTEP in the
> >> thread_info flags, and use disable_step_tsk/enable_step_tsk to 
> >> save/restore the
> >> single-step state.
> >>
> >> Could we do this regardless of which EL we came from?
> > 
> > Thanks for another idea. I think, we can not do this as it is, because
> > TIF_SINGLESTEP will not be set for kprobe events.
> 
> Hmmm, I see kernel_enable_single_step() doesn't set it, but setup_singlestep()
> in patch 5 could...
> 
> There is probably a good reason its never set for a kernel thread, I will 
> have a
> look at where else it is used.
> 
> 
> > But, we can introduce a
> > variant disable_step_kernel and enable_step_kernel, which can be called in
> > el1_da. 
> 
> What about sp/pc misalignment, or undefined instructions?
> Or worse... an irq occurs during your el1_da call (el1_da may re-enable irqs).
> el1_irq doesn't know you were careful not to unmask debug exceptions, it 
> blindly
> turns them back on.
> 
> The problem is the 'single step me' bit is still set, save/restoring it will
> save us having to consider every interaction, (and then missing some!).
> 
> It would also mean you don't have to disable interrupts while single stepping 
> in
> patch 5 (comment above kprobes_save_local_irqflag()).

I see.
kernel_enable_single_step() is called from watchpoint and kgdb handler. It seems
to me that, similar issue may arise there as well. So, it would be a good idea
to set TIF_SINGLESTEP in kernel_enable_single_step() and clear in
kernel_disable_single_step().

Meanwhile, I prepared a test case to reproduce the issue without this patch.
Instrumented a kprobe at an instruction of __copy_to_user() which stores in user
space memory. I can see a sea of messages "Unexpected kernel single-step
exception at EL1" within few seconds.  While with patch[1] applied, I do not see
any such messages. 

May be I can send [1] as RFC and seek feedback.

~Pratyush

[1] 
https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652


[GIT PULL] MD for 4.6

2016-03-20 Thread Shaohua Li
Hi Linus,

Could you please pull the MD update for 4.6? This update mainly fixes bugs.
- A raid5 discard related fix from Jes
- A MD multipath bio clone fix from Ming
- raid1 error handling deadlock fix from Nate and corresponding raid10 fix from
  myself
- A raid5 stripe batch fix from Neil
- A patch from Sebastian to avoid unnecessary uevent
- Several cleanup/debug patches

Thanks,
Shaohua

The following changes since commit 6dc390ad61ac8dfca5fa9b0823981fb6f7ec17a0:

  Merge tag 'arc-4.5-rc6-fixes-upd' of 
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc (2016-02-24 14:06:17 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.6-rc1

for you to fetch changes up to 1d034e68e2c256640eb1f44bd7dcd89f90806ccf:

  md/raid5: Cleanup cpu hotplug notifier (2016-03-17 14:30:15 -0700)


Anna-Maria Gleixner (1):
  md/raid5: Cleanup cpu hotplug notifier

Eric Engestrom (1):
  md/bitmap: remove redundant check

Guoqing Jiang (3):
  md/raid1: remove unnecessary BUG_ON
  md/bitmap: remove redundant return in bitmap_checkpage
  md: fix typos for stipe

Jes Sorensen (1):
  md/raid5: Compare apples to apples (or sectors to sectors)

Ming Lei (1):
  md: multipath: don't hardcopy bio in .make_request path

Nate Dailey (1):
  raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang

NeilBrown (1):
  md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

Sebastian Parschauer (1):
  md: Drop sending a change uevent when stopping

Shaohua Li (6):
  RAID5: check_reshape() shouldn't call mddev_suspend
  RAID5: revert e9e4c377e2f563 to fix a livelock
  MD: warn for potential deadlock
  Update MD git tree URL
  md/raid5: output stripe state for debug
  raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang

 MAINTAINERS|  2 +-
 drivers/md/bitmap.c|  4 +---
 drivers/md/bitmap.h|  4 ++--
 drivers/md/md.c|  2 +-
 drivers/md/multipath.c |  4 +++-
 drivers/md/raid1.c |  8 ---
 drivers/md/raid10.c|  7 --
 drivers/md/raid5.c | 63 +-
 drivers/md/raid5.h |  4 +++-
 9 files changed, 58 insertions(+), 40 deletions(-)


Re: [PATCH 1/4] vfs: add file_dentry()

2016-03-20 Thread Theodore Ts'o
On Thu, Mar 17, 2016 at 10:02:00AM +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi 
> 
> This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make
> f_path always point to the overlay and f_inode to the underlay").
> 
> Regular files opened on overlayfs will result in the file being opened on
> the underlying filesystem, while f_path points to the overlayfs
> mount/dentry.
> 
> This confuses filesystems which get the dentry from struct file and assume
> it's theirs.
> 
> Add a new helper, file_dentry() [*], to get the filesystem's own dentry
> from the file.  This simply compares file_inode(file->f_path.dentry) to
> file_inode(file) and if they are equal returns file->f_path.dentry (this is
> the common, non-overlayfs case).
> 
> In the uncommon case (regular file on overlayfs) it will call into
> overlayfs's ->d_native_dentry() to get the underlying dentry matching
> file_inode(file).
> 
> [*] If possible, it's better simply to use file_inode() instead.
> 
> Signed-off-by: Miklos Szeredi 
> Tested-by: Goldwyn Rodrigues 
> Reviewed-by: Trond Myklebust 
> Cc:  # v4.2
> Cc: David Howells 
> Cc: Al Viro 
> Cc: Theodore Ts'o 
> Cc: Daniel Axtens 
> ---
>  fs/open.c  | 11 +++
>  fs/overlayfs/super.c   | 16 
>  include/linux/dcache.h |  1 +
>  include/linux/fs.h |  2 ++
>  4 files changed, 30 insertions(+)

I have this patch in the ext4.git tree, but I'd like to get an
Acked-by from Al before I send a pull request to Linus.

Al?  Any objections to my sending in this change via the ext4 tree?

- Ted

> 
> diff --git a/fs/open.c b/fs/open.c
> index 55bdc75e2172..6326c11eda78 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -831,6 +831,17 @@ char *file_path(struct file *filp, char *buf, int buflen)
>  }
>  EXPORT_SYMBOL(file_path);
>  
> +struct dentry *file_dentry(const struct file *file)
> +{
> + struct dentry *dentry = file->f_path.dentry;
> +
> + if (likely(d_inode(dentry) == file_inode(file)))
> + return dentry;
> + else
> + return dentry->d_op->d_native_dentry(dentry, file_inode(file));
> +}
> +EXPORT_SYMBOL(file_dentry);
> +
>  /**
>   * vfs_open - open the file at the given path
>   * @path: path to open
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 619ad4b016d2..5142aa2034c4 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -336,14 +336,30 @@ static int ovl_dentry_weak_revalidate(struct dentry 
> *dentry, unsigned int flags)
>   return ret;
>  }
>  
> +static struct dentry *ovl_d_native_dentry(struct dentry *dentry,
> +  struct inode *inode)
> +{
> + struct ovl_entry *oe = dentry->d_fsdata;
> + struct dentry *realentry = ovl_upperdentry_dereference(oe);
> +
> + if (realentry && inode == d_inode(realentry))
> + return realentry;
> + realentry = __ovl_dentry_lower(oe);
> + if (realentry && inode == d_inode(realentry))
> + return realentry;
> + BUG();
> +}
> +
>  static const struct dentry_operations ovl_dentry_operations = {
>   .d_release = ovl_dentry_release,
>   .d_select_inode = ovl_d_select_inode,
> + .d_native_dentry = ovl_d_native_dentry,
>  };
>  
>  static const struct dentry_operations ovl_reval_dentry_operations = {
>   .d_release = ovl_dentry_release,
>   .d_select_inode = ovl_d_select_inode,
> + .d_native_dentry = ovl_d_native_dentry,
>   .d_revalidate = ovl_dentry_revalidate,
>   .d_weak_revalidate = ovl_dentry_weak_revalidate,
>  };
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c4b5f4b3f8f8..99ecb6de636c 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -161,6 +161,7 @@ struct dentry_operations {
>   struct vfsmount *(*d_automount)(struct path *);
>   int (*d_manage)(struct dentry *, bool);
>   struct inode *(*d_select_inode)(struct dentry *, unsigned);
> + struct dentry *(*d_native_dentry)(struct dentry *, struct inode *);
>  } cacheline_aligned;
>  
>  /*
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ae681002100a..1091d9f43271 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1234,6 +1234,8 @@ static inline struct inode *file_inode(const struct 
> file *f)
>   return f->f_inode;
>  }
>  
> +extern struct dentry *file_dentry(const struct file *file);
> +
>  static inline int locks_lock_file_wait(struct file *filp, struct file_lock 
> *fl)
>  {
>   return locks_lock_inode_wait(file_inode(filp), fl);
> -- 
> 2.1.4
> 


Re: [PATCH v3] ARC: [dts] Introduce Timer bindings

2016-03-20 Thread Vineet Gupta
On Sunday 20 March 2016 06:12 AM, Rob Herring wrote:
> On Fri, Mar 18, 2016 at 10:56:29AM +0530, Vineet Gupta wrote:
>> ARC Timers have historically been probed directly.
>> As precursor to start probing Timers thru DT introduce these bindings
>> Note that to keep series bisectable, these bindings are not yet used in
>> code.
>>
>> Cc: Daniel Lezcano 
>> Cc: Rob Herring 
>> Cc: devicet...@vger.kernel.org
>> Signed-off-by: Vineet Gupta 
>> ---
>> v3:
>>  - Renamed Node name to avoid new warnings when unit address used w/o regs 
>> [Rob]
>> v2:
>>  - http://lists.infradead.org/pipermail/linux-snps-arc/2016-March/000653.html
>>  - snps,arc-timer[0-1] folded into single snps-arc-timer [Rob]
>>  - Node name in DT example fixed:[Rob]
>>  "timer1: timer_clksrc {" -> timer@1 {
>>  - Introduced 64bit RTC in skeleton_hs.dtsi  [Vineet]
>> v1:
>>  - 
>> http://lists.infradead.org/pipermail/linux-snps-arc/2016-February/000447.html
>> ---
>>  .../devicetree/bindings/timer/snps,arc-timer.txt   | 32 
>> ++
>>  .../devicetree/bindings/timer/snps,archs-gfrc.txt  | 14 ++
>>  .../devicetree/bindings/timer/snps,archs-rtc.txt   | 14 ++
>>  arch/arc/boot/dts/abilis_tb10x.dtsi| 14 ++
>>  arch/arc/boot/dts/skeleton.dtsi| 14 ++
>>  arch/arc/boot/dts/skeleton_hs.dtsi | 20 ++
>>  arch/arc/boot/dts/skeleton_hs_idu.dtsi | 14 ++
>>  7 files changed, 122 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/timer/snps,arc-timer.txt
>>  create mode 100644 
>> Documentation/devicetree/bindings/timer/snps,archs-gfrc.txt
>>  create mode 100644 
>> Documentation/devicetree/bindings/timer/snps,archs-rtc.txt
> Acked-by: Rob Herring 

Thx a bunch Rob !

-Vineet


[GIT PULL] ARC changes for 4.6-rc1

2016-03-20 Thread Vineet Gupta
Hi Linus,

ARC changes for 4.6-rc1. Nothing too exciting here although diffstat hows more
files touched than usual due to some sweeping defconfig / DT updates.

Please pull !

Thx,
-Vineet
->
The following changes since commit fc77dbd34c5c99bce46d40a2491937c3bcbd10af:

  Linux 4.5-rc6 (2016-02-28 08:41:20 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git/ tags/arc-4.6-rc1

for you to fetch changes up to deaf7565eb618a80534844300aeacffa14125182:

  ARCv2: ioremap: Support dynamic peripheral address space (2016-03-19 14:34:10 
+0530)


ARC updates for 4.6-rc1
- Big Endian io accessors fix [Lada]
- Spellos fixes [Adam]
- Fix for DW GMAC breakage [Alexey]
- Making DMA API 64-bit ready
- Shutting up -Wmaybe-uninitialized noise for ARC
- Other minor fixes here and there, comments update


Adam Buchbinder (1):
  ARC: Fix misspellings in comments.

Alexey Brodkin (1):
  ARC: [plat-axs10x] add Ethernet PHY description in .dts

Kefeng Wang (1):
  arc: use of_platform_default_populate() to populate default bus

Lada Trimasova (2):
  ARC: [BE] readl()/writel() to work in Big Endian CPU configuration
  arc: [plat-nsimosci*] use ezchip network driver

Vineet Gupta (16):
  ARC: bitops: Remove non relevant comments
  ARC: [BE] Select correct CROSS_COMPILE prefix
  ARC: [*defconfig] No need to specify CONFIG_CROSS_COMPILE
  ARCv2: Allow enabling PAE40 w/o HIGHMEM
  ARC: build: Better way to detect ISA compatible toolchain
  ARC: [plat-nsim] document ranges
  ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern
  ARCv2: LLSC: software backoff is NOT needed starting HS2.1c
  ARC: thp: unbork !CONFIG_TRANSPARENT_HUGEPAGE build
  ARC: build: Turn off -Wmaybe-uninitialized for ARC gcc 4.8
  ARC: dma: Use struct page based page allocator helpers
  ARC: dma: non-coherent pages need V-P mapping if in HIGHMEM
  ARC: dma: pass_phys() not sg_virt() to cache ops
  ARC: dma: ioremap: use phys_addr_t consistenctly in code paths
  ARC: dma: reintroduce platform specific dma<->phys
  ARCv2: ioremap: Support dynamic peripheral address space

 arch/arc/Kconfig   |  6 ++-
 arch/arc/Makefile  | 22 -
 arch/arc/boot/dts/axs10x_mb.dtsi   |  8 
 arch/arc/boot/dts/nsim_hs.dts  |  3 +-
 arch/arc/boot/dts/nsimosci.dts |  5 +-
 arch/arc/boot/dts/nsimosci_hs.dts  |  5 +-
 arch/arc/boot/dts/nsimosci_hs_idu.dts  |  5 +-
 arch/arc/configs/axs101_defconfig  |  1 -
 arch/arc/configs/axs103_defconfig  |  1 -
 arch/arc/configs/axs103_smp_defconfig  |  1 -
 arch/arc/configs/nsim_700_defconfig|  1 -
 arch/arc/configs/nsim_hs_defconfig |  1 -
 arch/arc/configs/nsim_hs_smp_defconfig |  1 -
 arch/arc/configs/nsimosci_defconfig|  2 +-
 arch/arc/configs/nsimosci_hs_defconfig |  2 +-
 arch/arc/configs/nsimosci_hs_smp_defconfig |  3 +-
 arch/arc/configs/tb10x_defconfig   |  1 -
 arch/arc/configs/vdk_hs38_defconfig|  1 -
 arch/arc/configs/vdk_hs38_smp_defconfig|  1 -
 arch/arc/include/asm/arcregs.h |  6 ---
 arch/arc/include/asm/bitops.h  | 15 --
 arch/arc/include/asm/cache.h   |  1 +
 arch/arc/include/asm/cacheflush.h  |  6 +--
 arch/arc/include/asm/cmpxchg.h |  2 +-
 arch/arc/include/asm/dma-mapping.h |  7 +++
 arch/arc/include/asm/entry-compact.h   |  2 +-
 arch/arc/include/asm/io.h  | 22 ++---
 arch/arc/include/asm/page.h| 19 +++-
 arch/arc/include/asm/pgtable.h | 11 ++---
 arch/arc/include/asm/tlbflush.h|  7 ++-
 arch/arc/kernel/setup.c| 21 ++---
 arch/arc/kernel/stacktrace.c   |  2 +-
 arch/arc/kernel/time.c |  4 +-
 arch/arc/mm/cache.c| 39 +---
 arch/arc/mm/dma.c  | 75 --
 arch/arc/mm/highmem.c  |  2 +-
 arch/arc/mm/ioremap.c  | 37 ++-
 arch/arc/mm/tlb.c  |  8 ++--
 38 files changed, 201 insertions(+), 155 deletions(-)


Re: Suspicious error for CMA stress test

2016-03-20 Thread Joonsoo Kim
On Fri, Mar 18, 2016 at 02:32:35PM +0100, Lucas Stach wrote:
> Hi Vlastimil, Joonsoo,
> 
> Am Freitag, den 18.03.2016, 00:52 +0900 schrieb Joonsoo Kim:
> > 2016-03-18 0:43 GMT+09:00 Vlastimil Babka :
> > > On 03/17/2016 10:24 AM, Hanjun Guo wrote:
> > >>
> > >> On 2016/3/17 14:54, Joonsoo Kim wrote:
> > >>>
> > >>> On Wed, Mar 16, 2016 at 05:44:28PM +0800, Hanjun Guo wrote:
> > 
> >  On 2016/3/14 15:18, Joonsoo Kim wrote:
> > >
> > > On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
> > >>
> > >> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
> > >>>
> > >>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
> > 
> >  On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
> > 
> >  How about something like this? Just and idea, probably buggy
> >  (off-by-one etc.).
> >  Should keep away cost from  >  expense of the
> >  relatively fewer >pageblock_order iterations.
> > >>>
> > >>> Hmm... I tested this and found that it's code size is a little bit
> > >>> larger than mine. I'm not sure why this happens exactly but I guess
> > >>> it would be
> > >>> related to compiler optimization. In this case, I'm in favor of my
> > >>> implementation because it looks like well abstraction. It adds one
> > >>> unlikely branch to the merge loop but compiler would optimize it to
> > >>> check it once.
> > >>
> > >> I would be surprised if compiler optimized that to check it once, as
> > >> order increases with each loop iteration. But maybe it's smart
> > >> enough to do something like I did by hand? Guess I'll check the
> > >> disassembly.
> > >
> > > Okay. I used following slightly optimized version and I need to
> > > add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
> > > to yours. Please consider it, too.
> > 
> >  Hmm, this one is not work, I still can see the bug is there after
> >  applying
> >  this patch, did I miss something?
> > >>>
> > >>> I may find that there is a bug which was introduced by me some time
> > >>> ago. Could you test following change in __free_one_page() on top of
> > >>> Vlastimil's patch?
> > >>>
> > >>> -page_idx = pfn & ((1 << max_order) - 1);
> > >>> +page_idx = pfn & ((1 << MAX_ORDER) - 1);
> > >>
> > >>
> > >> I tested Vlastimil's patch + your change with stress for more than half
> > >> hour, the bug
> > >> I reported is gone :)
> > >
> > >
> > > Oh, ok, will try to send proper patch, once I figure out what to write in
> > > the changelog :)
> > 
> > Thanks in advance!
> 
> After digging into the "PFN busy" race in CMA (see [1]), I believe we
> should just prevent any buddy merging in isolated ranges. This fixes the
> race I'm seeing without the need to hold the zone lock for extend
> periods of time.

"PFNs busy" can be caused by other type of race, too. I guess that
other cases happens more than buddy merging. Do you have any test case for
your problem?

If it is indeed a problem, you can avoid it with simple retry
MAX_ORDER times on alloc_contig_range(). This is a rather dirty but
the reason I suggest it is that there are other type of race in
__alloc_contig_range() and retry could help them, too. For example,
if some of pages in the requested range isn't attached to the LRU yet
or detached from the LRU but not freed to buddy,
test_pages_isolated() can be failed.

Thanks.


RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

2016-03-20 Thread Rajesh Bhagat


> -Original Message-
> From: Alan Stern [mailto:st...@rowland.harvard.edu]
> Sent: Friday, March 18, 2016 7:51 PM
> To: Rajesh Bhagat 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> gre...@linuxfoundation.org; mathias.ny...@intel.com; Sriram Dash
> 
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On Fri, 18 Mar 2016, Rajesh Bhagat wrote:
> 
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -2897,10 +2897,14 @@ done:
> > /* The xHC may think the device is already reset,
> >  * so ignore the status.
> >  */
> > -   if (hcd->driver->reset_device)
> > -   hcd->driver->reset_device(hcd, udev);
> > -
> > -   usb_set_device_state(udev, USB_STATE_DEFAULT);
> > +   if (hcd->driver->reset_device) {
> > +   status = hcd->driver->reset_device(hcd, udev);
> > +   if (status == 0)
> > +   usb_set_device_state(udev, 
> > USB_STATE_DEFAULT);
> > +   else
> > +   usb_set_device_state(udev,
> USB_STATE_NOTATTACHED);
> > +   } else
> > +   usb_set_device_state(udev, USB_STATE_DEFAULT);
> 
> This is a really bad patch:
> 
> You left in the comment about ignoring the status, but then you changed the 
> code so
> that it doesn't ignore the status!
> 

My Apologies, I completely missed the above comment which was added before. 

> You also called usb_set_device_state() more times than necessary.  You could 
> have
> done it like this:
> 
>   if (hcd->driver->reset_device)
>   status = hcd->driver->reset_device(hcd, udev);
>   if (status == 0)
>   usb_set_device_state(udev, USB_STATE_DEFAULT);
>   else
>   usb_set_device_state(udev, 
> USB_STATE_NOTATTACHED);
> 
> (Even that could be simplified further, by combining it with the code that 
> follows.)
> 
> Finally, you violated the 80-column limit.
> 

I agree to your point. Actually the intent was to check the return status of 
reset_device which 
is currently being ignored. I encountered the reset_device failure case in 
resume operation (STR) 
which is increasing the time of resume and causing unexpected crashes if return 
value is not checked. 

Do you agree it should be checked here? If yes, I can rework this patch. 

> Alan Stern



Re: [PATCHv4 08/25] thp: support file pages in zap_huge_pmd()

2016-03-20 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:

> [ text/plain ]
> On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov"  writes:
>> 
>> > [ text/plain ]
>> > split_huge_pmd() for file mappings (and DAX too) is implemented by just
>> > clearing pmd entry as we can re-fill this area from page cache on pte
>> > level later.
>> >
>> > This means we don't need deposit page tables when file THP is mapped.
>> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
>> > file THP PMD.
>> 
>> Archs like ppc64 use deposited page table to track the hardware page
>> table slot information. We probably may want to add hooks which arch can
>> use to achieve the same even with file THP 
>
> Could you describe more on what kind of information you're talking about?
>

Hardware page table in ppc64 requires us to map each subpage of the huge
page. This is needed because at low level we use segment base page size
to find the hash slot and on TLB miss, we use the faulting address and
base page size (which is 64k even with THP) to find whether we have
the page mapped in hash page table. Since we use base page size of 64K,
we need to make sure that subpages are mapped (on demand) in hash page
table. If we have then mapped we also need to track their hash table
slot information so that we can clear it on invalidate of hugepage.

With THP we used the deposited page table to store the hash slot
information.

-aneesh



RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

2016-03-20 Thread Rajesh Bhagat


> -Original Message-
> From: Mathias Nyman [mailto:mathias.ny...@linux.intel.com]
> Sent: Friday, March 18, 2016 4:51 PM
> To: Rajesh Bhagat ; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Cc: gre...@linuxfoundation.org; mathias.ny...@intel.com; Sriram Dash
> 
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On 18.03.2016 09:01, Rajesh Bhagat wrote:
> > We are facing issue while performing the system resume operation from
> > STR where XHCI is going to indefinite hang/sleep state due to
> > wait_for_completion API called in function xhci_alloc_dev for command
> > TRB_ENABLE_SLOT which never completes.
> >
> > Now, xhci_handle_command_timeout function is called and prints
> > "Command timeout" message but never calls complete API for above
> > TRB_ENABLE_SLOT command as xhci_abort_cmd_ring is successful.
> >
> > Solution to above problem is:
> > 1. calling xhci_cleanup_command_queue API even if xhci_abort_cmd_ring
> > is successful or not.
> > 2. checking the status of reset_device in usb core code.
> 
> 
> Hi
> 
> I think clearing the whole command ring is a bit too much in this case.
> It may cause issues for all attached devices when one command times out.
> 


Hi Mathias, 

I understand your point, But I want to understand how would completion handler 
be called 
if a command is timed out and xhci_abort_cmd_ring is successful. In this case 
all the code 
would be waiting on completion handler forever. 


> We need to look in more detail why we fail to call completion for that one 
> aborted
> command.
> 

I checked the below code, Please correct me if I am wrong

code waiting on wait_for_completion: 
int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
{
...
ret = xhci_queue_slot_control(xhci, command, TRB_ENABLE_SLOT, 0);
...

wait_for_completion(command->completion); <=== waiting for command to 
complete 


code calling completion handler:
1. handle_cmd_completion -> xhci_complete_del_and_free_cmd
2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> 
xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd

In our case command is timed out, Hence we hit the case #2 but 
xhci_abort_cmd_ring is success which 
does not calls complete. 


> The bigger question is why the timeout happens in the first place?
> 

We are doing suspend resume operation, It might be controller issue :(, IMO 
software should not 
hang/stop if hardware is not behaving correct. 

> What kernel version, and what xhci vendor was this triggered on?
> 

We are using 4.1.8 kernel

> It's possible that the timeout is related either to the locking issue found 
> by Chris
> Bainbridge:
> http://marc.info/?l=linux-usb&m=145493945408601&w=2
> 
> or the resume issues in this thread, (see full thread)
> http://marc.info/?l=linux-usb&m=145477850706552&w=2
> 
> Does any of those proposed solutions fix the command timeout for you?
> 

I will check the above patches and share status.

> -Mathias


Re: [PATCH] sparc: Convert naked unsigned uses to unsigned int

2016-03-20 Thread David Miller
From: Joe Perches 
Date: Thu, 10 Mar 2016 15:21:43 -0800

> Use the more normal kernel definition/declaration style.
> 
> Done via:
> 
> $ git ls-files arch/sparc | \
>   xargs ./scripts/checkpatch.pl -f --fix-inplace --types=unspecified_int
> 
> Signed-off-by: Joe Perches 

Applied.


linux-next: Tree for Mar 21

2016-03-20 Thread Stephen Rothwell
Hi all,

Please do not add any v4.7 related material to your linux-next included
trees until after v4.6-rc1 is released.

Changes since 20160318:

The ext4 tree gained a conflict against Linus' tree.

The drm tree still had its build failure for which I applied a fix patch.

Non-merge commits (relative to Linus' tree): 3025
 2386 files changed, 114129 insertions(+), 49973 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 231 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (46e595a17dcf Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging fixes/master (36f90b0a2ddd Linux 4.5-rc2)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (deaf7565eb61 ARCv2: ioremap: Support dynamic 
peripheral address space)
Merging arm-current/fixes (f474c8c857d9 ARM: 8544/1: set_memory_xx fixes)
Merging m68k-current/for-linus (efbec135f11d m68k: Fix misspellings in 
comments.)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b562e44f507e Linux 4.5)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (142b9e6c9de0 x86/kallsyms: fix GOLD link failure with new 
relative kallsyms table format)
Merging net/master (1c191307af5f Revert "lan78xx: add ndo_get_stats64")
Merging ipsec/master (215276c0147e xfrm: Reset encapsulation field of the skb 
before transformation)
Merging ipvs/master (7617a24f83b5 ipvs: correct initial offset of Call-ID 
header search in SIP persistence engine)
Merging wireless-drivers/master (10da848f67a7 ssb: host_soc depends on sprom)
Merging mac80211/master (ad8ec957f693 wext: unregister_pernet_subsys() on 
notifier registration failure)
Merging sound-current/for-linus (0ef21100ae91 ALSA: usb-audio: add Microsoft 
HD-5001 to quirks)
Merging pci-current/for-linus (54c6e2dd00c3 PCI: Allow a NULL "parent" pointer 
in pci_bus_assign_domain_nr())
Merging driver-core.current/driver-core-linus (18558cae0272 Linux 4.5-rc4)
Merging tty.current/tty-linus (18558cae0272 Linux 4.5-rc4)
Merging usb.current/usb-linus (6b5f04b6cf8e Merge branch 'for-4.6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup)
Merging usb-gadget-fixes/fixes (3b2435192fe9 MAINTAINERS: drop OMAP USB and 
MUSB maintainership)
Merging usb-serial-fixes/usb-linus (f6cede5b49e8 Linux 4.5-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (1200b6809dfd Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next)
Merging char-misc.current/char-misc-linus (5cd0911a9e0e Merge tag 
'please-pull-pstore' of 
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux)
Merging input-current/for-linus (4d2508a55990 ARM: pxa/raumfeld: use 
PROPERTY_ENTRY_INTEGER to define props)
Merging crypto-current/master (dfe97ad30e8c crypto: marvell/cesa - forward 
devm_ioremap_resource() error code)
Merging ide/master (0d7ef45cdeeb ide: palm_bk3710: test clock rate to avoid 
division by 0)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs module insertion race.)
Me

[PATCH v3 07/23] ncr5380: Remove BOARD_REQUIRES_NO_DELAY macro

2016-03-20 Thread Finn Thain
The io_recovery_delay macro is intended to insert a microsecond delay
between the chip register accesses that begin a DMA operation. This
is reportedly needed for some ISA boards.

Reverse the sense of the macro test so that in the common case,
where no delay is required, drivers need not define the macro.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c   |   18 --
 drivers/scsi/dtc.h   |2 ++
 drivers/scsi/g_NCR5380.h |2 ++
 drivers/scsi/t128.h  |2 ++
 4 files changed, 14 insertions(+), 10 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:16.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:20.0 +1100
@@ -39,12 +39,6 @@
  * tagged queueing)
  */
 
-#ifdef BOARD_REQUIRES_NO_DELAY
-#define io_recovery_delay(x)
-#else
-#define io_recovery_delay(x)   udelay(x)
-#endif
-
 /*
  * Design
  *
@@ -150,6 +144,10 @@
  * possible) function may be used.
  */
 
+#ifndef NCR5380_io_delay
+#define NCR5380_io_delay(x)
+#endif
+
 static int do_abort(struct Scsi_Host *);
 static void do_reset(struct Scsi_Host *);
 
@@ -1468,14 +1466,14 @@ static int NCR5380_transfer_dma(struct S
 */
 
if (p & SR_IO) {
-   io_recovery_delay(1);
+   NCR5380_io_delay(1);
NCR5380_write(START_DMA_INITIATOR_RECEIVE_REG, 0);
} else {
-   io_recovery_delay(1);
+   NCR5380_io_delay(1);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | 
ICR_ASSERT_DATA);
-   io_recovery_delay(1);
+   NCR5380_io_delay(1);
NCR5380_write(START_DMA_SEND_REG, 0);
-   io_recovery_delay(1);
+   NCR5380_io_delay(1);
}
 
 /*
Index: linux/drivers/scsi/dtc.h
===
--- linux.orig/drivers/scsi/dtc.h   2016-03-21 13:31:16.0 +1100
+++ linux/drivers/scsi/dtc.h2016-03-21 13:31:20.0 +1100
@@ -28,6 +28,8 @@
 #define NCR5380_bus_reset  dtc_bus_reset
 #define NCR5380_info   dtc_info
 
+#define NCR5380_io_delay(x)udelay(x)
+
 /* 15 12 11 10
1001 1100   */
 
Index: linux/drivers/scsi/g_NCR5380.h
===
--- linux.orig/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:16.0 +1100
+++ linux/drivers/scsi/g_NCR5380.h  2016-03-21 13:31:20.0 +1100
@@ -71,6 +71,8 @@
 #define NCR5380_pwrite generic_NCR5380_pwrite
 #define NCR5380_info generic_NCR5380_info
 
+#define NCR5380_io_delay(x)udelay(x)
+
 #define BOARD_NCR5380  0
 #define BOARD_NCR53C4001
 #define BOARD_NCR53C400A 2
Index: linux/drivers/scsi/t128.h
===
--- linux.orig/drivers/scsi/t128.h  2016-03-21 13:31:16.0 +1100
+++ linux/drivers/scsi/t128.h   2016-03-21 13:31:20.0 +1100
@@ -84,6 +84,8 @@
 #define NCR5380_bus_reset t128_bus_reset
 #define NCR5380_info t128_info
 
+#define NCR5380_io_delay(x)udelay(x)
+
 /* 15 14 12 10 7 5 3
1101 0100 1010 1000 */
 




[PATCH v3 10/23] ncr5380: Merge DMA implementation from atari_NCR5380 core driver

2016-03-20 Thread Finn Thain
Adopt the DMA implementation from atari_NCR5380.c. This means that
atari_scsi and sun3_scsi can make use of the NCR5380.c core driver
and the atari_NCR5380.c driver fork can be made redundant.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c  |  170 +++-
 drivers/scsi/arm/cumana_1.c |3 
 drivers/scsi/arm/oak.c  |3 
 drivers/scsi/dmx3191d.c |1 
 drivers/scsi/dtc.c  |2 
 drivers/scsi/dtc.h  |1 
 drivers/scsi/g_NCR5380.c|2 
 drivers/scsi/g_NCR5380.h|1 
 drivers/scsi/mac_scsi.c |3 
 drivers/scsi/pas16.c|2 
 drivers/scsi/pas16.h|1 
 drivers/scsi/t128.c |2 
 drivers/scsi/t128.h |1 
 13 files changed, 152 insertions(+), 40 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:25.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:27.0 +1100
@@ -31,9 +31,6 @@
 
 /*
  * Further development / testing that should be done :
- * 1.  Cleanup the NCR5380_transfer_dma function and DMA operation complete
- * code so that everything does the same thing that's done at the
- * end of a pseudo-DMA read operation.
  *
  * 4.  Test SCSI-II tagged queueing (I have no devices which support
  * tagged queueing)
@@ -117,6 +114,8 @@
  *
  * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases.
  *
+ * REAL_DMA - if defined, REAL DMA is used during the data transfer phases.
+ *
  * These macros MUST be defined :
  *
  * NCR5380_read(register)  - read from the specified register
@@ -801,6 +800,72 @@ static void NCR5380_main(struct work_str
} while (!done);
 }
 
+/*
+ * NCR5380_dma_complete - finish DMA transfer
+ * @instance: the scsi host instance
+ *
+ * Called by the interrupt handler when DMA finishes or a phase
+ * mismatch occurs (which would end the DMA transfer).
+ */
+
+static void NCR5380_dma_complete(struct Scsi_Host *instance)
+{
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
+   int transferred;
+   unsigned char **data;
+   int *count;
+   int saved_data = 0, overrun = 0;
+   unsigned char p;
+
+   if (hostdata->read_overruns) {
+   p = hostdata->connected->SCp.phase;
+   if (p & SR_IO) {
+   udelay(10);
+   if ((NCR5380_read(BUS_AND_STATUS_REG) &
+(BASR_PHASE_MATCH | BASR_ACK)) ==
+   (BASR_PHASE_MATCH | BASR_ACK)) {
+   saved_data = NCR5380_read(INPUT_DATA_REG);
+   overrun = 1;
+   dsprintk(NDEBUG_DMA, instance, "read overrun 
handled\n");
+   }
+   }
+   }
+
+   NCR5380_write(MODE_REG, MR_BASE);
+   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+   NCR5380_read(RESET_PARITY_INTERRUPT_REG);
+
+   transferred = hostdata->dma_len - NCR5380_dma_residual(instance);
+   hostdata->dma_len = 0;
+
+   data = (unsigned char **)&hostdata->connected->SCp.ptr;
+   count = &hostdata->connected->SCp.this_residual;
+   *data += transferred;
+   *count -= transferred;
+
+   if (hostdata->read_overruns) {
+   int cnt, toPIO;
+
+   if ((NCR5380_read(STATUS_REG) & PHASE_MASK) == p && (p & 
SR_IO)) {
+   cnt = toPIO = hostdata->read_overruns;
+   if (overrun) {
+   dsprintk(NDEBUG_DMA, instance,
+"Got an input overrun, using saved 
byte\n");
+   *(*data)++ = saved_data;
+   (*count)--;
+   cnt--;
+   toPIO--;
+   }
+   if (toPIO > 0) {
+   dsprintk(NDEBUG_DMA, instance,
+"Doing %d byte PIO to 0x%p\n", cnt, 
*data);
+   NCR5380_transfer_pio(instance, &p, &cnt, data);
+   *count -= toPIO - cnt;
+   }
+   }
+   }
+}
+
 #ifndef DONT_USE_INTR
 
 /**
@@ -855,7 +920,22 @@ static irqreturn_t NCR5380_intr(int irq,
dsprintk(NDEBUG_INTR, instance, "IRQ %d, BASR 0x%02x, SR 
0x%02x, MR 0x%02x\n",
 irq, basr, sr, mr);
 
-   if ((NCR5380_read(CURRENT_SCSI_DATA_REG) & hostdata->id_mask) &&
+   if ((mr & MR_DMA_MODE) || (mr & MR_MONITOR_BSY)) {
+   /* Probably End of DMA, Phase Mismatch or Loss of BSY.
+* We ack IRQ after clearing Mode Register. Workarounds
+* for End of DMA e

[PATCH v3 09/23] ncr5380: Adopt uniform DMA setup convention

2016-03-20 Thread Finn Thain
Standardize the DMA setup hooks so that the DMA implementation in
atari_NCR5380.c can be reconciled with pseudo DMA implementation in
NCR5380.c.

Calls to NCR5380_dma_recv_setup() and NCR5380_dma_send_setup() return
a negative value on failure, zero on PDMA transfer success and a positive
byte count for DMA setup success.

This convention is not entirely new, but is now applied consistently.

Also remove a pointless Status Register access: the *phase assignment is
redundant because after NCR5380_transfer_dma() returns control to
NCR5380_information_transfer(), that routine then returns control
to NCR5380_main(), which means *phase is dead.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c  |   21 ++---
 drivers/scsi/arm/cumana_1.c |   10 --
 drivers/scsi/arm/oak.c  |4 ++--
 drivers/scsi/atari_scsi.c   |3 ---
 4 files changed, 20 insertions(+), 18 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:21.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:25.0 +1100
@@ -1431,7 +1431,7 @@ static int NCR5380_transfer_dma(struct S
register unsigned char p = *phase;
register unsigned char *d = *data;
unsigned char tmp;
-   int foo;
+   int result;
 
if ((tmp = (NCR5380_read(STATUS_REG) & PHASE_MASK)) != p) {
*phase = tmp;
@@ -1505,9 +1505,9 @@ static int NCR5380_transfer_dma(struct S
  */
 
if (p & SR_IO) {
-   foo = NCR5380_dma_recv_setup(instance, d,
+   result = NCR5380_dma_recv_setup(instance, d,
hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c);
-   if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
+   if (!result && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
 * The workaround was to transfer fewer bytes than we
 * intended to with the pseudo-DMA read function, wait 
for
@@ -1525,19 +1525,19 @@ static int NCR5380_transfer_dma(struct S
 
if (NCR5380_poll_politely(instance, BUS_AND_STATUS_REG,
  BASR_DRQ, BASR_DRQ, HZ) < 0) {
-   foo = -1;
+   result = -1;
shost_printk(KERN_ERR, instance, "PDMA read: 
DRQ timeout\n");
}
if (NCR5380_poll_politely(instance, STATUS_REG,
  SR_REQ, 0, HZ) < 0) {
-   foo = -1;
+   result = -1;
shost_printk(KERN_ERR, instance, "PDMA read: 
!REQ timeout\n");
}
d[c - 1] = NCR5380_read(INPUT_DATA_REG);
}
} else {
-   foo = NCR5380_dma_send_setup(instance, d, c);
-   if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
+   result = NCR5380_dma_send_setup(instance, d, c);
+   if (!result && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
 * Wait for the last byte to be sent.  If REQ is being 
asserted for
 * the byte we're interested, we'll ACK it and it will 
go false.
@@ -1545,7 +1545,7 @@ static int NCR5380_transfer_dma(struct S
if (NCR5380_poll_politely2(instance,
 BUS_AND_STATUS_REG, BASR_DRQ, BASR_DRQ,
 BUS_AND_STATUS_REG, BASR_PHASE_MATCH, 0, HZ) < 0) {
-   foo = -1;
+   result = -1;
shost_printk(KERN_ERR, instance, "PDMA write: 
DRQ and phase timeout\n");
}
}
@@ -1555,8 +1555,7 @@ static int NCR5380_transfer_dma(struct S
NCR5380_read(RESET_PARITY_INTERRUPT_REG);
*data = d + c;
*count = 0;
-   *phase = NCR5380_read(STATUS_REG) & PHASE_MASK;
-   return foo;
+   return result;
 }
 
 /*
@@ -1652,7 +1651,7 @@ static void NCR5380_information_transfer
if (!cmd->device->borken)
transfersize = 
NCR5380_dma_xfer_len(instance, cmd, phase);
 
-   if (transfersize) {
+   if (transfersize > 0) {
len = transfersize;
if (NCR5380_transfer_dma(instance, 
&phase,
&len, (unsigned char 
**)&cmd->SCp.ptr)) {
Index: linux/drivers/scsi/arm/cumana_1.c
===
--- linux.orig/drivers/scsi/arm/cuma

[PATCH v3 14/23] ncr5380: Reduce max_lun limit

2016-03-20 Thread Finn Thain
The driver has a limit of eight LUs because of the byte-sized bitfield
that is used for busy flags. That means the maximum LUN is 7. The default
is 8.

Signed-off-by: Finn Thain 
Tested-by: Michael Schmitz 

---

Changed since v1:
- Reduce shost->max_lun limit instead of adding 'MAX_LUN' limit.

---
 drivers/scsi/NCR5380.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:33.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:36.0 +1100
@@ -488,6 +488,8 @@ static int NCR5380_init(struct Scsi_Host
int i;
unsigned long deadline;
 
+   instance->max_lun = 7;
+
hostdata->host = instance;
hostdata->id_mask = 1 << instance->this_id;
hostdata->id_higher_mask = 0;




[PATCH v3 03/23] ncr5380: Remove REAL_DMA and REAL_DMA_POLL macros

2016-03-20 Thread Finn Thain
For the NCR5380.c core driver, these macros are never used.
If REAL_DMA were to be defined, compilation would fail.

For the atari_NCR5380.c core driver, REAL_DMA is always defined.

Hence these macros are pointless.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c   |  218 +--
 drivers/scsi/NCR5380.h   |  112 --
 drivers/scsi/atari_NCR5380.c |   62 +---
 drivers/scsi/atari_scsi.c|   32 --
 drivers/scsi/sun3_scsi.c |   13 --
 5 files changed, 22 insertions(+), 415 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:09.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:10.0 +1100
@@ -35,18 +35,10 @@
  * code so that everything does the same thing that's done at the
  * end of a pseudo-DMA read operation.
  *
- * 2.  Fix REAL_DMA (interrupt driven, polled works fine) -
- * basically, transfer size needs to be reduced by one
- * and the last byte read as is done with PSEUDO_DMA.
- *
  * 4.  Test SCSI-II tagged queueing (I have no devices which support
  * tagged queueing)
  */
 
-#ifndef notyet
-#undef REAL_DMA
-#endif
-
 #ifdef BOARD_REQUIRES_NO_DELAY
 #define io_recovery_delay(x)
 #else
@@ -131,12 +123,6 @@
  *
  * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases.
  *
- * REAL_DMA - if defined, REAL DMA is used during the data transfer phases.
- *
- * REAL_DMA_POLL - if defined, REAL DMA is used but the driver doesn't
- * rely on phase mismatch and EOP interrupts to determine end
- * of phase.
- *
  * These macros MUST be defined :
  *
  * NCR5380_read(register)  - read from the specified register
@@ -147,15 +133,9 @@
  * specific implementation of the NCR5380
  *
  * Either real DMA *or* pseudo DMA may be implemented
- * REAL functions :
- * NCR5380_REAL_DMA should be defined if real DMA is to be used.
  * Note that the DMA setup functions should return the number of bytes
  * that they were able to program the controller for.
  *
- * Also note that generic i386/PC versions of these macros are
- * available as NCR5380_i386_dma_write_setup,
- * NCR5380_i386_dma_read_setup, and NCR5380_i386_dma_residual.
- *
  * NCR5380_dma_write_setup(instance, src, count) - initialize
  * NCR5380_dma_read_setup(instance, dst, count) - initialize
  * NCR5380_dma_residual(instance); - residual count
@@ -486,12 +466,6 @@ static void prepare_info(struct Scsi_Hos
 #ifdef DIFFERENTIAL
 "DIFFERENTIAL "
 #endif
-#ifdef REAL_DMA
-"REAL_DMA "
-#endif
-#ifdef REAL_DMA_POLL
-"REAL_DMA_POLL "
-#endif
 #ifdef PARITY
 "PARITY "
 #endif
@@ -551,9 +525,8 @@ static int NCR5380_init(struct Scsi_Host
hostdata->id_higher_mask |= i;
for (i = 0; i < 8; ++i)
hostdata->busy[i] = 0;
-#ifdef REAL_DMA
-   hostdata->dmalen = 0;
-#endif
+   hostdata->dma_len = 0;
+
spin_lock_init(&hostdata->lock);
hostdata->connected = NULL;
hostdata->sensing = NULL;
@@ -850,11 +823,7 @@ static void NCR5380_main(struct work_str
requeue_cmd(instance, cmd);
}
}
-   if (hostdata->connected
-#ifdef REAL_DMA
-   && !hostdata->dmalen
-#endif
-   ) {
+   if (hostdata->connected && !hostdata->dma_len) {
dsprintk(NDEBUG_MAIN, instance, "main: performing 
information transfer\n");
NCR5380_information_transfer(instance);
done = 0;
@@ -919,34 +888,6 @@ static irqreturn_t NCR5380_intr(int irq,
dsprintk(NDEBUG_INTR, instance, "IRQ %d, BASR 0x%02x, SR 
0x%02x, MR 0x%02x\n",
 irq, basr, sr, mr);
 
-#if defined(REAL_DMA)
-   if ((mr & MR_DMA_MODE) || (mr & MR_MONITOR_BSY)) {
-   /* Probably End of DMA, Phase Mismatch or Loss of BSY.
-* We ack IRQ after clearing Mode Register. Workarounds
-* for End of DMA errata need to happen in DMA Mode.
-*/
-
-   dsprintk(NDEBUG_INTR, instance, "interrupt in DMA 
mode\n");
-
-   int transferred;
-
-   if (!hostdata->connected)
-   panic("scsi%d : DMA interrupt with no connected 
cmd\n",
- instance->hostno);
-
-   transferred = hostdata->dmalen - 
NCR5380_dma_residual(instance);
-   hostdata->connected->SCp.this_residual -= transferred;
-   hostdata->connected->SCp.ptr += transferred;
-   hostdata->dmalen = 0;
-
-   /* FIXME

[PATCH v3 12/23] sun3_scsi: Adopt NCR5380.c core driver

2016-03-20 Thread Finn Thain
Add support for the custom Sun 3 DMA logic to the NCR5380.c core driver.
This code is copied from atari_NCR5380.c.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---

The Sun 3 DMA code is still configured by macros. I have simplified things
slightly but I have avoided more ambitious refactoring. It's not clear to
me what that should look like and I can't test sun3_scsi anyway. At least
this permits the removal of atari_NCR5380.c.

---
 drivers/scsi/NCR5380.c   |  131 +++
 drivers/scsi/sun3_scsi.c |8 +-
 2 files changed, 124 insertions(+), 15 deletions(-)

Index: linux/drivers/scsi/sun3_scsi.c
===
--- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:13.0 +1100
+++ linux/drivers/scsi/sun3_scsi.c  2016-03-21 13:31:32.0 +1100
@@ -54,10 +54,8 @@
 #define NCR5380_abort   sun3scsi_abort
 #define NCR5380_infosun3scsi_info
 
-#define NCR5380_dma_read_setup(instance, data, count) \
-sun3scsi_dma_setup(instance, data, count, 0)
-#define NCR5380_dma_write_setup(instance, data, count) \
-sun3scsi_dma_setup(instance, data, count, 1)
+#define NCR5380_dma_recv_setup(instance, data, count) (count)
+#define NCR5380_dma_send_setup(instance, data, count) (count)
 #define NCR5380_dma_residual(instance) \
 sun3scsi_dma_residual(instance)
 #define NCR5380_dma_xfer_len(instance, cmd, phase) \
@@ -406,7 +404,7 @@ static int sun3scsi_dma_finish(int write
 
 }

-#include "atari_NCR5380.c"
+#include "NCR5380.c"
 
 #ifdef SUN3_SCSI_VME
 #define SUN3_SCSI_NAME  "Sun3 NCR5380 VME SCSI"
Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:31.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:32.0 +1100
@@ -31,6 +31,8 @@
 
 /* Ported to Atari by Roman Hodek and others. */
 
+/* Adapted for the Sun 3 by Sam Creasey. */
+
 /*
  * Further development / testing that should be done :
  *
@@ -858,6 +860,23 @@ static void NCR5380_dma_complete(struct
}
}
 
+#ifdef CONFIG_SUN3
+   if ((sun3scsi_dma_finish(rq_data_dir(hostdata->connected->request {
+   pr_err("scsi%d: overrun in UDC counter -- not prepared to deal 
with this!\n",
+  instance->host_no);
+   BUG();
+   }
+
+   if ((NCR5380_read(BUS_AND_STATUS_REG) & (BASR_PHASE_MATCH | BASR_ACK)) 
==
+   (BASR_PHASE_MATCH | BASR_ACK)) {
+   pr_err("scsi%d: BASR %02x\n", instance->host_no,
+  NCR5380_read(BUS_AND_STATUS_REG));
+   pr_err("scsi%d: bus stuck in data phase -- probably a single 
byte overrun!\n",
+  instance->host_no);
+   BUG();
+   }
+#endif
+
NCR5380_write(MODE_REG, MR_BASE);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
NCR5380_read(RESET_PARITY_INTERRUPT_REG);
@@ -981,10 +1000,16 @@ static irqreturn_t NCR5380_intr(int irq,
NCR5380_read(RESET_PARITY_INTERRUPT_REG);
 
dsprintk(NDEBUG_INTR, instance, "unknown interrupt\n");
+#ifdef SUN3_SCSI_VME
+   dregs->csr |= CSR_DMA_ENABLE;
+#endif
}
handled = 1;
} else {
shost_printk(KERN_NOTICE, instance, "interrupt without IRQ 
bit\n");
+#ifdef SUN3_SCSI_VME
+   dregs->csr |= CSR_DMA_ENABLE;
+#endif
}
 
spin_unlock_irqrestore(&hostdata->lock, flags);
@@ -1274,6 +1299,10 @@ static struct scsi_cmnd *NCR5380_select(
hostdata->connected = cmd;
hostdata->busy[cmd->device->id] |= 1 << cmd->device->lun;
 
+#ifdef SUN3_SCSI_VME
+   dregs->csr |= CSR_INTR;
+#endif
+
initialize_SCp(cmd);
 
cmd = NULL;
@@ -1557,6 +1586,11 @@ static int NCR5380_transfer_dma(struct S
dsprintk(NDEBUG_DMA, instance, "initializing DMA %s: length %d, address 
%p\n",
 (p & SR_IO) ? "receive" : "send", c, d);
 
+#ifdef CONFIG_SUN3
+   /* send start chain */
+   sun3scsi_dma_start(c, *data);
+#endif
+
NCR5380_write(TARGET_COMMAND_REG, PHASE_SR_TO_TCR(p));
NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY |
MR_ENABLE_EOP_INTR);
@@ -1577,6 +1611,7 @@ static int NCR5380_transfer_dma(struct S
 */
 
if (p & SR_IO) {
+   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
NCR5380_io_delay(1);
NCR5380_write(START_DMA_INITIATOR_RECEIVE_REG, 0);
} else {
@@ -1587,6 +1622,13 @@ static int NCR5380_transfer_dma(struct S
NCR5380_io_delay(1);
}
 
+#ifdef CONFIG_SUN3
+#ifdef SUN3_SCSI_VME
+   dregs->csr |= CSR_DMA_ENABLE;
+#endif
+   sun3_dma_active

[PATCH v3 11/23] atari_scsi: Adopt NCR5380.c core driver

2016-03-20 Thread Finn Thain
Add support for the Atari ST DMA chip to the NCR5380.c core driver.
This code is copied from atari_NCR5380.c.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c|   32 
 drivers/scsi/atari_scsi.c |6 +++---
 2 files changed, 35 insertions(+), 3 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:27.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:31.0 +1100
@@ -29,6 +29,8 @@
  * Ronald van Cuijlenborg, Alan Cox and others.
  */
 
+/* Ported to Atari by Roman Hodek and others. */
+
 /*
  * Further development / testing that should be done :
  *
@@ -141,6 +143,14 @@
 #define NCR5380_io_delay(x)
 #endif
 
+#ifndef NCR5380_acquire_dma_irq
+#define NCR5380_acquire_dma_irq(x) (1)
+#endif
+
+#ifndef NCR5380_release_dma_irq
+#define NCR5380_release_dma_irq(x)
+#endif
+
 static int do_abort(struct Scsi_Host *);
 static void do_reset(struct Scsi_Host *);
 
@@ -658,6 +668,9 @@ static int NCR5380_queue_command(struct
 
cmd->result = 0;
 
+   if (!NCR5380_acquire_dma_irq(instance))
+   return SCSI_MLQUEUE_HOST_BUSY;
+
spin_lock_irqsave(&hostdata->lock, flags);
 
/*
@@ -682,6 +695,19 @@ static int NCR5380_queue_command(struct
return 0;
 }
 
+static inline void maybe_release_dma_irq(struct Scsi_Host *instance)
+{
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
+
+   /* Caller does the locking needed to set & test these data atomically */
+   if (list_empty(&hostdata->disconnected) &&
+   list_empty(&hostdata->unissued) &&
+   list_empty(&hostdata->autosense) &&
+   !hostdata->connected &&
+   !hostdata->selecting)
+   NCR5380_release_dma_irq(instance);
+}
+
 /**
  * dequeue_next_cmd - dequeue a command for processing
  * @instance: the scsi host instance
@@ -783,6 +809,7 @@ static void NCR5380_main(struct work_str
 
if (!NCR5380_select(instance, cmd)) {
dsprintk(NDEBUG_MAIN, instance, "main: select 
complete\n");
+   maybe_release_dma_irq(instance);
} else {
dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance,
 "main: select failed, returning %p to 
queue\n", cmd);
@@ -1828,6 +1855,8 @@ static void NCR5380_information_transfer
 
/* Enable reselect interrupts */
NCR5380_write(SELECT_ENABLE_REG, 
hostdata->id_mask);
+
+   maybe_release_dma_irq(instance);
return;
case MESSAGE_REJECT:
/* Accept message by clearing ACK */
@@ -1963,6 +1992,7 @@ static void NCR5380_information_transfer
hostdata->connected = NULL;
cmd->result = DID_ERROR << 16;
complete_cmd(instance, cmd);
+   maybe_release_dma_irq(instance);
NCR5380_write(SELECT_ENABLE_REG, 
hostdata->id_mask);
return;
}
@@ -2256,6 +2286,7 @@ out:
dsprintk(NDEBUG_ABORT, instance, "abort: successfully aborted 
%p\n", cmd);
 
queue_work(hostdata->work_q, &hostdata->main_task);
+   maybe_release_dma_irq(instance);
spin_unlock_irqrestore(&hostdata->lock, flags);
 
return result;
@@ -2336,6 +2367,7 @@ static int NCR5380_bus_reset(struct scsi
hostdata->dma_len = 0;
 
queue_work(hostdata->work_q, &hostdata->main_task);
+   maybe_release_dma_irq(instance);
spin_unlock_irqrestore(&hostdata->lock, flags);
 
return SUCCESS;
Index: linux/drivers/scsi/atari_scsi.c
===
--- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:25.0 
+1100
+++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:31.0 +1100
@@ -99,9 +99,9 @@
 #define NCR5380_abort   atari_scsi_abort
 #define NCR5380_infoatari_scsi_info
 
-#define NCR5380_dma_read_setup(instance, data, count) \
+#define NCR5380_dma_recv_setup(instance, data, count) \
 atari_scsi_dma_setup(instance, data, count, 0)
-#define NCR5380_dma_write_setup(instance, data, count) \
+#define NCR5380_dma_send_setup(instance, data, count) \
 atari_scsi_dma_setup(instance, data, count, 1)
 #define NCR5380_dma_residual(instance) \
 atari_scsi_dma_residual(instance)
@@ -715,7 +715,7 @@ static void atari_scsi_f

[PATCH v3 01/23] g_ncr5380: Remove CONFIG_SCSI_GENERIC_NCR53C400

2016-03-20 Thread Finn Thain
This change brings a number of improvements: fewer macros, better test
coverage, simpler code and sane Kconfig options. The downside is a small
chance of incompatibility (which seems unavoidable).

CONFIG_SCSI_GENERIC_NCR53C400 exists to enable or inhibit pseudo DMA
transfers when the driver is used with 53C400-compatible cards. Thanks to
Ondrej Zary's patches, PDMA now works which means it can be enabled
unconditionally.

Due to bad design, CONFIG_SCSI_GENERIC_NCR53C400 ties together unrelated
functionality as it sets both PSEUDO_DMA and BIOSPARAM macros. This patch
effectively enables PSEUDO_DMA and disables BIOSPARAM.

The defconfigs and the Kconfig default leave CONFIG_SCSI_GENERIC_NCR53C400
undefined. Red Hat 9 and CentOS 2.1 were the same. This leaves both
PSEUDO_DMA and BIOSPARAM disabled. The effect of this patch should be
better performance from enabling PSEUDO_DMA.

On the other hand, Debian 4 and SLES 10 had CONFIG_SCSI_GENERIC_NCR53C400
enabled, so both PSEUDO_DMA and BIOSPARAM were enabled. This patch might
affect configurations like this by disabling BIOSPARAM. My best guess is
that this could be a problem only in the vanishingly rare case that
1) the CHS values stored in the boot device partition table are wrong and
2) a 5380 card is in use (because PDMA on 53C400 used to be broken).

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 

---

Here are the distro kernel versions I looked at:

CentOS 2.1:

$ strings 
kernel-2.4.9-e.40.i686/lib/modules/2.4.9-e.40/kernel/drivers/scsi/g_NCR5380.o | 
grep extension
NO NCR53C400 driver extensions


Red Hat 7:

$ strings 
kernel-2.4.18-3.i386/lib/modules/2.4.18-3/kernel/drivers/scsi/g_NCR5380.o | 
grep extension
NO NCR53C400 driver extensions


Red Hat 9:

$ strings 
kernel-2.4.20-8.i586/lib/modules/2.4.20-8/kernel/drivers/scsi/g_NCR5380.o | 
grep extension
NO NCR53C400 driver extensions


Debian 4:

$ strings 
linux-image-2.6.24-etchnhalf.1-486_2.6.24-6-etchnhalf.9etch3_i386/lib/modules/2.6.24-etchnhalf.1-486/kernel/drivers/scsi/g_NCR5380_mmio.ko
 | grep extension
NCR53C400 extension version %d
$ strings 
kernel-image-2.6.8-2-386_2.6.8-13_i386/lib/modules/2.6.8-2-386/kernel/drivers/scsi/g_NCR5380_mmio.ko
 | grep extension
NCR53C400 extension version %d


SLES 10.2:

$ strings 
kernel-default-2.6.18.2-34.i586/lib/modules/2.6.18.2-34-default/kernel/drivers/scsi/g_NCR5380_mmio.ko
 | grep extension
NCR53C400 extension version %d

---
 drivers/scsi/Kconfig |   11 --
 drivers/scsi/g_NCR5380.c |   75 ++-
 drivers/scsi/g_NCR5380.h |   16 +-
 3 files changed, 25 insertions(+), 77 deletions(-)

Index: linux/drivers/scsi/Kconfig
===
--- linux.orig/drivers/scsi/Kconfig 2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/Kconfig  2016-03-21 13:31:07.0 +1100
@@ -812,17 +812,6 @@ config SCSI_GENERIC_NCR5380_MMIO
  To compile this driver as a module, choose M here: the
  module will be called g_NCR5380_mmio.
 
-config SCSI_GENERIC_NCR53C400
-   bool "Enable NCR53c400 extensions"
-   depends on SCSI_GENERIC_NCR5380
-   help
- This enables certain optimizations for the NCR53c400 SCSI cards.
- You might as well try it out.  Note that this driver will only probe
- for the Trantor T130B in its default configuration; you might have
- to pass a command line option to the kernel at boot time if it does
- not detect your card.  See the file
-  for details.
-
 config SCSI_IPS
tristate "IBM ServeRAID support"
depends on PCI && SCSI
Index: linux/drivers/scsi/g_NCR5380.c
===
--- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/g_NCR5380.c  2016-03-21 13:31:07.0 +1100
@@ -57,10 +57,7 @@
  */
 
 #define AUTOPROBE_IRQ
-
-#ifdef CONFIG_SCSI_GENERIC_NCR53C400
 #define PSEUDO_DMA
-#endif
 
 #include 
 #include 
@@ -270,7 +267,7 @@ static int __init generic_NCR5380_detect
 #ifndef SCSI_G_NCR5380_MEM
int i;
int port_idx = -1;
-   unsigned long region_size = 16;
+   unsigned long region_size;
 #endif
static unsigned int __initdata ncr_53c400a_ports[] = {
0x280, 0x290, 0x300, 0x310, 0x330, 0x340, 0x348, 0x350, 0
@@ -290,6 +287,7 @@ static int __init generic_NCR5380_detect
 #ifdef SCSI_G_NCR5380_MEM
unsigned long base;
void __iomem *iomem;
+   resource_size_t iomem_size;
 #endif
 
if (ncr_irq)
@@ -353,9 +351,7 @@ static int __init generic_NCR5380_detect
flags = FLAG_NO_PSEUDO_DMA;
break;
case BOARD_NCR53C400:
-#ifdef PSEUDO_DMA
flags = FLAG_NO_DMA_FIXUP;
-#endif
break;
case BOARD_NCR53C400A:
flags = FLAG_NO_DM

[PATCH v3 08/23] ncr5380: Use DMA hooks for PDMA

2016-03-20 Thread Finn Thain
Those wrapper drivers which use DMA define the REAL_DMA macro and
those which use pseudo DMA define PSEUDO_DMA. These macros need to be
removed for a number of reasons, not least of which is to have drivers
share more code.

Redefine the PDMA send and receive hooks as DMA setup hooks, so that the
DMA code can be shared by all 5380 wrapper drivers. This will help to
reunify the forked core driver.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c  |   10 ++
 drivers/scsi/arm/cumana_1.c |   10 ++
 drivers/scsi/arm/oak.c  |   10 ++
 drivers/scsi/dmx3191d.c |4 ++--
 drivers/scsi/dtc.c  |6 --
 drivers/scsi/dtc.h  |2 ++
 drivers/scsi/g_NCR5380.c|   10 ++
 drivers/scsi/g_NCR5380.h|4 ++--
 drivers/scsi/mac_scsi.c |5 ++---
 drivers/scsi/pas16.c|   14 --
 drivers/scsi/pas16.h|2 ++
 drivers/scsi/t128.c |   12 ++--
 drivers/scsi/t128.h |2 ++
 13 files changed, 50 insertions(+), 41 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:20.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:21.0 +1100
@@ -127,17 +127,11 @@
  * specific implementation of the NCR5380
  *
  * Either real DMA *or* pseudo DMA may be implemented
- * Note that the DMA setup functions should return the number of bytes
- * that they were able to program the controller for.
  *
  * NCR5380_dma_write_setup(instance, src, count) - initialize
  * NCR5380_dma_read_setup(instance, dst, count) - initialize
  * NCR5380_dma_residual(instance); - residual count
  *
- * PSEUDO functions :
- * NCR5380_pwrite(instance, src, count)
- * NCR5380_pread(instance, dst, count);
- *
  * The generic driver is initialized by calling NCR5380_init(instance),
  * after setting the appropriate host specific fields and ID.  If the
  * driver wishes to autoprobe for an IRQ line, the NCR5380_probe_irq(instance,
@@ -1511,7 +1505,7 @@ static int NCR5380_transfer_dma(struct S
  */
 
if (p & SR_IO) {
-   foo = NCR5380_pread(instance, d,
+   foo = NCR5380_dma_recv_setup(instance, d,
hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c);
if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
@@ -1542,7 +1536,7 @@ static int NCR5380_transfer_dma(struct S
d[c - 1] = NCR5380_read(INPUT_DATA_REG);
}
} else {
-   foo = NCR5380_pwrite(instance, d, c);
+   foo = NCR5380_dma_send_setup(instance, d, c);
if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
 * Wait for the last byte to be sent.  If REQ is being 
asserted for
Index: linux/drivers/scsi/arm/cumana_1.c
===
--- linux.orig/drivers/scsi/arm/cumana_1.c  2016-03-21 13:31:16.0 
+1100
+++ linux/drivers/scsi/arm/cumana_1.c   2016-03-21 13:31:21.0 +1100
@@ -18,6 +18,8 @@
 #define NCR5380_write(reg, value)  cumanascsi_write(instance, reg, value)
 
 #define NCR5380_dma_xfer_len(instance, cmd, phase) (cmd->transfersize)
+#define NCR5380_dma_recv_setup cumanascsi_pread
+#define NCR5380_dma_send_setup cumanascsi_pwrite
 
 #define NCR5380_intr   cumanascsi_intr
 #define NCR5380_queue_command  cumanascsi_queue_command
@@ -39,8 +41,8 @@ void cumanascsi_setup(char *str, int *in
 #define L(v)   (((v)<<16)|((v) & 0x))
 #define H(v)   (((v)>>16)|((v) & 0x))
 
-static inline int
-NCR5380_pwrite(struct Scsi_Host *host, unsigned char *addr, int len)
+static inline int cumanascsi_pwrite(struct Scsi_Host *host,
+unsigned char *addr, int len)
 {
   unsigned long *laddr;
   void __iomem *dma = priv(host)->dma + 0x2000;
@@ -102,8 +104,8 @@ end:
   return len;
 }
 
-static inline int
-NCR5380_pread(struct Scsi_Host *host, unsigned char *addr, int len)
+static inline int cumanascsi_pread(struct Scsi_Host *host,
+   unsigned char *addr, int len)
 {
   unsigned long *laddr;
   void __iomem *dma = priv(host)->dma + 0x2000;
Index: linux/drivers/scsi/arm/oak.c
===
--- linux.orig/drivers/scsi/arm/oak.c   2016-03-21 13:31:16.0 +1100
+++ linux/drivers/scsi/arm/oak.c2016-03-21 13:31:21.0 +1100
@@ -24,6 +24,8 @@
writeb(value, priv(instance)->base + ((reg) << 2))
 
 #define NCR5380_dma_xfer_len(instance, cmd, phase) (0)
+#define NCR5380_dma_recv_setup oakscsi_pread
+#define NCR5380_dma_send_setup oakscsi_pwrite
 
 #define NCR5380_queue_command  oakscsi_que

[PATCH v3 18/23] ncr5380: Remove DONT_USE_INTR and AUTOPROBE_IRQ macros

2016-03-20 Thread Finn Thain
Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c   |   12 +---
 drivers/scsi/NCR5380.h   |4 
 drivers/scsi/arm/oak.c   |2 --
 drivers/scsi/dmx3191d.c  |2 --
 drivers/scsi/dtc.c   |   12 +++-
 drivers/scsi/g_NCR5380.c |2 --
 drivers/scsi/pas16.c |1 -
 drivers/scsi/t128.c  |1 -
 8 files changed, 4 insertions(+), 32 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:39.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:40.0 +1100
@@ -106,9 +106,6 @@
  * DIFFERENTIAL - if defined, NCR53c81 chips will use external differential
  * transceivers.
  *
- * DONT_USE_INTR - if defined, never use interrupts, even if we probe or
- * override-configure an IRQ.
- *
  * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases.
  *
  * REAL_DMA - if defined, REAL DMA is used during the data transfer phases.
@@ -464,9 +461,6 @@ static void prepare_info(struct Scsi_Hos
 hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "",
 hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "",
 hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY "  : "",
-#ifdef AUTOPROBE_IRQ
-"AUTOPROBE_IRQ "
-#endif
 #ifdef DIFFERENTIAL
 "DIFFERENTIAL "
 #endif
@@ -915,8 +909,6 @@ static void NCR5380_dma_complete(struct
}
 }
 
-#ifndef DONT_USE_INTR
-
 /**
  * NCR5380_intr - generic NCR5380 irq handler
  * @irq: interrupt number
@@ -951,7 +943,7 @@ static void NCR5380_dma_complete(struct
  * the Busy Monitor interrupt is enabled together with DMA Mode.
  */
 
-static irqreturn_t NCR5380_intr(int irq, void *dev_id)
+static irqreturn_t __maybe_unused NCR5380_intr(int irq, void *dev_id)
 {
struct Scsi_Host *instance = dev_id;
struct NCR5380_hostdata *hostdata = shost_priv(instance);
@@ -1020,8 +1012,6 @@ static irqreturn_t NCR5380_intr(int irq,
return IRQ_RETVAL(handled);
 }
 
-#endif
-
 /*
  * Function : int NCR5380_select(struct Scsi_Host *instance,
  * struct scsi_cmnd *cmd)
Index: linux/drivers/scsi/NCR5380.h
===
--- linux.orig/drivers/scsi/NCR5380.h   2016-03-21 13:31:33.0 +1100
+++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:40.0 +1100
@@ -280,16 +280,12 @@ static void NCR5380_print(struct Scsi_Ho
 #define NCR5380_dprint_phase(flg, arg) do {} while (0)
 #endif
 
-#if defined(AUTOPROBE_IRQ)
 static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible);
-#endif
 static int NCR5380_init(struct Scsi_Host *instance, int flags);
 static int NCR5380_maybe_reset_bus(struct Scsi_Host *);
 static void NCR5380_exit(struct Scsi_Host *instance);
 static void NCR5380_information_transfer(struct Scsi_Host *instance);
-#ifndef DONT_USE_INTR
 static irqreturn_t NCR5380_intr(int irq, void *dev_id);
-#endif
 static void NCR5380_main(struct work_struct *work);
 static const char *NCR5380_info(struct Scsi_Host *instance);
 static void NCR5380_reselect(struct Scsi_Host *instance);
Index: linux/drivers/scsi/arm/oak.c
===
--- linux.orig/drivers/scsi/arm/oak.c   2016-03-21 13:31:27.0 +1100
+++ linux/drivers/scsi/arm/oak.c2016-03-21 13:31:40.0 +1100
@@ -14,8 +14,6 @@
 
 #include 
 
-#define DONT_USE_INTR
-
 #define priv(host) ((struct NCR5380_hostdata 
*)(host)->hostdata)
 
 #define NCR5380_read(reg) \
Index: linux/drivers/scsi/dmx3191d.c
===
--- linux.orig/drivers/scsi/dmx3191d.c  2016-03-21 13:31:37.0 +1100
+++ linux/drivers/scsi/dmx3191d.c   2016-03-21 13:31:40.0 +1100
@@ -34,8 +34,6 @@
  * Definitions for the generic 5380 driver.
  */
 
-#define DONT_USE_INTR
-
 #define NCR5380_read(reg)  inb(instance->io_port + reg)
 #define NCR5380_write(reg, value)  outb(value, instance->io_port + reg)
 
Index: linux/drivers/scsi/dtc.c
===
--- linux.orig/drivers/scsi/dtc.c   2016-03-21 13:31:27.0 +1100
+++ linux/drivers/scsi/dtc.c2016-03-21 13:31:40.0 +1100
@@ -1,5 +1,3 @@
-#define DONT_USE_INTR
-
 /*
  * DTC 3180/3280 driver, by
  * Ray Van Tassle  ra...@comm.mot.com
@@ -53,7 +51,6 @@
 #include 
 
 #include "dtc.h"
-#define AUTOPROBE_IRQ
 #include "NCR5380.h"
 
 /*
@@ -243,9 +240,10 @@ found:
if (instance->irq == 255)
instance->irq = NO_IRQ;
 
-#ifndef DONT_USE_INTR
/* With interrupts enabled, it will sometimes hang when doing 
heavy
 * reads. So better not enable them until I finger it out. */
+  

[PATCH v3 20/23] atari_scsi: Set a reasonable default for cmd_per_lun

2016-03-20 Thread Finn Thain
This setting does not need to be conditional on Atari ST or TT.

Signed-off-by: Finn Thain 
Tested-by: Michael Schmitz 

---

Changed since v1:
- Set the default cmd_per_lun to 4 based on test results.

Changed since v2:
- Revert the default cmd_per_lun to 2, like in the v1 patch, because
a uniform default across all ten 5380 wrapper drivers is worth more
than a tiny improvement in one particular microbenchmark on one system.
Michael tells me that 2 is also the best setting for his Atari Falcon.

---
 drivers/scsi/atari_scsi.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux/drivers/scsi/atari_scsi.c
===
--- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:33.0 
+1100
+++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:43.0 +1100
@@ -752,6 +752,7 @@ static struct scsi_host_template atari_s
.eh_abort_handler   = atari_scsi_abort,
.eh_bus_reset_handler   = atari_scsi_bus_reset,
.this_id= 7,
+   .cmd_per_lun= 2,
.use_clustering = DISABLE_CLUSTERING,
.cmd_size   = NCR5380_CMD_SIZE,
 };
@@ -788,11 +789,9 @@ static int __init atari_scsi_probe(struc
 */
if (ATARIHW_PRESENT(TT_SCSI)) {
atari_scsi_template.can_queue= 16;
-   atari_scsi_template.cmd_per_lun  = 8;
atari_scsi_template.sg_tablesize = SG_ALL;
} else {
atari_scsi_template.can_queue= 8;
-   atari_scsi_template.cmd_per_lun  = 1;
atari_scsi_template.sg_tablesize = SG_NONE;
}
 




[PATCH v3 13/23] ncr5380: Remove disused atari_NCR5380.c core driver

2016-03-20 Thread Finn Thain
Now that atari_scsi and sun3_scsi have been converted to use the NCR5380.c
core driver, remove atari_NCR5380.c. Also remove the last vestiges of its
Tagged Command Queueing implementation from the wrapper drivers.

The TCQ support in atari_NCR5380.c is abandoned by this patch. It is not
merged into the remaining core driver because,

1) atari_scsi defines SUPPORT_TAGS but leaves FLAG_TAGGED_QUEUING disabled
by default, which indicates that it is mostly undesirable.

2) I'm told that it doesn't work correctly when enabled.

3) The algorithm does not make use of block layer tags which it will have
to do because scmd->tag is deprecated.

4) sun3_scsi doesn't define SUPPORT_TAGS at all, yet the the SUPPORT_TAGS
macro interacts with the CONFIG_SUN3 macro in 'interesting' ways.

5) Compile-time configuration with macros like SUPPORT_TAGS caused the
configuration space to explode, leading to untestable and unmaintainable
code that is too hard to reason about.

The merge_contiguous_buffers() code is also abandoned. This was unused
by sun3_scsi. Only atari_scsi used it and then only on TT, because only TT
supports scatter/gather. I suspect that the TT would work fine with
ENABLE_CLUSTERING instead. If someone can benchmark the difference then
perhaps the merge_contiguous_buffers() code can be be justified. Until
then we are better off without the extra complexity.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c   |   22 
 drivers/scsi/NCR5380.h   |   19 
 drivers/scsi/atari_NCR5380.c | 2632 ---
 drivers/scsi/atari_scsi.c|   11 
 drivers/scsi/mac_scsi.c  |8 
 drivers/scsi/sun3_scsi.c |   11 
 6 files changed, 4 insertions(+), 2699 deletions(-)

Index: linux/drivers/scsi/atari_scsi.c
===
--- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:31.0 
+1100
+++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:33.0 +1100
@@ -87,9 +87,6 @@
 
 /* Definitions for the core NCR5380 driver. */
 
-#define SUPPORT_TAGS
-#define MAX_TAGS32
-
 #define NCR5380_implementation_fields   /* none */
 
 #define NCR5380_read(reg)   atari_scsi_reg_read(reg)
@@ -189,8 +186,6 @@ static int setup_cmd_per_lun = -1;
 module_param(setup_cmd_per_lun, int, 0);
 static int setup_sg_tablesize = -1;
 module_param(setup_sg_tablesize, int, 0);
-static int setup_use_tagged_queuing = -1;
-module_param(setup_use_tagged_queuing, int, 0);
 static int setup_hostid = -1;
 module_param(setup_hostid, int, 0);
 static int setup_toshiba_delay = -1;
@@ -479,8 +474,7 @@ static int __init atari_scsi_setup(char
setup_sg_tablesize = ints[3];
if (ints[0] >= 4)
setup_hostid = ints[4];
-   if (ints[0] >= 5)
-   setup_use_tagged_queuing = ints[5];
+   /* ints[5] (use_tagged_queuing) is ignored */
/* ints[6] (use_pdma) is ignored */
if (ints[0] >= 7)
setup_toshiba_delay = ints[7];
@@ -853,9 +847,6 @@ static int __init atari_scsi_probe(struc
instance->irq = irq->start;
 
host_flags |= IS_A_TT() ? 0 : FLAG_LATE_DMA_SETUP;
-#ifdef SUPPORT_TAGS
-   host_flags |= setup_use_tagged_queuing > 0 ? FLAG_TAGGED_QUEUING : 0;
-#endif
host_flags |= setup_toshiba_delay > 0 ? FLAG_TOSHIBA_DELAY : 0;
 
error = NCR5380_init(instance, host_flags);
Index: linux/drivers/scsi/sun3_scsi.c
===
--- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:32.0 +1100
+++ linux/drivers/scsi/sun3_scsi.c  2016-03-21 13:31:33.0 +1100
@@ -41,9 +41,6 @@
 
 /* Definitions for the core NCR5380 driver. */
 
-/* #define SUPPORT_TAGS */
-/* #define MAX_TAGS 32 */
-
 #define NCR5380_implementation_fields   /* none */
 
 #define NCR5380_read(reg)   sun3scsi_read(reg)
@@ -75,10 +72,6 @@ static int setup_cmd_per_lun = -1;
 module_param(setup_cmd_per_lun, int, 0);
 static int setup_sg_tablesize = -1;
 module_param(setup_sg_tablesize, int, 0);
-#ifdef SUPPORT_TAGS
-static int setup_use_tagged_queuing = -1;
-module_param(setup_use_tagged_queuing, int, 0);
-#endif
 static int setup_hostid = -1;
 module_param(setup_hostid, int, 0);
 
@@ -512,10 +505,6 @@ static int __init sun3_scsi_probe(struct
instance->io_port = (unsigned long)ioaddr;
instance->irq = irq->start;
 
-#ifdef SUPPORT_TAGS
-   host_flags |= setup_use_tagged_queuing > 0 ? FLAG_TAGGED_QUEUING : 0;
-#endif
-
error = NCR5380_init(instance, host_flags);
if (error)
goto fail_init;
Index: linux/drivers/scsi/mac_scsi.c
===
--- linux.orig/drivers/scsi/mac_scsi.c  2016-03-21 13:31:27.0 +1100
+++ linux/drivers/scsi/mac_scsi.c   2016-03-21 13:31:33.00

[PATCH v3 17/23] ncr5380: Remove remaining register storage qualifiers

2016-03-20 Thread Finn Thain
Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:38.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:39.0 +1100
@@ -1555,9 +1555,9 @@ static int NCR5380_transfer_dma(struct S
unsigned char **data)
 {
struct NCR5380_hostdata *hostdata = shost_priv(instance);
-   register int c = *count;
-   register unsigned char p = *phase;
-   register unsigned char *d = *data;
+   int c = *count;
+   unsigned char p = *phase;
+   unsigned char *d = *data;
unsigned char tmp;
int result = 0;
 




[PATCH v3 15/23] dmx3191d: Drop max_sectors limit

2016-03-20 Thread Finn Thain
The dmx3191d driver is not capable of DMA or PDMA so all transfers
use PIO. Now that large slow PIO transfers periodically stop and call
cond_resched(), the max_sectors limit can go away.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 

---
 drivers/scsi/dmx3191d.c |1 -
 1 file changed, 1 deletion(-)

Index: linux/drivers/scsi/dmx3191d.c
===
--- linux.orig/drivers/scsi/dmx3191d.c  2016-03-21 13:31:27.0 +1100
+++ linux/drivers/scsi/dmx3191d.c   2016-03-21 13:31:37.0 +1100
@@ -67,7 +67,6 @@ static struct scsi_host_template dmx3191
.cmd_per_lun= 2,
.use_clustering = DISABLE_CLUSTERING,
.cmd_size   = NCR5380_CMD_SIZE,
-   .max_sectors= 128,
 };
 
 static int dmx3191d_probe_one(struct pci_dev *pdev,




[PATCH v3 16/23] ncr5380: Fix register decoding for debugging

2016-03-20 Thread Finn Thain
Decode all bits in the chip registers. They are all useful at times.
Fix printk severity so that this output can be suppressed along with
the other debugging output.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c |   42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:36.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:38.0 +1100
@@ -256,12 +256,20 @@ static struct {
{0, NULL}
 },
 basrs[] = {
+   {BASR_END_DMA_TRANSFER, "END OF DMA"},
+   {BASR_DRQ, "DRQ"},
+   {BASR_PARITY_ERROR, "PARITY ERROR"},
+   {BASR_IRQ, "IRQ"},
+   {BASR_PHASE_MATCH, "PHASE MATCH"},
+   {BASR_BUSY_ERROR, "BUSY ERROR"},
{BASR_ATN, "ATN"},
{BASR_ACK, "ACK"},
{0, NULL}
 },
 icrs[] = {
{ICR_ASSERT_RST, "ASSERT RST"},
+   {ICR_ARBITRATION_PROGRESS, "ARB. IN PROGRESS"},
+   {ICR_ARBITRATION_LOST, "LOST ARB."},
{ICR_ASSERT_ACK, "ASSERT ACK"},
{ICR_ASSERT_BSY, "ASSERT BSY"},
{ICR_ASSERT_SEL, "ASSERT SEL"},
@@ -270,14 +278,14 @@ icrs[] = {
{0, NULL}
 },
 mrs[] = {
-   {MR_BLOCK_DMA_MODE, "MODE BLOCK DMA"},
-   {MR_TARGET, "MODE TARGET"},
-   {MR_ENABLE_PAR_CHECK, "MODE PARITY CHECK"},
-   {MR_ENABLE_PAR_INTR, "MODE PARITY INTR"},
-   {MR_ENABLE_EOP_INTR, "MODE EOP INTR"},
-   {MR_MONITOR_BSY, "MODE MONITOR BSY"},
-   {MR_DMA_MODE, "MODE DMA"},
-   {MR_ARBITRATE, "MODE ARBITRATION"},
+   {MR_BLOCK_DMA_MODE, "BLOCK DMA MODE"},
+   {MR_TARGET, "TARGET"},
+   {MR_ENABLE_PAR_CHECK, "PARITY CHECK"},
+   {MR_ENABLE_PAR_INTR, "PARITY INTR"},
+   {MR_ENABLE_EOP_INTR, "EOP INTR"},
+   {MR_MONITOR_BSY, "MONITOR BSY"},
+   {MR_DMA_MODE, "DMA MODE"},
+   {MR_ARBITRATE, "ARBITRATE"},
{0, NULL}
 };
 
@@ -298,23 +306,23 @@ static void NCR5380_print(struct Scsi_Ho
icr = NCR5380_read(INITIATOR_COMMAND_REG);
basr = NCR5380_read(BUS_AND_STATUS_REG);
 
-   printk("STATUS_REG: %02x ", status);
+   printk(KERN_DEBUG "SR =   0x%02x : ", status);
for (i = 0; signals[i].mask; ++i)
if (status & signals[i].mask)
-   printk(",%s", signals[i].name);
-   printk("\nBASR: %02x ", basr);
+   printk(KERN_CONT "%s, ", signals[i].name);
+   printk(KERN_CONT "\nBASR = 0x%02x : ", basr);
for (i = 0; basrs[i].mask; ++i)
if (basr & basrs[i].mask)
-   printk(",%s", basrs[i].name);
-   printk("\nICR: %02x ", icr);
+   printk(KERN_CONT "%s, ", basrs[i].name);
+   printk(KERN_CONT "\nICR =  0x%02x : ", icr);
for (i = 0; icrs[i].mask; ++i)
if (icr & icrs[i].mask)
-   printk(",%s", icrs[i].name);
-   printk("\nMODE: %02x ", mr);
+   printk(KERN_CONT "%s, ", icrs[i].name);
+   printk(KERN_CONT "\nMR =   0x%02x : ", mr);
for (i = 0; mrs[i].mask; ++i)
if (mr & mrs[i].mask)
-   printk(",%s", mrs[i].name);
-   printk("\n");
+   printk(KERN_CONT "%s, ", mrs[i].name);
+   printk(KERN_CONT "\n");
 }
 
 static struct {




[PATCH v3 22/23] mac_scsi: Fix pseudo DMA implementation

2016-03-20 Thread Finn Thain
Fix various issues: Comments about bus errors are incorrect. The
PDMA asm must return the size of the memory access that faulted so the
transfer count can be adjusted accordingly. A phase change may cause a
bus error but should not be treated as failure. A bus error does not
always imply a phase change and generally the transfer may continue.
Scatter/gather doesn't seem to work with PDMA due to overruns. This is
a pity because peak throughput seems to double with SG_ALL.
Tested on a Mac LC III.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 

---

Changed since v1:
- Set the default cmd_per_lun to 4 based on test results.

Changed since v2:
- Revert the default cmd_per_lun to 2, like in the v1 patch, because
a uniform default across all ten 5380 wrapper drivers is worth more
than a tiny improvement in one particular microbenchmark on one system.
- Add 'reviewed-by' tag.

---
 drivers/scsi/NCR5380.h  |2 
 drivers/scsi/mac_scsi.c |  210 ++--
 2 files changed, 118 insertions(+), 94 deletions(-)

Index: linux/drivers/scsi/mac_scsi.c
===
--- linux.orig/drivers/scsi/mac_scsi.c  2016-03-21 13:31:33.0 +1100
+++ linux/drivers/scsi/mac_scsi.c   2016-03-21 13:31:45.0 +1100
@@ -28,7 +28,8 @@
 
 /* Definitions for the core NCR5380 driver. */
 
-#define NCR5380_implementation_fields   unsigned char *pdma_base
+#define NCR5380_implementation_fields   unsigned char *pdma_base; \
+int pdma_residual
 
 #define NCR5380_read(reg)   macscsi_read(instance, reg)
 #define NCR5380_write(reg, value)   macscsi_write(instance, reg, value)
@@ -37,7 +38,7 @@
 macscsi_dma_xfer_len(instance, cmd)
 #define NCR5380_dma_recv_setup  macscsi_pread
 #define NCR5380_dma_send_setup  macscsi_pwrite
-#define NCR5380_dma_residual(instance)  (0)
+#define NCR5380_dma_residual(instance)  (hostdata->pdma_residual)
 
 #define NCR5380_intrmacscsi_intr
 #define NCR5380_queue_command   macscsi_queue_command
@@ -104,18 +105,9 @@ static int __init mac_scsi_setup(char *s
 __setup("mac5380=", mac_scsi_setup);
 #endif /* !MODULE */
 
-/* 
-   Pseudo-DMA: (Ove Edlund)
-   The code attempts to catch bus errors that occur if one for example
-   "trips over the cable".
-   XXX: Since bus errors in the PDMA routines never happen on my 
-   computer, the bus error code is untested. 
-   If the code works as intended, a bus error results in Pseudo-DMA 
-   being disabled, meaning that the driver switches to slow handshake.
-   If bus errors are NOT extremely rare, this has to be changed. 
-*/
+/* Pseudo DMA asm originally by Ove Edlund */
 
-#define CP_IO_TO_MEM(s,d,len)  \
+#define CP_IO_TO_MEM(s,d,n)\
 __asm__ __volatile__   \
 ("cmp.w  #4,%2\n"  \
  "bls8f\n" \
@@ -152,61 +144,73 @@ __asm__ __volatile__  
\
  " 9: \n"  \
  ".section .fixup,\"ax\"\n"\
  ".even\n" \
- "90: moveq.l #1, %2\n"\
+ "91: moveq.l #1, %2\n"\
+ "jra 9b\n"\
+ "94: moveq.l #4, %2\n"\
  "jra 9b\n"\
  ".previous\n" \
  ".section __ex_table,\"a\"\n" \
  "   .align 4\n"   \
- "   .long  1b,90b\n"  \
- "   .long  3b,90b\n"  \
- "   .long 31b,90b\n"  \
- "   .long 32b,90b\n"  \
- "   .long 33b,90b\n"  \
- "   .long 34b,90b\n"  \
- "   .long 35b,90b\n"  \
- "   .long 36b,90b\n"  \
- "   .long 37b,90b\n"  \
- "   .long  5b,90b\n"  \
- "   .long  7b,90b\n"  \
+ "   .long  1b,91b\n"  \
+ "   .long  3b,94b\n"  \
+ "   .long 31b,94b\n"  \
+ "   .long 32b,94b\n"  \
+ "   .long 33b,94b\n"  \
+ "   .long 34b,94b\n"  \
+ "   .long 35b,94b\n"  \
+ "   .long 36b,94b\n"  \
+ "   .long 37b,94b\n"  \
+ "   .long  5b,94b\n"

[PATCH v3 23/23] ncr5380: Call complete_cmd() for disconnected commands on bus reset

2016-03-20 Thread Finn Thain
I'm told that some targets are liable to disconnect a REQUEST SENSE
command. Theoretically this would cause a command undergoing autosense to
be moved onto the disconnected list. The bus reset handler must call
complete_cmd() for these commands, otherwise the hostdata->sensing pointer
will not get cleared. That would cause autosense processing to stall and
a timeout or an incorrect scsi_eh_restore_cmnd() would eventually follow.

Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:40.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:47.0 +1100
@@ -2437,7 +2437,7 @@ static int NCR5380_bus_reset(struct scsi
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
 
set_host_byte(cmd, DID_RESET);
-   cmd->scsi_done(cmd);
+   complete_cmd(instance, cmd);
}
INIT_LIST_HEAD(&hostdata->disconnected);
 




[PATCH v3 19/23] ncr5380: Update usage documentation

2016-03-20 Thread Finn Thain
Update kernel parameter documentation for atari_scsi, mac_scsi and
g_NCR5380 drivers. Remove duplication.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 

---
 Documentation/scsi/g_NCR5380.txt   |   17 ++-
 Documentation/scsi/scsi-parameters.txt |   11 +++---
 drivers/scsi/g_NCR5380.c   |   36 -
 3 files changed, 16 insertions(+), 48 deletions(-)

Index: linux/Documentation/scsi/scsi-parameters.txt
===
--- linux.orig/Documentation/scsi/scsi-parameters.txt   2016-03-21 
13:31:06.0 +1100
+++ linux/Documentation/scsi/scsi-parameters.txt2016-03-21 
13:31:42.0 +1100
@@ -27,13 +27,15 @@ parameters may be changed at runtime by
aic79xx=[HW,SCSI]
See Documentation/scsi/aic79xx.txt.
 
-   atascsi=[HW,SCSI] Atari SCSI
+   atascsi=[HW,SCSI]
+   See drivers/scsi/atari_scsi.c.
 
BusLogic=   [HW,SCSI]
See drivers/scsi/BusLogic.c, comment before function
BusLogic_ParseDriverOptions().
 
dtc3181e=   [HW,SCSI]
+   See Documentation/scsi/g_NCR5380.txt.
 
eata=   [HW,SCSI]
 
@@ -51,8 +53,8 @@ parameters may be changed at runtime by
ips=[HW,SCSI] Adaptec / IBM ServeRAID controller
See header of drivers/scsi/ips.c.
 
-   mac5380=[HW,SCSI] Format:
-   

+   mac5380=[HW,SCSI]
+   See drivers/scsi/mac_scsi.c.
 
max_luns=   [SCSI] Maximum number of LUNs to probe.
Should be between 1 and 2^32-1.
@@ -65,10 +67,13 @@ parameters may be changed at runtime by
See header of drivers/scsi/NCR_D700.c.
 
ncr5380=[HW,SCSI]
+   See Documentation/scsi/g_NCR5380.txt.
 
ncr53c400=  [HW,SCSI]
+   See Documentation/scsi/g_NCR5380.txt.
 
ncr53c400a= [HW,SCSI]
+   See Documentation/scsi/g_NCR5380.txt.
 
ncr53c406a= [HW,SCSI]
 
Index: linux/Documentation/scsi/g_NCR5380.txt
===
--- linux.orig/Documentation/scsi/g_NCR5380.txt 2016-03-21 13:31:06.0 
+1100
+++ linux/Documentation/scsi/g_NCR5380.txt  2016-03-21 13:31:42.0 
+1100
@@ -23,11 +23,10 @@ supported by the driver.
 
 If the default configuration does not work for you, you can use the kernel
 command lines (eg using the lilo append command):
-   ncr5380=port,irq,dma
-   ncr53c400=port,irq
-or
-   ncr5380=base,irq,dma
-   ncr53c400=base,irq
+   ncr5380=addr,irq
+   ncr53c400=addr,irq
+   ncr53c400a=addr,irq
+   dtc3181e=addr,irq
 
 The driver does not probe for any addresses or ports other than those in
 the OVERRIDE or given to the kernel as above.
@@ -36,19 +35,17 @@ This driver provides some information on
 /proc/scsi/g_NCR5380/x where x is the scsi card number as detected at boot
 time. More info to come in the future.
 
-When NCR53c400 support is compiled in, BIOS parameters will be returned by
-the driver (the raw 5380 driver does not and I don't plan to fiddle with
-it!).
-
 This driver works as a module.
 When included as a module, parameters can be passed on the insmod/modprobe
 command line:
   ncr_irq=xx   the interrupt
   ncr_addr=xx  the port or base address (for port or memory
mapped, resp.)
-  ncr_dma=xx   the DMA
   ncr_5380=1   to set up for a NCR5380 board
   ncr_53c400=1 to set up for a NCR53C400 board
+  ncr_53c400a=1 to set up for a NCR53C400A board
+  dtc_3181e=1  to set up for a Domex Technology Corp 3181E board
+  hp_c2502=1   to set up for a Hewlett Packard C2502 board
 e.g.
 modprobe g_NCR5380 ncr_irq=5 ncr_addr=0x350 ncr_5380=1
   for a port mapped NCR5380 board or
Index: linux/drivers/scsi/g_NCR5380.c
===
--- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:40.0 +1100
+++ linux/drivers/scsi/g_NCR5380.c  2016-03-21 13:31:42.0 +1100
@@ -18,42 +18,8 @@
  *
  * Added ISAPNP support for DTC436 adapters,
  * Thomas Sailer, sai...@ife.ee.ethz.ch
- */
-
-/* 
- * TODO : flesh out DMA support, find some one actually using this (I have
- * a memory mapped Trantor board that works fine)
- */
-
-/*
- * The card is detected and initialized in one of several ways : 
- * 1.  With command line overrides - NCR5380=port,irq may be 
- * used on the LILO command line to override the defaults.
- *
- * 2.  With the GENERIC_NCR5380_OVERRIDE compile time define.  This is 
- * specified as an array of address, irq, dma, board tuples.  Ie, for
- * one board at 0x350, IRQ5, no dma, I could say  
- * -DGENERIC_NCR5380_OVERRIDE={{0xcc000, 5,

[PATCH v3 02/23] ncr5380: Remove FLAG_NO_PSEUDO_DMA where possible

2016-03-20 Thread Finn Thain
Drivers that define PSEUDO_DMA also define NCR5380_dma_xfer_len.
The core driver must call NCR5380_dma_xfer_len which means
FLAG_NO_PSEUDO_DMA can be eradicated from the core driver.

dmx3191d doesn't define PSEUDO_DMA and has no use for FLAG_NO_PSEUDO_DMA,
so remove it there also.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c   |3 +--
 drivers/scsi/dmx3191d.c  |2 +-
 drivers/scsi/g_NCR5380.c |7 ++-
 drivers/scsi/g_NCR5380.h |2 +-
 drivers/scsi/mac_scsi.c  |   15 ++-
 5 files changed, 23 insertions(+), 6 deletions(-)

Index: linux/drivers/scsi/dmx3191d.c
===
--- linux.orig/drivers/scsi/dmx3191d.c  2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/dmx3191d.c   2016-03-21 13:31:09.0 +1100
@@ -93,7 +93,7 @@ static int dmx3191d_probe_one(struct pci
 */
shost->irq = NO_IRQ;
 
-   error = NCR5380_init(shost, FLAG_NO_PSEUDO_DMA);
+   error = NCR5380_init(shost, 0);
if (error)
goto out_host_put;
 
Index: linux/drivers/scsi/mac_scsi.c
===
--- linux.orig/drivers/scsi/mac_scsi.c  2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/mac_scsi.c   2016-03-21 13:31:09.0 +1100
@@ -37,7 +37,9 @@
 
 #define NCR5380_pread   macscsi_pread
 #define NCR5380_pwrite  macscsi_pwrite
-#define NCR5380_dma_xfer_len(instance, cmd, phase) (cmd->transfersize)
+
+#define NCR5380_dma_xfer_len(instance, cmd, phase) \
+macscsi_dma_xfer_len(instance, cmd)
 
 #define NCR5380_intrmacscsi_intr
 #define NCR5380_queue_command   macscsi_queue_command
@@ -303,6 +305,17 @@ static int macscsi_pwrite(struct Scsi_Ho
 }
 #endif
 
+static int macscsi_dma_xfer_len(struct Scsi_Host *instance,
+struct scsi_cmnd *cmd)
+{
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
+
+   if (hostdata->flags & FLAG_NO_PSEUDO_DMA)
+   return 0;
+
+   return cmd->transfersize;
+}
+
 #include "NCR5380.c"
 
 #define DRV_MODULE_NAME "mac_scsi"
Index: linux/drivers/scsi/g_NCR5380.c
===
--- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/g_NCR5380.c  2016-03-21 13:31:09.0 +1100
@@ -712,10 +712,15 @@ static inline int NCR5380_pwrite(struct
return 0;
 }
 
-static int generic_NCR5380_dma_xfer_len(struct scsi_cmnd *cmd)
+static int generic_NCR5380_dma_xfer_len(struct Scsi_Host *instance,
+struct scsi_cmnd *cmd)
 {
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
int transfersize = cmd->transfersize;
 
+   if (hostdata->flags & FLAG_NO_PSEUDO_DMA)
+   return 0;
+
/* Limit transfers to 32K, for xx400 & xx406
 * pseudoDMA that transfers in 128 bytes blocks.
 */
Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:09.0 +1100
@@ -1833,8 +1833,7 @@ static void NCR5380_information_transfer
 
 #if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL)
transfersize = 0;
-   if (!cmd->device->borken &&
-   !(hostdata->flags & FLAG_NO_PSEUDO_DMA))
+   if (!cmd->device->borken)
transfersize = 
NCR5380_dma_xfer_len(instance, cmd, phase);
 
if (transfersize) {
Index: linux/drivers/scsi/g_NCR5380.h
===
--- linux.orig/drivers/scsi/g_NCR5380.h 2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/g_NCR5380.h  2016-03-21 13:31:09.0 +1100
@@ -61,7 +61,7 @@
 #endif
 
 #define NCR5380_dma_xfer_len(instance, cmd, phase) \
-generic_NCR5380_dma_xfer_len(cmd)
+generic_NCR5380_dma_xfer_len(instance, cmd)
 
 #define NCR5380_intr generic_NCR5380_intr
 #define NCR5380_queue_command generic_NCR5380_queue_command




[PATCH v3 06/23] ncr5380: Remove PSEUDO_DMA macro

2016-03-20 Thread Finn Thain
For those wrapper drivers which only implement Programmed IO, have
NCR5380_dma_xfer_len() evaluate to zero. That allows PDMA to be easily
disabled at run-time and so the PSEUDO_DMA macro is no longer needed.

Also remove the spin counters used for debugging pseudo DMA drivers.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c  |   32 +---
 drivers/scsi/NCR5380.h  |4 
 drivers/scsi/arm/cumana_1.c |2 --
 drivers/scsi/arm/oak.c  |3 +--
 drivers/scsi/dmx3191d.c |4 
 drivers/scsi/dtc.c  |7 ---
 drivers/scsi/dtc.h  |2 --
 drivers/scsi/g_NCR5380.c|1 -
 drivers/scsi/g_NCR5380.h|1 -
 drivers/scsi/mac_scsi.c |   10 --
 drivers/scsi/pas16.c|   10 --
 drivers/scsi/pas16.h|2 --
 drivers/scsi/t128.c |4 
 drivers/scsi/t128.h |2 --
 14 files changed, 6 insertions(+), 78 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:14.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:16.0 +1100
@@ -469,34 +469,9 @@ static void prepare_info(struct Scsi_Hos
 #ifdef PARITY
 "PARITY "
 #endif
-#ifdef PSEUDO_DMA
-"PSEUDO_DMA "
-#endif
 "");
 }
 
-#ifdef PSEUDO_DMA
-static int __maybe_unused NCR5380_write_info(struct Scsi_Host *instance,
-   char *buffer, int length)
-{
-   struct NCR5380_hostdata *hostdata = shost_priv(instance);
-
-   hostdata->spin_max_r = 0;
-   hostdata->spin_max_w = 0;
-   return 0;
-}
-
-static int __maybe_unused NCR5380_show_info(struct seq_file *m,
-struct Scsi_Host *instance)
-{
-   struct NCR5380_hostdata *hostdata = shost_priv(instance);
-
-   seq_printf(m, "Highwater I/O busy spin counts: write %d, read %d\n",
-   hostdata->spin_max_w, hostdata->spin_max_r);
-   return 0;
-}
-#endif
-
 /**
  * NCR5380_init - initialise an NCR5380
  * @instance: adapter to configure
@@ -1436,7 +1411,6 @@ timeout:
return -1;
 }
 
-#if defined(PSEUDO_DMA)
 /*
  * Function : int NCR5380_transfer_dma (struct Scsi_Host *instance,
  * unsigned char *phase, int *count, unsigned char **data)
@@ -1592,7 +1566,6 @@ static int NCR5380_transfer_dma(struct S
*phase = NCR5380_read(STATUS_REG) & PHASE_MASK;
return foo;
 }
-#endif /* PSEUDO_DMA */
 
 /*
  * Function : NCR5380_information_transfer (struct Scsi_Host *instance)
@@ -1683,7 +1656,6 @@ static void NCR5380_information_transfer
 * in an unconditional loop.
 */
 
-#if defined(PSEUDO_DMA)
transfersize = 0;
if (!cmd->device->borken)
transfersize = 
NCR5380_dma_xfer_len(instance, cmd, phase);
@@ -1706,9 +1678,7 @@ static void NCR5380_information_transfer
/* XXX - need to source or sink 
data here, as appropriate */
} else
cmd->SCp.this_residual -= 
transfersize - len;
-   } else
-#endif /* PSEUDO_DMA */
-   {
+   } else {
/* Break up transfer into 3 ms chunks,
 * presuming 6 accesses per handshake.
 */
Index: linux/drivers/scsi/NCR5380.h
===
--- linux.orig/drivers/scsi/NCR5380.h   2016-03-21 13:31:14.0 +1100
+++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:16.0 +1100
@@ -257,10 +257,6 @@ struct NCR5380_hostdata {
 #ifdef SUPPORT_TAGS
struct tag_alloc TagAlloc[8][8];/* 8 targets and 8 LUNs */
 #endif
-#ifdef PSEUDO_DMA
-   unsigned spin_max_r;
-   unsigned spin_max_w;
-#endif
struct workqueue_struct *work_q;
unsigned long accesses_per_ms;  /* chip register accesses per ms */
 };
Index: linux/drivers/scsi/arm/cumana_1.c
===
--- linux.orig/drivers/scsi/arm/cumana_1.c  2016-03-21 13:31:14.0 
+1100
+++ linux/drivers/scsi/arm/cumana_1.c   2016-03-21 13:31:16.0 +1100
@@ -13,8 +13,6 @@
 
 #include 
 
-#define PSEUDO_DMA
-
 #define priv(host) ((struct NCR5380_hostdata 
*)(host)->hostdata)
 #define NCR5380_read(reg)  cumanascsi_read(instance, reg)
 #define NCR5380_write(reg, value)  cumanascsi_write(instance, reg, value)
Index: linux/drivers/scsi/arm/oak.c

[PATCH v3 21/23] atari_scsi: Allow can_queue to be increased for Falcon

2016-03-20 Thread Finn Thain
The benefit of limiting can_queue to 1 is that atari_scsi shares the
ST DMA chip more fairly with other drivers (e.g. falcon-ide).

Unfortunately, this can limit SCSI bus utilization. On systems without
IDE, atari_scsi should issue SCSI commands whenever it can arbitrate for
the bus. Make that possible by making can_queue configurable.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/atari_scsi.c |   83 --
 1 file changed, 22 insertions(+), 61 deletions(-)

Index: linux/drivers/scsi/atari_scsi.c
===
--- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:43.0 
+1100
+++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:44.0 +1100
@@ -14,55 +14,23 @@
  *
  */
 
-
-/**/
-/**/
-/* Notes for Falcon SCSI: */
-/* -- */
-/**/
-/* Since the Falcon SCSI uses the ST-DMA chip, that is shared among   */
-/* several device drivers, locking and unlocking the access to this   */
-/* chip is required. But locking is not possible from an interrupt,   */
-/* since it puts the process to sleep if the lock is not available.   */
-/* This prevents "late" locking of the DMA chip, i.e. locking it just */
-/* before using it, since in case of disconnection-reconnection   */
-/* commands, the DMA is started from the reselection interrupt.   */
-/**/
-/* Two possible schemes for ST-DMA-locking would be:  */
-/*  1) The lock is taken for each command separately and disconnecting*/
-/* is forbidden (i.e. can_queue = 1). */
-/*  2) The DMA chip is locked when the first command comes in and */
-/* released when the last command is finished and all queues are  */
-/* empty. */
-/* The first alternative would result in bad performance, since the   */
-/* interleaving of commands would not be used. The second is unfair to*/
-/* other drivers using the ST-DMA, because the queues will seldom be  */
-/* totally empty if there is a lot of disk traffic.   */
-/**/
-/* For this reasons I decided to employ a more elaborate scheme:  */
-/*  - First, we give up the lock every time we can (for fairness), this*/
-/*means every time a command finishes and there are no other commands */
-/*on the disconnected queue.  */
-/*  - If there are others waiting to lock the DMA chip, we stop   */
-/*issuing commands, i.e. moving them onto the issue queue.   */
-/*Because of that, the disconnected queue will run empty in a */
-/*while. Instead we go to sleep on a 'fairness_queue'.*/
-/*  - If the lock is released, all processes waiting on the fairness  */
-/*queue will be woken. The first of them tries to re-lock the DMA, */
-/*the others wait for the first to finish this task. After that,  */
-/*they can all run on and do their commands...*/
-/* This sounds complicated (and it is it :-(), but it seems to be a   */
-/* good compromise between fairness and performance: As long as no one */
-/* else wants to work with the ST-DMA chip, SCSI can go along as  */
-/* usual. If now someone else comes, this behaviour is changed to a   */
-/* "fairness mode": just already initiated commands are finished and  */
-/* then the lock is released. The other one waiting will probably win */
-/* the race for locking the DMA, since it was waiting for longer. And */
-/* after it has finished, SCSI can go ahead again. Finally: I hope I  */
-/* have not produced any deadlock possibilities!  */
-/**/
-/**/
-
+/*
+ * Notes for Falcon SCSI DMA
+ *
+ * The 5380 device is one of several that all share the DMA chip. Hence
+ * "locking" and "unlocking" access to this chip is required.
+ *
+ * Two possible schemes for ST DMA acquisition by atari_scsi are:
+ * 1) The lock is taken for each command separately (i.e. can_queue == 1).
+ * 2) The lock is taken when the first command arrives and released
+ * when the last command is finished (i.e. can_queue > 1).
+ *
+ * The firs

[PATCH v3 04/23] atari_NCR5380: Remove DMA_MIN_SIZE macro

2016-03-20 Thread Finn Thain
Only the atari_scsi and sun3_scsi drivers define DMA_MIN_SIZE.
Both drivers also define NCR5380_dma_xfer_len, which means
DMA_MIN_SIZE can be removed from the core driver.

This removes another discrepancy between the two core drivers.

Signed-off-by: Finn Thain 
Tested-by: Michael Schmitz 

---

Changes since v1:
- Retain MIN_DMA_SIZE macro in wrapper drivers.

---
 drivers/scsi/atari_NCR5380.c |   16 
 drivers/scsi/atari_scsi.c|6 +-
 drivers/scsi/sun3_scsi.c |   19 +--
 3 files changed, 22 insertions(+), 19 deletions(-)

Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-03-21 13:31:10.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-03-21 13:31:13.0 +1100
@@ -1857,12 +1857,11 @@ static void NCR5380_information_transfer
d = cmd->SCp.ptr;
}
/* this command setup for dma yet? */
-   if ((count >= DMA_MIN_SIZE) && 
(sun3_dma_setup_done != cmd)) {
-   if (cmd->request->cmd_type == 
REQ_TYPE_FS) {
-   sun3scsi_dma_setup(instance, d, 
count,
-  
rq_data_dir(cmd->request));
-   sun3_dma_setup_done = cmd;
-   }
+   if (sun3_dma_setup_done != cmd &&
+   sun3scsi_dma_xfer_len(count, cmd) > 0) {
+   sun3scsi_dma_setup(instance, d, count,
+  
rq_data_dir(cmd->request));
+   sun3_dma_setup_done = cmd;
}
 #ifdef SUN3_SCSI_VME
dregs->csr |= CSR_INTR;
@@ -1927,7 +1926,7 @@ static void NCR5380_information_transfer
 #endif
transfersize = 
NCR5380_dma_xfer_len(instance, cmd, phase);
 
-   if (transfersize >= DMA_MIN_SIZE) {
+   if (transfersize > 0) {
len = transfersize;
cmd->SCp.phase = phase;
if (NCR5380_transfer_dma(instance, 
&phase,
@@ -2366,7 +2365,8 @@ static void NCR5380_reselect(struct Scsi
d = tmp->SCp.ptr;
}
/* setup this command for dma if not already */
-   if ((count >= DMA_MIN_SIZE) && (sun3_dma_setup_done != tmp)) {
+   if (sun3_dma_setup_done != tmp &&
+   sun3scsi_dma_xfer_len(count, tmp) > 0) {
sun3scsi_dma_setup(instance, d, count,
   rq_data_dir(tmp->request));
sun3_dma_setup_done = tmp;
Index: linux/drivers/scsi/atari_scsi.c
===
--- linux.orig/drivers/scsi/atari_scsi.c2016-03-21 13:31:10.0 
+1100
+++ linux/drivers/scsi/atari_scsi.c 2016-03-21 13:31:13.0 +1100
@@ -83,11 +83,12 @@
 
 #include 
 
+#define DMA_MIN_SIZE32
+
 /* Definitions for the core NCR5380 driver. */
 
 #define SUPPORT_TAGS
 #define MAX_TAGS32
-#define DMA_MIN_SIZE32
 
 #define NCR5380_implementation_fields   /* none */
 
@@ -605,6 +606,9 @@ static unsigned long atari_dma_xfer_len(
 {
unsigned long   possible_len, limit;
 
+   if (wanted_len < DMA_MIN_SIZE)
+   return 0;
+
if (IS_A_TT())
/* TT SCSI DMA can transfer arbitrary #bytes */
return wanted_len;
Index: linux/drivers/scsi/sun3_scsi.c
===
--- linux.orig/drivers/scsi/sun3_scsi.c 2016-03-21 13:31:10.0 +1100
+++ linux/drivers/scsi/sun3_scsi.c  2016-03-21 13:31:13.0 +1100
@@ -36,12 +36,12 @@
 #include 
 #include "sun3_scsi.h"
 
-/* Definitions for the core NCR5380 driver. */
-
-/* #define SUPPORT_TAGS */
 /* minimum number of bytes to do dma on */
 #define DMA_MIN_SIZE129
 
+/* Definitions for the core NCR5380 driver. */
+
+/* #define SUPPORT_TAGS */
 /* #define MAX_TAGS 32 */
 
 #define NCR5380_implementation_fields   /* none */
@@ -61,7 +61,7 @@
 #define NCR5380_dma_residual(instance) \
 sun3scsi_dma_residual(instance)
 #define NCR5380_dma_xfer_len(instance, cmd, phase) \
-sun3scsi_dma_xfer_len(cmd->SCp.this_residual, cmd, !((phase) & SR_IO))
+sun3scsi_dma_xfer_len(cmd->SCp.this_residual, cmd)
 
 #define NCR5380_acquire_dma_irq(instance)(1)
 #define N

[PATCH v3 00/23] ncr5380: Eliminate macros, reduce code duplication, fix bugs etc

2016-03-20 Thread Finn Thain

This patch series has more macro elimination and some tweaks to the
DMA hooks so that all the wrapper drivers can share the same core
DMA algorithm. This resolves the major discrepancies between the two
core drivers, which relate to code conditional on the REAL_DMA and
PSEUDO_DMA macros.

After all the wrapper drivers agree on the DMA hook api, the core driver
fork gets resolved. NCR5380.c is adopted by atari_scsi and sun3_scsi and
atari_NCR5380.c is then deleted.

Historically, the 5380 drivers suffered from over-use of conditional
compilation, which caused the compile-time configuration space to explode,
leading to core driver code that was practically untestable, unmaintainable
and difficult to reason about. It also prevented driver modules from
sharing object code.

Along with REAL_DMA, REAL_DMA_POLL and PSEUDO_DMA, most of the remaining
macros are also eradicated, such as CONFIG_SCSI_GENERIC_NCR53C400,
SUPPORT_TAGS, DONT_USE_INTR, AUTOPROBE_IRQ and BIOSPARAM.

Also in this patch series, some duplicated documentation is removed and
the PDMA implementation in mac_scsi finally gets fixed.

This patch series was tested by exercising the dmx3191d and mac_scsi modules
on suitable hardware. Michael has tested atari_scsi on an Atari Falcon.
Help with driver testing on ISA cards is sought as I don't have such
hardware. Likewise RiscPC ecards and Sun 3.

Changes since v1:
- Patch 4: don't remove MIN_DMA_SIZE macro from wrapper drivers.
- Patch 9: improve commit log entry and add 'Reviewed-by' tag.
- Patch 14: reduce shost->max_lun limit instead of adding MAX_LUN limit.
- Patches 20 and 22: set the default cmd_per_lun to 4.
- For the rest: add 'Reviewed-by' tag.

Changes since v2:
- Patches 20 and 22: revert the default cmd_per_lun to 2, like the v1 patch
series.
- Add patch 23 to fix a theoretical bus reset/autosense issue.

---
 Documentation/scsi/g_NCR5380.txt   |   17 
 Documentation/scsi/scsi-parameters.txt |   11 
 drivers/scsi/Kconfig   |   11 
 drivers/scsi/NCR5380.c |  659 
 drivers/scsi/NCR5380.h |  143 -
 drivers/scsi/arm/cumana_1.c|   25 
 drivers/scsi/arm/oak.c |   22 
 drivers/scsi/atari_NCR5380.c   | 2676 -
 drivers/scsi/atari_scsi.c  |  144 -
 drivers/scsi/dmx3191d.c|   10 
 drivers/scsi/dtc.c |   27 
 drivers/scsi/dtc.h |7 
 drivers/scsi/g_NCR5380.c   |  143 -
 drivers/scsi/g_NCR5380.h   |   26 
 drivers/scsi/mac_scsi.c|  239 +-
 drivers/scsi/pas16.c   |   27 
 drivers/scsi/pas16.h   |5 
 drivers/scsi/sun3_scsi.c   |   47 
 drivers/scsi/t128.c|   19 
 drivers/scsi/t128.h|7 
 20 files changed, 634 insertions(+), 3631 deletions(-)






[PATCH v3 05/23] ncr5380: Disable the DMA errata workaround flag by default

2016-03-20 Thread Finn Thain
The only chip that needs the workarounds enabled is an early NMOS
device. That means that the common case is to disable them.

Unfortunately the sense of the flag is such that it has to be set
for the common case.

Rename the flag so that zero can be used to mean "no errata workarounds
needed". This simplifies the code.

Signed-off-by: Finn Thain 
Reviewed-by: Hannes Reinecke 
Tested-by: Michael Schmitz 

---
 drivers/scsi/NCR5380.c  |   14 +++---
 drivers/scsi/NCR5380.h  |2 +-
 drivers/scsi/arm/cumana_1.c |2 +-
 drivers/scsi/arm/oak.c  |2 +-
 drivers/scsi/dtc.c  |2 +-
 drivers/scsi/g_NCR5380.c|8 +---
 drivers/scsi/pas16.c|2 +-
 drivers/scsi/t128.c |2 +-
 8 files changed, 14 insertions(+), 20 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-03-21 13:31:10.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-03-21 13:31:14.0 +1100
@@ -457,7 +457,7 @@ static void prepare_info(struct Scsi_Hos
 instance->base, instance->irq,
 instance->can_queue, instance->cmd_per_lun,
 instance->sg_tablesize, instance->this_id,
-hostdata->flags & FLAG_NO_DMA_FIXUP  ? "NO_DMA_FIXUP "  : "",
+hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "",
 hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "",
 hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY "  : "",
 #ifdef AUTOPROBE_IRQ
@@ -1480,11 +1480,11 @@ static int NCR5380_transfer_dma(struct S
 * before the setting of DMA mode to after transfer of the last byte.
 */
 
-   if (hostdata->flags & FLAG_NO_DMA_FIXUP)
+   if (hostdata->flags & FLAG_DMA_FIXUP)
+   NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY);
+   else
NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY |
MR_ENABLE_EOP_INTR);
-   else
-   NCR5380_write(MODE_REG, MR_BASE | MR_DMA_MODE | MR_MONITOR_BSY);
 
dprintk(NDEBUG_DMA, "scsi%d : mode reg = 0x%X\n", instance->host_no, 
NCR5380_read(MODE_REG));
 
@@ -1540,8 +1540,8 @@ static int NCR5380_transfer_dma(struct S
 
if (p & SR_IO) {
foo = NCR5380_pread(instance, d,
-   hostdata->flags & FLAG_NO_DMA_FIXUP ? c : c - 1);
-   if (!foo && !(hostdata->flags & FLAG_NO_DMA_FIXUP)) {
+   hostdata->flags & FLAG_DMA_FIXUP ? c - 1 : c);
+   if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
 * The workaround was to transfer fewer bytes than we
 * intended to with the pseudo-DMA read function, wait 
for
@@ -1571,7 +1571,7 @@ static int NCR5380_transfer_dma(struct S
}
} else {
foo = NCR5380_pwrite(instance, d, c);
-   if (!foo && !(hostdata->flags & FLAG_NO_DMA_FIXUP)) {
+   if (!foo && (hostdata->flags & FLAG_DMA_FIXUP)) {
/*
 * Wait for the last byte to be sent.  If REQ is being 
asserted for
 * the byte we're interested, we'll ACK it and it will 
go false.
Index: linux/drivers/scsi/NCR5380.h
===
--- linux.orig/drivers/scsi/NCR5380.h   2016-03-21 13:31:10.0 +1100
+++ linux/drivers/scsi/NCR5380.h2016-03-21 13:31:14.0 +1100
@@ -220,7 +220,7 @@
 #define NO_IRQ 0
 #endif
 
-#define FLAG_NO_DMA_FIXUP  1   /* No DMA errata workarounds */
+#define FLAG_DMA_FIXUP 1   /* Use DMA errata workarounds */
 #define FLAG_NO_PSEUDO_DMA 8   /* Inhibit DMA */
 #define FLAG_LATE_DMA_SETUP32  /* Setup NCR before DMA H/W */
 #define FLAG_TAGGED_QUEUING64  /* as X3T9.2 spelled it */
Index: linux/drivers/scsi/dtc.c
===
--- linux.orig/drivers/scsi/dtc.c   2016-03-21 13:31:07.0 +1100
+++ linux/drivers/scsi/dtc.c2016-03-21 13:31:14.0 +1100
@@ -229,7 +229,7 @@ found:
instance->base = addr;
((struct NCR5380_hostdata *)(instance)->hostdata)->base = base;
 
-   if (NCR5380_init(instance, FLAG_NO_DMA_FIXUP))
+   if (NCR5380_init(instance, 0))
goto out_unregister;
 
NCR5380_maybe_reset_bus(instance);
Index: linux/drivers/scsi/g_NCR5380.c
===
--- linux.orig/drivers/scsi/g_NCR5380.c 2016-03-21 13:31:09.0 +1100
+++ linux/drivers/scsi/g_NCR5380.c  2016-03-21 13:31:14.0 +1100
@@ -348,23 +348,17 @@ static in

Re: [PATCH 0/2] ARM: uniphier: UniPhier updates for Linux 4.6-rc1 (2nd round)

2016-03-20 Thread Masahiro Yamada
Hi Arnd,


2016-03-19 8:49 GMT+09:00 Masahiro Yamada :
> Hi Arnd,
>
> 2016-03-19 1:49 GMT+09:00 Arnd Bergmann :
>> On Tuesday 15 March 2016 11:01:00 Masahiro Yamada wrote:
>>> Olof, Arnd,
>>>
>>>
>>> I sent my patches around -rc4 and
>>> took action soon as requested.
>>>
>>> But, my series is still not applied due to the long silence on your side.
>>>
>>> Please respond!
>>
>> Sorry for all the delays, we screwed this one up, and you did everything
>> right. I have put the DT changes into the next/dt2 branch now, and applied
>> the two other patches to next/soc directly.
>>
>> Please check that the for-next branch in arm-soc has everything you need now.
>>
>
> I checked both branch and everything is fine.
>
> Thanks you very much!


I thought you'd include DT updates in the pull requests,
but you didn't.

Why was next/dt2 missed?



-- 
Best Regards
Masahiro Yamada


[git pull] drm pull for 4.6-rc1

2016-03-20 Thread Dave Airlie
Hi Linus,

This is the main drm pull request for 4.6 kernel.

The highlights are below, and there are a few merge conflicts,
but I think they should all be simple enough for you to take
care off. At least at the moment they are just the writecombine
interface changes.

Overall the coolest thing here for me is the nouveau maxwell
signed firmware support from NVidia, it's taken a long while
to extract this from them.

I also wish the ARM vendors just designed one set of display IP,
ARM display block proliferation is definitely increasing.

Core:
drm_event cleanups
Internal API cleanup making mode_fixup optional.
Apple GMUX vga switcheroo support.
DP AUX testing interface

Panel:
Refactoring of DSI core for use over more transports.

New driver:
ARM hdlcd driver

i915:
FBC/PSR (framebuffer compression, panel self refresh) enabled by 
default.
Ongoing atomic display support work
Ongoing runtime PM work
Pixel clock limit checks
VBT DSI description support
GEM fixes
GuC firmware scheduler enhancements

amdkfd:
Deferred probing fixes to avoid make file or link ordering.

amdgpu/radeon:
ACP support for i2s audio support.
Command Submission/GPU scheduler/GPUVM optimisations
Initial GPU reset support for amdgpu

vmwgfx:
Support for DX10 gen mipmaps
Pageflipping and other fixes.

exynos:
Exynos5420 SoC support for FIMD
Exynos5422 SoC support for MIPI-DSI

nouveau:
GM20x secure boot support - adds acceleration for Maxwell GPUs.
GM200 support
GM20B clock driver support
Power sensors work

etnaviv:
Correctness fixes for GPU cache flushing
Better support for i.MX6 systems.

imx-drm:
VBlank IRQ support
Fence support
OF endpoint support

msm:
HDMI support for 8996 (snapdragon 820)
Adreno 430 support
Timestamp queries support

virtio-gpu:
Fixes for Android support.

rockchip:
Add support for Innosilicion HDMI

rcar-du:
Support for 4 crtcs
R8A7795 support
RCar Gen 3 support

omapdrm:
HDMI interlace output support
dma-buf import support
Refactoring to remove a lot of legacy code.

tilcdc:
Rewrite of pageflipping code
dma-buf support
pinctrl support

vc4:
HDMI modesetting bug fixes
Significant 3D performance improvement.

fsl-dcu (FreeScale):
Lots of fixes

tegra:
Two small fixes

sti:
Atomic support for planes
Improved HDMI support

The following changes since commit 2a4fb270daa9c1f1d1b86a53d66ed86cc64ad232:

  Merge tag 'armsoc-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2016-03-11 12:35:54 
-0800)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-next

for you to fetch changes up to 568d7c764ae01f3706085ac8f0d8a8ac7e826bd7:

  drm/amdgpu: release_pages requires linux/pagemap.h (2016-03-21 13:22:52 +1000)


Abhay Kumar (1):
  drm/i915: edp resume/On time optimization.

Akshay Bhat (1):
  drm/panel: simple: Fix g121x1_l03 hsync/vsync polarity

Alan (1):
  i915: cast before shifting in i915_pte_count

Alan Cox (1):
  gma500: clean up an excessive and confusing helper

Alex Dai (7):
  drm/i915/guc: Move GuC wq_check_space to alloc_request_extras
  drm/i915/guc: Add GuC ADS (Addition Data Structure) - allocation
  drm/i915/guc: Add GuC ADS - scheduler policies
  drm/i915/guc: Add GuC ADS - MMIO reg state
  drm/i915/guc: Add GuC ADS - enabling ADS
  drm/i915/guc: Fix a memory leak where guc->execbuf_client is not freed
  drm/i915/guc: Decouple GuC engine id from ring id

Alex Deucher (33):
  drm/amdgpu: remove some more semaphore leftovers
  drm/amdgpu: clean up asic level reset for CI
  drm/amdgpu: clean up asic level reset for VI
  drm/amdgpu: post card after hard reset
  drm/amdgpu: add a debugfs property to trigger a GPU reset
  drm/amdgpu: drop hard_reset module parameter
  drm/amd: add dce8 enum register header
  drm/amdgpu: remove unused function
  drm/amdgpu: add check for atombios GPU virtualization table
  drm/amdgpu: track whether the asic supports SR-IOV
  drm/amdgpu: always repost cards that support SR-IOV
  drm/amdgpu/gmc8: skip MC ucode loading on SR-IOV capable boards
  drm/amdgpu/smu: skip SMC ucode loading on SR-IOV capable boards (v2)
  drm/amdgpu/gfx: minor code cleanup
  drm/amdgpu/gfx: clean up harvest configuration (v2)
  drm/amdgpu/gfx7: rework gpu_init()
  drm/amdgpu/cik: move sdma tiling config setup into sdma code
  drm/amdgpu/cik: move uvd tiling config setup into uvd code
  drm/amdgpu/vi: move sdma tiling config setup into sdma code
  drm/amdgpu/vi: 

Re: [PATCH v3] staging: netlogic: Fixed alignment of parentheseis checkpatch warning

2016-03-20 Thread Valdis . Kletnieks
On Sat, 19 Mar 2016 19:22:09 -0700, Joe Perches said:
> On Sun, 2016-03-20 at 07:48 +0530, Parth Sane wrote:
> > Hi,
> > Thanks for pointing out that the changes have been done. Nevertheless
> > this was a good learning exercise. How do I check which changes have
> > already been done?
>
> Use this tree:
>
> http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git

And note that doing a 'git clone' of this won't do what you want..

What you *want* to do:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ git remote add linux-next 
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
$ git fetch --tags linux-next

This will get you a tree that you can actually work with...
... # later on (linux-next is updated most weekdays)
$ git remote update
to find out what the current tree looks like.

You do *not* want to use 'git pull' against linux-next because it rebases
every night



pgpQGfpChrT6Q.pgp
Description: PGP signature


Re: [PATCH 2/3] x86/topology: Fix AMD core count

2016-03-20 Thread Huang Rui
On Mon, Mar 21, 2016 at 11:07:44AM +0800, Huang Rui wrote:
> On Fri, Mar 18, 2016 at 05:41:01PM +0100, Borislav Petkov wrote:
> > On Fri, Mar 18, 2016 at 04:03:47PM +0100, Peter Zijlstra wrote:
> > > It turns out AMD gets x86_max_cores wrong when there are compute
> > > units.
> > > 
> > > The issue is that Linux assumes:
> > > 
> > >   nr_logical_cpus = nr_cores * nr_siblings
> > > 
> > > But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
> > > to 2 as well.
> > > 
> > > Cc: Ingo Molnar 
> > > Cc: Borislav Petkov 
> > > Cc: Thomas Gleixner 
> > > Cc: Andreas Herrmann 
> > > Reported-by: Xiong Zhou 
> > > Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > Link: 
> > > http://lkml.kernel.org/r/20160317095220.go6...@twins.programming.kicks-ass.net
> > > ---
> > >  arch/x86/kernel/cpu/amd.c |8 
> > >  arch/x86/kernel/smpboot.c |   11 ++-
> > >  2 files changed, 10 insertions(+), 9 deletions(-)
> > > 
> > > --- a/arch/x86/kernel/cpu/amd.c
> > > +++ b/arch/x86/kernel/cpu/amd.c
> > > @@ -313,9 +313,9 @@ static void amd_get_topology(struct cpui
> > >   node_id = ecx & 7;
> > >  
> > >   /* get compute unit information */
> > > - smp_num_siblings = ((ebx >> 8) & 3) + 1;
> > > + cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
> > > + c->x86_max_cores /= smp_num_siblings;
> > >   c->compute_unit_id = ebx & 0xff;
> > > - cores_per_cu += ((ebx >> 8) & 3);
> > >   } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
> > >   u64 value;
> > >  
> > > @@ -331,8 +331,8 @@ static void amd_get_topology(struct cpui
> > >   u32 cus_per_node;
> > >  
> > >   set_cpu_cap(c, X86_FEATURE_AMD_DCM);
> > > - cores_per_node = c->x86_max_cores / nodes_per_socket;
> > > - cus_per_node = cores_per_node / cores_per_cu;
> > > + cus_per_node = c->x86_max_cores / nodes_per_socket;
> > > + cores_per_node = cus_per_node * cores_per_cu;
> > >  
> > >   /* store NodeID, use llc_shared_map to store sibling info */
> > >   per_cpu(cpu_llc_id, cpu) = node_id;
> > 
> > Looks ok to me, however it probably would be prudent if AMD tested it on
> > a bunch of machines just to make sure we don't break anything else. I'm
> > thinking F15h and F16h, something big...
> > 
> > Rui, can you find some time to run this one please?
> > 
> > Look at before/after info in /proc/cpuinfo, topology in sysfs and dmesg
> > before and after might be useful too.
> > 
> 
> OK, we will find some fam15h, fam16h platforms to verify it. Please
> wait for my feedback.
> 
> But I am confused with c->x86_max_cores /= smp_num_siblings, what is
> the real meaning of c->x86_max_cores here for AMD, the whole compute
> unit numbers per socket?
> 
> + Sherry, for her awareness.
> 

I quickly applied this patch on tip/master with on a fam15h machine.
The issue is still existed, only one core can be detected.

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 2
model name  : AMD Opteron(tm) Processor 6386 SE
stepping: 0
microcode   : 0x6000822
cpu MHz : 2792.882
cache size  : 2048 KB
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock 
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs: fxsave_leak sysret_ss_attrs
bogomips: 5585.76
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro


Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):1
On-line CPU(s) list:   0
Thread(s) per core:1
Core(s) per socket:1
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family:21
Model: 2
Stepping:  0
CPU MHz:   2792.882
BogoMIPS:  5585.76
Virtualization:AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache:  2048K
L3 cache:  6144K

Thanks,
Rui


Re: [PATCH 2/3] x86/topology: Fix AMD core count

2016-03-20 Thread Huang Rui
On Fri, Mar 18, 2016 at 05:41:01PM +0100, Borislav Petkov wrote:
> On Fri, Mar 18, 2016 at 04:03:47PM +0100, Peter Zijlstra wrote:
> > It turns out AMD gets x86_max_cores wrong when there are compute
> > units.
> > 
> > The issue is that Linux assumes:
> > 
> > nr_logical_cpus = nr_cores * nr_siblings
> > 
> > But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
> > to 2 as well.
> > 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: Thomas Gleixner 
> > Cc: Andreas Herrmann 
> > Reported-by: Xiong Zhou 
> > Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Link: 
> > http://lkml.kernel.org/r/20160317095220.go6...@twins.programming.kicks-ass.net
> > ---
> >  arch/x86/kernel/cpu/amd.c |8 
> >  arch/x86/kernel/smpboot.c |   11 ++-
> >  2 files changed, 10 insertions(+), 9 deletions(-)
> > 
> > --- a/arch/x86/kernel/cpu/amd.c
> > +++ b/arch/x86/kernel/cpu/amd.c
> > @@ -313,9 +313,9 @@ static void amd_get_topology(struct cpui
> > node_id = ecx & 7;
> >  
> > /* get compute unit information */
> > -   smp_num_siblings = ((ebx >> 8) & 3) + 1;
> > +   cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
> > +   c->x86_max_cores /= smp_num_siblings;
> > c->compute_unit_id = ebx & 0xff;
> > -   cores_per_cu += ((ebx >> 8) & 3);
> > } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
> > u64 value;
> >  
> > @@ -331,8 +331,8 @@ static void amd_get_topology(struct cpui
> > u32 cus_per_node;
> >  
> > set_cpu_cap(c, X86_FEATURE_AMD_DCM);
> > -   cores_per_node = c->x86_max_cores / nodes_per_socket;
> > -   cus_per_node = cores_per_node / cores_per_cu;
> > +   cus_per_node = c->x86_max_cores / nodes_per_socket;
> > +   cores_per_node = cus_per_node * cores_per_cu;
> >  
> > /* store NodeID, use llc_shared_map to store sibling info */
> > per_cpu(cpu_llc_id, cpu) = node_id;
> 
> Looks ok to me, however it probably would be prudent if AMD tested it on
> a bunch of machines just to make sure we don't break anything else. I'm
> thinking F15h and F16h, something big...
> 
> Rui, can you find some time to run this one please?
> 
> Look at before/after info in /proc/cpuinfo, topology in sysfs and dmesg
> before and after might be useful too.
> 

OK, we will find some fam15h, fam16h platforms to verify it. Please
wait for my feedback.

But I am confused with c->x86_max_cores /= smp_num_siblings, what is
the real meaning of c->x86_max_cores here for AMD, the whole compute
unit numbers per socket?

+ Sherry, for her awareness.

Thanks,
Rui


RE: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection

2016-03-20 Thread beanhuo
Hi, Yunhai
You mean that EVCR.bit7 cannot clear(enable quad mode) if not write SR.bit7 to 
0?
They don't have any connection each other.

> -Original Message-
> From: Yunhui Cui [mailto:yunhui@nxp.com]
> Sent: Friday, March 18, 2016 6:09 PM
> To: Bean Huo 霍斌斌 (beanhuo); Yunhui Cui
> Cc: linux-...@lists.infradead.org; dw...@infradead.org;
> computersforpe...@gmail.com; han...@freescale.com;
> linux-kernel@vger.kernel.org; linux-...@lists.infradead.org;
> linux-arm-ker...@lists.infradead.org; Yao Yuan
> Subject: RE: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection
> 
> Hi Bean,
> 
> Thanks for your suggestions very much.
> Yes, the flash N25Q128A status register write enable/disable bit is disable in
> initial state.
> But, This patch aims to clear status registerV bit[7](write enable/disable 
> bit) to
> 0, which enables the bit.
> Frankly speaking, I also don't want to add this patch.
> The reason for this is that clear status register bit[7] to 0 is a must to 
> set quad
> mode to Enhanced Volatile Configuration Register using command
> SPINOR_OP_WD_EVCR. Otherwise it will output "Micron EVCR Quad bit not
> clear" in spi-nor.c I looked up the datasheet, but I really don't find out any
> connection between status register bit[7](write enable/disable bit) equals 0
> and seting quad mode to Enhanced Volatile Configuration Register.
> 
> Just as I want to send the issue to Micron team , could you give me some
> solutions ?
> 
> 
> Thanks
> Yunhui
> 
> -Original Message-
> From: Bean Huo 霍斌斌 (beanhuo) [mailto:bean...@micron.com]
> Sent: Thursday, March 03, 2016 9:39 PM
> To: Yunhui Cui
> Cc: linux-...@lists.infradead.org; dw...@infradead.org;
> computersforpe...@gmail.com; han...@freescale.com;
> linux-kernel@vger.kernel.org; linux-...@lists.infradead.org;
> linux-arm-ker...@lists.infradead.org; Yao Yuan; Yunhui Cui
> Subject: Re: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW protection
> 
> > From: Yunhui Cui 
> > To: , ,
> > 
> > Cc: , ,
> > , , Yunhui
> > Cui
> > 
> > Subject: [PATCH v3 4/4] mtd: spi-nor: Disable Micron flash HW
> > protection
> > Message-ID:
> <1456988044-37061-4-git-send-email-b56...@freescale.com>
> > Content-Type: text/plain
> >
> > From: Yunhui Cui 
> >
> > For Micron family ,The status register write enable/disable bit,
> > provides hardware data protection for the device.
> > When the enable/disable bit is set to 1, the status register
> > nonvolatile bits become read-only and the WRITE STATUS REGISTER
> > operation will not execute.
> >
> > Signed-off-by: Yunhui Cui 
> > ---
> >  drivers/mtd/spi-nor/spi-nor.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/mtd/spi-nor/spi-nor.c
> > b/drivers/mtd/spi-nor/spi-nor.c index ed0c19c..917f814 100644
> > --- a/drivers/mtd/spi-nor/spi-nor.c
> > +++ b/drivers/mtd/spi-nor/spi-nor.c
> > @@ -39,6 +39,7 @@
> >
> >  #define SPI_NOR_MAX_ID_LEN 6
> >  #define SPI_NOR_MAX_ADDR_WIDTH 4
> > +#define SPI_NOR_MICRON_WRITE_ENABLE0x7f
> >
> >  struct flash_info {
> > char*name;
> > @@ -1238,6 +1239,14 @@ int spi_nor_scan(struct spi_nor *nor, const
> > char *name, enum read_mode mode)
> > write_sr(nor, 0);
> > }
> >
> > +   if (JEDEC_MFR(info) == SNOR_MFR_MICRON) {
> > +   ret = read_sr(nor);
> > +   ret &= SPI_NOR_MICRON_WRITE_ENABLE;
> > +
> For Micron the status register write enable/disable bit, its default/factory
> value is disable.
> Can here first check ,then program?
> > +   write_enable(nor);
> > +   write_sr(nor, ret);
> > +   }
> > +
> > if (!mtd->name)
> > mtd->name = dev_name(dev);
> > mtd->priv = nor;


Re: [LKP] [lkp] [futex] 65d8fc777f: +25.6% will-it-scale.per_process_ops

2016-03-20 Thread Huang, Ying
Hi, Thomas,

Thanks a lot for your valuable input!

Thomas Gleixner  writes:

> On Fri, 18 Mar 2016, Huang, Ying wrote:
>> Usually we will put most important change we think in the subject of the
>> mail, for this email, it is,
>> 
>> +25.6% will-it-scale.per_process_ops
>
> That is confusing on it's own, because the reader does not know at all whether
> this is an improvement or a regression.
>
> So something like this might be useful:
>
> Subject: subsystem: 12digitsha1: 25% performance improvement
>
> or in some other case
>
> Subject: subsystem: 12digitsha1: 25% performance regression
>
> So in the latter case I will look into that mail immediately. The improvement
> one can wait until I have cared about urgent stuff.
>
> In the subject line it is pretty much irrelevant which foo-bla-ops test has
> produced that result. It really does not matter. If it's a regression, it's
> urgent. If it's an improvement it's informal and it can wait to be read.
>
> So in that case it would be:
>
> futex: 65d8fc777f6d: 25% performance improvement
>
> You can grab the subsystem prefix from the commit.

We will include regression/improvement information in subject at least.

>> and, we try to put most important changes at the top of the comparison
>> result below.  That is the will-it-scale.xxx below.
>> 
>> We are thinking about how to improve this.  You input is valuable for
>> us.  We are thinking change the "below changes" line to something like
>> below.
>> 
>> FYI, we noticed the +25.6% will-it-scale.per_process_ops improvement on
>> ...
>> 
>> Does this looks better?
>
> A bit, but it still does not tell me much. It's completely non obvious what
> 'will-it-scale.per_process_ops' means.

will-it-scale is a test suite, per_process_ops is one of its results.
That is the convention used in original report.

> Let me give you an example how a useful
> and easy to understand summary of the change could look like:
>
>
>  FYI, we noticed 25.6% performance improvement due to commit
>
>65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()"
>
>  in the will-it-scale.per_process_ops test.
>
>  will-it-scale.per_process_ops tests the futex operations for process shared
>  futexes (Or whatever that test really does).

There is a futex sub test case for will-it-scale test suite.  But I got your
point, we need some description for the test case.  If email is not too
limited for the full description, we will put it in some web site and
include short description and link to the full description in email.

>  The commit has no significant impact on any other test in the test suite.

Sorry, we have no enough machine power to test all test cases for each
bisect result.  So we will have no such information until we find a way
to do that.

> So those few lines tell precisely what this is about. It's something I already
> expected, so I really can skip the rest of the mail unless I'm interested in
> reproducing the result.

We will put important information at first of the email.  And details
later.  Better to have clear mark.  So people can get important
information and ignore the details if they don't want.

> Now lets look at a performance regression.
>
> Subject: futex: 65d8fc777f6d: 25% performance regression
>
>  FYI, we noticed a 25.2% performance regression due to commit
>
>  65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()"
>
>  in the will-it-scale.per_process_ops test.
>
>  will-it-scale.per_process_ops tests the futex operations for process shared
>  futexes (Or whatever that test really does).
>
>  The commit has no significant impact on any other test in the test suite.
>
> In that case I will certainly be interested how to reproduce that test. So I
> need the following information:
>
> Machine description: Intel IvyBridge 2 sockets, 32 cores, 64G RAM
> Config file: http://wherever.you.store/your/results/test-nr/config

We have some information about this before.  But not organized good
enough, will improve it.

> Test: 
> http://wherever.you.store/your/tests/will-it-scale.per_process_ops.tar.bz2
>
> That tarball should contain:
>
>  README
>  test_script.sh
>  test_binary
>
> README should tell:
>
>will-it-scale.per_process_ops
>
>Short explanation of the test
>
>Preliminaries: 
>   - perf
>   - whatever
>
> So that allows me to reproduce that test more or less with no effort. And
> that's the really important part.

For reproducing, now we use lkp-tests tool, which includes scripts to
build the test case, run the test, collect various information, compare
the test result, with the job file attached with the report email.  That
is not the easiest way, we will continuously improve it.

> You can provide nice charts and full comparison tables for all tests on a web
> site for those who are interested in large stats and pretty charts.
>
> Full results: http://wherever.you.store/your/results/test-nr/results

Before we have a website for detaile

[PATCH] regulator: Lookup unresolved parent supplies before regulators cleanup

2016-03-20 Thread Javier Martinez Canillas
Commit 6261b06de565 ("regulator: Defer lookup of supply to regulator_get")
moved the regulator supplies lookup logic from the regulators registration
to the regulators get time.

Unfortunately, that changed the behavior of the regulator core since now a
parent supply with a child regulator marked as always-on, won't be enabled
unless a client driver attempts to get the child regulator during boot.

This patch makes the unresolved parent supplies to be looked up before the
regulators late cleanup, so those with a child marked as always on will be
enabled regardless if a driver attempted to get the child regulator or not.

That was the behavior before the mentioned commit, since parent supplies
were looked up at regulator registration time instead of during child get.

Cc:  # 4.3+
Fixes: 6261b06de565 ("regulator: Defer lookup of supply to regulator_get")
Signed-off-by: Javier Martinez Canillas 

---
Hello,

The commit that caused this issue landed into v4.1 but $SUBJECT can't be
cherry-picked to older kernel versions than v4.3 without causing a merge
conflict. So I added v4.3+ to stable, please let me know if isn't right.

Best regards,
Javier

 drivers/regulator/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 6dd63523bcfe..15dbb771e1d8 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4376,6 +4376,11 @@ static int __init regulator_init(void)
 /* init early to allow our consumers to complete system booting */
 core_initcall(regulator_init);
 
+static int __init regulator_late_resolve_supply(struct device *dev, void *data)
+{
+   return regulator_resolve_supply(dev_to_rdev(dev));
+}
+
 static int __init regulator_late_cleanup(struct device *dev, void *data)
 {
struct regulator_dev *rdev = dev_to_rdev(dev);
@@ -4436,6 +4441,14 @@ static int __init regulator_init_complete(void)
if (of_have_populated_dt())
has_full_constraints = true;
 
+   /* At this point there may be regulators that were not looked
+* up by a client driver, so its parent supply was not resolved
+* and could be wrongly disabled when needed to remain enabled
+* to meet their child constraints.
+*/
+   class_for_each_device(®ulator_class, NULL, NULL,
+ regulator_late_resolve_supply);
+
/* If we have a full configuration then disable any regulators
 * we have permission to change the status for and which are
 * not in use or always_on.  This is effectively the default
-- 
2.5.0



[PATCH] regulator: Remove unneded check for regulator supply

2016-03-20 Thread Javier Martinez Canillas
The regulator_resolve_supply() function checks if a supply has been
associated with a regulator to avoid enabling it if that is not the
case.

But the supply was already looked up with regulator_resolve_supply()
and set with set_supply() before the check and both return on error.

So the fact that this statement has been reached means that neither
of them failed and a supply must be associated with the regulator.

Signed-off-by: Javier Martinez Canillas 

---

 drivers/regulator/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index e0b764284773..6dd63523bcfe 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1532,7 +1532,7 @@ static int regulator_resolve_supply(struct regulator_dev 
*rdev)
}
 
/* Cascade always-on state to supply */
-   if (_regulator_is_enabled(rdev) && rdev->supply) {
+   if (_regulator_is_enabled(rdev)) {
ret = regulator_enable(rdev->supply);
if (ret < 0) {
_regulator_put(rdev->supply);
-- 
2.5.0



RE: [PATCH v5 3/5] ARM: at91: pm: configure PMC fast startup signals

2016-03-20 Thread Yang, Wenyou
Hi Alexandre,

> -Original Message-
> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com]
> Sent: 2016年3月18日 1:15
> To: Yang, Wenyou 
> Cc: Ferre, Nicolas ; Jean-Christophe Plagniol-
> Villard ; Russell King ; linux-
> ker...@vger.kernel.org; devicet...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux-...@vger.kernel.org; Rob Herring
> ; Pawel Moll ; Mark Brown
> ; Ian Campbell ; Kumar
> Gala 
> Subject: Re: [PATCH v5 3/5] ARM: at91: pm: configure PMC fast startup signals
> 
> On 16/03/2016 at 14:58:07 +0800, Wenyou Yang wrote :
> > The fast startup signal is used as wake up sources for ULP1 mode.
> > As soon as a fast startup signal is asserted, the embedded 12 MHz RC
> > oscillator restarts automatically.
> >
> > This patch is to configure the fast startup signals, which signal is
> > enabled to trigger the PMC to wake up the system from ULP1 mode should
> > be configured via the DT.
> >
> > Signed-off-by: Wenyou Yang 
> 
> I would actually avoid doing that from the PMC driver and do that 
> configuration
> from the aic5 driver. It has all the information you need, it knows what kind 
> of level
> or edge is needed to wake up and what are the wakeup interrupts to enable. 
> This
> will allow you to stop introducing a new binding. Also, this will avoid 
> discrepancies
> between what is configured in the DT and what the user really wants (for 
> exemple
> differences between the edge direction configured for the PIOBu in userspace
> versus what is in the device tree or wakeonlan activation/deactivation).

Thank you for your feedback.

But some wake-ups such as WKUP pin, ACC_CE, RXLP_MCE, don't have the 
corresponding
interrupt number. Moreover, I think, the ULP1 is very different form the ULP0, 
it is not woken
up by the interrupt. It is fallen sleep and woken up by the some mechanism in 
the PMC. 

Maybe I was wrong. I still think the aic5 driver should be devoted on the 
AIC5's behaviors. 

> 
> You can get the PMC syscon from irq-atmel-aic5.c and then use a table to map
> the hwirq to the offset in PMC_FSMR. Use it in aic5_set_type to set the 
> polarity
> and then in aic5_suspend to enable the wakeup.
> 
> Maybe we could even go further and avoid ulp1 if no ulp1 compatbile wakeup
> sources are defined but there are ulp0 wakeup sources.
> 
> 
> --
> Alexandre Belloni, Free Electrons
> Embedded Linux, Kernel and Android engineering http://free-electrons.com


Best Regards,
Wenyou Yang


[PATCH 1/1] x86/perf/intel/uncore: remove ev_sel_ext bit support for PCU

2016-03-20 Thread kan . liang
From: Kan Liang 

The ev_sel_ext in PCU_MSR_PMON_CTL is locked. So there could be #GP if
writing that bit to 1. Also there is no public events which use the bit.
This patch removes ev_sel_ext bit support for PCU.

Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/uncore_snbep.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c 
b/arch/x86/events/intel/uncore_snbep.c
index 93f6bd9..ab2bcaa 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -46,7 +46,6 @@
(SNBEP_PMON_CTL_EV_SEL_MASK | \
 SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
 SNBEP_PMON_CTL_EDGE_DET | \
-SNBEP_PMON_CTL_EV_SEL_EXT | \
 SNBEP_PMON_CTL_INVERT | \
 SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \
 SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \
@@ -148,7 +147,6 @@
 /* IVBEP PCU */
 #define IVBEP_PCU_MSR_PMON_RAW_EVENT_MASK  \
(SNBEP_PMON_CTL_EV_SEL_MASK | \
-SNBEP_PMON_CTL_EV_SEL_EXT | \
 SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
 SNBEP_PMON_CTL_EDGE_DET | \
 SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \
@@ -258,7 +256,6 @@
 SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
 SNBEP_PMON_CTL_EDGE_DET | \
 SNBEP_CBO_PMON_CTL_TID_EN | \
-SNBEP_PMON_CTL_EV_SEL_EXT | \
 SNBEP_PMON_CTL_INVERT | \
 KNL_PCU_MSR_PMON_CTL_TRESH_MASK | \
 SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \
@@ -472,7 +469,7 @@ static struct attribute *snbep_uncore_cbox_formats_attr[] = 
{
 };
 
 static struct attribute *snbep_uncore_pcu_formats_attr[] = {
-   &format_attr_event_ext.attr,
+   &format_attr_event.attr,
&format_attr_occ_sel.attr,
&format_attr_edge.attr,
&format_attr_inv.attr,
@@ -1313,7 +1310,7 @@ static struct attribute *ivbep_uncore_cbox_formats_attr[] 
= {
 };
 
 static struct attribute *ivbep_uncore_pcu_formats_attr[] = {
-   &format_attr_event_ext.attr,
+   &format_attr_event.attr,
&format_attr_occ_sel.attr,
&format_attr_edge.attr,
&format_attr_thresh5.attr,
-- 
2.5.0



Re: [GIT PULL] Protection Keys (pkeys) support

2016-03-20 Thread Linus Torvalds
So I finally got around to this one and the objtool pull request, and
note that there's a conflict in the arch/x86/Kconfig file.

And I'm not sending this email because the conflict would have been
hard to resolve - it was completely trivial. But the conflict does
show that once again people are starting to add the new options to the
end of the list, even though that list is supposedly sorted.

HOWEVER.

I didn't actually fix that up in the merge, because I think that those
options should be done differently anyway.

So all of these are under the "X86" config options as "select"
statements that are true for x86. However, all the new ones (and an
alarming number of old ones) aren't actually really "these are true
for x86". No, they are *conditionally* true for x86.

For example, if we were to sort those thing, the two PKEY-related
options would have to be split up:

select ARCH_USES_HIGH_VMA_FLAGS if
X86_INTEL_MEMORY_PROTECTION_KEYS
select ARCH_HAS_PKEYS   if
X86_INTEL_MEMORY_PROTECTION_KEYS

which would actually make it really nasty to see that they are related.

There's also a *lot* of those X86 selects that are "if X86_64". So
they really aren't x86 options, they are x86-64 options.

So instead of having a _huge_ list of select statements under the X86
option, why aren't those split up, and the select statements are
closer to the thing that actually controls them.

I realize that for many *common* options that really are "this
architecture uses the generic XYZ feature", the current "select" model
is really good. But it's starting to look really quite nasty for some
of these more specialized options, and I really think it would be
better to move (for example) the select for ARCH_HAS_PKEYS and
ARCH_USES_HIGH_VMA_FLAGS to actually be under the
X86_INTEL_MEMORY_PROTECTION_KEYS config option, rather than try to lie
and make it look like this is somehow some "x86 feature". It's much
more specific than that.

Anyway, it's all merged in my tree, but is going through the built
tests and I'll do a boot test too before pushing out. So no need to do
anything wrt these pull requests, this was more of a "Hmm, I really
think the x86 Kconfig file is getting pretty nasty".

   Linus


[GIT PULL] xfs: updates for 4.6-rc1

2016-03-20 Thread Dave Chinner
Hi Linus,

Can you please pull the XFS update from the location below? There's
quite a lot in this request, and there's some cross-over with ext4,
dax and quota code due to the nature of the changes being made.

There are conflicts with the ext4 code that has already been merged
this cycle. Ted didn't pull the stable xfs-dio-fixes-4.6 branch with
the DIO completion unwritten extent error handling fixes before
merging a rework of the ext4 unwritten extent code, so there's a
bunch of non-trivial conflicts in that.

The quota changes don't appear to have created any conflicts at this
point - I think Jan pulled the stable xfs-get-next-dquot-4.6 branch
to base his further work on that, so I don't expect merge problems
here.

Finally, there's a merge conflict between the XFS writepages rework
and the DAX flushing fixes that were merged in 4.5-rc6. That's a
trivial conflict to resolve, though.

I've attached the merge resolution diff from my local test merge at
the end after the pull-req output - the XFS part is correct, but I'm
not sure about the ext4 parts of it. If you need confirmation as to
whether that is the correct resolution, then Ted and/or Jan (cc'd)
will need to look at it

As for the rest of the XFS changes, there are lots of little things
all over the place, which add up to a lot of changes in the end.
The major changes are that we've reduced the size of the struct
xfs_inode by ~100 bytes (gives an inode cache footprint reduction of
>10%), the writepage code now only does a single set of mapping tree
lockups so uses less CPU, delayed allocation reservations won't
overrun under random write loads anymore, and we added compile time
verification for on-disk structure sizes so we find out when a
commit or platform/compiler change breaks the on disk structure as
early as possible.

Cheers,

Dave.


The following changes since commit 7f6aff3a29b08fc4234c8136eb1ac31b4897522c:

  xfs: only run torn log write detection on dirty logs (2016-03-07 08:22:22 
+1100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git 
tags/xfs-for-linus-4.6-rc1

for you to fetch changes up to 2cdb958aba6afbced5bc563167813b972b6acbfe:

  Merge branch 'xfs-misc-fixes-4.6-4' into for-next (2016-03-15 11:44:35 +1100)



xfs: Changes for 4.6-rc1

Change summary:
o error propagation for direct IO failures fixes for both XFS and ext4
o new quota interfaces and XFS implementation for iterating all the quota IDs
  in the filesystem
o locking fixes for real-time device extent allocation
o reduction of duplicate information in the xfs and vfs inode, saving roughly
  100 bytes of memory per cached inode.
o buffer flag cleanup
o rework of the writepage code to use the generic write clustering mechanisms
o several fixes for inode flag based DAX enablement
o rework of remount option parsing
o compile time verification of on-disk format structure sizes
o delayed allocation reservation overrun fixes
o lots of little error handling fixes
o small memory leak fixes
o enable xfsaild freezing again


Brian Foster (6):
  xfs: clean up unwritten buffers on write failure
  xfs: fix xfs_log_ticket leak in xfs_end_io() after fs shutdown
  xfs: debug mode forced buffered write failure
  xfs: update freeblocks counter after extent deletion
  xfs: refactor delalloc indlen reservation split into helper
  xfs: borrow indirect blocks from freed extent when available

Carlos Maiolino (1):
  xfs: Split default quota limits by quota type

Christoph Hellwig (8):
  direct-io: always call ->end_io if non-NULL
  xfs: don't use ioends for direct write completions
  xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO
  xfs: handle errors from ->free_blocks in xfs_btree_kill_iroot
  xfs: factor btree block freeing into a helper
  xfs: move buffer invalidation to xfs_btree_free_block
  xfs: remove xfs_trans_get_block_res
  xfs: always set rvalp in xfs_dir2_node_trim_free

Colin Ian King (1):
  xfs: fix format specifier , should be %llx and not %llu

Darrick J. Wong (5):
  xfs: move struct xfs_attr_shortform to xfs_da_format.h
  xfs: fix computation of inode btree maxlevels
  xfs: use named array initializers for log item dumping
  xfs: ioends require logically contiguous file offsets
  xfs: check sizes of XFS on-disk structures at compile time

Dave Chinner (41):
  xfs: lock rt summary inode on allocation
  xfs: RT bitmap and summary buffers are not typed
  xfs: RT bitmap and summary buffers need verifiers
  xfs: introduce inode log format object
  xfs: remove timestamps from incore inode
  xfs: cull unnecessary icdinode fields
  xfs: move v1 inode conversion to xfs_inode_from_disk
  xfs: reinitialise recycled VFS inode correctly
  xfs: use vfs inode nlink field everywhere
   

Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-20 Thread Ganesh Mahendran
Hello, Tirumalesh:

2016-03-19 5:05 GMT+08:00 Chalamarla, Tirumalesh
:
>
>
>
>
>
> On 3/16/16, 2:32 AM, "linux-arm-kernel on behalf of Ganesh Mahendran" 
>  opensource.gan...@gmail.com> wrote:
>
>>Reverts commit 97303480753e ("arm64: Increase the max granular size").
>>
>>The commit 97303480753e ("arm64: Increase the max granular size") will
>>degrade system performente in some cpus.
>>
>>We test wifi network throughput with iperf on Qualcomm msm8996 CPU:
>>
>>run on host:
>>  # iperf -s
>>run on device:
>>  # iperf -c  -t 100 -i 1
>>
>>
>>Test result:
>>
>>with commit 97303480753e ("arm64: Increase the max granular size"):
>>172MBits/sec
>>
>>without commit 97303480753e ("arm64: Increase the max granular size"):
>>230MBits/sec
>>
>>
>>Some module like slab/net will use the L1_CACHE_SHIFT, so if we do not
>>set the parameter correctly, it may affect the system performance.
>>
>>So revert the commit.
>
> Is there any explanation why is this so? May be there is an alternative to 
> this, apart from reverting the commit.
>

I just think the commit 97303480753e ("arm64: Increase the max
granular size") introduced new problem for other Socs which
the L1 cache line size is not 128 Bytes. So I wanted to revert this commit.

> Until now it seems L1_CACHE_SHIFT is the max of supported chips. But now we 
> are making it 64byte, is there any reason why not 32.
>

We could not simply set the L1_CACHE_SHIFT to max. There are other
places which use L1 cache line size.
If we just set the L1 cache line size to the max, the memory footprint
and the system performance will be affected.
For example:
--
#define SMP_CACHE_BYTES L1_CACHE_BYTES
#define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES)
--

Thanks.

> Thanks,
> Tirumalesh.
>>
>>Cc: sta...@vger.kernel.org
>>Signed-off-by: Ganesh Mahendran 
>>---
>> arch/arm64/include/asm/cache.h |2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>>index 5082b30..bde4499 100644
>>--- a/arch/arm64/include/asm/cache.h
>>+++ b/arch/arm64/include/asm/cache.h
>>@@ -18,7 +18,7 @@
>>
>> #include 
>>
>>-#define L1_CACHE_SHIFT7
>>+#define L1_CACHE_SHIFT6
>> #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)
>>
>> /*
>>--
>>1.7.9.5
>>
>>
>>___
>>linux-arm-kernel mailing list
>>linux-arm-ker...@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


RE: [PATCH v2 0/4] ARM64:SoC add a new platform, LG Electronics's lg1k

2016-03-20 Thread Chanho Min
> Subject: [PATCH v2 0/4] ARM64:SoC add a new platform, LG Electronics's lg1k
> 
> This is an initial series for supporting LG Electronics's lg1k SoCs, based on
> ARM Cortex-A53, mainly used for digital TVs.
> 
> Chanho Min (4):
>   arm64: add Kconfig entry for LG1K SoC family
>   arm64: defconfig: enable ARCH_LG1K
>   arm64: dts: Add dts files for LG Electronics's lg1312 SoC
>   MAINTAINERS: add myself as ARM/LG1K maintainer
> 
>  MAINTAINERS   |6 +
>  arch/arm64/Kconfig.platforms  |4 +
>  arch/arm64/boot/dts/Makefile  |1 +
>  arch/arm64/boot/dts/lg/Makefile   |5 +
>  arch/arm64/boot/dts/lg/lg1312-ref.dts |   36 
>  arch/arm64/boot/dts/lg/lg1312.dtsi|  351
+
>  arch/arm64/configs/defconfig  |1 +
>  7 files changed, 404 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/lg/Makefile  create mode 100644
> arch/arm64/boot/dts/lg/lg1312-ref.dts
>  create mode 100644 arch/arm64/boot/dts/lg/lg1312.dtsi

Please review or Ack these patches.

Chanho




Re: Nokia N900 - audio TPA6130A2 problems

2016-03-20 Thread Sebastian Reichel
Hi,

On Mon, Mar 21, 2016 at 01:04:18AM +0100, Sebastian Reichel wrote:
> On Sun, Mar 20, 2016 at 09:43:11PM +0200, Ivaylo Dimitrov wrote:
> > On 20.03.2016 07:17, Sebastian Reichel wrote:
> > >On Sat, Mar 19, 2016 at 10:49:57AM +0200, Ivaylo Dimitrov wrote:
> > >>On 18.03.2016 17:04, Sebastian Reichel wrote:
> > >>>On Fri, Mar 18, 2016 at 03:45:26PM +0200, Ivaylo Dimitrov wrote:
> > On 18.03.2016 15:36, Sebastian Reichel wrote:
> > Regulator is V28_A, which is always-on, so it is enabled no matter what
> > probe does. Anyway, I added a various delays after regulator_enable(), 
> > to no
> > success.
> > >>
> > >>I guess we're getting closer - I put some printks in various functions in
> > >>the twl-regulator.c, here is the result:
> > >>
> > >>on power-up:
> > >>
> > >>[2.378601] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008
> > >>[2.384948] twl4030reg_enable VMMC2 grp 0x0020
> > >>[2.408416] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008
> > >>[7.196685] twl4030reg_is_enabled VMMC2 state 0x002e
> > >>[7.202819] twl4030reg_is_enabled VMMC2 state 0x002e
> > >>[7.209777] twl4030reg_is_enabled VMMC2 state 0x002e
> > >>[7.215728] twl4030reg_is_enabled VMMC2 state 0x002e
> > >>[7.223205] twl4030reg_is_enabled VMMC2 state 0x002e
> > >
> > >Ok, so normal power up results in running VMMC2 (always-on works),
> > >but voltage is not configured correctly. 2.6V is default according
> > >to the TRM. I think this is a "bug" in the regulator framework. It
> > >should setup the minimum allowed voltage before enabling the
> > >always-on regulator.
> > >
> > 
> > /sys/kernel/debug/regulator/regulator_summary shows 2850mV for V28_A, so I
> > would remove the quotes. Also, always-on is because if V28_A regulator is
> > turned off, there is a leakage through tlv320aic34 VIO. BTW one of the
> > things I did while trying to find the problem, was to remove that always-on
> > property from the DTS - it didn't help.
> 
> Right thinking about it, the voltage must also be configured for the
> non always-on cases. So it's not a problem with the regulator
> framework, but with twl-regulator's probe function, that should take
> care of this.
> 
> > >In case of the tpa6130a2/tpa6140a2 driver it may also be nice to add
> > >something like this to the driver (Vdd may be between 2.5V and 5.5V
> > >according to both datasheets):
> > >
> > >if (regulator_can_change_voltage(data->supply))
> > > regulator_set_voltage(data->supply, 250, 550);
> > >
> > 
> > and add DT property for that voltage range, as max output power and
> > harmonics depend on the supply voltage.
> 
> I guess that's 2nd step.
> 
> > >>after restart from stock kernel:
> > >>
> > >>[2.388610] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a
> > >>[2.394958] twl4030reg_enable VMMC2 grp 0x0028
> > >
> > >I had a quick glance at this. I think stock kernel put VMMC2
> > >into sleep mode. Mainline kernel does not expect sleep mode
> > >being set and does not disable it.
> > >
> > 
> > Well, one would think that kernel should not have expectations on what would
> > be the state of the hardware by the time it takes control over it, but setup
> > everything needed instead.
> 
> I thought it's obvious, that this is not the desired behaviour :)
> 
> > >>[2.418426] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a
> > >>[7.186645] twl4030reg_is_enabled VMMC2 state 0x0020
> > >>[7.192718] twl4030reg_is_enabled VMMC2 state 0x0020
> > >>[7.199615] twl4030reg_is_enabled VMMC2 state 0x0020
> > >>[7.205535] twl4030reg_is_enabled VMMC2 state 0x0020
> > >>[7.212951] twl4030reg_is_enabled VMMC2 state 0x0020
> > >>
> > >>I don't see twl4030ldo_set_voltage_sel() for VMMC2(V28_A) regulator, 
> > >>though
> > >>there are calls for VMMC1 and VAUX3.
> > >
> > >I guess that's because the voltage is only configured if at least
> > >one regulator consumer requests anything specific.
> > >
> > 
> > But then the board DTS is simply ignored. Doesn't look good :)
> >
> > >>So, it seems to me that V28_A is not enabled or correctly set-up
> > >>and all devices connected to it does not function. And it looks
> > >>like even after power-on VMMC2 is not correctly set-up - it is
> > >>supposed to have voltage of 2.85V (10) but kernel leaves it to
> > >>2.60V (8). However my twl-fu ends here so any help is appreciated.
> > >
> > >So in case of reboot from stock kernel voltage is already configured
> > >to 2.8V, but it does not work, because of the sleep mode.
> > >
> > 
> > Yeah, that sleep is pretty clear, I was rather asking - "any idea how to fix
> > that?". Or it is someone else expected to fix it?
> 
> You may have noticed, that I included Mark and Liam. I hope they
> can give some feedback. I think there are two bugs:
> 
> 1. twl_probe() should setup a default voltage based on DT
>information.

I just had a look at the regulator core code. I think the voltage

Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async

2016-03-20 Thread Byungchul Park
On Mon, Mar 21, 2016 at 09:43:47AM +0900, Sergey Senozhatsky wrote:
> On (03/21/16 09:06), Byungchul Park wrote:
> > On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote:
> [..]
> > > + if (!sync_print) {
> > > + if (in_sched) {
> > > + /*
> > > +  * @in_sched messages may come too early, when we don't
> > > +  * yet have @printk_kthread. We can't print deferred
> > > +  * messages directly, because this may deadlock, route
> > > +  * them via IRQ context.
> > > +  */
> > > + __this_cpu_or(printk_pending,
> > > + PRINTK_PENDING_OUTPUT);
> > > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > + } else if (printk_kthread && !in_panic) {
> > > + /* Offload printing to a schedulable context. */
> > > + wake_up_process(printk_kthread);
> > 
> > It will not print the "lockup suspected" message at all, for e.g. rq->lock,
> > p->pi_lock and any locks which are used within wake_up_process().
> 
> this will switch to old SYNC printk() mode should such a lockup ever
> happen, which is a giant advantage over any other implementation; doing
> wake_up_process() within the 'we can detect recursive printk() here'
> gives us better control.
> 
> why
>   
> printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ...
> is better?

What is IRQ? And I didn't say the recursion is good. I just said it can be
avoid without using the last resort.

> 
> 
> > Furtheremore, any printk() within wake_up_process() cannot work at all, as
> > well.
> 
> there is printk_deferred() which has LOGLEVEL_SCHED and which must be used
> in sched functions.

It would be good for all scheduler code to use the printk_deferred() as you
said, but it's not true yet.

> 
>   -ss


Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async

2016-03-20 Thread Sergey Senozhatsky
On (03/21/16 09:06), Byungchul Park wrote:
> On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote:
[..]
> > +   if (!sync_print) {
> > +   if (in_sched) {
> > +   /*
> > +* @in_sched messages may come too early, when we don't
> > +* yet have @printk_kthread. We can't print deferred
> > +* messages directly, because this may deadlock, route
> > +* them via IRQ context.
> > +*/
> > +   __this_cpu_or(printk_pending,
> > +   PRINTK_PENDING_OUTPUT);
> > +   irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +   } else if (printk_kthread && !in_panic) {
> > +   /* Offload printing to a schedulable context. */
> > +   wake_up_process(printk_kthread);
> 
> It will not print the "lockup suspected" message at all, for e.g. rq->lock,
> p->pi_lock and any locks which are used within wake_up_process().

this will switch to old SYNC printk() mode should such a lockup ever
happen, which is a giant advantage over any other implementation; doing
wake_up_process() within the 'we can detect recursive printk() here'
gives us better control.

why
  
printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ->wake_up_process()->spin_dump()->printk()->IRQ...
is better?


> Furtheremore, any printk() within wake_up_process() cannot work at all, as
> well.

there is printk_deferred() which has LOGLEVEL_SCHED and which must be used
in sched functions.

-ss


linux-next: manual merge of the ext4 tree with Linus' tree

2016-03-20 Thread Stephen Rothwell
Hi Theodore,

Today's linux-next merge of the ext4 tree got a conflict in:

  fs/overlayfs/super.c

between commit:

  b5891cfab08f ("ovl: fix working on distributed fs as lower layer")

from Linus' tree and commit:

  a7f7fb45f728 ("vfs: add file_dentry()")

from the ext4 tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/overlayfs/super.c
index 619ad4b016d2,10dbdc7da69d..
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@@ -343,7 -356,7 +358,8 @@@ static const struct dentry_operations o
  
  static const struct dentry_operations ovl_reval_dentry_operations = {
.d_release = ovl_dentry_release,
 +  .d_select_inode = ovl_d_select_inode,
+   .d_native_dentry = ovl_d_native_dentry,
.d_revalidate = ovl_dentry_revalidate,
.d_weak_revalidate = ovl_dentry_weak_revalidate,
  };


Re: [RFC][PATCH v5 1/2] printk: Make printk() completely async

2016-03-20 Thread Byungchul Park
On Sun, Mar 20, 2016 at 11:13:10PM +0900, Sergey Senozhatsky wrote:
> @@ -1748,13 +1872,42 @@ asmlinkage int vprintk_emit(int facility, int level,
>dict, dictlen, text, text_len);
>   }
>  
> + /*
> +  * By default we print message to console asynchronously so that kernel
> +  * doesn't get stalled due to slow serial console. That can lead to
> +  * softlockups, lost interrupts, or userspace timing out under heavy
> +  * printing load.
> +  *
> +  * However we resort to synchronous printing of messages during early
> +  * boot, when synchronous printing was explicitly requested by
> +  * kernel parameter, or when console_verbose() was called to print
> +  * everything during panic / oops.
> +  */
> + if (!sync_print) {
> + if (in_sched) {
> + /*
> +  * @in_sched messages may come too early, when we don't
> +  * yet have @printk_kthread. We can't print deferred
> +  * messages directly, because this may deadlock, route
> +  * them via IRQ context.
> +  */
> + __this_cpu_or(printk_pending,
> + PRINTK_PENDING_OUTPUT);
> + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> + } else if (printk_kthread && !in_panic) {
> + /* Offload printing to a schedulable context. */
> + wake_up_process(printk_kthread);

It will not print the "lockup suspected" message at all, for e.g. rq->lock,
p->pi_lock and any locks which are used within wake_up_process().
Furtheremore, any printk() within wake_up_process() cannot work at all, as
well. It's too bad to use any functions potentially including printk()
inside of this critical section.

> + } else {
> + sync_print = true;
> + }
> + }
> +
>   logbuf_cpu = UINT_MAX;
>   raw_spin_unlock(&logbuf_lock);
>   lockdep_on();
>   local_irq_restore(flags);


[PATCH] drivers/rtc/rtc-mcp795.c: add devicetree support

2016-03-20 Thread Emil Bartczak
Add device tree support to the rtc-mcp795 driver.

Signed-off-by: Emil Bartczak 
---
 Documentation/devicetree/bindings/rtc/maxim,mcp795.txt | 11 +++
 drivers/rtc/rtc-mcp795.c   | 10 ++
 2 files changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/rtc/maxim,mcp795.txt

diff --git a/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt 
b/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt
new file mode 100644
index 000..a59fdd8
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/maxim,mcp795.txt
@@ -0,0 +1,11 @@
+* Maxim MCP795 SPI Serial Real-Time Clock
+
+Required properties:
+- compatible: Should contain "maxim,mcp795".
+- reg: SPI address for chip
+
+Example:
+   mcp795: rtc@0 {
+   compatible = "maxim,mcp795";
+   reg = <0>;
+   };
diff --git a/drivers/rtc/rtc-mcp795.c b/drivers/rtc/rtc-mcp795.c
index 1c91ce8..025bb33 100644
--- a/drivers/rtc/rtc-mcp795.c
+++ b/drivers/rtc/rtc-mcp795.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* MCP795 Instructions, see datasheet table 3-1 */
 #define MCP795_EEREAD  0x03
@@ -183,9 +184,18 @@ static int mcp795_probe(struct spi_device *spi)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id mcp795_of_match[] = {
+   { .compatible = "maxim,mcp795" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, mcp795_of_match);
+#endif
+
 static struct spi_driver mcp795_driver = {
.driver = {
.name = "rtc-mcp795",
+   .of_match_table = of_match_ptr(mcp795_of_match),
},
.probe = mcp795_probe,
 };
-- 
1.9.1



Re: Nokia N900 - audio TPA6130A2 problems

2016-03-20 Thread Sebastian Reichel
Hi,

On Sun, Mar 20, 2016 at 09:43:11PM +0200, Ivaylo Dimitrov wrote:
> On 20.03.2016 07:17, Sebastian Reichel wrote:
> >On Sat, Mar 19, 2016 at 10:49:57AM +0200, Ivaylo Dimitrov wrote:
> >>On 18.03.2016 17:04, Sebastian Reichel wrote:
> >>>On Fri, Mar 18, 2016 at 03:45:26PM +0200, Ivaylo Dimitrov wrote:
> On 18.03.2016 15:36, Sebastian Reichel wrote:
> Regulator is V28_A, which is always-on, so it is enabled no matter what
> probe does. Anyway, I added a various delays after regulator_enable(), to 
> no
> success.
> >>
> >>I guess we're getting closer - I put some printks in various functions in
> >>the twl-regulator.c, here is the result:
> >>
> >>on power-up:
> >>
> >>[2.378601] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008
> >>[2.384948] twl4030reg_enable VMMC2 grp 0x0020
> >>[2.408416] twl4030ldo_get_voltage_sel VMMC2 vsel 0x0008
> >>[7.196685] twl4030reg_is_enabled VMMC2 state 0x002e
> >>[7.202819] twl4030reg_is_enabled VMMC2 state 0x002e
> >>[7.209777] twl4030reg_is_enabled VMMC2 state 0x002e
> >>[7.215728] twl4030reg_is_enabled VMMC2 state 0x002e
> >>[7.223205] twl4030reg_is_enabled VMMC2 state 0x002e
> >
> >Ok, so normal power up results in running VMMC2 (always-on works),
> >but voltage is not configured correctly. 2.6V is default according
> >to the TRM. I think this is a "bug" in the regulator framework. It
> >should setup the minimum allowed voltage before enabling the
> >always-on regulator.
> >
> 
> /sys/kernel/debug/regulator/regulator_summary shows 2850mV for V28_A, so I
> would remove the quotes. Also, always-on is because if V28_A regulator is
> turned off, there is a leakage through tlv320aic34 VIO. BTW one of the
> things I did while trying to find the problem, was to remove that always-on
> property from the DTS - it didn't help.

Right thinking about it, the voltage must also be configured for the
non always-on cases. So it's not a problem with the regulator
framework, but with twl-regulator's probe function, that should take
care of this.

> >In case of the tpa6130a2/tpa6140a2 driver it may also be nice to add
> >something like this to the driver (Vdd may be between 2.5V and 5.5V
> >according to both datasheets):
> >
> >if (regulator_can_change_voltage(data->supply))
> > regulator_set_voltage(data->supply, 250, 550);
> >
> 
> and add DT property for that voltage range, as max output power and
> harmonics depend on the supply voltage.

I guess that's 2nd step.

> >>after restart from stock kernel:
> >>
> >>[2.388610] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a
> >>[2.394958] twl4030reg_enable VMMC2 grp 0x0028
> >
> >I had a quick glance at this. I think stock kernel put VMMC2
> >into sleep mode. Mainline kernel does not expect sleep mode
> >being set and does not disable it.
> >
> 
> Well, one would think that kernel should not have expectations on what would
> be the state of the hardware by the time it takes control over it, but setup
> everything needed instead.

I thought it's obvious, that this is not the desired behaviour :)

> >>[2.418426] twl4030ldo_get_voltage_sel VMMC2 vsel 0x000a
> >>[7.186645] twl4030reg_is_enabled VMMC2 state 0x0020
> >>[7.192718] twl4030reg_is_enabled VMMC2 state 0x0020
> >>[7.199615] twl4030reg_is_enabled VMMC2 state 0x0020
> >>[7.205535] twl4030reg_is_enabled VMMC2 state 0x0020
> >>[7.212951] twl4030reg_is_enabled VMMC2 state 0x0020
> >>
> >>I don't see twl4030ldo_set_voltage_sel() for VMMC2(V28_A) regulator, though
> >>there are calls for VMMC1 and VAUX3.
> >
> >I guess that's because the voltage is only configured if at least
> >one regulator consumer requests anything specific.
> >
> 
> But then the board DTS is simply ignored. Doesn't look good :)
>
> >>So, it seems to me that V28_A is not enabled or correctly set-up
> >>and all devices connected to it does not function. And it looks
> >>like even after power-on VMMC2 is not correctly set-up - it is
> >>supposed to have voltage of 2.85V (10) but kernel leaves it to
> >>2.60V (8). However my twl-fu ends here so any help is appreciated.
> >
> >So in case of reboot from stock kernel voltage is already configured
> >to 2.8V, but it does not work, because of the sleep mode.
> >
> 
> Yeah, that sleep is pretty clear, I was rather asking - "any idea how to fix
> that?". Or it is someone else expected to fix it?

You may have noticed, that I included Mark and Liam. I hope they
can give some feedback. I think there are two bugs:

1. twl_probe() should setup a default voltage based on DT
   information.
2. if regulator is in sleep mode, regulator enable should
   disable sleep mode.

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH 70/71] mm: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

2016-03-20 Thread Guenter Roeck
On Sun, Mar 20, 2016 at 09:41:17PM +0300, Kirill A. Shutemov wrote:
> PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago
> with promise that one day it will be possible to implement page cache with
> bigger chunks than PAGE_SIZE.
> 
> This promise never materialized. And unlikely will.
> 
> We have many places where PAGE_CACHE_SIZE assumed to be equal to
> PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_*
> or PAGE_* constant should be used in a particular case, especially on the
> border between fs and mm.
> 
> Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
> breakage to be doable.
> 
> Let's stop pretending that pages in page cache are special. They are not.
> 
> The changes are pretty straight-forward:
> 
>  -  << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
> 
>  - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
> 
>  - page_cache_get() -> get_page();
> 
>  - page_cache_release() -> put_page();
> 
> Signed-off-by: Kirill A. Shutemov 
> ---

...

>  extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
> @@ -425,7 +425,7 @@ static inline pgoff_t linear_page_index(struct 
> vm_area_struct *vma,
>   return linear_hugepage_index(vma, address);
>   pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
>   pgoff += vma->vm_pgoff;
> - return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> + return pgoff >> (PAGE_SHIFT - PAGE_SHIFT);
^

Guenter


[PATCH] perf/x86/intel/rapl: Add missing Broadwell models

2016-03-20 Thread Srinivas Pandruvada
Added Broadwell-H and Broadwell-Server.

Signed-off-by: Srinivas Pandruvada 
---
 arch/x86/events/intel/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 69904e7..6196f41 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -753,6 +753,7 @@ static int __init rapl_pmu_init(void)
rapl_pmu_events_group.attrs = rapl_events_cln_attr;
break;
case 63: /* Haswell-Server */
+   case 79: /* Broadwell-Server */
apply_quirk = true;
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
@@ -760,6 +761,7 @@ static int __init rapl_pmu_init(void)
case 60: /* Haswell */
case 69: /* Haswell-Celeron */
case 61: /* Broadwell */
+   case 71: /* Broadwell-H */
rapl_cntr_mask = RAPL_IDX_HSW;
rapl_pmu_events_group.attrs = rapl_events_hsw_attr;
break;
-- 
2.5.0



Re: [PATCH 69/71] vfs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

2016-03-20 Thread Matthew Wilcox

Spotted an oops:

-   length is PAGE_CACHE_SIZE, then the private data should be released,
+   length is PAG__SIZE, then the private data should be released,



Re: [PATCH 0/5] staging: rtl8712: Fixed Multiple FSF address checkpatch warnings

2016-03-20 Thread Joshua Clayton
On Sunday, March 20, 2016 11:12:32 PM Parth Sane wrote:
> > Fixed Multiple FSF address checkpatch warnings to conform to kernel coding 
> > style.
> > 
> > Parth Sane (5):
> >  staging: rtl8712: Fixed FSF address warning in basic_types.h
> >  staging: rtl8712: Fixed FSF address warning in drv_types.h
> >  staging: rtl9712: Fixed FSF address warning in ethernet.h
> >  staging: rtl9712: Fixed FSF address warning in hal_init.c
> >  staging: rtl9712: Fixed FSF address warning in ieee80211.c
> > 
> > drivers/staging/rtl8712/basic_types.h | 4 
> > drivers/staging/rtl8712/drv_types.h   | 4 
> > drivers/staging/rtl8712/ethernet.h| 4 
> > drivers/staging/rtl8712/hal_init.c| 4 
> > drivers/staging/rtl8712/ieee80211.c   | 4 
> > 5 files changed, 20 deletions(-)
> > 
> > --
> > 1.9.1
> Hi,
> The thing is all these patches are related and have a cover letter explaining 
> changes. But this seems to be a trivial change which is self explanatory.
> This should possibly suffice. What do you think?
> Regards,
> Parth Sane
The cover letter does not end up in the repository.
A cover letter can be helpful, but is not required.
You MUST, however add a descriptive commit message to
each patch for them to be accepted into the kernel.
See Documentation/SubmittingPatches in the kernel sources.


[PATCH v3 2/2] powercap: intel_rapl: PSys support

2016-03-20 Thread Srinivas Pandruvada
Skylake processor supports a new set of RAPL registers for controlling
entire SoC instead of just CPU package. This is useful for thermal
and power control when source of power/thermal is not just CPU/GPU.
This change adds a new platform domain (AKA PSys) to the current
power capping Intel RAPL driver.
PSys also supports PL1 (long term) and PL2 (short term) control like
package domain. This also follows same MSRs for energy and time
units as package domain.
Unlike package domain, PSys support requires more than just processor
level implementation. The other parts in the system need additional
implementation, which OEMs needs to support. So not all Skylake
systems will support PSys.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/powercap/intel_rapl.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index cdfd01f0..2c0235d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 
+/* Local defines */
+#define MSR_PLATFORM_POWER_LIMIT   0x065c
+
 /* bitmasks for RAPL MSRs, used by primitive access functions */
 #define ENERGY_STATUS_MASK  0x
 
@@ -86,6 +89,7 @@ enum rapl_domain_type {
RAPL_DOMAIN_PP0, /* core power plane */
RAPL_DOMAIN_PP1, /* graphics uncore */
RAPL_DOMAIN_DRAM,/* DRAM control_type */
+   RAPL_DOMAIN_PLATFORM, /* PSys control_type */
RAPL_DOMAIN_MAX,
 };
 
@@ -251,9 +255,11 @@ static const char * const rapl_domain_names[] = {
"core",
"uncore",
"dram",
+   "psys",
 };
 
 static struct powercap_control_type *control_type; /* PowerCap Controller */
+static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
 
 /* caller to ensure CPU hotplug lock is held */
 static struct rapl_package *find_package_by_id(int id)
@@ -409,6 +415,14 @@ static const struct powercap_zone_ops zone_ops[] = {
.set_enable = set_domain_enable,
.get_enable = get_domain_enable,
},
+   /* RAPL_DOMAIN_PLATFORM */
+   {
+   .get_energy_uj = get_energy_counter,
+   .get_max_energy_range_uj = get_max_energy_counter,
+   .release = release_zone,
+   .set_enable = set_domain_enable,
+   .get_enable = get_domain_enable,
+   },
 };
 
 static int set_power_limit(struct powercap_zone *power_zone, int id,
@@ -1159,6 +1173,13 @@ static int rapl_unregister_powercap(void)
powercap_unregister_zone(control_type,
&rd_package->power_zone);
}
+
+   if (platform_rapl_domain) {
+   powercap_unregister_zone(control_type,
+&platform_rapl_domain->power_zone);
+   kfree(platform_rapl_domain);
+   }
+
powercap_unregister_control_type(control_type);
 
return 0;
@@ -1238,6 +1259,47 @@ err_cleanup:
return ret;
 }
 
+static int rapl_register_psys(void)
+{
+   struct rapl_domain *rd;
+   struct powercap_zone *power_zone;
+   u64 val;
+
+   if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_ENERGY_STATUS, &val) || !val)
+   return -ENODEV;
+
+   if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_POWER_LIMIT, &val) || !val)
+   return -ENODEV;
+
+   rd = kzalloc(sizeof(*rd), GFP_KERNEL);
+   if (!rd)
+   return -ENOMEM;
+
+   rd->name = rapl_domain_names[RAPL_DOMAIN_PLATFORM];
+   rd->id = RAPL_DOMAIN_PLATFORM;
+   rd->msrs[0] = MSR_PLATFORM_POWER_LIMIT;
+   rd->msrs[1] = MSR_PLATFORM_ENERGY_STATUS;
+   rd->rpl[0].prim_id = PL1_ENABLE;
+   rd->rpl[0].name = pl1_name;
+   rd->rpl[1].prim_id = PL2_ENABLE;
+   rd->rpl[1].name = pl2_name;
+   rd->rp = find_package_by_id(0);
+
+   power_zone = powercap_register_zone(&rd->power_zone, control_type,
+   "psys", NULL,
+   &zone_ops[RAPL_DOMAIN_PLATFORM],
+   2, &constraint_ops);
+
+   if (IS_ERR(power_zone)) {
+   kfree(rd);
+   return PTR_ERR(power_zone);
+   }
+
+   platform_rapl_domain = rd;
+
+   return 0;
+}
+
 static int rapl_register_powercap(void)
 {
struct rapl_domain *rd;
@@ -1254,6 +1316,10 @@ static int rapl_register_powercap(void)
list_for_each_entry(rp, &rapl_packages, plist)
if (rapl_package_register_powercap(rp))
goto err_cleanup_package;
+
+   /* Don't bail out if PSys is not supported */
+   rapl_register_psys();
+
return ret;
 
 err_cleanup_package:
-- 
2.5.0



[PATCH v3 1/2] perf/x86/intel/rapl: support Skylake RAPL domains

2016-03-20 Thread Srinivas Pandruvada
Added Skylake support for RAPL domains. In addition to RAPL domains in
Broadwell clients, it has support for platform domain (aka PSys).

Also fixed error in comment for gpu counter, which previously was dram
counter.

Signed-off-by: Srinivas Pandruvada 
---
 arch/x86/events/intel/rapl.c | 50 ++--
 arch/x86/include/asm/msr-index.h |  2 ++
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index b834a3f..69904e7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -27,10 +27,14 @@
  *   event: rapl_energy_dram
  *perf code: 0x3
  *
- * dram counter: consumption of the builtin-gpu domain (client only)
+ * gpu counter: consumption of the builtin-gpu domain (client only)
  *   event: rapl_energy_gpu
  *perf code: 0x4
  *
+ *  psys counter: consumption of the builtin-psys domain (client only)
+ *   event: rapl_energy_psys
+ *perf code: 0x5
+ *
  * We manage those counters as free running (read-only). They may be
  * use simultaneously by other tools, such as turbostat.
  *
@@ -64,13 +68,16 @@
 #define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
 #define RAPL_IDX_PP1_NRG_STAT  3   /* gpu */
 #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */
+#define RAPL_IDX_PSYS_NRG_STAT 4   /* psys */
+#define INTEL_RAPL_PSYS0x5 /* pseudo-encoding */
 
-#define NR_RAPL_DOMAINS 0x4
+#define NR_RAPL_DOMAINS 0x5
 static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
"pp0-core",
"package",
"dram",
"pp1-gpu",
+   "psys",
 };
 
 /* Clients have PP0, PKG */
@@ -89,6 +96,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] 
__initconst = {
 1<

[PATCH v3 0/2][Resend] Skylake PSys support

2016-03-20 Thread Srinivas Pandruvada
Sorry, I had typo in Mingo's email address. So resending.

v3:
As suggested by tglx adding support first in perf-rapl.
Perf RAPL was missing RAPL support for Skylake
Added support including Psys

v2:
Moved PSYS MSR defines to intel_rapl.c as suggested by Boris

Srinivas Pandruvada (2):
  perf/x86/intel/rapl: support Skylake RAPL domains
  powercap: intel_rapl: PSys support

 arch/x86/events/intel/rapl.c | 50 --
 arch/x86/include/asm/msr-index.h |  2 ++
 drivers/powercap/intel_rapl.c| 66 
 3 files changed, 116 insertions(+), 2 deletions(-)

-- 
2.5.0



[PATCH v3 2/2] powercap: intel_rapl: PSys support

2016-03-20 Thread Srinivas Pandruvada
Skylake processor supports a new set of RAPL registers for controlling
entire SoC instead of just CPU package. This is useful for thermal
and power control when source of power/thermal is not just CPU/GPU.
This change adds a new platform domain (AKA PSys) to the current
power capping Intel RAPL driver.
PSys also supports PL1 (long term) and PL2 (short term) control like
package domain. This also follows same MSRs for energy and time
units as package domain.
Unlike package domain, PSys support requires more than just processor
level implementation. The other parts in the system need additional
implementation, which OEMs needs to support. So not all Skylake
systems will support PSys.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/powercap/intel_rapl.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index cdfd01f0..2c0235d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 
+/* Local defines */
+#define MSR_PLATFORM_POWER_LIMIT   0x065c
+
 /* bitmasks for RAPL MSRs, used by primitive access functions */
 #define ENERGY_STATUS_MASK  0x
 
@@ -86,6 +89,7 @@ enum rapl_domain_type {
RAPL_DOMAIN_PP0, /* core power plane */
RAPL_DOMAIN_PP1, /* graphics uncore */
RAPL_DOMAIN_DRAM,/* DRAM control_type */
+   RAPL_DOMAIN_PLATFORM, /* PSys control_type */
RAPL_DOMAIN_MAX,
 };
 
@@ -251,9 +255,11 @@ static const char * const rapl_domain_names[] = {
"core",
"uncore",
"dram",
+   "psys",
 };
 
 static struct powercap_control_type *control_type; /* PowerCap Controller */
+static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
 
 /* caller to ensure CPU hotplug lock is held */
 static struct rapl_package *find_package_by_id(int id)
@@ -409,6 +415,14 @@ static const struct powercap_zone_ops zone_ops[] = {
.set_enable = set_domain_enable,
.get_enable = get_domain_enable,
},
+   /* RAPL_DOMAIN_PLATFORM */
+   {
+   .get_energy_uj = get_energy_counter,
+   .get_max_energy_range_uj = get_max_energy_counter,
+   .release = release_zone,
+   .set_enable = set_domain_enable,
+   .get_enable = get_domain_enable,
+   },
 };
 
 static int set_power_limit(struct powercap_zone *power_zone, int id,
@@ -1159,6 +1173,13 @@ static int rapl_unregister_powercap(void)
powercap_unregister_zone(control_type,
&rd_package->power_zone);
}
+
+   if (platform_rapl_domain) {
+   powercap_unregister_zone(control_type,
+&platform_rapl_domain->power_zone);
+   kfree(platform_rapl_domain);
+   }
+
powercap_unregister_control_type(control_type);
 
return 0;
@@ -1238,6 +1259,47 @@ err_cleanup:
return ret;
 }
 
+static int rapl_register_psys(void)
+{
+   struct rapl_domain *rd;
+   struct powercap_zone *power_zone;
+   u64 val;
+
+   if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_ENERGY_STATUS, &val) || !val)
+   return -ENODEV;
+
+   if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_POWER_LIMIT, &val) || !val)
+   return -ENODEV;
+
+   rd = kzalloc(sizeof(*rd), GFP_KERNEL);
+   if (!rd)
+   return -ENOMEM;
+
+   rd->name = rapl_domain_names[RAPL_DOMAIN_PLATFORM];
+   rd->id = RAPL_DOMAIN_PLATFORM;
+   rd->msrs[0] = MSR_PLATFORM_POWER_LIMIT;
+   rd->msrs[1] = MSR_PLATFORM_ENERGY_STATUS;
+   rd->rpl[0].prim_id = PL1_ENABLE;
+   rd->rpl[0].name = pl1_name;
+   rd->rpl[1].prim_id = PL2_ENABLE;
+   rd->rpl[1].name = pl2_name;
+   rd->rp = find_package_by_id(0);
+
+   power_zone = powercap_register_zone(&rd->power_zone, control_type,
+   "psys", NULL,
+   &zone_ops[RAPL_DOMAIN_PLATFORM],
+   2, &constraint_ops);
+
+   if (IS_ERR(power_zone)) {
+   kfree(rd);
+   return PTR_ERR(power_zone);
+   }
+
+   platform_rapl_domain = rd;
+
+   return 0;
+}
+
 static int rapl_register_powercap(void)
 {
struct rapl_domain *rd;
@@ -1254,6 +1316,10 @@ static int rapl_register_powercap(void)
list_for_each_entry(rp, &rapl_packages, plist)
if (rapl_package_register_powercap(rp))
goto err_cleanup_package;
+
+   /* Don't bail out if PSys is not supported */
+   rapl_register_psys();
+
return ret;
 
 err_cleanup_package:
-- 
2.5.0



[PATCH v3 0/2] Skylake PSys support

2016-03-20 Thread Srinivas Pandruvada
v3:
As suggested by tglx adding support first in perf-rapl.
Perf RAPL was missing RAPL support for Skylake
Added support including Psys

v2:
Moved PSYS MSR defines to intel_rapl.c as suggested by Boris

Srinivas Pandruvada (2):
  perf/x86/intel/rapl: support Skylake RAPL domains
  powercap: intel_rapl: PSys support

 arch/x86/events/intel/rapl.c | 50 --
 arch/x86/include/asm/msr-index.h |  2 ++
 drivers/powercap/intel_rapl.c| 66 
 3 files changed, 116 insertions(+), 2 deletions(-)

-- 
2.5.0



[PATCH v3 1/2] perf/x86/intel/rapl: support Skylake RAPL domains

2016-03-20 Thread Srinivas Pandruvada
Added Skylake support for RAPL domains. In addition to RAPL domains in
Broadwell clients, it has support for platform domain (aka PSys).

Also fixed error in comment for gpu counter, which previously was dram
counter.

Signed-off-by: Srinivas Pandruvada 
---
 arch/x86/events/intel/rapl.c | 50 ++--
 arch/x86/include/asm/msr-index.h |  2 ++
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index b834a3f..69904e7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -27,10 +27,14 @@
  *   event: rapl_energy_dram
  *perf code: 0x3
  *
- * dram counter: consumption of the builtin-gpu domain (client only)
+ * gpu counter: consumption of the builtin-gpu domain (client only)
  *   event: rapl_energy_gpu
  *perf code: 0x4
  *
+ *  psys counter: consumption of the builtin-psys domain (client only)
+ *   event: rapl_energy_psys
+ *perf code: 0x5
+ *
  * We manage those counters as free running (read-only). They may be
  * use simultaneously by other tools, such as turbostat.
  *
@@ -64,13 +68,16 @@
 #define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
 #define RAPL_IDX_PP1_NRG_STAT  3   /* gpu */
 #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */
+#define RAPL_IDX_PSYS_NRG_STAT 4   /* psys */
+#define INTEL_RAPL_PSYS0x5 /* pseudo-encoding */
 
-#define NR_RAPL_DOMAINS 0x4
+#define NR_RAPL_DOMAINS 0x5
 static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
"pp0-core",
"package",
"dram",
"pp1-gpu",
+   "psys",
 };
 
 /* Clients have PP0, PKG */
@@ -89,6 +96,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] 
__initconst = {
 1<

Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-20 Thread Guenter Roeck

On 03/20/2016 03:43 AM, Geert Uytterhoeven wrote:

If CONFIG_PM=n:

 drivers/net/usb/lan78xx.c: In function ‘lan78xx_get_stats64’:
 drivers/net/usb/lan78xx.c:3274: error: ‘struct dev_pm_info’ has no member 
named ‘runtime_auto’

If PM is disabled, the runtime_auto flag is not available, but auto
suspend is not enabled anyway.  Hence protect the check for runtime_auto
by #ifdef CONFIG_PM to fix this.

Fixes: a59f8c5b048dc938 ("lan78xx: add ndo_get_stats64")
Reported-by: Guenter Roeck 
Signed-off-by: Geert Uytterhoeven 
---
Alternatively, we can add a dev_pm_runtime_auto_is_enabled() wrapper to
include/linux/pm.h, which always return false if CONFIG_PM is disabled.

The only other user in non-core code (drivers/usb/core/sysfs.c) has a
big #ifdef CONFIG_PM check around all PM-related code.

Thoughts?


Not that it matters anymore since David reverted the original patch,
but my reason for not sending a similar patch was that I wasn't sure
if .runtime_auto should be accessed from drivers in the first place,
or if there is some logical problem with the code.

Guenter


---
  drivers/net/usb/lan78xx.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index d36d5ebf37f355f2..7b9ac47b2ecf9905 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3271,7 +3271,9 @@ struct rtnl_link_stats64 *lan78xx_get_stats64(struct 
net_device *netdev,
 * periodic reading from HW will prevent from entering USB auto suspend.
 * if autosuspend is disabled, read from HW.
 */
+#ifdef CONFIG_PM
if (!dev->udev->dev.power.runtime_auto)
+#endif
lan78xx_update_stats(dev);

mutex_lock(&dev->stats.access_lock);





[GIT PULL] f2fs updates for v4.6

2016-03-20 Thread Jaegeuk Kim
Hi Linus,

I made another pull request which removes the previous wrong commits and adds
a single commit to migrate the f2fs crypto into fs/crypto.

Could you please consider to pull this?

Thanks,

The following changes since commit 4de8ebeff8ddefaceeb7fc6a9b1a514fc9624509:

  Merge tag 'trace-fixes-v4.5-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace (2016-02-22 
14:09:18 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git 
tags/for-f2fs-4.6

for you to fetch changes up to 12bb0a8fd47e6020a7b52dc283a2d855f03d6ef5:

  f2fs: submit node page write bios when really required (2016-03-17 21:19:47 
-0700)


= New Features =
 - uplift filesystem encryption into fs/crypto/
 - give sysfs entries to control memroy consumption

= Enhancement ==
 - aio performance by preallocating blocks in ->write_iter
 - use writepages lock for only WB_SYNC_ALL
 - avoid redundant inline_data conversion
 - enhance forground GC
 - use wait_for_stable_page as possible
 - speed up SEEK_DATA and fiiemap

= Bug Fixes =
 - corner case in terms of -ENOSPC for inline_data
 - hung task caused by long latency in shrinker
 - corruption between atomic write and f2fs_trace_pid
 - avoid garbage lengths in dentries
 - revoke atomicly written pages if an error occurs

In addition, there are various minor bug fixes and clean-ups.


Arnd Bergmann (1):
  f2fs: add missing argument to f2fs_setxattr stub

Chao Yu (33):
  f2fs: relocate is_merged_page
  f2fs: flush dirty nat entries when exceeding threshold
  f2fs: export dirty_nats_ratio in sysfs
  f2fs: correct search area in get_new_segment
  f2fs: enhance foreground GC
  f2fs: simplify f2fs_map_blocks
  f2fs: simplify __allocate_data_blocks
  f2fs: remove unneeded pointer conversion
  f2fs: introduce get_next_page_offset to speed up SEEK_DATA
  f2fs: speed up handling holes in fiemap
  f2fs: introduce f2fs_submit_merged_bio_cond
  f2fs: split drop_inmem_pages from commit_inmem_pages
  f2fs: support revoking atomic written pages
  f2fs crypto: make sure the encryption info is initialized on opendir(2)
  f2fs crypto: handle unexpected lack of encryption keys
  f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink
  f2fs: introduce f2fs_journal struct to wrap journal info
  f2fs: enhance IO path with block plug
  f2fs: split journal cache from curseg cache
  f2fs: reorder nat cache lock in cache_nat_entry
  f2fs: detect error of update_dent_inode in ->rename
  f2fs: fix to delete old dirent in converted inline directory in ->rename
  f2fs: fix the wrong stat count of calling gc
  f2fs: show more info about superblock recovery
  f2fs: try to flush inode after merging inline data
  f2fs: trace old block address for CoWed page
  f2fs: fix incorrect upper bound when iterating inode mapping tree
  f2fs crypto: fix incorrect positioning for GCing encrypted data page
  f2fs: introduce f2fs_update_data_blkaddr for cleanup
  f2fs: introduce f2fs_flush_merged_bios for cleanup
  f2fs: fix to avoid deadlock when merging inline data
  f2fs: clean up opened code with f2fs_update_dentry
  f2fs: fix to avoid unneeded unlock_new_inode

Fan Li (2):
  f2fs: avoid unnecessary search while finding victim in gc
  f2fs: modify the readahead method in ra_node_page()

Hou Pengyang (2):
  f2fs: reconstruct the code to free an extent_node
  f2fs: improve shrink performance of extent nodes

Jaegeuk Kim (32):
  f2fs: remove needless condition check
  f2fs: use writepages->lock for WB_SYNC_ALL
  f2fs: fix to overcome inline_data floods
  f2fs: do f2fs_balance_fs when block is allocated
  f2fs: avoid multiple node page writes due to inline_data
  f2fs: don't need to sync node page at every time
  f2fs: avoid needless sync_inode_page when reading inline_data
  f2fs: don't need to call set_page_dirty for io error
  f2fs: use wait_for_stable_page to avoid contention
  f2fs: use wq_has_sleeper for cp_wait wait_queue
  f2fs: move extent_node list operations being coupled with rbtree operation
  f2fs: don't set cached_en if it will be freed
  f2fs: give scheduling point in shrinking path
  f2fs: wait on page's writeback in writepages path
  f2fs: flush bios to handle cp_error in put_super
  f2fs: fix conflict on page->private usage
  f2fs: move dio preallocation into f2fs_file_write_iter
  f2fs: preallocate blocks for buffered aio writes
  f2fs: increase i_size to avoid missing data
  f2fs crypto: replace some BUG_ON()'s with error checks
  f2fs crypto: fix spelling typo in comment
  f2fs crypto: f2fs_page_crypto() doesn't need a encryption context
  f2fs crypto: ch

[PATCH] Staging: wlan-ng: removed "goto " instructions where this is not necessary.

2016-03-20 Thread Claudiu Beznea
This patch removes "goto " instructions which do only
a return. In this way, aditional instructions were removed.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/cfg80211.c | 112 +
 1 file changed, 39 insertions(+), 73 deletions(-)

diff --git a/drivers/staging/wlan-ng/cfg80211.c 
b/drivers/staging/wlan-ng/cfg80211.c
index 8bad018..63d7c99 100644
--- a/drivers/staging/wlan-ng/cfg80211.c
+++ b/drivers/staging/wlan-ng/cfg80211.c
@@ -62,7 +62,6 @@ static int prism2_result2err(int prism2_result)
err = -EOPNOTSUPP;
break;
default:
-   err = 0;
break;
}
 
@@ -111,13 +110,13 @@ static int prism2_change_virtual_intf(struct wiphy *wiphy,
switch (type) {
case NL80211_IFTYPE_ADHOC:
if (wlandev->macmode == WLAN_MACMODE_IBSS_STA)
-   goto exit;
+   return err;
wlandev->macmode = WLAN_MACMODE_IBSS_STA;
data = 0;
break;
case NL80211_IFTYPE_STATION:
if (wlandev->macmode == WLAN_MACMODE_ESS_STA)
-   goto exit;
+   return err;
wlandev->macmode = WLAN_MACMODE_ESS_STA;
data = 1;
break;
@@ -136,7 +135,6 @@ static int prism2_change_virtual_intf(struct wiphy *wiphy,
 
dev->ieee80211_ptr->iftype = type;
 
-exit:
return err;
 }
 
@@ -146,9 +144,7 @@ static int prism2_add_key(struct wiphy *wiphy, struct 
net_device *dev,
 {
wlandevice_t *wlandev = dev->ml_priv;
u32 did;
-
-   int err = 0;
-   int result = 0;
+   int result;
 
switch (params->cipher) {
case WLAN_CIPHER_SUITE_WEP40:
@@ -157,7 +153,7 @@ static int prism2_add_key(struct wiphy *wiphy, struct 
net_device *dev,

DIDmib_dot11smt_dot11PrivacyTable_dot11WEPDefaultKeyID,
key_index);
if (result)
-   goto exit;
+   return -EFAULT;
 
/* send key to driver */
switch (key_index) {
@@ -178,26 +174,22 @@ static int prism2_add_key(struct wiphy *wiphy, struct 
net_device *dev,
break;
 
default:
-   err = -EINVAL;
-   goto exit;
+   return -EINVAL;
}
 
result = prism2_domibset_pstr32(wlandev, did,
params->key_len, params->key);
if (result)
-   goto exit;
+   return -EFAULT;
+
break;
 
default:
pr_debug("Unsupported cipher suite\n");
-   result = 1;
+   return -EFAULT;
}
 
-exit:
-   if (result)
-   err = -EFAULT;
-
-   return err;
+   return 0;
 }
 
 static int prism2_get_key(struct wiphy *wiphy, struct net_device *dev,
@@ -235,8 +227,7 @@ static int prism2_del_key(struct wiphy *wiphy, struct 
net_device *dev,
 {
wlandevice_t *wlandev = dev->ml_priv;
u32 did;
-   int err = 0;
-   int result = 0;
+   int result;
 
/* There is no direct way in the hardware (AFAIK) of removing
 * a key, so we will cheat by setting the key to a bogus value
@@ -265,35 +256,30 @@ static int prism2_del_key(struct wiphy *wiphy, struct 
net_device *dev,
break;
 
default:
-   err = -EINVAL;
-   goto exit;
+   return -EINVAL;
}
 
result = prism2_domibset_pstr32(wlandev, did, 13, "0");
-
-exit:
if (result)
-   err = -EFAULT;
+   return -EFAULT;
 
-   return err;
+   return 0;
 }
 
 static int prism2_set_default_key(struct wiphy *wiphy, struct net_device *dev,
  u8 key_index, bool unicast, bool multicast)
 {
wlandevice_t *wlandev = dev->ml_priv;
-
-   int err = 0;
-   int result = 0;
+   int result;
 
result = prism2_domibset_uint32(wlandev,
DIDmib_dot11smt_dot11PrivacyTable_dot11WEPDefaultKeyID,
key_index);
 
if (result)
-   err = -EFAULT;
+   return -EFAULT;
 
-   return err;
+   return 0;
 }
 
 static int prism2_get_station(struct wiphy *wiphy, struct net_device *dev,
@@ -451,7 +437,6 @@ static int prism2_set_wiphy_params(struct wiphy *wiphy, u32 
changed)
wlandevice_t *wlandev = priv->wlandev;
u32 data;
int result;
-   int err = 0;
 
if (changed & WIPHY_PARAM_RTS_THRESHOLD) {
if (wiphy->rts_threshold == -1)
@@ -462,10 +447,8 @@ static int prism2_set_wiphy_params(struct wiphy *wiphy, 
u32 changed)
result = prism2_domibset_uint32(wla

[PATCH] PKCS#7: pkcs7_validate_trust(): initialize the _trusted output argument

2016-03-20 Thread Nicolai Stange
Despite what the DocBook comment to pkcs7_validate_trust() says, the
*_trusted argument is never set to false.

pkcs7_validate_trust() only positively sets *_trusted upon encountering
a trusted PKCS#7 SignedInfo block.

This is quite unfortunate since its callers, system_verify_data() for
example, depend on pkcs7_validate_trust() clearing *_trusted on non-trust.

Indeed, UBSAN splats when attempting to load the uninitialized local
variable 'trusted' from system_verify_data() in pkcs7_validate_trust():

  UBSAN: Undefined behaviour in crypto/asymmetric_keys/pkcs7_trust.c:194:14
  load of value 82 is not a valid value for type '_Bool'
  [...]
  Call Trace:
[] dump_stack+0xbc/0x117
[] ? _atomic_dec_and_lock+0x169/0x169
[] ubsan_epilogue+0xd/0x4e
[] __ubsan_handle_load_invalid_value+0x111/0x158
[] ? val_to_string.constprop.12+0xcf/0xcf
[] ? x509_request_asymmetric_key+0x114/0x370
[] ? kfree+0x220/0x370
[] ? public_key_verify_signature_2+0x32/0x50
[] pkcs7_validate_trust+0x524/0x5f0
[] system_verify_data+0xca/0x170
[] ? top_trace_array+0x9b/0x9b
[] ? __vfs_read+0x279/0x3d0
[] mod_verify_sig+0x1ff/0x290
[...]

The implication is that pkcs7_validate_trust() effectively grants trust
when it really shouldn't have.

Fix this by explicitly setting *_trusted to false at the very beginning
of pkcs7_validate_trust().

Signed-off-by: Nicolai Stange 
---
 Applicable to linux-next-20160318

 crypto/asymmetric_keys/pkcs7_trust.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/asymmetric_keys/pkcs7_trust.c 
b/crypto/asymmetric_keys/pkcs7_trust.c
index 3bbdcc7..7d7a39b4 100644
--- a/crypto/asymmetric_keys/pkcs7_trust.c
+++ b/crypto/asymmetric_keys/pkcs7_trust.c
@@ -178,6 +178,8 @@ int pkcs7_validate_trust(struct pkcs7_message *pkcs7,
int cached_ret = -ENOKEY;
int ret;
 
+   *_trusted = false;
+
for (p = pkcs7->certs; p; p = p->next)
p->seen = false;
 
-- 
2.7.3



Re: [PATCH 01/71] arc: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

2016-03-20 Thread Hugh Dickins
On Sun, 20 Mar 2016, Linus Torvalds wrote:
> On Sun, Mar 20, 2016 at 12:34 PM, Kirill A. Shutemov
>  wrote:
> >
> > Hm. Okay. Re-split this way would take some time. I'll post updated
> > patchset tomorrow.
> 
> Oh, I was assuming this was automated with coccinelle or at least some
> simple shell scripting..
> 
> Generally, for things like this, automation really is great.
> 
> In fact, I like it when people attach the scripts to the commit
> message, further clarifying exactly what they did (even if the end
> result then often includes manual fixups for patterns that didn't
> _quite_ match, or where the automated script just generated ugly
> indentation or similar).

Fine by me to make these changes - once upon a time I had a better
grip than most of when and how to use PAGE_CACHE_blah; but have long
lost it, and agree with all those who find the imaginary distinction
now a drag.

Just a plea, which I expect you already intend, to apply these changes
either just before 4.6-rc1 or just before 4.7-rc1 (I think I'd opt for
4.6-rc1 myself), without any interim of days or months in linux-next,
where a period of divergence would be quite tiresome.  Holding back
Kirill's 71/71 until the coast is clear just a little later.

Thanks,
Hugh


  1   2   3   4   5   >