Re: [PATCH] badblocks: fix overlapping check for clearing

2016-10-06 Thread NeilBrown
On Tue, Sep 06 2016, Tomasz Majchrzak wrote:

> Current bad block clear implementation assumes the range to clear
> overlaps with at least one bad block already stored. If given range to
> clear precedes first bad block in a list, the first entry is incorrectly
> updated.

In the original md context, it would only ever be called on a block that
was already in the list.
But you are right that it is best not to assume this, and to code more
safely.



>
> Check not only if stored block end is past clear block end but also if
> stored block start is before clear block end.
>
> Signed-off-by: Tomasz Majchrzak 

Dan Williams seems to have taken responsibility for this code through
his nvdimm tree, so I've added him to 'cc' in the hope that he looks at
this (I wonder if he is even on linux-block )


> ---
>  block/badblocks.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/block/badblocks.c b/block/badblocks.c
> index 7be53cb..b2ffcc7 100644
> --- a/block/badblocks.c
> +++ b/block/badblocks.c
> @@ -354,7 +354,8 @@ int badblocks_clear(struct badblocks *bb, sector_t s, int 
> sectors)
>* current range.  Earlier ranges could also overlap,
>* but only this one can overlap the end of the range.
>*/
> - if (BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > target) {
> + if ((BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > target) &&
> + (BB_OFFSET(p[lo]) <= target)) {

hmmm..
'target' is the sector just beyond the set of sectors to remove from the
list.
BB_OFFSET(p[lo]) is the first sector in a range that was found in the
list.
If these are equal, then are aren't clearing anything in this range.
So I would have '<', not '<='.

I don't think this makes the code wrong as we end up assigning to p[lo]
the value that is already there.  But it might be confusing.


>   /* Partial overlap, leave the tail of this range */
>   int ack = BB_ACK(p[lo]);
>   sector_t a = BB_OFFSET(p[lo]);
> @@ -377,7 +378,8 @@ int badblocks_clear(struct badblocks *bb, sector_t s, int 
> sectors)
>   lo--;
>   }
>   while (lo >= 0 &&
> -BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > s) {
> +(BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > s) &&
> +(BB_OFFSET(p[lo]) <= target)) {

Ditto.

But the code is, I think, correct. Just not how I would have written it.
So

 Acked-by: NeilBrown 

Thanks,
NeilBrown


>   /* This range does overlap */
>   if (BB_OFFSET(p[lo]) < s) {
>   /* Keep the early parts of this range. */
> -- 
> 1.8.3.1


signature.asc
Description: PGP signature


loop mount: kernel BUG at lib/percpu-refcount.c:231

2016-10-06 Thread Dave Young
Hi,

Below bug happened to me while loop mount a file image after stopping a
kvm guest. But it only happend once til now..

[ 4761.031686] [ cut here ]
[ 4761.075984] kernel BUG at lib/percpu-refcount.c:231!
[ 4761.120184] invalid opcode:  [#1] SMP
[ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse 
snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 
rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect 
sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds 
btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 
bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer 
thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc
[ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168
[ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 
) 12/04/2015
[ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000
[ 4761.490120] RIP: 0010:[]  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4761.548138] RSP: 0018:a8584252bb38  EFLAGS: 00010246
[ 4761.604673] RAX:  RBX: 986fbdca3200 RCX: 
[ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958
[ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008
[ 4761.779270] R10:  R11:  R12: 
[ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200
[ 4761.895870] FS:  7fc286eb4640() GS:98711f24() 
knlGS:
[ 4761.954596] CS:  0010 DS:  ES:  CR0: 80050033
[ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0
[ 4762.072454] Stack:
[ 4762.131283]  9870f2f37800 9870c8e46000 9870fa22c880 
a8584252bbb8
[ 4762.190776]  ae2a147c ba169577 986fbdca3200 
9870fa22c870
[ 4762.251149]  9870fa22c800 a8584252bb90 ae2b3294 
a8584252bbc8
[ 4762.311657] Call Trace:
[ 4762.371157]  [] ? kobject_uevent_env+0xfc/0x3b0
[ 4762.431483]  [] percpu_ref_switch_to_percpu+0x14/0x20
[ 4762.492093]  [] blk_register_queue+0xbe/0x120
[ 4762.552727]  [] device_add_disk+0x1c4/0x470
[ 4762.614155]  [] loop_add+0x1d9/0x260 [loop]
[ 4762.674042]  [] loop_init+0x119/0x16c [loop]
[ 4762.733949]  [] ? 0xc02ff000
[ 4762.793563]  [] do_one_initcall+0x4b/0x180
[ 4762.853068]  [] ? free_vmap_area_noflush+0x43/0xb0
[ 4762.913665]  [] do_init_module+0x55/0x1c4
[ 4762.973400]  [] load_module+0x1fc4/0x23e0
[ 4763.033545]  [] ? __symbol_put+0x60/0x60
[ 4763.094281]  [] SYSC_init_module+0x138/0x150
[ 4763.154985]  [] SyS_init_module+0x9/0x10
[ 4763.215577]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 
a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b 
e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 
[ 4763.342964] RIP  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4763.407151]  RSP 

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blockdev kernel regression (bugzilla 173031)

2016-10-06 Thread NeilBrown
On Thu, Oct 06 2016, Francesco Dolcini wrote:

> On Thu, Oct 06, 2016 at 04:42:52PM +1100, NeilBrown wrote:
> cc
>> Maybe there is a race, but that seems unlikely.
>
> Consider that just hot removal while writing is not enough to 
> reproduce systematically the bug.
>
> while true; do [ ! -f /media/usb/.not_mounted ] \
>   && dd if=/dev/zero of=/media/usb/aaa bs=1k \
>   count=1 2>/dev/null && echo -n '*' ; done
>
> with lazy umount by mdev on USB flash drive removal
>
> reproduce the problem pretty always
>
>> The vfat issue is different, and is only a warning.
> Why you say is only a warning? Here the Oops with vfat on ARM/i.MX6:

I looked at:

x86 Oops information (with vfat):

in the bugzilla and didn't realized there was another one further down.
That first vfat on is just a warning.
>
>> > Regression is on commit 6cd18e7 ("block: destroy bdi before blockdev is 
>> > unregistered.")
>> >
>> > Commit: bdfe0cbd746a ("Revert "ext4: remove block_device_ejected") is 
>> > already present on 4.1 stable I am currently working on (2a6f417 on 4.1 
>> > branch)
>> >
>> > I wonder if commit b02176f ("block: don't release bdi while request_queue 
>> > has live references") is the correct fix for this also in kernel 4.1.
>> 
>> Maybe.  It is worth a try.
>> 
>> Below is a a backport to 4.1.33.  It compiles, but I haven't tested.
>> If it works for you, I can recommend it for -stable.
>
> I confirm that it works!

Thanks.

NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Paolo Valente

> Il giorno 06 ott 2016, alle ore 21:57, Shaohua Li  ha scritto:
> 
> On Thu, Oct 06, 2016 at 09:58:44AM +0200, Paolo Valente wrote:
>> 
>>> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li  ha scritto:
>>> 
>>> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote:
 
> Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li  ha 
> scritto:
> 
> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote:
>> Hello, Paolo.
>> 
>> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote:
>>> In this respect, for your generic, unpredictable scenario to make
>>> sense, there must exist at least one real system that meets the
>>> requirements of such a scenario.  Or, if such a real system does not
>>> yet exist, it must be possible to emulate it.  If it is impossible to
>>> achieve this last goal either, then I miss the usefulness
>>> of looking for solutions for such a scenario.
>>> 
>>> That said, let's define the instance(s) of the scenario that you find
>>> most representative, and let's test BFQ on it/them.  Numbers will give
>>> us the answers.  For example, what about all or part of the following
>>> groups:
>>> . one cyclically doing random I/O for some second and then sequential 
>>> I/O
>>> for the next seconds
>>> . one doing, say, quasi-sequential I/O in ON/OFF cycles
>>> . one starting an application cyclically
>>> . one playing back or streaming a movie
>>> 
>>> For each group, we could then measure the time needed to complete each
>>> phase of I/O in each cycle, plus the responsiveness in the group
>>> starting an application, plus the frame drop in the group streaming
>>> the movie.  In addition, we can measure the bandwidth/iops enjoyed by
>>> each group, plus, of course, the aggregate throughput of the whole
>>> system.  In particular we could compare results with throttling, BFQ,
>>> and CFQ.
>>> 
>>> Then we could write resulting numbers on the stone, and stick to them
>>> until something proves them wrong.
>>> 
>>> What do you (or others) think about it?
>> 
>> That sounds great and yeah it's lame that we didn't start with that.
>> Shaohua, would it be difficult to compare how bfq performs against
>> blk-throttle?
> 
> I had a test of BFQ.
 
 Thank you very much for testing BFQ!
 
> I'm using BFQ found at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78=
>  . version is
> 4.7.0-v8r3.
 
 That's the latest stable version.  The development version [1] already
 contains further improvements for fairness, latency and throughput.
 It is however still a release candidate.
 
 [1] https://github.com/linusw/linux-bfq/tree/bfq-v8
 
> It's a LSI SSD, queue depth 32. I use default setting. fio script
> is:
> 
> [global]
> ioengine=libaio
> direct=1
> readwrite=randread
> bs=4k
> runtime=60
> time_based=1
> file_service_type=random:36
> overwrite=1
> thread=0
> group_reporting=1
> filename=/dev/sdb
> iodepth=1
> numjobs=8
> 
> [groupA]
> prio=2
> 
> [groupB]
> new_group
> prio=6
> 
> I'll change iodepth, numjobs and prio in different tests. result unit is 
> MB/s.
> 
> iodepth=1 numjobs=1 prio 4:4
> CFQ: 28:28 BFQ: 21:21 deadline: 29:29
> 
> iodepth=8 numjobs=1 prio 4:4
> CFQ: 162:162 BFQ: 102:98 deadline: 205:205
> 
> iodepth=1 numjobs=8 prio 4:4
> CFQ: 157:157 BFQ: 81:92 deadline: 196:197
> 
> iodepth=1 numjobs=1 prio 2:6
> CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29
> 
> iodepth=8 numjobs=1 prio 2:6
> CFQ: 166:174 BFQ: 139:72  deadline: 202:202
> 
> iodepth=1 numjobs=8 prio 2:6
> CFQ: 148:150 BFQ: 90:77 deadline: 198:197
> 
> CFQ isn't fair at all. BFQ is very good in this side, but has poor 
> throughput
> even prio is the default value.
> 
 
 Throughput is lower with BFQ for two reasons.
 
 First, you certainly left the low_latency in its default state, i.e.,
 on.  As explained, e.g., here [2], low_latency mode is totally geared
 towards maximum responsiveness and minimum latency for soft real-time
 applications (e.g., video players).  To achieve this goal, BFQ is
 willing to perform more idling, when necessary.  This lowers
 throughput (I'll get back on this at the end of the discussion of the
 second reason).
>>> 
>>> changing low_latency to 0 seems not change anything, at least for the test:
>>> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k
>>> 
 The second, most 

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Paolo Valente

> Il giorno 06 ott 2016, alle ore 20:32, Vivek Goyal  ha 
> scritto:
> 
> On Thu, Oct 06, 2016 at 08:01:42PM +0200, Paolo Valente wrote:
>> 
>>> Il giorno 06 ott 2016, alle ore 19:49, Vivek Goyal  ha 
>>> scritto:
>>> 
>>> On Thu, Oct 06, 2016 at 03:15:50PM +0200, Paolo Valente wrote:
>>> 
>>> [..]
 Shaohua, I have just realized that I have unconsciously defended a
 wrong argument.  Although all the facts that I have reported are
 evidently true, I have argued as if the question was: "do we need to
 throw away throttling because there is proportional, or do we need to
 throw away proportional share because there is throttling?".  This
 question is simply wrong, as I think consciously (sorry for my
 dissociated behavior :) ).
>>> 
>>> I was wondering about the same. We need both and both should be able 
>>> to work with fast devices of today using blk-mq interfaces without
>>> much overhead.
>>> 
 
 The best goal to achieve is to have both a good throttling mechanism,
 and a good proportional share scheduler.  This goal would be valid if
 even if there was just one important scenario for each of the two
 approaches.  The vulnus here is that you guys are constantly, and
 rightly, working on solutions to achieve and consolidate reasonable
 QoS guarantees, but an apparently very good proportional-share
 scheduler has been kept off for years.  If you (or others) have good
 arguments to support this state of affairs, then this would probably
 be an important point to discuss.
>>> 
>>> Paolo, CFQ is legacy now and if we can come up with a proportional
>>> IO mechanism which works reasonably well with fast devices using
>>> blk-mq interfaces, that will be much more interesting.
>>> 
>> 
>> That's absolutely true.  But, why do we pretend not to know that, for
>> (at least) hundreds of thousands of users Linux will go on giving bad
>> responsiveness, starvation, high latency and unfairness, until blk
>> will not be used any more (assuming that these problems will somehow
>> disappear will blk-mq).  Many of these users are fully aware of these
>> Linux long-standing problems.  We could solve these problems by just
>> adding a scheduler that has already been adopted, and thus extensively
>> tested, by thousands of users.  And more and more people are aware of
>> this fact too.  Are we doing the right thing?
> 
> Hi Paolo,
> 

Hi

> People have been using CFQ for many years.

Yes, but allow me just to add that a lot of people have also been
unhappy with CFQ for many years.

> I am not sure if benefits 
> offered by BFQ over CFQ are significant enough to justify taking a
> completely new code and get rid of CFQ. Or are the benfits significant
> enough that one feels like putting time and effort into this and
> take chances wiht new code.
> 

Although I think that BFQ's benefits are relevant (but I'm a little
bit an interested party :) ), I do agree that abruptly replacing the
most used I/O scheduler (AFAIK) with a so different one is at least a
little risky.

> At this point of time replacing CFQ with something better is not a
> priority for me.

ok

> But if something better and stable goes upstream, I
> will gladly use it.
> 

Then, in case of success, I will be glad to receive some feedback from
you, and possibly use it to improve the set of ideas that we have put
into BFQ.

Thank you,
Paolo

> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Paolo Valente
Algogroup
Dipartimento di Scienze Fisiche, Informatiche e Matematiche
Via Campi 213/B
41125 Modena - Italy
http://algogroup.unimore.it/people/paolo/





--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blk-mq: Return invalid cookie if bio was split

2016-10-06 Thread Ming Lei
Hi Keith,

On Thu, Oct 6, 2016 at 12:51 AM, Keith Busch  wrote:
> On Wed, Oct 05, 2016 at 11:19:39AM +0800, Ming Lei wrote:
>> But .poll() need to check if the specific request is completed or not,
>> then blk_poll() can set 'current' as RUNNING if it is completed.
>>
>> blk_poll():
>> ...
>> ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
>> if (ret > 0) {
>> hctx->poll_success++;
>> set_current_state(TASK_RUNNING);
>> return true;
>> }
>
>
> Right, but the task could be waiting on a whole lot more than just that
> one tag, so setting the task to running before knowing those all complete
> doesn't sound right.
>
>> I am glad to take a look the patch if you post it out.
>
> Here's what I got for block + nvme. It relies on all the requests to
> complete to set the task back to running.

Yeah, but your patch doesn't add that change, and looks 'task_struct *'
in submission path need to be stored in request or somewhere else.

There are some issues with current polling approach:
- in dio, one dio may include lots of bios, but only the last submitted bio
is polled
- one bio can be splitted into several bios, but submit_bio() only returns
one cookie for polling

Looks your approach via polling current state can fix this issue.

>
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b993f88..3c1cfbf 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3342,6 +3342,8 @@ EXPORT_SYMBOL(blk_finish_plug);
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>  {
> struct blk_plug *plug;
> +   struct blk_mq_hw_ctx *hctx;
> +   unsigned int queue_num;
> long state;
>
> if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
> @@ -3353,27 +3355,15 @@ bool blk_poll(struct request_queue *q, blk_qc_t 
> cookie)
> blk_flush_plug_list(plug, false);
>
> state = current->state;
> +   queue_num = blk_qc_t_to_queue_num(cookie);
> +   hctx = q->queue_hw_ctx[queue_num];
> while (!need_resched()) {
> -   unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
> -   struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
> -   int ret;
> -
> -   hctx->poll_invoked++;
> -
> -   ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
> -   if (ret > 0) {
> -   hctx->poll_success++;
> -   set_current_state(TASK_RUNNING);
> -   return true;
> -   }
> +   q->mq_ops->poll(hctx);
>
> if (signal_pending_state(state, current))
> set_current_state(TASK_RUNNING);
> -
> if (current->state == TASK_RUNNING)
> return true;
> -   if (ret < 0)
> -   break;
> cpu_relax();
> }

Then looks the whole hw queue is polled and only the queue num
in cookie matters.

In theory, there might be one race:

- one dio need to submit several bios(suppose there are two bios: A and B)
- A is submitted to hardware queue M
- B is submitted to hardware queue N because the current task is migrated
to another CPU
- then only hardware queue N is polled

>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index befac5b..2e359e0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -649,7 +651,7 @@ static inline bool nvme_cqe_valid(struct nvme_queue 
> *nvmeq, u16 head,
> return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
>  }
>
> -static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
> +static void nvme_process_cq(struct nvme_queue *nvmeq)
>  {
> u16 head, phase;
>
> @@ -665,9 +667,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, 
> unsigned int *tag)
> phase = !phase;
> }
>
> -   if (tag && *tag == cqe.command_id)
> -   *tag = -1;
> -
> if (unlikely(cqe.command_id >= nvmeq->q_depth)) {
> dev_warn(nvmeq->dev->ctrl.device,
> "invalid id %d completed on queue %d\n",
> @@ -711,11 +710,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, 
> unsigned int *tag)
> nvmeq->cqe_seen = 1;
>  }
>
> -static void nvme_process_cq(struct nvme_queue *nvmeq)
> -{
> -   __nvme_process_cq(nvmeq, NULL);
> -}
> -
>  static irqreturn_t nvme_irq(int irq, void *data)
>  {
> irqreturn_t result;
> @@ -736,20 +730,15 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
> return IRQ_NONE;
>  }
>
> -static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +static void nvme_poll(struct blk_mq_hw_ctx *hctx)
>  {
> struct nvme_queue *nvmeq = 

Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-06 Thread Wouter Verhelst
On Thu, Oct 06, 2016 at 06:16:30AM -0700, Christoph Hellwig wrote:
> On Thu, Oct 06, 2016 at 03:09:49PM +0200, Wouter Verhelst wrote:
> > Okay, I've updated the proto.md file then, to clarify that in the case
> > of multiple connections, a client MUST NOT send a flush request until it
> > has seen the replies to the write requests that it cares about. That
> > should be enough for now.
> 
> How do you guarantee that nothing has been reordered or even lost even for
> a single connection?

In the case of a single connection, we already stated that the flush
covers the write requests for which a reply has already been sent out by
the time the flush reply is sent out. On a single connection, there is
no way an implementation can comply with the old requirement but not the
new one.

We do not guarantee any ordering beyond that; and lost requests would be
a bug in the server.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Austin S. Hemmelgarn

On 2016-10-06 08:50, Paolo Valente wrote:



Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarn 
 ha scritto:

On 2016-10-06 07:03, Mark Brown wrote:

On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote:

On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo  wrote:



I get that bfq can be a good compromise on most desktop workloads and
behave reasonably well for some server workloads with the slice
expiration mechanism but it really isn't an IO resource partitioning
mechanism.



Not just desktops, also Android phones.



So why not have BFQ as a separate scheduling policy upstream,
alongside CFQ, deadline and noop?


Right.


We're already doing the per-usecase Kconfig thing for preemption.
But maybe somebody already hates that and want to get rid of it,
I don't know.


Hannes also suggested going back to making BFQ a separate scheduler
rather than replacing CFQ earlier, pointing out that it mitigates
against the risks of changing CFQ substantially at this point (which
seems to be the biggest issue here).


ISTR that the original argument for this approach essentially amounted to: 'If 
it's so much better, why do we need both?'.

Such an argument is valid only if the new design is better in all respects 
(which there isn't sufficient information to decide in this case), or the 
negative aspects are worth the improvements (which is too workload specific to 
decide for something like this).


All correct, apart from the workload-specific issue, which is not very clear to 
me. Over the last five years I have not found a single workload for which CFQ 
is better than BFQ, and none has been suggested.
My point is that whether or not BFQ is better depends on the workload. 
You can't test for every workload, so you can't say definitively that 
BFQ is better for every workload.  At a minimum, there are workloads 
where the deadline and noop schedulers are better, but they're very 
domain specific workloads.  Based on the numbers from Shaohua, it looks 
like CFQ has better throughput than BFQ, and that will affect some 
workloads (for most, the improved fairness is worth the reduced 
throughput, but there probably are some cases where it isn't).


Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: 
"we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, 
and thus replace, CFQ"
I agree that that's part of the issue, but I also don't entirely agree 
with the reasoning on it.  Until blk-mq has proper I/O scheduling, 
people will continue to use CFQ, and based on the way things are going, 
it will be multiple months before that happens, whereas BFQ exists and 
is working now.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Paolo Valente

> Il giorno 06 ott 2016, alle ore 09:58, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li  ha scritto:
>> 
>> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote:
>>> 
 Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li  ha scritto:
 
 On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote:
> Hello, Paolo.
> 
> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote:
>> In this respect, for your generic, unpredictable scenario to make
>> sense, there must exist at least one real system that meets the
>> requirements of such a scenario.  Or, if such a real system does not
>> yet exist, it must be possible to emulate it.  If it is impossible to
>> achieve this last goal either, then I miss the usefulness
>> of looking for solutions for such a scenario.
>> 
>> That said, let's define the instance(s) of the scenario that you find
>> most representative, and let's test BFQ on it/them.  Numbers will give
>> us the answers.  For example, what about all or part of the following
>> groups:
>> . one cyclically doing random I/O for some second and then sequential I/O
>> for the next seconds
>> . one doing, say, quasi-sequential I/O in ON/OFF cycles
>> . one starting an application cyclically
>> . one playing back or streaming a movie
>> 
>> For each group, we could then measure the time needed to complete each
>> phase of I/O in each cycle, plus the responsiveness in the group
>> starting an application, plus the frame drop in the group streaming
>> the movie.  In addition, we can measure the bandwidth/iops enjoyed by
>> each group, plus, of course, the aggregate throughput of the whole
>> system.  In particular we could compare results with throttling, BFQ,
>> and CFQ.
>> 
>> Then we could write resulting numbers on the stone, and stick to them
>> until something proves them wrong.
>> 
>> What do you (or others) think about it?
> 
> That sounds great and yeah it's lame that we didn't start with that.
> Shaohua, would it be difficult to compare how bfq performs against
> blk-throttle?
 
 I had a test of BFQ.
>>> 
>>> Thank you very much for testing BFQ!
>>> 
 I'm using BFQ found at
 https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78=
  . version is
 4.7.0-v8r3.
>>> 
>>> That's the latest stable version.  The development version [1] already
>>> contains further improvements for fairness, latency and throughput.
>>> It is however still a release candidate.
>>> 
>>> [1] https://github.com/linusw/linux-bfq/tree/bfq-v8
>>> 
 It's a LSI SSD, queue depth 32. I use default setting. fio script
 is:
 
 [global]
 ioengine=libaio
 direct=1
 readwrite=randread
 bs=4k
 runtime=60
 time_based=1
 file_service_type=random:36
 overwrite=1
 thread=0
 group_reporting=1
 filename=/dev/sdb
 iodepth=1
 numjobs=8
 
 [groupA]
 prio=2
 
 [groupB]
 new_group
 prio=6
 
 I'll change iodepth, numjobs and prio in different tests. result unit is 
 MB/s.
 
 iodepth=1 numjobs=1 prio 4:4
 CFQ: 28:28 BFQ: 21:21 deadline: 29:29
 
 iodepth=8 numjobs=1 prio 4:4
 CFQ: 162:162 BFQ: 102:98 deadline: 205:205
 
 iodepth=1 numjobs=8 prio 4:4
 CFQ: 157:157 BFQ: 81:92 deadline: 196:197
 
 iodepth=1 numjobs=1 prio 2:6
 CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29
 
 iodepth=8 numjobs=1 prio 2:6
 CFQ: 166:174 BFQ: 139:72  deadline: 202:202
 
 iodepth=1 numjobs=8 prio 2:6
 CFQ: 148:150 BFQ: 90:77 deadline: 198:197
 
 CFQ isn't fair at all. BFQ is very good in this side, but has poor 
 throughput
 even prio is the default value.
 
>>> 
>>> Throughput is lower with BFQ for two reasons.
>>> 
>>> First, you certainly left the low_latency in its default state, i.e.,
>>> on.  As explained, e.g., here [2], low_latency mode is totally geared
>>> towards maximum responsiveness and minimum latency for soft real-time
>>> applications (e.g., video players).  To achieve this goal, BFQ is
>>> willing to perform more idling, when necessary.  This lowers
>>> throughput (I'll get back on this at the end of the discussion of the
>>> second reason).
>> 
>> changing low_latency to 0 seems not change anything, at least for the test:
>> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k
>> 
>>> The second, most important reason, is that a minimum of idling is the
>>> *only* way to achieve differentiated bandwidth distribution, as you
>>> requested by setting different ioprios.  I stress that this 

Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-06 Thread Christoph Hellwig
On Thu, Oct 06, 2016 at 03:09:49PM +0200, Wouter Verhelst wrote:
> Okay, I've updated the proto.md file then, to clarify that in the case
> of multiple connections, a client MUST NOT send a flush request until it
> has seen the replies to the write requests that it cares about. That
> should be enough for now.

How do you guarantee that nothing has been reordered or even lost even for
a single connection?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Paolo Valente

> Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarn 
>  ha scritto:
> 
> On 2016-10-06 07:03, Mark Brown wrote:
>> On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote:
>>> On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo  wrote:
>> 
 I get that bfq can be a good compromise on most desktop workloads and
 behave reasonably well for some server workloads with the slice
 expiration mechanism but it really isn't an IO resource partitioning
 mechanism.
>> 
>>> Not just desktops, also Android phones.
>> 
>>> So why not have BFQ as a separate scheduling policy upstream,
>>> alongside CFQ, deadline and noop?
>> 
>> Right.
>> 
>>> We're already doing the per-usecase Kconfig thing for preemption.
>>> But maybe somebody already hates that and want to get rid of it,
>>> I don't know.
>> 
>> Hannes also suggested going back to making BFQ a separate scheduler
>> rather than replacing CFQ earlier, pointing out that it mitigates
>> against the risks of changing CFQ substantially at this point (which
>> seems to be the biggest issue here).
>> 
> ISTR that the original argument for this approach essentially amounted to: 
> 'If it's so much better, why do we need both?'.
> 
> Such an argument is valid only if the new design is better in all respects 
> (which there isn't sufficient information to decide in this case), or the 
> negative aspects are worth the improvements (which is too workload specific 
> to decide for something like this).

All correct, apart from the workload-specific issue, which is not very clear to 
me. Over the last five years I have not found a single workload for which CFQ 
is better than BFQ, and none has been suggested.

Anyway, leaving aside this fact, IMO the real problem here is that we are in a 
catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you 
cannot change, and thus replace, CFQ"

Thanks,
Paolo

--
Paolo Valente
Algogroup
Dipartimento di Scienze Fisiche, Informatiche e Matematiche
Via Campi 213/B
41125 Modena - Italy
http://algogroup.unimore.it/people/paolo/





--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Mark Brown
On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote:
> On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo  wrote:

> > I get that bfq can be a good compromise on most desktop workloads and
> > behave reasonably well for some server workloads with the slice
> > expiration mechanism but it really isn't an IO resource partitioning
> > mechanism.

> Not just desktops, also Android phones.

> So why not have BFQ as a separate scheduling policy upstream,
> alongside CFQ, deadline and noop?

Right.

> We're already doing the per-usecase Kconfig thing for preemption.
> But maybe somebody already hates that and want to get rid of it,
> I don't know.

Hannes also suggested going back to making BFQ a separate scheduler
rather than replacing CFQ earlier, pointing out that it mitigates
against the risks of changing CFQ substantially at this point (which
seems to be the biggest issue here).


signature.asc
Description: PGP signature


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-06 Thread Wouter Verhelst
On Thu, Oct 06, 2016 at 10:41:36AM +0100, Alex Bligh wrote:
> Wouter,
[...]
> > Given that, given the issue in the previous
> > paragraph, and given the uncertainty introduced with multiple
> > connections, I think it is reasonable to say that a client should just
> > not assume a flush touches anything except for the writes for which it
> > has already received a reply by the time the flush request is sent out.
> 
> OK. So you are proposing weakening the semantic for flush (saying that
> it is only guaranteed to cover those writes for which the client has
> actually received a reply prior to sending the flush, as opposed to
> prior to receiving the flush reply). This is based on the view that
> the Linux kernel client wouldn't be affected, and if other clients
> were affected, their behaviour would be 'somewhat unusual'.

Right.

> We do have one significant other client out there that uses flush
> which is Qemu. I think we should get a view on whether they would be
> affected.

That's certainly something to consider, yes.

> > Those are semantics that are actually useful and can be guaranteed in
> > the face of multiple connections. Other semantics can not.
> 
> Well there is another semantic which would work just fine, and also
> cures the other problem (synchronisation between channels) which would
> be simply that flush is only guaranteed to affect writes issued on the
> same channel. Then flush would do the natural thing, i.e. flush
> all the writes that had been done *on that channel*.

That is an option, yes, but the natural result will be that you issue N
flush requests, rather than one, which I'm guessing will kill
performance. Therefore, I'd prefer not to go down that route.

[...]
> > It is indeed impossible for a server to know what has been received by
> > the client by the time it (the client) sent out the flush request.
> > However, the server doesn't need that information, at all. The flush
> > request's semantics do not say that any request not covered by the flush
> > request itself MUST NOT have hit disk; instead, it just says that there
> > is no guarantee on whether or not that is the case. That's fine; all a
> > server needs to know is that when it receives a flush, it needs to
> > fsync() or some such, and then send the reply. All a *client* needs to
> > know is which requests have most definitely hit the disk. In my
> > proposal, those are the requests that finished before the flush request
> > was sent, and not the requests that finished between that and when the
> > flush reply is received. Those are *likely* to also be covered
> > (especially on single-connection NBD setups), but in my proposal,
> > they're no longer *guaranteed* to be.
> 
> I think my objection was more that you were writing mandatory language
> for a server's behaviour based on what the client perceives.
> 
> What you are saying from the client's point of view is that it under
> your proposed change it can only rely on that writes in respect of
> which the reply has been received prior to issuing the flush are persisted
> to disk (more might be persisted, but the client can't rely on it).

Exactly.

[...]
> IE I don't actually think the wording from the server side needs changing
> now I see what you are trying to do. Just we need a new paragraph saying
> what the client can and cannot reply on.

That's obviously also a valid option. I'm looking forward to your
proposed wording then :-)

[...]
> >> I suppose that's fine in that we can at least shorten the CC: line,
> >> but I still think it would be helpful if the protocol
> > 
> > unfinished sentence here...
> 
>  but I still think it would be helpful if the protocol helped out
> the end user of the client and refused to negotiate multichannel
> connections when they are unsafe. How is the end client meant to know
> whether the back end is not on Linux, not on a block device, done
> via a Ceph driver etc?

Well, it isn't. The server, if it provides certain functionality, should
also provide particular guarantees. If it can't provide those
guarantees, it should not provide that functionality.

e.g., if a server runs on a backend with cache coherency issues, it
should not allow multiple connections to the same device, etc.

> I still think it's pretty damn awkward that with a ceph back end
> (for instance) which would be one of the backends to benefit the
> most from multichannel connections (as it's inherently parallel),
> no one has explained how flush could be done safely.

If ceph doesn't have any way to guarantee that a write is available to
all readers of a particular device, then it *cannot* be used to map
block device semantics with multiple channels. Therefore, it should not
allow writing to the device from multiple clients, period, unless the
filesystem (or other thing) making use of the nbd device above the ceph
layer actually understands how things may go wrong and can take care of
it.

As such, I don't think that the problems inherent in using 

Re: [PATCH 2/3] zram: support page-based parallel write

2016-10-06 Thread Sergey Senozhatsky
Hello Minchan,

On (10/05/16 11:01), Minchan Kim wrote:
[..]
> 1. just changed ordering of test execution - hope to reduce testing time due 
> to
>block population before the first reading or reading just zero pages
> 2. used sync_on_close instead of direct io
> 3. Don't use perf to avoid noise
> 4. echo 0 > /sys/block/zram0/use_aio to test synchronous IO for old behavior

ok, will use it in the tests below.

> 1. ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=async FIO_LOOPS=2 MAX_ITER=1 
> ./zram-fio-test.sh
> 2. modify script to disable aio via /sys/block/zram0/use_aio
>ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=sync FIO_LOOPS=2 MAX_ITER=1 
> ./zram-fio-test.sh
>
>   seq-write 380930 474325 124.52%
>  rand-write 286183 357469 124.91%
>seq-read 266813 265731  99.59%
>   rand-read 211747 210670  99.49%
>mixed-seq(R) 145750 171232 117.48%
>mixed-seq(W) 145736 171215 117.48%
>   mixed-rand(R) 115355 125239 108.57%
>   mixed-rand(W) 115371 125256 108.57%

no_aio   use_aio

WRITE:  1432.9MB/s   1511.5MB/s
WRITE:  1173.9MB/s   1186.9MB/s
READ:   912699KB/s   912170KB/s
WRITE:  912497KB/s   911968KB/s
READ:   725658KB/s   726747KB/s
READ:   579003KB/s   594543KB/s
READ:   373276KB/s   373719KB/s
WRITE:  373572KB/s   374016KB/s

seconds elapsed45.399702511 44.280199716

> LZO compression is fast and a CPU for queueing while 3 CPU for compressing
> it cannot saturate CPU full bandwidth. Nonetheless, it shows 24% enhancement.
> It could be more in slow CPU like embedded.
> 
> I tested it with deflate. The result is 300% enhancement.
> 
>   seq-write  33598 109882 327.05%
>  rand-write  32815 102293 311.73%
>seq-read 154323 153765  99.64%
>   rand-read 129978 129241  99.43%
>mixed-seq(R)  15887  44995 283.22%
>mixed-seq(W)  15885  44990 283.22%
>   mixed-rand(R)  25074  55491 221.31%
>   mixed-rand(W)  25078  55499 221.31%
>
> So, curious with your test.
> Am my test sync with yours? If you cannot see enhancment in job1, could
> you test with deflate? It seems your CPU is really fast.

interesting observation.

no_aio   use_aio
WRITE:  47882KB/s158931KB/s
WRITE:  47714KB/s156484KB/s
READ:   42914KB/s137997KB/s
WRITE:  42904KB/s137967KB/s
READ:   333764KB/s   332828KB/s
READ:   293883KB/s   294709KB/s
READ:   51243KB/s129701KB/s
WRITE:  51284KB/s129804KB/s

seconds elapsed480.869169882181.678431855

yes, looks like with lzo CPU manages to process bdi writeback fast enough
to keep fio-template-static-buffer worker active.

to prove this theory: direct=1 cures zram-deflate.

no_aio   use_aio
WRITE:  41873KB/s34257KB/s
WRITE:  41455KB/s34087KB/s
READ:   36705KB/s28960KB/s
WRITE:  36697KB/s28954KB/s
READ:   327902KB/s   327270KB/s
READ:   316217KB/s   316886KB/s
READ:   35980KB/s28131KB/s
WRITE:  36008KB/s28153KB/s

seconds elapsed515.575252170629.114626795



as soon as wb flush kworker can't keep up anymore things are going off
the rails. most of the time, fio-template-static-buffer are in D state,
while the biggest bdi flush kworker is doing the job (a lot of job):

  PID USER  PR  NIVIRTRES  %CPU %MEM TIME+ S COMMAND
 6274 root  20   00.0m   0.0m 100.0  0.0   1:15.60 R [kworker/u8:1]
11169 root  20   0  718.1m   1.6m  16.6  0.0   0:01.88 D fio 
././conf/fio-template-static-buffer
11171 root  20   0  718.1m   1.6m   3.3  0.0   0:01.15 D fio 
././conf/fio-template-static-buffer
11170 root  20   0  718.1m   3.3m   2.6  0.1   0:00.98 D fio 
././conf/fio-template-static-buffer


and still working...

 6274 root  20   00.0m   0.0m 100.0  0.0   3:05.49 R [kworker/u8:1]
12048 root  20   0  718.1m   1.6m  16.7  0.0   0:01.80 R fio 
././conf/fio-template-static-buffer
12047 root  20   0  718.1m   1.6m   3.3  0.0   0:01.12 D fio 
././conf/fio-template-static-buffer
12049 root  20   0  718.1m   1.6m   3.3  0.0   0:01.12 D fio 
././conf/fio-template-static-buffer
12050 root  20   0  718.1m   1.6m   2.0  0.0   0:00.98 D fio 
././conf/fio-template-static-buffer

and working...


[ 4159.338731] CPU: 0 PID: 105 Comm: kworker/u8:4
[ 4159.338734] Workqueue: writeback wb_workfn (flush-254:0)
[ 4159.338746]  [] zram_make_request+0x4a3/0x67b [zram]
[ 4159.338748]  [] ? try_to_wake_up+0x201/0x213
[ 4159.338750]  [] ? mempool_alloc+0x5e/0x124
[ 4159.338752]  [] generic_make_request+0xb8/0x156
[ 

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Linus Walleij
On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo  wrote:

> I get that bfq can be a good compromise on most desktop workloads and
> behave reasonably well for some server workloads with the slice
> expiration mechanism but it really isn't an IO resource partitioning
> mechanism.

Not just desktops, also Android phones.

So why not have BFQ as a separate scheduling policy upstream,
alongside CFQ, deadline and noop?

I understand the CPU scheduler people's position that they want
one scheduler for everyone's everyday loads (except RT and
SCHED_DEADLINE) and I guess that is the source of the highlander
"there can be only one" argument, but note this:

kernel/Kconfig.preempt:

config PREEMPT_NONE
bool "No Forced Preemption (Server)"
config PREEMPT_VOLUNTARY
bool "Voluntary Kernel Preemption (Desktop)"
config PREEMPT
bool "Preemptible Kernel (Low-Latency Desktop)"

We're already doing the per-usecase Kconfig thing for preemption.
But maybe somebody already hates that and want to get rid of it,
I don't know.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blockdev kernel regression (bugzilla 173031)

2016-10-06 Thread Francesco Dolcini
On Thu, Oct 06, 2016 at 04:42:52PM +1100, NeilBrown wrote:
cc
> Maybe there is a race, but that seems unlikely.

Consider that just hot removal while writing is not enough to 
reproduce systematically the bug.

while true; do [ ! -f /media/usb/.not_mounted ] \
&& dd if=/dev/zero of=/media/usb/aaa bs=1k \
count=1 2>/dev/null && echo -n '*' ; done

with lazy umount by mdev on USB flash drive removal

reproduce the problem pretty always

> The vfat issue is different, and is only a warning.
Why you say is only a warning? Here the Oops with vfat on ARM/i.MX6:

[  103.493761] Unable to handle kernel paging request at virtual address 
50886000
[  103.500996] pgd = cecec000
[  103.503709] [50886000] *pgd=
[  103.507310] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  103.512626] Modules linked in:
[  103.515707] CPU: 3 PID: 2071 Comm: umount Tainted: GW   
4.1.33-01808-gab8d223 #4
[  103.524150] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  103.530684] task: ce67cc00 ti: cecd8000 task.ti: cecd8000
[  103.536096] PC is at __percpu_counter_add+0x2c/0x104
[  103.541068] LR is at __percpu_counter_add+0x24/0x104
[  103.546044] pc : [<801dcca0>]lr : [<801dcc98>]psr: 200c0093
[  103.546044] sp : cecd9e08  ip :   fp : 
[  103.557525] r10: d1970ba0  r9 : 0001  r8 : 
[  103.562755] r7 :   r6 :   r5 : 0018  r4 : ce411150
[  103.569288] r3 :   r2 : 50886000  r1 : 805eb7d0  r0 : 0003
[  103.575821] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
user
[  103.583049] Control: 10c53c7d  Table: 5ecec04a  DAC: 0015
[  103.588800] Process umount (pid: 2071, stack limit = 0xcecd8210)
[  103.594812] Stack: (0xcecd9e08 to 0xcecda000)
[  103.599180] 9e00:   200c0013 cc024b5c  cc024b5c 
 0001
[  103.607365] 9e20: d1970ba0 8009599c 0018 d1970ba0 00040831 cecd9f00 
 80095bc4
[  103.615550] 9e40: 000e ce06b900 d0f22640 0004 0001  
0002 cecd9e78
[  103.623736] 9e60: 800954d0 cc024b5c e000 03c6 0002  
d1970ba0 d1960620
[  103.631922] 9e80: cecd8000 800b5b90 e000 cc36bee0 a00c0013 0001 
e000 0001
[  103.640107] 9ea0: cecd9eb4 80043108 cecd9f00 cecd9efc 000ac998 cc024b5c 
cecd9f00 
[  103.648293] 9ec0:  8000ebc4 cecd8000  000ac998 80095cf8 
cecd9ed8 cecd9ed8
[  103.656478] 9ee0: cecd9ee0 cecd9ee0 cecd9ee8 cecd9ee8 cc024b5c cc024b5c 
 8008e7cc
[  103.664662] 9f00: 7fff     7fff 
 
[  103.672847] 9f20: ce73b800 804aaf00 0034 8008e868  7fff 
 cc024a90
[  103.681033] 9f40: ce73b800 800e0090 ce73b800 ce73b864 804aaf00 800bcec8 
cc024a00 0083
[  103.689218] 9f60: 806c95a0 800bd184 ce73b800 806aac0c 806c95a0 800bd448 
cebf39c0 
[  103.697404] 9f80: 806c95a0 800d44c4 ce67cc00 8003c6c8 8000ebc4 cecd8000 
cecd9fb0 800116bc
[  103.705589] 9fa0: 011fd408 011fd428 011fd408 8000ea8c  0002 
 
[  103.713774] 9fc0: 011fd408 011fd428 011fd408 0034 0002 011fd438 
011fd408 000ac998
[  103.721959] 9fe0: 76e0a441 7ef6abac 00050fe0 76e0a446 800c0030 011fd428 
 
[  103.730163] [<801dcca0>] (__percpu_counter_add) from [<8009599c>] 
(clear_page_dirty_for_io+0xac/0xd8)
[  103.739401] [<8009599c>] (clear_page_dirty_for_io) from [<80095bc4>] 
(write_cache_pages+0x1fc/0x2f4)
[  103.748550] [<80095bc4>] (write_cache_pages) from [<80095cf8>] 
(generic_writepages+0x3c/0x60)
[  103.757090] [<80095cf8>] (generic_writepages) from [<8008e7cc>] 
(__filemap_fdatawrite_range+0x64/0x6c)
[  103.766412] [<8008e7cc>] (__filemap_fdatawrite_range) from [<8008e868>] 
(filemap_flush+0x24/0x2c)
[  103.775306] [<8008e868>] (filemap_flush) from [<800e0090>] 
(sync_filesystem+0x60/0xa8)
[  103.783240] [<800e0090>] (sync_filesystem) from [<800bcec8>] 
(generic_shutdown_super+0x28/0xd4)
[  103.791953] [<800bcec8>] (generic_shutdown_super) from [<800bd184>] 
(kill_block_super+0x18/0x64)
[  103.800750] [<800bd184>] (kill_block_super) from [<800bd448>] 
(deactivate_locked_super+0x4c/0x7c)
[  103.809638] [<800bd448>] (deactivate_locked_super) from [<800d44c4>] 
(cleanup_mnt+0x4c/0x6c)
[  103.818097] [<800d44c4>] (cleanup_mnt) from [<8003c6c8>] 
(task_work_run+0xb4/0xc8)
[  103.825688] [<8003c6c8>] (task_work_run) from [<800116bc>] 
(do_work_pending+0x90/0xa4)
[  103.833623] [<800116bc>] (do_work_pending) from [<8000ea8c>] 
(work_pending+0xc/0x20)
[  103.841378] Code: e59f00d8 ebfff186 e5943018 ee1d2f90 (e7933002) 
[  103.847477] ---[ end trace 5b641bdc50ddcfe7 ]---
[  103.852101] Kernel panic - not syncing: Fatal exception
[  103.857337] CPU1: stopping
[  103.860059] CPU: 1 PID: 277 Comm: sh Tainted: G  D W   
4.1.33-01808-gab8d223 #4
[  103.868068] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  103.874623] [<80014bc4>] (unwind_backtrace) from [<80011a60>] 

Re: [PATCH v2 1/2] ata: Enabling ATA Command Priorities

2016-10-06 Thread Hannes Reinecke

On 10/05/2016 09:00 PM, Adam Manzanares wrote:

This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates IO_PRIO_CLASS_RT
and also enables request priorities in the block queue then we build a tf
with a high priority command.

This patch depends on patch block-Add-iocontext-priority-to-request

Signed-off-by: Adam Manzanares 
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c | 10 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h| 18 ++
 5 files changed, 68 insertions(+), 3 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html