Re: [PATCH] badblocks: fix overlapping check for clearing
On Tue, Sep 06 2016, Tomasz Majchrzak wrote: > Current bad block clear implementation assumes the range to clear > overlaps with at least one bad block already stored. If given range to > clear precedes first bad block in a list, the first entry is incorrectly > updated. In the original md context, it would only ever be called on a block that was already in the list. But you are right that it is best not to assume this, and to code more safely. > > Check not only if stored block end is past clear block end but also if > stored block start is before clear block end. > > Signed-off-by: Tomasz MajchrzakDan Williams seems to have taken responsibility for this code through his nvdimm tree, so I've added him to 'cc' in the hope that he looks at this (I wonder if he is even on linux-block ) > --- > block/badblocks.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/block/badblocks.c b/block/badblocks.c > index 7be53cb..b2ffcc7 100644 > --- a/block/badblocks.c > +++ b/block/badblocks.c > @@ -354,7 +354,8 @@ int badblocks_clear(struct badblocks *bb, sector_t s, int > sectors) >* current range. Earlier ranges could also overlap, >* but only this one can overlap the end of the range. >*/ > - if (BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > target) { > + if ((BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > target) && > + (BB_OFFSET(p[lo]) <= target)) { hmmm.. 'target' is the sector just beyond the set of sectors to remove from the list. BB_OFFSET(p[lo]) is the first sector in a range that was found in the list. If these are equal, then are aren't clearing anything in this range. So I would have '<', not '<='. I don't think this makes the code wrong as we end up assigning to p[lo] the value that is already there. But it might be confusing. > /* Partial overlap, leave the tail of this range */ > int ack = BB_ACK(p[lo]); > sector_t a = BB_OFFSET(p[lo]); > @@ -377,7 +378,8 @@ int badblocks_clear(struct badblocks *bb, sector_t s, int > sectors) > lo--; > } > while (lo >= 0 && > -BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > s) { > +(BB_OFFSET(p[lo]) + BB_LEN(p[lo]) > s) && > +(BB_OFFSET(p[lo]) <= target)) { Ditto. But the code is, I think, correct. Just not how I would have written it. So Acked-by: NeilBrown Thanks, NeilBrown > /* This range does overlap */ > if (BB_OFFSET(p[lo]) < s) { > /* Keep the early parts of this range. */ > -- > 1.8.3.1 signature.asc Description: PGP signature
loop mount: kernel BUG at lib/percpu-refcount.c:231
Hi, Below bug happened to me while loop mount a file image after stopping a kvm guest. But it only happend once til now.. [ 4761.031686] [ cut here ] [ 4761.075984] kernel BUG at lib/percpu-refcount.c:231! [ 4761.120184] invalid opcode: [#1] SMP [ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd grace sunrpc [ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168 [ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 ) 12/04/2015 [ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000 [ 4761.490120] RIP: 0010:[] [] __percpu_ref_switch_to_percpu+0xf8/0x100 [ 4761.548138] RSP: 0018:a8584252bb38 EFLAGS: 00010246 [ 4761.604673] RAX: RBX: 986fbdca3200 RCX: [ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958 [ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008 [ 4761.779270] R10: R11: R12: [ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200 [ 4761.895870] FS: 7fc286eb4640() GS:98711f24() knlGS: [ 4761.954596] CS: 0010 DS: ES: CR0: 80050033 [ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0 [ 4762.072454] Stack: [ 4762.131283] 9870f2f37800 9870c8e46000 9870fa22c880 a8584252bbb8 [ 4762.190776] ae2a147c ba169577 986fbdca3200 9870fa22c870 [ 4762.251149] 9870fa22c800 a8584252bb90 ae2b3294 a8584252bbc8 [ 4762.311657] Call Trace: [ 4762.371157] [] ? kobject_uevent_env+0xfc/0x3b0 [ 4762.431483] [] percpu_ref_switch_to_percpu+0x14/0x20 [ 4762.492093] [] blk_register_queue+0xbe/0x120 [ 4762.552727] [] device_add_disk+0x1c4/0x470 [ 4762.614155] [] loop_add+0x1d9/0x260 [loop] [ 4762.674042] [] loop_init+0x119/0x16c [loop] [ 4762.733949] [] ? 0xc02ff000 [ 4762.793563] [] do_one_initcall+0x4b/0x180 [ 4762.853068] [] ? free_vmap_area_noflush+0x43/0xb0 [ 4762.913665] [] do_init_module+0x55/0x1c4 [ 4762.973400] [] load_module+0x1fc4/0x23e0 [ 4763.033545] [] ? __symbol_put+0x60/0x60 [ 4763.094281] [] SYSC_init_module+0x138/0x150 [ 4763.154985] [] SyS_init_module+0x9/0x10 [ 4763.215577] [] entry_SYSCALL_64_fastpath+0x1e/0xad [ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 [ 4763.342964] RIP [] __percpu_ref_switch_to_percpu+0xf8/0x100 [ 4763.407151] RSP Thanks Dave -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: blockdev kernel regression (bugzilla 173031)
On Thu, Oct 06 2016, Francesco Dolcini wrote: > On Thu, Oct 06, 2016 at 04:42:52PM +1100, NeilBrown wrote: > cc >> Maybe there is a race, but that seems unlikely. > > Consider that just hot removal while writing is not enough to > reproduce systematically the bug. > > while true; do [ ! -f /media/usb/.not_mounted ] \ > && dd if=/dev/zero of=/media/usb/aaa bs=1k \ > count=1 2>/dev/null && echo -n '*' ; done > > with lazy umount by mdev on USB flash drive removal > > reproduce the problem pretty always > >> The vfat issue is different, and is only a warning. > Why you say is only a warning? Here the Oops with vfat on ARM/i.MX6: I looked at: x86 Oops information (with vfat): in the bugzilla and didn't realized there was another one further down. That first vfat on is just a warning. > >> > Regression is on commit 6cd18e7 ("block: destroy bdi before blockdev is >> > unregistered.") >> > >> > Commit: bdfe0cbd746a ("Revert "ext4: remove block_device_ejected") is >> > already present on 4.1 stable I am currently working on (2a6f417 on 4.1 >> > branch) >> > >> > I wonder if commit b02176f ("block: don't release bdi while request_queue >> > has live references") is the correct fix for this also in kernel 4.1. >> >> Maybe. It is worth a try. >> >> Below is a a backport to 4.1.33. It compiles, but I haven't tested. >> If it works for you, I can recommend it for -stable. > > I confirm that it works! Thanks. NeilBrown signature.asc Description: PGP signature
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 21:57, Shaohua Liha scritto: > > On Thu, Oct 06, 2016 at 09:58:44AM +0200, Paolo Valente wrote: >> >>> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li ha scritto: >>> >>> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha > scritto: > > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: >> Hello, Paolo. >> >> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: >>> In this respect, for your generic, unpredictable scenario to make >>> sense, there must exist at least one real system that meets the >>> requirements of such a scenario. Or, if such a real system does not >>> yet exist, it must be possible to emulate it. If it is impossible to >>> achieve this last goal either, then I miss the usefulness >>> of looking for solutions for such a scenario. >>> >>> That said, let's define the instance(s) of the scenario that you find >>> most representative, and let's test BFQ on it/them. Numbers will give >>> us the answers. For example, what about all or part of the following >>> groups: >>> . one cyclically doing random I/O for some second and then sequential >>> I/O >>> for the next seconds >>> . one doing, say, quasi-sequential I/O in ON/OFF cycles >>> . one starting an application cyclically >>> . one playing back or streaming a movie >>> >>> For each group, we could then measure the time needed to complete each >>> phase of I/O in each cycle, plus the responsiveness in the group >>> starting an application, plus the frame drop in the group streaming >>> the movie. In addition, we can measure the bandwidth/iops enjoyed by >>> each group, plus, of course, the aggregate throughput of the whole >>> system. In particular we could compare results with throttling, BFQ, >>> and CFQ. >>> >>> Then we could write resulting numbers on the stone, and stick to them >>> until something proves them wrong. >>> >>> What do you (or others) think about it? >> >> That sounds great and yeah it's lame that we didn't start with that. >> Shaohua, would it be difficult to compare how bfq performs against >> blk-throttle? > > I had a test of BFQ. Thank you very much for testing BFQ! > I'm using BFQ found at > https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78= > . version is > 4.7.0-v8r3. That's the latest stable version. The development version [1] already contains further improvements for fairness, latency and throughput. It is however still a release candidate. [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 > It's a LSI SSD, queue depth 32. I use default setting. fio script > is: > > [global] > ioengine=libaio > direct=1 > readwrite=randread > bs=4k > runtime=60 > time_based=1 > file_service_type=random:36 > overwrite=1 > thread=0 > group_reporting=1 > filename=/dev/sdb > iodepth=1 > numjobs=8 > > [groupA] > prio=2 > > [groupB] > new_group > prio=6 > > I'll change iodepth, numjobs and prio in different tests. result unit is > MB/s. > > iodepth=1 numjobs=1 prio 4:4 > CFQ: 28:28 BFQ: 21:21 deadline: 29:29 > > iodepth=8 numjobs=1 prio 4:4 > CFQ: 162:162 BFQ: 102:98 deadline: 205:205 > > iodepth=1 numjobs=8 prio 4:4 > CFQ: 157:157 BFQ: 81:92 deadline: 196:197 > > iodepth=1 numjobs=1 prio 2:6 > CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 > > iodepth=8 numjobs=1 prio 2:6 > CFQ: 166:174 BFQ: 139:72 deadline: 202:202 > > iodepth=1 numjobs=8 prio 2:6 > CFQ: 148:150 BFQ: 90:77 deadline: 198:197 > > CFQ isn't fair at all. BFQ is very good in this side, but has poor > throughput > even prio is the default value. > Throughput is lower with BFQ for two reasons. First, you certainly left the low_latency in its default state, i.e., on. As explained, e.g., here [2], low_latency mode is totally geared towards maximum responsiveness and minimum latency for soft real-time applications (e.g., video players). To achieve this goal, BFQ is willing to perform more idling, when necessary. This lowers throughput (I'll get back on this at the end of the discussion of the second reason). >>> >>> changing low_latency to 0 seems not change anything, at least for the test: >>> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k >>> The second, most
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 20:32, Vivek Goyalha > scritto: > > On Thu, Oct 06, 2016 at 08:01:42PM +0200, Paolo Valente wrote: >> >>> Il giorno 06 ott 2016, alle ore 19:49, Vivek Goyal ha >>> scritto: >>> >>> On Thu, Oct 06, 2016 at 03:15:50PM +0200, Paolo Valente wrote: >>> >>> [..] Shaohua, I have just realized that I have unconsciously defended a wrong argument. Although all the facts that I have reported are evidently true, I have argued as if the question was: "do we need to throw away throttling because there is proportional, or do we need to throw away proportional share because there is throttling?". This question is simply wrong, as I think consciously (sorry for my dissociated behavior :) ). >>> >>> I was wondering about the same. We need both and both should be able >>> to work with fast devices of today using blk-mq interfaces without >>> much overhead. >>> The best goal to achieve is to have both a good throttling mechanism, and a good proportional share scheduler. This goal would be valid if even if there was just one important scenario for each of the two approaches. The vulnus here is that you guys are constantly, and rightly, working on solutions to achieve and consolidate reasonable QoS guarantees, but an apparently very good proportional-share scheduler has been kept off for years. If you (or others) have good arguments to support this state of affairs, then this would probably be an important point to discuss. >>> >>> Paolo, CFQ is legacy now and if we can come up with a proportional >>> IO mechanism which works reasonably well with fast devices using >>> blk-mq interfaces, that will be much more interesting. >>> >> >> That's absolutely true. But, why do we pretend not to know that, for >> (at least) hundreds of thousands of users Linux will go on giving bad >> responsiveness, starvation, high latency and unfairness, until blk >> will not be used any more (assuming that these problems will somehow >> disappear will blk-mq). Many of these users are fully aware of these >> Linux long-standing problems. We could solve these problems by just >> adding a scheduler that has already been adopted, and thus extensively >> tested, by thousands of users. And more and more people are aware of >> this fact too. Are we doing the right thing? > > Hi Paolo, > Hi > People have been using CFQ for many years. Yes, but allow me just to add that a lot of people have also been unhappy with CFQ for many years. > I am not sure if benefits > offered by BFQ over CFQ are significant enough to justify taking a > completely new code and get rid of CFQ. Or are the benfits significant > enough that one feels like putting time and effort into this and > take chances wiht new code. > Although I think that BFQ's benefits are relevant (but I'm a little bit an interested party :) ), I do agree that abruptly replacing the most used I/O scheduler (AFAIK) with a so different one is at least a little risky. > At this point of time replacing CFQ with something better is not a > priority for me. ok > But if something better and stable goes upstream, I > will gladly use it. > Then, in case of success, I will be glad to receive some feedback from you, and possibly use it to improve the set of ideas that we have put into BFQ. Thank you, Paolo > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blk-mq: Return invalid cookie if bio was split
Hi Keith, On Thu, Oct 6, 2016 at 12:51 AM, Keith Buschwrote: > On Wed, Oct 05, 2016 at 11:19:39AM +0800, Ming Lei wrote: >> But .poll() need to check if the specific request is completed or not, >> then blk_poll() can set 'current' as RUNNING if it is completed. >> >> blk_poll(): >> ... >> ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie)); >> if (ret > 0) { >> hctx->poll_success++; >> set_current_state(TASK_RUNNING); >> return true; >> } > > > Right, but the task could be waiting on a whole lot more than just that > one tag, so setting the task to running before knowing those all complete > doesn't sound right. > >> I am glad to take a look the patch if you post it out. > > Here's what I got for block + nvme. It relies on all the requests to > complete to set the task back to running. Yeah, but your patch doesn't add that change, and looks 'task_struct *' in submission path need to be stored in request or somewhere else. There are some issues with current polling approach: - in dio, one dio may include lots of bios, but only the last submitted bio is polled - one bio can be splitted into several bios, but submit_bio() only returns one cookie for polling Looks your approach via polling current state can fix this issue. > > --- > diff --git a/block/blk-core.c b/block/blk-core.c > index b993f88..3c1cfbf 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -3342,6 +3342,8 @@ EXPORT_SYMBOL(blk_finish_plug); > bool blk_poll(struct request_queue *q, blk_qc_t cookie) > { > struct blk_plug *plug; > + struct blk_mq_hw_ctx *hctx; > + unsigned int queue_num; > long state; > > if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) || > @@ -3353,27 +3355,15 @@ bool blk_poll(struct request_queue *q, blk_qc_t > cookie) > blk_flush_plug_list(plug, false); > > state = current->state; > + queue_num = blk_qc_t_to_queue_num(cookie); > + hctx = q->queue_hw_ctx[queue_num]; > while (!need_resched()) { > - unsigned int queue_num = blk_qc_t_to_queue_num(cookie); > - struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num]; > - int ret; > - > - hctx->poll_invoked++; > - > - ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie)); > - if (ret > 0) { > - hctx->poll_success++; > - set_current_state(TASK_RUNNING); > - return true; > - } > + q->mq_ops->poll(hctx); > > if (signal_pending_state(state, current)) > set_current_state(TASK_RUNNING); > - > if (current->state == TASK_RUNNING) > return true; > - if (ret < 0) > - break; > cpu_relax(); > } Then looks the whole hw queue is polled and only the queue num in cookie matters. In theory, there might be one race: - one dio need to submit several bios(suppose there are two bios: A and B) - A is submitted to hardware queue M - B is submitted to hardware queue N because the current task is migrated to another CPU - then only hardware queue N is polled > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index befac5b..2e359e0 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -649,7 +651,7 @@ static inline bool nvme_cqe_valid(struct nvme_queue > *nvmeq, u16 head, > return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase; > } > > -static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) > +static void nvme_process_cq(struct nvme_queue *nvmeq) > { > u16 head, phase; > > @@ -665,9 +667,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, > unsigned int *tag) > phase = !phase; > } > > - if (tag && *tag == cqe.command_id) > - *tag = -1; > - > if (unlikely(cqe.command_id >= nvmeq->q_depth)) { > dev_warn(nvmeq->dev->ctrl.device, > "invalid id %d completed on queue %d\n", > @@ -711,11 +710,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, > unsigned int *tag) > nvmeq->cqe_seen = 1; > } > > -static void nvme_process_cq(struct nvme_queue *nvmeq) > -{ > - __nvme_process_cq(nvmeq, NULL); > -} > - > static irqreturn_t nvme_irq(int irq, void *data) > { > irqreturn_t result; > @@ -736,20 +730,15 @@ static irqreturn_t nvme_irq_check(int irq, void *data) > return IRQ_NONE; > } > > -static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag) > +static void nvme_poll(struct blk_mq_hw_ctx *hctx) > { > struct nvme_queue *nvmeq =
Re: [Nbd] [PATCH][V3] nbd: add multi-connection support
On Thu, Oct 06, 2016 at 06:16:30AM -0700, Christoph Hellwig wrote: > On Thu, Oct 06, 2016 at 03:09:49PM +0200, Wouter Verhelst wrote: > > Okay, I've updated the proto.md file then, to clarify that in the case > > of multiple connections, a client MUST NOT send a flush request until it > > has seen the replies to the write requests that it cares about. That > > should be enough for now. > > How do you guarantee that nothing has been reordered or even lost even for > a single connection? In the case of a single connection, we already stated that the flush covers the write requests for which a reply has already been sent out by the time the flush reply is sent out. On a single connection, there is no way an implementation can comply with the old requirement but not the new one. We do not guarantee any ordering beyond that; and lost requests would be a bug in the server. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On 2016-10-06 08:50, Paolo Valente wrote: Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarnha scritto: On 2016-10-06 07:03, Mark Brown wrote: On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo wrote: I get that bfq can be a good compromise on most desktop workloads and behave reasonably well for some server workloads with the slice expiration mechanism but it really isn't an IO resource partitioning mechanism. Not just desktops, also Android phones. So why not have BFQ as a separate scheduling policy upstream, alongside CFQ, deadline and noop? Right. We're already doing the per-usecase Kconfig thing for preemption. But maybe somebody already hates that and want to get rid of it, I don't know. Hannes also suggested going back to making BFQ a separate scheduler rather than replacing CFQ earlier, pointing out that it mitigates against the risks of changing CFQ substantially at this point (which seems to be the biggest issue here). ISTR that the original argument for this approach essentially amounted to: 'If it's so much better, why do we need both?'. Such an argument is valid only if the new design is better in all respects (which there isn't sufficient information to decide in this case), or the negative aspects are worth the improvements (which is too workload specific to decide for something like this). All correct, apart from the workload-specific issue, which is not very clear to me. Over the last five years I have not found a single workload for which CFQ is better than BFQ, and none has been suggested. My point is that whether or not BFQ is better depends on the workload. You can't test for every workload, so you can't say definitively that BFQ is better for every workload. At a minimum, there are workloads where the deadline and noop schedulers are better, but they're very domain specific workloads. Based on the numbers from Shaohua, it looks like CFQ has better throughput than BFQ, and that will affect some workloads (for most, the improved fairness is worth the reduced throughput, but there probably are some cases where it isn't). Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, and thus replace, CFQ" I agree that that's part of the issue, but I also don't entirely agree with the reasoning on it. Until blk-mq has proper I/O scheduling, people will continue to use CFQ, and based on the way things are going, it will be multiple months before that happens, whereas BFQ exists and is working now. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 09:58, Paolo Valente >ha scritto: > >> >> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li ha scritto: >> >> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: >>> Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha scritto: On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > Hello, Paolo. > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: >> In this respect, for your generic, unpredictable scenario to make >> sense, there must exist at least one real system that meets the >> requirements of such a scenario. Or, if such a real system does not >> yet exist, it must be possible to emulate it. If it is impossible to >> achieve this last goal either, then I miss the usefulness >> of looking for solutions for such a scenario. >> >> That said, let's define the instance(s) of the scenario that you find >> most representative, and let's test BFQ on it/them. Numbers will give >> us the answers. For example, what about all or part of the following >> groups: >> . one cyclically doing random I/O for some second and then sequential I/O >> for the next seconds >> . one doing, say, quasi-sequential I/O in ON/OFF cycles >> . one starting an application cyclically >> . one playing back or streaming a movie >> >> For each group, we could then measure the time needed to complete each >> phase of I/O in each cycle, plus the responsiveness in the group >> starting an application, plus the frame drop in the group streaming >> the movie. In addition, we can measure the bandwidth/iops enjoyed by >> each group, plus, of course, the aggregate throughput of the whole >> system. In particular we could compare results with throttling, BFQ, >> and CFQ. >> >> Then we could write resulting numbers on the stone, and stick to them >> until something proves them wrong. >> >> What do you (or others) think about it? > > That sounds great and yeah it's lame that we didn't start with that. > Shaohua, would it be difficult to compare how bfq performs against > blk-throttle? I had a test of BFQ. >>> >>> Thank you very much for testing BFQ! >>> I'm using BFQ found at https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78= . version is 4.7.0-v8r3. >>> >>> That's the latest stable version. The development version [1] already >>> contains further improvements for fairness, latency and throughput. >>> It is however still a release candidate. >>> >>> [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 >>> It's a LSI SSD, queue depth 32. I use default setting. fio script is: [global] ioengine=libaio direct=1 readwrite=randread bs=4k runtime=60 time_based=1 file_service_type=random:36 overwrite=1 thread=0 group_reporting=1 filename=/dev/sdb iodepth=1 numjobs=8 [groupA] prio=2 [groupB] new_group prio=6 I'll change iodepth, numjobs and prio in different tests. result unit is MB/s. iodepth=1 numjobs=1 prio 4:4 CFQ: 28:28 BFQ: 21:21 deadline: 29:29 iodepth=8 numjobs=1 prio 4:4 CFQ: 162:162 BFQ: 102:98 deadline: 205:205 iodepth=1 numjobs=8 prio 4:4 CFQ: 157:157 BFQ: 81:92 deadline: 196:197 iodepth=1 numjobs=1 prio 2:6 CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 iodepth=8 numjobs=1 prio 2:6 CFQ: 166:174 BFQ: 139:72 deadline: 202:202 iodepth=1 numjobs=8 prio 2:6 CFQ: 148:150 BFQ: 90:77 deadline: 198:197 CFQ isn't fair at all. BFQ is very good in this side, but has poor throughput even prio is the default value. >>> >>> Throughput is lower with BFQ for two reasons. >>> >>> First, you certainly left the low_latency in its default state, i.e., >>> on. As explained, e.g., here [2], low_latency mode is totally geared >>> towards maximum responsiveness and minimum latency for soft real-time >>> applications (e.g., video players). To achieve this goal, BFQ is >>> willing to perform more idling, when necessary. This lowers >>> throughput (I'll get back on this at the end of the discussion of the >>> second reason). >> >> changing low_latency to 0 seems not change anything, at least for the test: >> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k >> >>> The second, most important reason, is that a minimum of idling is the >>> *only* way to achieve differentiated bandwidth distribution, as you >>> requested by setting different ioprios. I stress that this
Re: [Nbd] [PATCH][V3] nbd: add multi-connection support
On Thu, Oct 06, 2016 at 03:09:49PM +0200, Wouter Verhelst wrote: > Okay, I've updated the proto.md file then, to clarify that in the case > of multiple connections, a client MUST NOT send a flush request until it > has seen the replies to the write requests that it cares about. That > should be enough for now. How do you guarantee that nothing has been reordered or even lost even for a single connection? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarn >ha scritto: > > On 2016-10-06 07:03, Mark Brown wrote: >> On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: >>> On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo wrote: >> I get that bfq can be a good compromise on most desktop workloads and behave reasonably well for some server workloads with the slice expiration mechanism but it really isn't an IO resource partitioning mechanism. >> >>> Not just desktops, also Android phones. >> >>> So why not have BFQ as a separate scheduling policy upstream, >>> alongside CFQ, deadline and noop? >> >> Right. >> >>> We're already doing the per-usecase Kconfig thing for preemption. >>> But maybe somebody already hates that and want to get rid of it, >>> I don't know. >> >> Hannes also suggested going back to making BFQ a separate scheduler >> rather than replacing CFQ earlier, pointing out that it mitigates >> against the risks of changing CFQ substantially at this point (which >> seems to be the biggest issue here). >> > ISTR that the original argument for this approach essentially amounted to: > 'If it's so much better, why do we need both?'. > > Such an argument is valid only if the new design is better in all respects > (which there isn't sufficient information to decide in this case), or the > negative aspects are worth the improvements (which is too workload specific > to decide for something like this). All correct, apart from the workload-specific issue, which is not very clear to me. Over the last five years I have not found a single workload for which CFQ is better than BFQ, and none has been suggested. Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, and thus replace, CFQ" Thanks, Paolo -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: > On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heowrote: > > I get that bfq can be a good compromise on most desktop workloads and > > behave reasonably well for some server workloads with the slice > > expiration mechanism but it really isn't an IO resource partitioning > > mechanism. > Not just desktops, also Android phones. > So why not have BFQ as a separate scheduling policy upstream, > alongside CFQ, deadline and noop? Right. > We're already doing the per-usecase Kconfig thing for preemption. > But maybe somebody already hates that and want to get rid of it, > I don't know. Hannes also suggested going back to making BFQ a separate scheduler rather than replacing CFQ earlier, pointing out that it mitigates against the risks of changing CFQ substantially at this point (which seems to be the biggest issue here). signature.asc Description: PGP signature
Re: [Nbd] [PATCH][V3] nbd: add multi-connection support
On Thu, Oct 06, 2016 at 10:41:36AM +0100, Alex Bligh wrote: > Wouter, [...] > > Given that, given the issue in the previous > > paragraph, and given the uncertainty introduced with multiple > > connections, I think it is reasonable to say that a client should just > > not assume a flush touches anything except for the writes for which it > > has already received a reply by the time the flush request is sent out. > > OK. So you are proposing weakening the semantic for flush (saying that > it is only guaranteed to cover those writes for which the client has > actually received a reply prior to sending the flush, as opposed to > prior to receiving the flush reply). This is based on the view that > the Linux kernel client wouldn't be affected, and if other clients > were affected, their behaviour would be 'somewhat unusual'. Right. > We do have one significant other client out there that uses flush > which is Qemu. I think we should get a view on whether they would be > affected. That's certainly something to consider, yes. > > Those are semantics that are actually useful and can be guaranteed in > > the face of multiple connections. Other semantics can not. > > Well there is another semantic which would work just fine, and also > cures the other problem (synchronisation between channels) which would > be simply that flush is only guaranteed to affect writes issued on the > same channel. Then flush would do the natural thing, i.e. flush > all the writes that had been done *on that channel*. That is an option, yes, but the natural result will be that you issue N flush requests, rather than one, which I'm guessing will kill performance. Therefore, I'd prefer not to go down that route. [...] > > It is indeed impossible for a server to know what has been received by > > the client by the time it (the client) sent out the flush request. > > However, the server doesn't need that information, at all. The flush > > request's semantics do not say that any request not covered by the flush > > request itself MUST NOT have hit disk; instead, it just says that there > > is no guarantee on whether or not that is the case. That's fine; all a > > server needs to know is that when it receives a flush, it needs to > > fsync() or some such, and then send the reply. All a *client* needs to > > know is which requests have most definitely hit the disk. In my > > proposal, those are the requests that finished before the flush request > > was sent, and not the requests that finished between that and when the > > flush reply is received. Those are *likely* to also be covered > > (especially on single-connection NBD setups), but in my proposal, > > they're no longer *guaranteed* to be. > > I think my objection was more that you were writing mandatory language > for a server's behaviour based on what the client perceives. > > What you are saying from the client's point of view is that it under > your proposed change it can only rely on that writes in respect of > which the reply has been received prior to issuing the flush are persisted > to disk (more might be persisted, but the client can't rely on it). Exactly. [...] > IE I don't actually think the wording from the server side needs changing > now I see what you are trying to do. Just we need a new paragraph saying > what the client can and cannot reply on. That's obviously also a valid option. I'm looking forward to your proposed wording then :-) [...] > >> I suppose that's fine in that we can at least shorten the CC: line, > >> but I still think it would be helpful if the protocol > > > > unfinished sentence here... > > but I still think it would be helpful if the protocol helped out > the end user of the client and refused to negotiate multichannel > connections when they are unsafe. How is the end client meant to know > whether the back end is not on Linux, not on a block device, done > via a Ceph driver etc? Well, it isn't. The server, if it provides certain functionality, should also provide particular guarantees. If it can't provide those guarantees, it should not provide that functionality. e.g., if a server runs on a backend with cache coherency issues, it should not allow multiple connections to the same device, etc. > I still think it's pretty damn awkward that with a ceph back end > (for instance) which would be one of the backends to benefit the > most from multichannel connections (as it's inherently parallel), > no one has explained how flush could be done safely. If ceph doesn't have any way to guarantee that a write is available to all readers of a particular device, then it *cannot* be used to map block device semantics with multiple channels. Therefore, it should not allow writing to the device from multiple clients, period, unless the filesystem (or other thing) making use of the nbd device above the ceph layer actually understands how things may go wrong and can take care of it. As such, I don't think that the problems inherent in using
Re: [PATCH 2/3] zram: support page-based parallel write
Hello Minchan, On (10/05/16 11:01), Minchan Kim wrote: [..] > 1. just changed ordering of test execution - hope to reduce testing time due > to >block population before the first reading or reading just zero pages > 2. used sync_on_close instead of direct io > 3. Don't use perf to avoid noise > 4. echo 0 > /sys/block/zram0/use_aio to test synchronous IO for old behavior ok, will use it in the tests below. > 1. ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=async FIO_LOOPS=2 MAX_ITER=1 > ./zram-fio-test.sh > 2. modify script to disable aio via /sys/block/zram0/use_aio >ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=sync FIO_LOOPS=2 MAX_ITER=1 > ./zram-fio-test.sh > > seq-write 380930 474325 124.52% > rand-write 286183 357469 124.91% >seq-read 266813 265731 99.59% > rand-read 211747 210670 99.49% >mixed-seq(R) 145750 171232 117.48% >mixed-seq(W) 145736 171215 117.48% > mixed-rand(R) 115355 125239 108.57% > mixed-rand(W) 115371 125256 108.57% no_aio use_aio WRITE: 1432.9MB/s 1511.5MB/s WRITE: 1173.9MB/s 1186.9MB/s READ: 912699KB/s 912170KB/s WRITE: 912497KB/s 911968KB/s READ: 725658KB/s 726747KB/s READ: 579003KB/s 594543KB/s READ: 373276KB/s 373719KB/s WRITE: 373572KB/s 374016KB/s seconds elapsed45.399702511 44.280199716 > LZO compression is fast and a CPU for queueing while 3 CPU for compressing > it cannot saturate CPU full bandwidth. Nonetheless, it shows 24% enhancement. > It could be more in slow CPU like embedded. > > I tested it with deflate. The result is 300% enhancement. > > seq-write 33598 109882 327.05% > rand-write 32815 102293 311.73% >seq-read 154323 153765 99.64% > rand-read 129978 129241 99.43% >mixed-seq(R) 15887 44995 283.22% >mixed-seq(W) 15885 44990 283.22% > mixed-rand(R) 25074 55491 221.31% > mixed-rand(W) 25078 55499 221.31% > > So, curious with your test. > Am my test sync with yours? If you cannot see enhancment in job1, could > you test with deflate? It seems your CPU is really fast. interesting observation. no_aio use_aio WRITE: 47882KB/s158931KB/s WRITE: 47714KB/s156484KB/s READ: 42914KB/s137997KB/s WRITE: 42904KB/s137967KB/s READ: 333764KB/s 332828KB/s READ: 293883KB/s 294709KB/s READ: 51243KB/s129701KB/s WRITE: 51284KB/s129804KB/s seconds elapsed480.869169882181.678431855 yes, looks like with lzo CPU manages to process bdi writeback fast enough to keep fio-template-static-buffer worker active. to prove this theory: direct=1 cures zram-deflate. no_aio use_aio WRITE: 41873KB/s34257KB/s WRITE: 41455KB/s34087KB/s READ: 36705KB/s28960KB/s WRITE: 36697KB/s28954KB/s READ: 327902KB/s 327270KB/s READ: 316217KB/s 316886KB/s READ: 35980KB/s28131KB/s WRITE: 36008KB/s28153KB/s seconds elapsed515.575252170629.114626795 as soon as wb flush kworker can't keep up anymore things are going off the rails. most of the time, fio-template-static-buffer are in D state, while the biggest bdi flush kworker is doing the job (a lot of job): PID USER PR NIVIRTRES %CPU %MEM TIME+ S COMMAND 6274 root 20 00.0m 0.0m 100.0 0.0 1:15.60 R [kworker/u8:1] 11169 root 20 0 718.1m 1.6m 16.6 0.0 0:01.88 D fio ././conf/fio-template-static-buffer 11171 root 20 0 718.1m 1.6m 3.3 0.0 0:01.15 D fio ././conf/fio-template-static-buffer 11170 root 20 0 718.1m 3.3m 2.6 0.1 0:00.98 D fio ././conf/fio-template-static-buffer and still working... 6274 root 20 00.0m 0.0m 100.0 0.0 3:05.49 R [kworker/u8:1] 12048 root 20 0 718.1m 1.6m 16.7 0.0 0:01.80 R fio ././conf/fio-template-static-buffer 12047 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer 12049 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer 12050 root 20 0 718.1m 1.6m 2.0 0.0 0:00.98 D fio ././conf/fio-template-static-buffer and working... [ 4159.338731] CPU: 0 PID: 105 Comm: kworker/u8:4 [ 4159.338734] Workqueue: writeback wb_workfn (flush-254:0) [ 4159.338746] [] zram_make_request+0x4a3/0x67b [zram] [ 4159.338748] [] ? try_to_wake_up+0x201/0x213 [ 4159.338750] [] ? mempool_alloc+0x5e/0x124 [ 4159.338752] [] generic_make_request+0xb8/0x156 [
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heowrote: > I get that bfq can be a good compromise on most desktop workloads and > behave reasonably well for some server workloads with the slice > expiration mechanism but it really isn't an IO resource partitioning > mechanism. Not just desktops, also Android phones. So why not have BFQ as a separate scheduling policy upstream, alongside CFQ, deadline and noop? I understand the CPU scheduler people's position that they want one scheduler for everyone's everyday loads (except RT and SCHED_DEADLINE) and I guess that is the source of the highlander "there can be only one" argument, but note this: kernel/Kconfig.preempt: config PREEMPT_NONE bool "No Forced Preemption (Server)" config PREEMPT_VOLUNTARY bool "Voluntary Kernel Preemption (Desktop)" config PREEMPT bool "Preemptible Kernel (Low-Latency Desktop)" We're already doing the per-usecase Kconfig thing for preemption. But maybe somebody already hates that and want to get rid of it, I don't know. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: blockdev kernel regression (bugzilla 173031)
On Thu, Oct 06, 2016 at 04:42:52PM +1100, NeilBrown wrote: cc > Maybe there is a race, but that seems unlikely. Consider that just hot removal while writing is not enough to reproduce systematically the bug. while true; do [ ! -f /media/usb/.not_mounted ] \ && dd if=/dev/zero of=/media/usb/aaa bs=1k \ count=1 2>/dev/null && echo -n '*' ; done with lazy umount by mdev on USB flash drive removal reproduce the problem pretty always > The vfat issue is different, and is only a warning. Why you say is only a warning? Here the Oops with vfat on ARM/i.MX6: [ 103.493761] Unable to handle kernel paging request at virtual address 50886000 [ 103.500996] pgd = cecec000 [ 103.503709] [50886000] *pgd= [ 103.507310] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 103.512626] Modules linked in: [ 103.515707] CPU: 3 PID: 2071 Comm: umount Tainted: GW 4.1.33-01808-gab8d223 #4 [ 103.524150] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 103.530684] task: ce67cc00 ti: cecd8000 task.ti: cecd8000 [ 103.536096] PC is at __percpu_counter_add+0x2c/0x104 [ 103.541068] LR is at __percpu_counter_add+0x24/0x104 [ 103.546044] pc : [<801dcca0>]lr : [<801dcc98>]psr: 200c0093 [ 103.546044] sp : cecd9e08 ip : fp : [ 103.557525] r10: d1970ba0 r9 : 0001 r8 : [ 103.562755] r7 : r6 : r5 : 0018 r4 : ce411150 [ 103.569288] r3 : r2 : 50886000 r1 : 805eb7d0 r0 : 0003 [ 103.575821] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 103.583049] Control: 10c53c7d Table: 5ecec04a DAC: 0015 [ 103.588800] Process umount (pid: 2071, stack limit = 0xcecd8210) [ 103.594812] Stack: (0xcecd9e08 to 0xcecda000) [ 103.599180] 9e00: 200c0013 cc024b5c cc024b5c 0001 [ 103.607365] 9e20: d1970ba0 8009599c 0018 d1970ba0 00040831 cecd9f00 80095bc4 [ 103.615550] 9e40: 000e ce06b900 d0f22640 0004 0001 0002 cecd9e78 [ 103.623736] 9e60: 800954d0 cc024b5c e000 03c6 0002 d1970ba0 d1960620 [ 103.631922] 9e80: cecd8000 800b5b90 e000 cc36bee0 a00c0013 0001 e000 0001 [ 103.640107] 9ea0: cecd9eb4 80043108 cecd9f00 cecd9efc 000ac998 cc024b5c cecd9f00 [ 103.648293] 9ec0: 8000ebc4 cecd8000 000ac998 80095cf8 cecd9ed8 cecd9ed8 [ 103.656478] 9ee0: cecd9ee0 cecd9ee0 cecd9ee8 cecd9ee8 cc024b5c cc024b5c 8008e7cc [ 103.664662] 9f00: 7fff 7fff [ 103.672847] 9f20: ce73b800 804aaf00 0034 8008e868 7fff cc024a90 [ 103.681033] 9f40: ce73b800 800e0090 ce73b800 ce73b864 804aaf00 800bcec8 cc024a00 0083 [ 103.689218] 9f60: 806c95a0 800bd184 ce73b800 806aac0c 806c95a0 800bd448 cebf39c0 [ 103.697404] 9f80: 806c95a0 800d44c4 ce67cc00 8003c6c8 8000ebc4 cecd8000 cecd9fb0 800116bc [ 103.705589] 9fa0: 011fd408 011fd428 011fd408 8000ea8c 0002 [ 103.713774] 9fc0: 011fd408 011fd428 011fd408 0034 0002 011fd438 011fd408 000ac998 [ 103.721959] 9fe0: 76e0a441 7ef6abac 00050fe0 76e0a446 800c0030 011fd428 [ 103.730163] [<801dcca0>] (__percpu_counter_add) from [<8009599c>] (clear_page_dirty_for_io+0xac/0xd8) [ 103.739401] [<8009599c>] (clear_page_dirty_for_io) from [<80095bc4>] (write_cache_pages+0x1fc/0x2f4) [ 103.748550] [<80095bc4>] (write_cache_pages) from [<80095cf8>] (generic_writepages+0x3c/0x60) [ 103.757090] [<80095cf8>] (generic_writepages) from [<8008e7cc>] (__filemap_fdatawrite_range+0x64/0x6c) [ 103.766412] [<8008e7cc>] (__filemap_fdatawrite_range) from [<8008e868>] (filemap_flush+0x24/0x2c) [ 103.775306] [<8008e868>] (filemap_flush) from [<800e0090>] (sync_filesystem+0x60/0xa8) [ 103.783240] [<800e0090>] (sync_filesystem) from [<800bcec8>] (generic_shutdown_super+0x28/0xd4) [ 103.791953] [<800bcec8>] (generic_shutdown_super) from [<800bd184>] (kill_block_super+0x18/0x64) [ 103.800750] [<800bd184>] (kill_block_super) from [<800bd448>] (deactivate_locked_super+0x4c/0x7c) [ 103.809638] [<800bd448>] (deactivate_locked_super) from [<800d44c4>] (cleanup_mnt+0x4c/0x6c) [ 103.818097] [<800d44c4>] (cleanup_mnt) from [<8003c6c8>] (task_work_run+0xb4/0xc8) [ 103.825688] [<8003c6c8>] (task_work_run) from [<800116bc>] (do_work_pending+0x90/0xa4) [ 103.833623] [<800116bc>] (do_work_pending) from [<8000ea8c>] (work_pending+0xc/0x20) [ 103.841378] Code: e59f00d8 ebfff186 e5943018 ee1d2f90 (e7933002) [ 103.847477] ---[ end trace 5b641bdc50ddcfe7 ]--- [ 103.852101] Kernel panic - not syncing: Fatal exception [ 103.857337] CPU1: stopping [ 103.860059] CPU: 1 PID: 277 Comm: sh Tainted: G D W 4.1.33-01808-gab8d223 #4 [ 103.868068] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 103.874623] [<80014bc4>] (unwind_backtrace) from [<80011a60>]
Re: [PATCH v2 1/2] ata: Enabling ATA Command Priorities
On 10/05/2016 09:00 PM, Adam Manzanares wrote: This patch checks to see if an ATA device supports NCQ command priorities. If so and the user has specified an iocontext that indicates IO_PRIO_CLASS_RT and also enables request priorities in the block queue then we build a tf with a high priority command. This patch depends on patch block-Add-iocontext-priority-to-request Signed-off-by: Adam Manzanares--- drivers/ata/libata-core.c | 35 ++- drivers/ata/libata-scsi.c | 10 +- drivers/ata/libata.h | 2 +- include/linux/ata.h | 6 ++ include/linux/libata.h| 18 ++ 5 files changed, 68 insertions(+), 3 deletions(-) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html