Re: Unrecoverable fs corruption?

2016-01-05 Thread Duncan
Christoph Anton Mitterer posted on Mon, 04 Jan 2016 01:05:02 +0100 as
excerpted:

> On Sun, 2016-01-03 at 15:00 +, Duncan wrote:
>> But now that I think about it, balance does read the chunk in ordered
>> to rewrite its contents, and that read, like all reads, should normally
>> be checksum verified
> That was my idea :)
> 
>>  (except of course in the case of nodatasum, which nocow
>> of course implies).
> Though I haven't had the time so far to reply on the most recent posts
> in that thread,... I still haven't given up on the quest for
> checksumming of nodatacow'ed data ;-)

Following the lines of the btrfs-convert discussion elsewhere, I don't 
believe the current devs to be too interested in this at the current 
time, tho maybe in the "bluesky" timeframe, beyond five years out, likely 
more like ten.  Because most of them believe it to be cost/benefit 
impractical to work on.  However, much like btrfs-convert, if a (probably 
new) developer finds this his particular itch he wants to scratch, and 
puts in the seriously high level of effort to get it to work, and it's 
all up to code standard, perhaps.  But it's going to have to pass a 
pretty high level of skepticism and in general it's simply not considered 
worth the incredible level of effort that would be necessary, so it's 
going to take a developer with a pretty intense itch to scratch over a 
period, very likely, of some years, by the time the code can be both 
demonstrated theoretically correct and pass regression tests and 
skepticism, to get it to the level were it could be properly included.

IOW, not impossible, but as close as it gets.  I'd say the chances of 
seeing this in mainline (not just a series of patches carried by someone 
else) in anything under say 7 years is well under 5%, probably under 2%.  
The chances at say 15 years... maybe 15%.  (That said, if you look at ext4 
as an example, it has grown a bunch of exotic options over time, that 
most people will never use but that scratched someone's itch.  Btrfs 
could be getting similar, at 7+ years out, so it's possible, and at that 
viewpoint, some may even consider the chances near 50% at the 10 year out 
mark.  I'm skeptical, but I wouldn't have considered all those weird 
things now possible in ext4 likely to ever reach mainline ext4, either, 
so...)

But I honestly don't expect current devs to spend much time on the 
proposal, at least not in the 7- year timeframe.

> Especially on large filesystems all these operations tend to take large
> amounts of time and may even impact the lifetime of the storage
> device(s)... so it would be clever if certain such operations could be
> kinda "merged", at least for the purposes of getting the results.
> As in the above example, if one would anyway run a full balance, the
> next scrub may be skipped because one is just doing one.
> Similar for defrag.

Well, balance definitely doesn't do defrag.  By analogy, balance is at 
the UN, nation to nation, level, while defrag is at the city precinct 
level.  They're simply out of each other's scope.

Which isn't to say that at some point in the future, there won't be some 
btrfs doitall command, that does scrub and balance and defrag and 
recompression and ... all in a single pass, taking parameters from all 
the individual functions.  But as you say, that's likely to be at least 
intermediate future, 3-5 years out, maybe 5-7 years out or more.

And like btrfs-convert, I'd consider it in the "not a core tool, but nice 
to have" category.

>> And even if balance works to verify no checksum errors, I don't believe
>> it would correct them or give you the detail on them that a scrub
>> would.
> I'd have expected that that read errors are (if possible because of
> block copies) are repaired as soon as they're encountered... isn't that
> the case?

(My understanding is that...) At the balance level, checksum corruption 
errors aren't going to be fixed from the other copy or from parity, 
because unlike normal file usage, the other copy isn't read -- balance 
isn't worried about file or extent level corruption, and any it would 
find would be simply a byproduct of the normal read-time checksum 
verification process, it's simply moving chunks around.  Such errors 
would thus simply cause the balance to abort, with whatever balance-time 
error that wouldn't even necessarily reflect that it's a checksum error.

Assuming that's correct, a completed balance could be assumed to have in 
addition the meaning of a scrub completed without any errors, but a 
failed balance could have failed for one of any number of reasons and 
with one of various balance-level errors, with such a failure yielding 
little or no clue as to scrub status.

>> And if there is an error, it'd be a balance error, which might or might
>> not actually be a scrub error.
> Sure, but it shouldn't be difficult to collect e.g. scrub stats during
> balance as well.

Given that as of now they're still struggling to manage balance's me

Re: Add big device, remove small device, read-only

2016-01-05 Thread Duncan
Rasmus Abrahamsen posted on Fri, 01 Jan 2016 21:20:13 +0100 as excerpted:

> I accidentically sent my messages directly to Duncan, I am copying them
> in here.
> 
> Hello Duncan,
> 
> Thank you for the amazing response. Wow, you are awesome.

Just a note to mention that real life (TM) got in the way, and I'm a few 
days and a couple hundred posts behind on the list, now.  Sounds like you 
have a backup tho, and if worse comes to worse, you can simply blow away 
the filesystem and start over.  Between that and Chris Murphy helping you 
now, I read the thread to date but am simply marking it read without 
further replies as it exists ATM, but might reply to new posts to the 
thread from now, if I think I can be helpful.

(Which is why I try to discourage direct replies, too.  With direct 
replies to a single person, if that person doesn't get back...  While if 
it's to the list, there's more that can take up the thread, it's not on 
just one person.  Of course the just to me was an accident, but...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


floating point exception (core dumped) - btrfs rescue chunk-recover

2016-01-05 Thread P R Shah

Hello,

TL;DR ==

btrfs 3x500GB RAID 5 - One device failed. Added a new device (btrfs device add) 
and tried to remove the failed device (btrfs device delete).

I tried to mount the array in degraded mode, but that didn't work either. After 
multiple attempts (including adding back the failed HDD), I finally ran the 
btrfs rescue chunk-recover command on the primary member /dev/sdb.

This ran for about 4 hours, and then failed with "floating point exception (core 
dumped)".
==

I am testing out btrfs to gain familiarity with it. I am quite amazed at it's 
capabilities and performance. However, I am either not able to understand or 
implement RAID5 fault tolerance.

I understand from the wiki that RAID56 is experimental. The data I am working 
with is backed up elsewhere and for all intents and purposes, discard-able.

I have set up a btrfs RAID5 with 3x500GB Seagate HDDs, with a mount point of 
/storage. Booting is off a fourth HDD (ext4, lubuntu 64bit) that is not 
involved in the RAID.

Everything was working amazingly well, until one HDD failed and was quietly 
offlined. For a couple of days, the RAID was running off 2 HDDs and I didn't 
notice.

When I DID realize, I shut down the system, bought a new HDD (2TB), which took 
a couple of days to arrive.

When I powered up the system again, the failed 500GB was back. Everything 
loaded fine, and looked good. To be on the safe side, I ran a badblocks test 
(ro) on the failing HDD.

Halfway through the test, the HDD disappeared again. After a cold reboot, it 
was loaded fine again.

At this point, I decided to replace the failed HDD. I shutdown, plugged in the 
new HDD in place of the boot HDD, booted up with Lubuntu live, mounted 
(/storage) and added the device to the RAID.

After adding the device successfully, I gave a device delete command for the 
failed HDD. Partway through the process, the failing HDD (/dev/sdc) disappeared 
again, and after waiting a couple of hours, I hard-reset the system, and 
removed the failing HDD, assuming that the RAID will re-build on the existing 
devices.

Now, the RAID (/storage) refused to mount. I got a c_tree error (please see 
enclosed logs below).

I tried to mount the array in degraded mode, but that didn't work either. After 
multiple attempts (including adding back the failed HDD), I finally ran the 
btrfs rescue chunk-recover command on the primary member /dev/sdb.

This ran for about 4 hours, and then failed with "floating point exception (core 
dumped)".

Can I recover the array or should I start again? The data is not important, but 
I would like to know the recovery process, or any misconceptions in my thinking 
that RAID5 with 3 devices is enough for SOHO-level fault tolerance?

Any advice, pointers, etc, much appreciated. Tech level: medium-high (RHCE).

Relevant system information:
=== uname -a
Linux lubuntu 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 
x86_64 x86_64 x86_64 GNU/Linux


== btrfs --version
btrfs-progs v4.0

== btrfs fi show
warning, device 2 is missing
Label: 'storage'  uuid: 5a3d6590-df08-4520-b61b-802d350849c7
Total devices 4 FS bytes used 176.91GiB
devid1 size 465.76GiB used 90.03GiB path /dev/sdb
devid3 size 465.76GiB used 90.01GiB path /dev/sdc
devid4 size 1.82TiB used 10.00GiB path /dev/sda
*** Some devices missing

== dmesg info
...
Jan  5 01:45:22 lubuntu kernel: [   10.338295] Btrfs loaded
Jan  5 01:45:22 lubuntu kernel: [   10.338899] BTRFS: device label storage 
devid 4 transid 969 /dev/sda
Jan  5 01:45:22 lubuntu kernel: [   10.340448] BTRFS info (device sda): disk 
space caching is enabled
Jan  5 01:45:22 lubuntu kernel: [   10.340454] BTRFS: has skinny extents
Jan  5 01:45:22 lubuntu kernel: [   10.343395] BTRFS: failed to read the system 
array on sda
Jan  5 01:45:22 lubuntu kernel: [   10.352137] BTRFS: open_ctree failed
Jan  5 01:45:22 lubuntu kernel: [   10.382199] BTRFS: device label storage 
devid 1 transid 969 /dev/sdb
Jan  5 01:45:22 lubuntu kernel: [   10.383740] BTRFS info (device sdb): disk 
space caching is enabled
Jan  5 01:45:22 lubuntu kernel: [   10.383744] BTRFS: has skinny extents
Jan  5 01:45:22 lubuntu kernel: [   10.384469] BTRFS: failed to read the system 
array on sdb
Jan  5 01:45:22 lubuntu kernel: [   10.392116] BTRFS: open_ctree failed
Jan  5 01:45:22 lubuntu kernel: [   10.423075] BTRFS: device label storage 
devid 3 transid

... // after btrfs rescue chunk for about 4 hours
Jan  5 06:01:45 lubuntu kernel: [15404.828156] traps: btrfs[3016] trap divide 
error ip:4211a0 sp:7ffd7dbb03a8 error:0 in btrfs[40+73000]
...

== some output from btrfs rescu chunk -vv
...
Stripes list:
[ 0] Stripe: devid = 3, offset = 21484273664
[ 1] Stripe: devid = 2, offset = 21484273664
[ 2] Stripe: devid = 1, offset = 21504196608
Chunk: start = 45134905344, len = 2147483648, type = 81, num_stripes = 3
Stripes list:
[ 0] Str

Re: [PATCH] Btrfs: fix transaction handle leak on failure to create hard link

2016-01-05 Thread Liu Bo
On Tue, Jan 05, 2016 at 04:33:02PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we failed to create a hard link we were not always releasing the
> the transaction handle we got before, resulting in a memory leak and
> preventing any other tasks from being able to commit the current
> transaction.
> Fix this by always releasing our transaction handle.

Reviewed-by: Liu Bo 

Thanks,

-liubo
> 
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/inode.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5dbc07a..018c2a6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6488,7 +6488,7 @@ out_unlock_inode:
>  static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
> struct dentry *dentry)
>  {
> - struct btrfs_trans_handle *trans;
> + struct btrfs_trans_handle *trans = NULL;
>   struct btrfs_root *root = BTRFS_I(dir)->root;
>   struct inode *inode = d_inode(old_dentry);
>   u64 index;
> @@ -6514,6 +6514,7 @@ static int btrfs_link(struct dentry *old_dentry, struct 
> inode *dir,
>   trans = btrfs_start_transaction(root, 5);
>   if (IS_ERR(trans)) {
>   err = PTR_ERR(trans);
> + trans = NULL;
>   goto fail;
>   }
>  
> @@ -6547,9 +6548,10 @@ static int btrfs_link(struct dentry *old_dentry, 
> struct inode *dir,
>   btrfs_log_new_name(trans, inode, NULL, parent);
>   }
>  
> - btrfs_end_transaction(trans, root);
>   btrfs_balance_delayed_items(root);
>  fail:
> + if (trans)
> + btrfs_end_transaction(trans, root);
>   if (drop_inode) {
>   inode_dec_link_count(inode);
>   iput(inode);
> -- 
> 2.1.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix output of compression message in btrfs_parse_options()

2016-01-05 Thread Tsutomu Itoh

Hi, David,

On 2016/01/05 23:12, David Sterba wrote:

On Wed, Dec 16, 2015 at 11:57:38AM +0900, Tsutomu Itoh wrote:

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 974be09..dcc1f15 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2709,7 +2709,7 @@ int open_ctree(struct super_block *sb,
 * In the long term, we'll store the compression type in the super
 * block, and it'll be used for per file compression control.
 */
-   fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
+   fs_info->compress_type = BTRFS_COMPRESS_NONE;


This would change the default compression type, eg. when the compression
is turned on via chattr +c . This would break the applications out
there, the fix has to avoid changing that.


Thanks for pointing that out. I had forgotten chattr +c.
I'll post V2 patch later.

Thanks,
Tsutomu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Chris Bainbridge
On Wed, Jan 06, 2016 at 08:57:28AM +0800, Qu Wenruo wrote:
> 
> Since you took the image of the corrupted fs, would you please try the
> following commands on the corrupted fs?
> 
> $ btrfs-debug-tree -b 67239936 

Command runs then segfaults:

leaf 67239936 items 92 free space 9138 generation 276688 owner 2
fs uuid b1103526-98a3-4b40-a782-cf66721ed600
chunk uuid 16e767e3-a321-4d0f-9c72-6ebac9d305c4
item 0 key (61513990144 EXTENT_ITEM 16384) itemoff 16232 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61617864704 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 1 key (61514006528 EXTENT_ITEM 16384) itemoff 16181 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (27627 DIR_INDEX 3576) level 0
tree block backref root 260
item 2 key (61514022912 EXTENT_ITEM 16384) itemoff 16130 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99424354304) level 0
tree block backref root 7
item 3 key (61514039296 EXTENT_ITEM 16384) itemoff 16079 itemsize 51
extent refs 1 gen 275904 flags TREE_BLOCK
tree block key (118 INODE_ITEM 0) level 0
tree block backref root 260
item 4 key (61514055680 EXTENT_ITEM 16384) itemoff 16028 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61702111232 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 5 key (61514088448 EXTENT_ITEM 16384) itemoff 15977 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (1906872 INODE_REF 34185) level 0
tree block backref root 260
item 6 key (61514104832 EXTENT_ITEM 16384) itemoff 15926 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61741957120 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 7 key (61514121216 EXTENT_ITEM 16384) itemoff 15875 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99654656000) level 0
tree block backref root 7
item 8 key (61514137600 EXTENT_ITEM 16384) itemoff 15824 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99620417536) level 0
tree block backref root 7
item 9 key (61514153984 EXTENT_ITEM 16384) itemoff 15773 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99669962752) level 0
tree block backref root 7
item 10 key (61514170368 EXTENT_ITEM 16384) itemoff 15722 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99639615488) level 0
tree block backref root 7
item 11 key (61514186752 EXTENT_ITEM 16384) itemoff 15671 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99681320960) level 0
tree block backref root 7
item 12 key (61514203136 EXTENT_ITEM 16384) itemoff 15620 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (882130 INODE_ITEM 0) level 0
tree block backref root 260
item 13 key (61514219520 EXTENT_ITEM 16384) itemoff 15569 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61831168000 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 14 key (61514268672 EXTENT_ITEM 16384) itemoff 15518 itemsize 51
extent refs 1 gen 275904 flags TREE_BLOCK
tree block key (1553336 INODE_ITEM 0) level 0
tree block backref root 260
item 15 key (61514285056 EXTENT_ITEM 16384) itemoff 15467 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (62053400576 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 16 key (61514334208 EXTENT_ITEM 16384) itemoff 15416 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99928444928) level 0
tree block backref root 7
item 17 key (61514350592 EXTENT_ITEM 16384) itemoff 15365 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99940794368) level 0
tree block backref root 7
item 18 key (61514366976 EXTENT_ITEM 16384) itemoff 15314 itemsize 51
extent refs 1 g

Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Qu Wenruo



Chris Bainbridge wrote on 2016/01/05 13:41 +:

On 5 January 2016 at 01:57, Qu Wenruo  wrote:


Data, single: total=106.79GiB, used=82.01GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=2.01GiB, used=1.51GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



That's the btrfs fi df misleading output confusing you.

In fact, your metadata is already used up without available space.
GlobalReserve should also be counted as Metadata *used* space.


Thanks for the explanation - the FAQ[1] misleads when it describes
GlobalReserve as "The block reserve is only virtual and is not stored
on the devices." - which sounds like the reserve is literally not
stored on the drive.


In fact FAQ description is not wrong either.

GlobalReserve is not stored in any where, that's true.
Since it doesn't takes space(unless its used is not 0), it is stored no 
where and FAQ is right.


Metadata allocation algorithm will try its best to keep enough free 
space for GlobalReserve.
So for end user, space you can't directly use is no different from used 
space.




The FAQ[2] also suggests that the free space in metadata can be less
than the block reserve total:

"If the free space in metadata is less than or equal to the block
reserve value (typically 512 MiB, but might be something else on a
particularly small or large filesystem), then it's close to full."

But what you are saying is that this is wrong and the free space in
metadata can never be less than the block reserve, because the block
reserve includes the metadata free space?


Sorry for the confusion.
Yes, it's possible for available metadata space less than global reserve 
space.


But when it happens, your used space in GlobalReserved is not 0, and 
unfortunately you are already super short of space.

Meaning you are even unable to touch an empty file.

And in that case, if your kernel is not new enough, you can't even 
delete a file thanks to the metadata COW.


So for common case, one can just treat global reserve as used metadata, 
unless used global reserve is not 0.




[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_GlobalReserve_and_why_does_.27btrfs_fi_df.27_show_it_as_single_even_on_RAID_filesystems.3F
[2] 
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29


Good, 5GiB freed space, it can be allocated for metadata to slightly reduce
the metadata pressure.

But not for long.
The root resolve will be, add more space into this btrfs.


Yes but this is a 128GB SSD and metadata could have been reallocated
from some of the 25GB of free space allocated to data.


This can only happens when:
1) All data chunk is balanced into super compact case, to free all the 25G
   Since btrfs store data and metadata into different chunks, one needs
   to use balance to free space from allocated data/metadata chunks.

   And in your case, you just tried dlimit=1 2 and 5, which will only
   free at most 8 chunks (and at most 8G space).

   If you want to free all the 25G free space from data chunks, then no
   dlimit at all.

2) Mixed block groups.
   This is the most straightforward case.
   All data and metadata can be stored into the same chunk. Then no
   such problem at all.

   But developers tends to avoid such behavior though.

Even with a
bigger drive, it is possible that chunks could be allocated to data,
and then later operations requiring more metadata will still run out
(running out of metadata space seems to be a reasonably common
occurrence judging by the number of "why is btrfs reporting no space
when I have space free" questions).


This is true, and that's the long existing btrfs problem.

Except balance and add more devices, there is no super good ideas so far.
Maybe one day we can enhance it from the allocation algorithm.


The file system shouldn't be corrupted when that happens.



I'm sorry that I'm off topic for the GlobalReserve and unbalanced 
data/metadata chunk.


But I don't consider the corruption is caused by unbalanced 
data/metadata chunks.


So let's go back to the corruption case.

Since you took the image of the corrupted fs, would you please try the 
following commands on the corrupted fs?


$ btrfs-debug-tree -b 67239936 

And, what the kernel mount option for the fs before crash?

The kernel messages shows that your tree root is corrupted.
This is common for a power loss.

But the problem is, btrfs uses barrier to ensure superblock is written 
to disk *after* all other metadata committed.
Or superblock is not updated and still points to old metadata, makes 
everything fine.


So, either barrier is broken or you specified nobarrier, or the power 
loss directly corrupted the new tree root and magically makes the csum 
still match.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 35/35] block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH

2016-01-05 Thread kbuild test robot
Hi Mike,

[auto build test ERROR on next-20160105]
[cannot apply to dm/for-next v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/mchristi-redhat-com/separate-operations-from-flags-in-the-bio-request-structs/20160106-052858
config: um-x86_64_defconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64

All errors (new ones prefixed by >>):

   arch/um/drivers/ubd_kern.c: In function 'ubd_add':
   arch/um/drivers/ubd_kern.c:869:43: error: macro "blk_queue_flush" passed 2 
arguments, but takes just 1
 blk_queue_flush(ubd_dev->queue, REQ_FLUSH);
  ^
   arch/um/drivers/ubd_kern.c:869:2: error: 'blk_queue_flush' undeclared (first 
use in this function)
 blk_queue_flush(ubd_dev->queue, REQ_FLUSH);
 ^
   arch/um/drivers/ubd_kern.c:869:2: note: each undeclared identifier is 
reported only once for each function it appears in
   arch/um/drivers/ubd_kern.c: In function 'do_ubd_request':
>> arch/um/drivers/ubd_kern.c:1293:24: error: 'REQ_FLUSH' undeclared (first use 
>> in this function)
  if (req->cmd_flags & REQ_FLUSH) {
   ^

vim +/REQ_FLUSH +1293 arch/um/drivers/ubd_kern.c

a0044bdf Jeff Dike  2007-05-06  1287dev->start_sg = 
0;
a0044bdf Jeff Dike  2007-05-06  1288dev->end_sg = 
blk_rq_map_sg(q, req, dev->sg);
a0044bdf Jeff Dike  2007-05-06  1289}
a0044bdf Jeff Dike  2007-05-06  1290  
a0044bdf Jeff Dike  2007-05-06  1291req = dev->request;
805f11a0 Richard Weinberger 2013-08-18  1292  
805f11a0 Richard Weinberger 2013-08-18 @1293if (req->cmd_flags & 
REQ_FLUSH) {
805f11a0 Richard Weinberger 2013-08-18  1294io_req = 
kmalloc(sizeof(struct io_thread_req),
805f11a0 Richard Weinberger 2013-08-18  1295
 GFP_ATOMIC);
805f11a0 Richard Weinberger 2013-08-18  1296if (io_req == 
NULL) {

:: The code at line 1293 was first introduced by commit
:: 805f11a0d515658106bfbfadceff0eb30bd90ad2 um: ubd: Add REQ_FLUSH suppport

:: TO: Richard Weinberger 
:: CC: Richard Weinberger 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 34/35] block: add QUEUE_FLAGs for flush and fua

2016-01-05 Thread kbuild test robot
Hi Mike,

[auto build test ERROR on next-20160105]
[cannot apply to dm/for-next v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/mchristi-redhat-com/separate-operations-from-flags-in-the-bio-request-structs/20160106-052858
config: um-x86_64_defconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64

All errors (new ones prefixed by >>):

   arch/um/drivers/ubd_kern.c: In function 'ubd_add':
>> arch/um/drivers/ubd_kern.c:869:43: error: macro "blk_queue_flush" passed 2 
>> arguments, but takes just 1
 blk_queue_flush(ubd_dev->queue, REQ_FLUSH);
  ^
>> arch/um/drivers/ubd_kern.c:869:2: error: 'blk_queue_flush' undeclared (first 
>> use in this function)
 blk_queue_flush(ubd_dev->queue, REQ_FLUSH);
 ^
   arch/um/drivers/ubd_kern.c:869:2: note: each undeclared identifier is 
reported only once for each function it appears in

vim +/blk_queue_flush +869 arch/um/drivers/ubd_kern.c

62f96cb0 Jeff Dike  2007-02-10  863 ubd_dev->queue = 
blk_init_queue(do_ubd_request, &ubd_dev->lock);
62f96cb0 Jeff Dike  2007-02-10  864 if (ubd_dev->queue == NULL) {
62f96cb0 Jeff Dike  2007-02-10  865 *error_out = "Failed to 
initialize device queue";
80c13749 Jeff Dike  2006-09-29  866 goto out;
62f96cb0 Jeff Dike  2007-02-10  867 }
62f96cb0 Jeff Dike  2007-02-10  868 ubd_dev->queue->queuedata = 
ubd_dev;
805f11a0 Richard Weinberger 2013-08-18 @869 blk_queue_flush(ubd_dev->queue, 
REQ_FLUSH);
62f96cb0 Jeff Dike  2007-02-10  870  
8a78362c Martin K. Petersen 2010-02-26  871 
blk_queue_max_segments(ubd_dev->queue, MAX_SG);
792dd4fc Christoph Hellwig  2009-03-31  872 err = 
ubd_disk_register(UBD_MAJOR, ubd_dev->size, n, &ubd_gendisk[n]);

:: The code at line 869 was first introduced by commit
:: 805f11a0d515658106bfbfadceff0eb30bd90ad2 um: ubd: Add REQ_FLUSH suppport

:: TO: Richard Weinberger 
:: CC: Richard Weinberger 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 05/35] fs: have ll_rw_block users pass in op and flags separately

2016-01-05 Thread kbuild test robot
Hi Mike,

[auto build test WARNING on next-20160105]
[cannot apply to dm/for-next v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/mchristi-redhat-com/separate-operations-from-flags-in-the-bio-request-structs/20160106-052858
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> fs/buffer.c:3091: warning: No description found for parameter 'op_flags'
   include/linux/jbd2.h:439: warning: No description found for parameter 
'i_transaction'
   include/linux/jbd2.h:439: warning: No description found for parameter 
'i_next_transaction'
   include/linux/jbd2.h:439: warning: No description found for parameter 
'i_list'
   include/linux/jbd2.h:439: warning: No description found for parameter 
'i_vfs_inode'
   include/linux/jbd2.h:439: warning: No description found for parameter 
'i_flags'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_rsv_handle'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_reserved'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_type'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_line_no'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_start_jiffies'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_requested_credits'
   include/linux/jbd2.h:495: warning: No description found for parameter 
'h_lockdep_map'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_chkpt_bhs[JBD2_NR_BATCH]'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_devname[BDEVNAME_SIZE+24]'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_average_commit_time'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_min_batch_time'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_max_batch_time'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_commit_callback'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_failed_commit'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_chksum_driver'
   include/linux/jbd2.h:1038: warning: No description found for parameter 
'j_csum_seed'
   include/linux/jbd2.h:1038: warning: Excess struct/union/enum/typedef member 
'j_history' description in 'journal_s'
   include/linux/jbd2.h:1038: warning: Excess struct/union/enum/typedef member 
'j_history_max' description in 'journal_s'
   include/linux/jbd2.h:1038: warning: Excess struct/union/enum/typedef member 
'j_history_cur' description in 'journal_s'
   fs/jbd2/transaction.c:429: warning: No description found for parameter 
'rsv_blocks'
   fs/jbd2/transaction.c:429: warning: No description found for parameter 
'gfp_mask'
   fs/jbd2/transaction.c:429: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:429: warning: No description found for parameter 
'line_no'
   fs/jbd2/transaction.c:505: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:505: warning: No description found for parameter 
'line_no'
   fs/jbd2/transaction.c:635: warning: No description found for parameter 
'gfp_mask'

vim +/op_flags +3091 fs/buffer.c

^1da177e Linus Torvalds2005-04-16  3075   *
^1da177e Linus Torvalds2005-04-16  3076   * This function drops any buffer 
that it cannot get a lock on (with the
9cb569d6 Christoph Hellwig 2010-08-11  3077   * BH_Lock state bit), any buffer 
that appears to be clean when doing a write
9cb569d6 Christoph Hellwig 2010-08-11  3078   * request, and any buffer that 
appears to be up-to-date when doing read
9cb569d6 Christoph Hellwig 2010-08-11  3079   * request.  Further it marks as 
clean buffers that are processed for
9cb569d6 Christoph Hellwig 2010-08-11  3080   * writing (the buffer cache won't 
assume that they are actually clean
9cb569d6 Christoph Hellwig 2010-08-11  3081   * until the buffer gets unlocked).
^1da177e Linus Torvalds2005-04-16  3082   *
^1da177e Linus Torvalds2005-04-16  3083   * ll_rw_block sets b_end_io to 
simple completion handler that marks
e227867f Masanari Iida 2014-02-18  3084   * the buffer up-to-date (if 
appropriate), unlocks the buffer and wakes
^1da177e Linus Torvalds2005-04-16  3085   * any waiters. 
^1da177e Linus Torvalds2005-04-16  3086   *
^1da177e Linus Torvalds2005-04-16  3087   * All of the buffers must be for 
the

Re: evidence of persistent state, despite device disconnects

2016-01-05 Thread Chris Murphy
On Tue, Jan 5, 2016 at 7:50 AM, Duncan <1i5t5.dun...@cox.net> wrote:

>
> If however you mounted it degraded,rw at some point, then I'd say the bug
> is in wetware, as in that case, based on my understanding, it's working
> as intended.  I was inclined to believe that was what happened based on
> the obviously partial sequence in the earlier post, but if you say you
> didn't... then it's all down to duplication and finding why it's suddenly
> reverting to single mode on non-degraded mounts, which indeed /is/ a bug.

Clearly I will have to retest.

But even as rw,degraded, it doesn't matter, that'd still be a huge
bug. There's no possible way you'll convince me this is a user
misunderstanding. No where is this documented.

I made the fs using mfks.btrfs -draid1 -mraid1. There is no way the
fs, under any circumstance, legitimately creates and uses any other
profile for any chunk type, ever. Let alone silently.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/35] btrfs: have submit_one_bio users setup bio bi_op

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has btrfs's submit_one_bio callers set
the bio->bi_op to a REQ_OP and the bi_rw to rq_flag_bits.

The next patches will continue to convert btrfs,
so submit_bio_hook and merge_bio_hook
related code will be modified to take only the bio. I did
not do it in this patch to try and keep it smaller.

Note:
I have run xfs tests on these btrfs patches. There were some failures
with and without the patches. I have not had time to track down why
xfstest fails without the patches.

Signed-off-by: Mike Christie 
---
 fs/btrfs/extent_io.c | 92 +++-
 1 file changed, 47 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7bcc729..b6c281a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2382,7 +2382,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode;
int ret;
 
-   BUG_ON(failed_bio->bi_rw & REQ_WRITE);
+   BUG_ON(failed_bio->bi_op == REQ_OP_WRITE);
 
ret = btrfs_get_io_failure_record(inode, start, end, &failrec);
if (ret)
@@ -2408,6 +2408,8 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
free_io_failure(inode, failrec);
return -EIO;
}
+   bio->bi_op = REQ_OP_READ;
+   bio->bi_rw |= read_mode;
 
pr_debug("Repair Read Error: submitting new read[%#x] to 
this_mirror=%d, in_validation=%d\n",
 read_mode, failrec->this_mirror, failrec->in_validation);
@@ -2719,8 +2721,8 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned 
int nr_iovecs)
 }
 
 
-static int __must_check submit_one_bio(int rw, struct bio *bio,
-  int mirror_num, unsigned long bio_flags)
+static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
+  unsigned long bio_flags)
 {
int ret = 0;
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
@@ -2731,12 +2733,12 @@ static int __must_check submit_one_bio(int rw, struct 
bio *bio,
start = page_offset(page) + bvec->bv_offset;
 
bio->bi_private = NULL;
-   bio->bi_rw |= rw;
bio_get(bio);
 
if (tree->ops && tree->ops->submit_bio_hook)
-   ret = tree->ops->submit_bio_hook(page->mapping->host, rw, bio,
-  mirror_num, bio_flags, start);
+   ret = tree->ops->submit_bio_hook(page->mapping->host,
+bio->bi_rw, bio, mirror_num,
+bio_flags, start);
else
btrfsic_submit_bio(bio);
 
@@ -2744,20 +2746,20 @@ static int __must_check submit_one_bio(int rw, struct 
bio *bio,
return ret;
 }
 
-static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
+static int merge_bio(struct extent_io_tree *tree, struct page *page,
 unsigned long offset, size_t size, struct bio *bio,
 unsigned long bio_flags)
 {
int ret = 0;
if (tree->ops && tree->ops->merge_bio_hook)
-   ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
-   bio_flags);
+   ret = tree->ops->merge_bio_hook(bio->bi_op, page, offset, size,
+   bio, bio_flags);
BUG_ON(ret < 0);
return ret;
 
 }
 
-static int submit_extent_page(int rw, struct extent_io_tree *tree,
+static int submit_extent_page(int op, int op_flags, struct extent_io_tree 
*tree,
  struct writeback_control *wbc,
  struct page *page, sector_t sector,
  size_t size, unsigned long offset,
@@ -2785,10 +2787,9 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
 
if (prev_bio_flags != bio_flags || !contig ||
force_bio_submit ||
-   merge_bio(rw, tree, page, offset, page_size, bio, 
bio_flags) ||
+   merge_bio(tree, page, offset, page_size, bio, bio_flags) ||
bio_add_page(bio, page, page_size, offset) < page_size) {
-   ret = submit_one_bio(rw, bio, mirror_num,
-prev_bio_flags);
+   ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
if (ret < 0) {
*bio_ret = NULL;
return ret;
@@ -2809,6 +2810,8 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
bio_add_page(bio, page, page_size, offset);
bio->bi_end_io = end_io_func;
bio->bi_private = tree;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
if (wbc) {
wbc_init_bio(wbc, bio);
   

[PATCH 06/35] direct-io: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has the dio code set the bio bi_op to a REQ_OP.

It also begins to convert btrfs's dio_submit_t related code,
because of the submit_io callout use. In the btrfs_submit_direct
change, I OR'd the op and flag back together. It is only temporary.
The next patch will completely convert all the btrfs code paths.

Signed-off-by: Mike Christie 
---
 fs/btrfs/inode.c   |  9 +
 fs/direct-io.c | 35 +--
 include/linux/fs.h |  2 +-
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 00f27eb..06f88bf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8300,14 +8300,14 @@ out_err:
return 0;
 }
 
-static void btrfs_submit_direct(int rw, struct bio *dio_bio,
-   struct inode *inode, loff_t file_offset)
+static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
+   loff_t file_offset)
 {
struct btrfs_dio_private *dip = NULL;
struct bio *io_bio = NULL;
struct btrfs_io_bio *btrfs_bio;
int skip_sum;
-   int write = rw & REQ_WRITE;
+   bool write = (dio_bio->bi_op == REQ_OP_WRITE);
int ret = 0;
 
skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
@@ -8358,7 +8358,8 @@ static void btrfs_submit_direct(int rw, struct bio 
*dio_bio,
dio_data->unsubmitted_oe_range_end;
}
 
-   ret = btrfs_submit_direct_hook(rw, dip, skip_sum);
+   ret = btrfs_submit_direct_hook(dio_bio->bi_op | dio_bio->bi_rw, dip,
+  skip_sum);
if (!ret)
return;
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 66b1d3eb..aa12742 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -108,7 +108,8 @@ struct dio_submit {
 /* dio_state communicated between submission path and end_io */
 struct dio {
int flags;  /* doesn't change */
-   int rw;
+   int op;
+   int op_flags;
blk_qc_t bio_cookie;
struct block_device *bio_bdev;
struct inode *inode;
@@ -163,7 +164,7 @@ static inline int dio_refill_pages(struct dio *dio, struct 
dio_submit *sdio)
ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
&sdio->from);
 
-   if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) {
+   if (ret < 0 && sdio->blocks_available && (dio->op == REQ_OP_WRITE)) {
struct page *page = ZERO_PAGE(0);
/*
 * A memory fault, but the filesystem has some outstanding
@@ -242,7 +243,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, 
ssize_t ret,
transferred = dio->result;
 
/* Check for short read case */
-   if ((dio->rw == READ) && ((offset + transferred) > dio->i_size))
+   if ((dio->op == REQ_OP_READ) &&
+   ((offset + transferred) > dio->i_size))
transferred = dio->i_size - offset;
}
 
@@ -260,7 +262,7 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, 
ssize_t ret,
inode_dio_end(dio->inode);
 
if (is_async) {
-   if (dio->rw & WRITE) {
+   if (dio->op == REQ_OP_WRITE) {
int err;
 
err = generic_write_sync(dio->iocb->ki_filp, offset,
@@ -369,7 +371,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 
bio->bi_bdev = bdev;
bio->bi_iter.bi_sector = first_sector;
-   bio->bi_rw |= dio->rw;
+   bio->bi_op = dio->op;
+   bio->bi_rw |= dio->op_flags;
if (dio->is_async)
bio->bi_end_io = dio_bio_end_aio;
else
@@ -397,14 +400,13 @@ static inline void dio_bio_submit(struct dio *dio, struct 
dio_submit *sdio)
dio->refcount++;
spin_unlock_irqrestore(&dio->bio_lock, flags);
 
-   if (dio->is_async && dio->rw == READ && dio->should_dirty)
+   if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty)
bio_set_pages_dirty(bio);
 
dio->bio_bdev = bio->bi_bdev;
 
if (sdio->submit_io) {
-   sdio->submit_io(dio->rw, bio, dio->inode,
-  sdio->logical_offset_in_bio);
+   sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio);
dio->bio_cookie = BLK_QC_T_NONE;
} else
dio->bio_cookie = submit_bio(bio);
@@ -472,14 +474,14 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
if (bio->bi_error)
dio->io_error = -EIO;
 
-   if (dio->is_async && dio->rw == READ && dio->should_dirty) {
+   if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */
err = bio->bi_error;
} else {

[PATCH 01/35] block/fs/drivers: remove rw argument from submit_bio

2016-01-05 Thread mchristi
From: Mike Christie 

This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie 
---
 block/bio.c |  7 +++
 block/blk-core.c| 11 ---
 block/blk-flush.c   |  3 ++-
 block/blk-lib.c |  9 ++---
 drivers/block/drbd/drbd_actlog.c|  2 +-
 drivers/block/drbd/drbd_bitmap.c|  4 ++--
 drivers/block/floppy.c  |  3 ++-
 drivers/block/xen-blkback/blkback.c |  4 +++-
 drivers/block/xen-blkfront.c|  4 ++--
 drivers/md/bcache/debug.c   |  6 --
 drivers/md/bcache/journal.c |  2 +-
 drivers/md/bcache/super.c   |  4 ++--
 drivers/md/dm-bufio.c   |  3 ++-
 drivers/md/dm-io.c  |  3 ++-
 drivers/md/dm-log-writes.c  |  9 ++---
 drivers/md/dm-thin.c|  3 ++-
 drivers/md/md.c | 10 +++---
 drivers/md/raid1.c  |  3 ++-
 drivers/md/raid10.c |  4 +++-
 drivers/md/raid5-cache.c|  7 ---
 drivers/target/target_core_iblock.c | 24 +---
 fs/btrfs/check-integrity.c  | 18 ++
 fs/btrfs/check-integrity.h  |  4 ++--
 fs/btrfs/disk-io.c  |  3 ++-
 fs/btrfs/extent_io.c|  7 ---
 fs/btrfs/raid56.c   | 16 +++-
 fs/btrfs/scrub.c| 16 +++-
 fs/btrfs/volumes.c  | 14 +++---
 fs/buffer.c |  3 ++-
 fs/direct-io.c  |  3 ++-
 fs/ext4/crypto.c|  3 ++-
 fs/ext4/page-io.c   |  3 ++-
 fs/ext4/readpage.c  |  9 +
 fs/f2fs/data.c  | 13 -
 fs/f2fs/segment.c   |  6 --
 fs/gfs2/lops.c  |  3 ++-
 fs/gfs2/meta_io.c   |  3 ++-
 fs/gfs2/ops_fstype.c|  3 ++-
 fs/hfsplus/wrapper.c|  3 ++-
 fs/jfs/jfs_logmgr.c |  6 --
 fs/jfs/jfs_metapage.c   | 10 ++
 fs/logfs/dev_bdev.c | 15 ++-
 fs/mpage.c  |  3 ++-
 fs/nfs/blocklayout/blocklayout.c| 22 --
 fs/nilfs2/segbuf.c  |  3 ++-
 fs/ocfs2/cluster/heartbeat.c| 12 +++-
 fs/xfs/xfs_aops.c   |  3 ++-
 fs/xfs/xfs_buf.c|  4 ++--
 include/linux/bio.h |  2 +-
 include/linux/fs.h  |  2 +-
 kernel/power/swap.c |  5 +++--
 mm/page_io.c| 10 ++
 52 files changed, 211 insertions(+), 141 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index dbabd48..921112b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -859,21 +859,20 @@ static void submit_bio_wait_endio(struct bio *bio)
 
 /**
  * submit_bio_wait - submit a bio, and wait until it completes
- * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
  * @bio: The &struct bio which describes the I/O
  *
  * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
  * bio_endio() on failure.
  */
-int submit_bio_wait(int rw, struct bio *bio)
+int submit_bio_wait(struct bio *bio)
 {
struct submit_bio_ret ret;
 
-   rw |= REQ_SYNC;
init_completion(&ret.event);
bio->bi_private = &ret;
bio->bi_end_io = submit_bio_wait_endio;
-   submit_bio(rw, bio);
+   bio->bi_rw |= REQ_SYNC;
+   submit_bio(bio);
wait_for_completion(&ret.event);
 
return ret.error;
diff --git a/block/blk-core.c b/block/blk-core.c
index ab51685..9b887e3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2092,7 +2092,6 @@ EXPORT_SYMBOL(generic_make_request);
 
 /**
  * submit_bio - submit a bio to the block device layer for I/O
- * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
  * @bio: The &struct bio which describes the I/O
  *
  * submit_bio() is very similar in purpose to generic_make_request(), and
@@ -2100,10 +2099,8 @@ EXPORT_SYMBOL(generic_make_request);
  * interfaces; @bio must be presetup and ready for I/O.
  *
  */
-blk_qc_t submit_bio(int rw, struct bio *bio)
+blk_qc_t submit_bio(struct bio *bio)
 {
-   bio->bi_rw |= rw;
-
/*
 * If it's a regular read/write or a barrier with data attached,
 * go through the normal accounting stuff before submission.
@@ -2111,12 +2108,12 @@ blk_qc_t submit_bio(int rw, struct bio *bio)
if (bio_has_data(bio)) {
unsigned int count;
 
-   if (unlikely(rw & REQ_WRITE_SAME))
+   if (unlikely(bio->bi_rw & REQ_WRITE_SAME))
count = bdev_logical_block_size(bio->bi_bdev) >> 9;
else
 

[PATCH 04/35] fs: have submit_bh users pass in op and flags separately

2016-01-05 Thread mchristi
From: Mike Christie 

This has submit_bh users pass in the operation and flags separately,
so we can setup the bio->bi_op and bio-bi_rw flags.

Signed-off-by: Mike Christie 
---
 drivers/md/bitmap.c |  4 ++--
 fs/btrfs/check-integrity.c  | 24 ++--
 fs/btrfs/check-integrity.h  |  2 +-
 fs/btrfs/disk-io.c  |  4 ++--
 fs/buffer.c | 54 +++--
 fs/ext4/balloc.c|  2 +-
 fs/ext4/ialloc.c|  2 +-
 fs/ext4/inode.c |  2 +-
 fs/ext4/mmp.c   |  4 ++--
 fs/fat/misc.c   |  2 +-
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/dir.c   |  2 +-
 fs/gfs2/meta_io.c   |  6 ++---
 fs/jbd2/commit.c|  6 ++---
 fs/jbd2/journal.c   |  8 +++
 fs/nilfs2/btnode.c  |  6 ++---
 fs/nilfs2/btnode.h  |  2 +-
 fs/nilfs2/btree.c   |  6 +++--
 fs/nilfs2/gcinode.c |  5 +++--
 fs/nilfs2/mdt.c | 11 -
 fs/ntfs/aops.c  |  6 ++---
 fs/ntfs/compress.c  |  2 +-
 fs/ntfs/file.c  |  2 +-
 fs/ntfs/logfile.c   |  2 +-
 fs/ntfs/mft.c   |  4 ++--
 fs/ocfs2/buffer_head_io.c   |  8 +++
 fs/reiserfs/inode.c |  4 ++--
 fs/reiserfs/journal.c   |  6 ++---
 fs/ufs/util.c   |  2 +-
 include/linux/buffer_head.h |  9 
 30 files changed, 103 insertions(+), 96 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 4f22e91..13811fc 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -299,7 +299,7 @@ static void write_page(struct bitmap *bitmap, struct page 
*page, int wait)
atomic_inc(&bitmap->pending_writes);
set_buffer_locked(bh);
set_buffer_mapped(bh);
-   submit_bh(WRITE | REQ_SYNC, bh);
+   submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
bh = bh->b_this_page;
}
 
@@ -394,7 +394,7 @@ static int read_page(struct file *file, unsigned long index,
atomic_inc(&bitmap->pending_writes);
set_buffer_locked(bh);
set_buffer_mapped(bh);
-   submit_bh(READ, bh);
+   submit_bh(REQ_OP_READ, 0, bh);
}
block++;
bh = bh->b_this_page;
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7717043..e3fd86b 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2898,12 +2898,12 @@ static struct btrfsic_dev_state 
*btrfsic_dev_state_lookup(
return ds;
 }
 
-int btrfsic_submit_bh(int rw, struct buffer_head *bh)
+int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
struct btrfsic_dev_state *dev_state;
 
if (!btrfsic_is_initialized)
-   return submit_bh(rw, bh);
+   return submit_bh(op, op_flags, bh);
 
mutex_lock(&btrfsic_mutex);
/* since btrfsic_submit_bh() might also be called before
@@ -2912,26 +2912,26 @@ int btrfsic_submit_bh(int rw, struct buffer_head *bh)
 
/* Only called to write the superblock (incl. FLUSH/FUA) */
if (NULL != dev_state &&
-   (rw & WRITE) && bh->b_size > 0) {
+   (op == REQ_OP_WRITE) && bh->b_size > 0) {
u64 dev_bytenr;
 
dev_bytenr = 4096 * bh->b_blocknr;
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bh(rw=0x%x, blocknr=%llu (bytenr %llu),"
-  " size=%zu, data=%p, bdev=%p)\n",
-  rw, (unsigned long long)bh->b_blocknr,
+  "submit_bh(op=0x%x,0x%x, blocknr=%llu "
+  "(bytenr %llu), size=%zu, data=%p, bdev=%p)\n",
+  op, op_flags, (unsigned long long)bh->b_blocknr,
   dev_bytenr, bh->b_size, bh->b_data, bh->b_bdev);
btrfsic_process_written_block(dev_state, dev_bytenr,
  &bh->b_data, 1, NULL,
- NULL, bh, rw);
-   } else if (NULL != dev_state && (rw & REQ_FLUSH)) {
+ NULL, bh, op_flags);
+   } else if (NULL != dev_state && (op_flags & REQ_FLUSH)) {
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bh(rw=0x%x FLUSH, bdev=%p)\n",
-  rw, bh->b_bdev);
+  "submit_bh(op=0x%x,0x%x FLUSH, bdev=%p)\n",
+  op, op_flags, bh->b_bdev);
if (!dev_state->dummy_block_

[PATCH 05/35] fs: have ll_rw_block users pass in op and flags separately

2016-01-05 Thread mchristi
From: Mike Christie 

This has ll_rw_block users pass in the operation and flags separately,
so we can setup the bio->bi_op and bio-bi_rw flags.

Signed-off-by: Mike Christie 
---
 fs/buffer.c | 19 ++-
 fs/ext4/inode.c |  6 +++---
 fs/ext4/namei.c |  2 +-
 fs/ext4/super.c |  2 +-
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/meta_io.c   |  4 ++--
 fs/gfs2/quota.c |  2 +-
 fs/isofs/compress.c |  2 +-
 fs/jbd2/journal.c   |  2 +-
 fs/jbd2/recovery.c  |  4 ++--
 fs/ocfs2/aops.c |  2 +-
 fs/ocfs2/super.c|  2 +-
 fs/reiserfs/journal.c   |  8 
 fs/reiserfs/stree.c |  4 ++--
 fs/reiserfs/super.c |  2 +-
 fs/squashfs/block.c |  4 ++--
 fs/udf/dir.c|  2 +-
 fs/udf/directory.c  |  2 +-
 fs/udf/inode.c  |  2 +-
 fs/ufs/balloc.c |  2 +-
 include/linux/buffer_head.h |  2 +-
 21 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0843964..1a14bf2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -595,7 +595,7 @@ void write_boundary_block(struct block_device *bdev,
struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize);
if (bh) {
if (buffer_dirty(bh))
-   ll_rw_block(WRITE, 1, &bh);
+   ll_rw_block(REQ_OP_WRITE, 0, 1, &bh);
put_bh(bh);
}
 }
@@ -1406,7 +1406,7 @@ void __breadahead(struct block_device *bdev, sector_t 
block, unsigned size)
 {
struct buffer_head *bh = __getblk(bdev, block, size);
if (likely(bh)) {
-   ll_rw_block(READA, 1, &bh);
+   ll_rw_block(REQ_OP_READ, READA, 1, &bh);
brelse(bh);
}
 }
@@ -1966,7 +1966,7 @@ int __block_write_begin(struct page *page, loff_t pos, 
unsigned len,
if (!buffer_uptodate(bh) && !buffer_delay(bh) &&
!buffer_unwritten(bh) &&
 (block_start < from || block_end > to)) {
-   ll_rw_block(READ, 1, &bh);
+   ll_rw_block(REQ_OP_READ, 0, 1, &bh);
*wait_bh++=bh;
}
}
@@ -2863,7 +2863,7 @@ int block_truncate_page(struct address_space *mapping,
 
if (!buffer_uptodate(bh) && !buffer_delay(bh) && !buffer_unwritten(bh)) 
{
err = -EIO;
-   ll_rw_block(READ, 1, &bh);
+   ll_rw_block(REQ_OP_READ, 0, 1, &bh);
wait_on_buffer(bh);
/* Uhhuh. Read error. Complain and punt. */
if (!buffer_uptodate(bh))
@@ -3063,7 +3063,8 @@ EXPORT_SYMBOL(submit_bh);
 
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
- * @rw: whether to %READ or %WRITE or maybe %READA (readahead)
+ * @op: whether to %READ or %WRITE
+ * op_flags: rq_flag_bits or %READA (readahead)
  * @nr: number of &struct buffer_heads in the array
  * @bhs: array of pointers to &struct buffer_head
  *
@@ -3086,7 +3087,7 @@ EXPORT_SYMBOL(submit_bh);
  * All of the buffers must be for the same device, and must also be a
  * multiple of the current approved size for the device.
  */
-void ll_rw_block(int rw, int nr, struct buffer_head *bhs[])
+void ll_rw_block(int op, int op_flags,  int nr, struct buffer_head *bhs[])
 {
int i;
 
@@ -3095,18 +3096,18 @@ void ll_rw_block(int rw, int nr, struct buffer_head 
*bhs[])
 
if (!trylock_buffer(bh))
continue;
-   if (rw == WRITE) {
+   if (op == WRITE) {
if (test_clear_buffer_dirty(bh)) {
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
-   submit_bh(rw, 0, bh);
+   submit_bh(op, op_flags, bh);
continue;
}
} else {
if (!buffer_uptodate(bh)) {
bh->b_end_io = end_buffer_read_sync;
get_bh(bh);
-   submit_bh(rw, 0, bh);
+   submit_bh(op, op_flags, bh);
continue;
}
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4fc178a..26a07cb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -810,7 +810,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct 
inode *inode,
return bh;
if (!bh || buffer_uptodate(bh))
return bh;
-   ll_rw_block(READ | REQ_META | REQ_PRIO, 1, &bh);
+   ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
@@ -964,7 +964,7 @@ static int ext4_block_write_begin(struct page *page, lof

[PATCH 08/35] btrfs: set bi_op tp REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has btrfs set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

Signed-off-by: Mike Christie 
---
 fs/btrfs/check-integrity.c | 19 +--
 fs/btrfs/compression.c |  4 
 fs/btrfs/disk-io.c |  7 ---
 fs/btrfs/inode.c   | 20 +---
 fs/btrfs/raid56.c  | 10 +-
 fs/btrfs/scrub.c   |  9 +
 fs/btrfs/volumes.c | 18 +-
 7 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index e3fd86b..e409d1f 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1683,7 +1683,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
}
bio->bi_bdev = block_ctx->dev->bdev;
bio->bi_iter.bi_sector = dev_bytenr >> 9;
-   bio->bi_rw |= READ;
+   bio->bi_op = REQ_OP_READ;
 
for (j = i; j < num_pages; j++) {
ret = bio_add_page(bio, block_ctx->pagev[j],
@@ -2964,7 +2964,6 @@ int btrfsic_submit_bh(int op, int op_flags, struct 
buffer_head *bh)
 static void __btrfsic_submit_bio(struct bio *bio)
 {
struct btrfsic_dev_state *dev_state;
-   int rw = bio->bi_rw;
 
if (!btrfsic_is_initialized)
return;
@@ -2974,7 +2973,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 * btrfsic_mount(), this might return NULL */
dev_state = btrfsic_dev_state_lookup(bio->bi_bdev);
if (NULL != dev_state &&
-   (rw & WRITE) && NULL != bio->bi_io_vec) {
+   (bio->bi_op == REQ_OP_WRITE) && NULL != bio->bi_io_vec) {
unsigned int i;
u64 dev_bytenr;
u64 cur_bytenr;
@@ -2986,9 +2985,9 @@ static void __btrfsic_submit_bio(struct bio *bio)
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bio(rw=0x%x, bi_vcnt=%u,"
+  "submit_bio(rw=%d,0x%lx, bi_vcnt=%u,"
   " bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
-  rw, bio->bi_vcnt,
+  bio->bi_op, bio->bi_rw, bio->bi_vcnt,
   (unsigned long long)bio->bi_iter.bi_sector,
   dev_bytenr, bio->bi_bdev);
 
@@ -3019,18 +3018,18 @@ static void __btrfsic_submit_bio(struct bio *bio)
btrfsic_process_written_block(dev_state, dev_bytenr,
  mapped_datav, bio->bi_vcnt,
  bio, &bio_is_patched,
- NULL, rw);
+ NULL, bio->bi_rw);
while (i > 0) {
i--;
kunmap(bio->bi_io_vec[i].bv_page);
}
kfree(mapped_datav);
-   } else if (NULL != dev_state && (rw & REQ_FLUSH)) {
+   } else if (NULL != dev_state && (bio->bi_rw & REQ_FLUSH)) {
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bio(rw=0x%x FLUSH, bdev=%p)\n",
-  rw, bio->bi_bdev);
+  "submit_bio(rw=%d,0x%lx FLUSH, bdev=%p)\n",
+  bio->bi_op, bio->bi_rw, bio->bi_bdev);
if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) {
if ((dev_state->state->print_mask &
 (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH |
@@ -3048,7 +3047,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
block->never_written = 0;
block->iodone_w_error = 0;
block->flush_gen = dev_state->last_flush_gen + 1;
-   block->submit_bio_bh_rw = rw;
+   block->submit_bio_bh_rw = bio->bi_rw;
block->orig_bio_bh_private = bio->bi_private;
block->orig_bio_bh_end_io.bio = bio->bi_end_io;
block->next_in_same_bio = NULL;
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index c473c42..25bf179 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -363,6 +363,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
kfree(cb);
return -ENOMEM;
}
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_private = cb;
bio->bi_end_io = end_compressed_bio_write;
atomic_inc(&cb->pending_bios);
@@ -408,6 +409,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
 
bio = compressed_bio_alloc(bdev, first_byte, GFP_NO

[PATCH 10/35] btrfs: don't pass rq_flag_bits if there is a bio

2016-01-05 Thread mchristi
From: Mike Christie 

The bio bi_op and bi_rw is now setup, so there is no need
to pass around the rq_flag_bits bits too.

Signed-off-by: Mike Christie 
---
 fs/btrfs/compression.c |  9 -
 fs/btrfs/disk-io.c | 30 --
 fs/btrfs/disk-io.h |  2 +-
 fs/btrfs/extent_io.c   | 16 +++-
 fs/btrfs/extent_io.h   |  6 +++---
 fs/btrfs/inode.c   | 40 ++--
 fs/btrfs/volumes.c |  6 +++---
 fs/btrfs/volumes.h |  2 +-
 8 files changed, 49 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 25bf179..3112cc3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -402,7 +402,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, WRITE, bio, 0, 1);
+   ret = btrfs_map_bio(root, bio, 0, 1);
BUG_ON(ret); /* -ENOMEM */
 
bio_put(bio);
@@ -433,7 +433,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, WRITE, bio, 0, 1);
+   ret = btrfs_map_bio(root, bio, 0, 1);
BUG_ON(ret); /* -ENOMEM */
 
bio_put(bio);
@@ -694,8 +694,7 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
 root->sectorsize);
 
-   ret = btrfs_map_bio(root, READ, comp_bio,
-   mirror_num, 0);
+   ret = btrfs_map_bio(root, comp_bio, mirror_num, 0);
if (ret) {
bio->bi_error = ret;
bio_endio(comp_bio);
@@ -725,7 +724,7 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, READ, comp_bio, mirror_num, 0);
+   ret = btrfs_map_bio(root, comp_bio, mirror_num, 0);
if (ret) {
bio->bi_error = ret;
bio_endio(comp_bio);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cd152e2..d344231 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -119,7 +119,6 @@ struct async_submit_bio {
struct list_head list;
extent_submit_bio_hook_t *submit_bio_start;
extent_submit_bio_hook_t *submit_bio_done;
-   int rw;
int mirror_num;
unsigned long bio_flags;
/*
@@ -783,7 +782,7 @@ static void run_one_async_start(struct btrfs_work *work)
int ret;
 
async = container_of(work, struct  async_submit_bio, work);
-   ret = async->submit_bio_start(async->inode, async->rw, async->bio,
+   ret = async->submit_bio_start(async->inode, async->bio,
  async->mirror_num, async->bio_flags,
  async->bio_offset);
if (ret)
@@ -816,9 +815,8 @@ static void run_one_async_done(struct btrfs_work *work)
return;
}
 
-   async->submit_bio_done(async->inode, async->rw, async->bio,
-  async->mirror_num, async->bio_flags,
-  async->bio_offset);
+   async->submit_bio_done(async->inode, async->bio, async->mirror_num,
+  async->bio_flags, async->bio_offset);
 }
 
 static void run_one_async_free(struct btrfs_work *work)
@@ -830,7 +828,7 @@ static void run_one_async_free(struct btrfs_work *work)
 }
 
 int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
-   int rw, struct bio *bio, int mirror_num,
+   struct bio *bio, int mirror_num,
unsigned long bio_flags,
u64 bio_offset,
extent_submit_bio_hook_t *submit_bio_start,
@@ -843,7 +841,6 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, 
struct inode *inode,
return -ENOMEM;
 
async->inode = inode;
-   async->rw = rw;
async->bio = bio;
async->mirror_num = mirror_num;
async->submit_bio_start = submit_bio_start;
@@ -889,9 +886,8 @@ static int btree_csum_one_bio(struct bio *bio)
return ret;
 }
 
-static int __btree_submit_bio_start(struct inode *inode, int rw,
-   struct bio *bio, int mirror_num,
-   unsigned long bio_flags,
+static int __btree_submit_bio_start(struct inode *inode, struct bio *bio,
+   int mirror_num, unsigned long bio_flags,
u64 bio_offset)
 {
/*
@@ -901,7 +897,7 @@ static int __btree_submit_bio_st

[PATCH 03/35] block, fs, mm, drivers: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to use bi_op for a REQ_OP and bi_rw
for rq_flag_bits.

These should be simple one liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.

Signed-off-by: Mike Christie 
---
 block/bio.c  |  8 +---
 block/blk-flush.c|  1 +
 block/blk-lib.c  |  7 ---
 block/blk-map.c  |  2 +-
 drivers/block/floppy.c   |  2 +-
 drivers/block/pktcdvd.c  |  4 ++--
 drivers/lightnvm/rrpc.c  |  4 ++--
 drivers/scsi/osd/osd_initiator.c |  8 
 fs/exofs/ore.c   |  2 +-
 fs/ext4/crypto.c |  2 +-
 fs/ext4/page-io.c|  7 ---
 fs/ext4/readpage.c   |  2 +-
 fs/jfs/jfs_logmgr.c  |  2 ++
 fs/jfs/jfs_metapage.c|  4 ++--
 fs/logfs/dev_bdev.c  | 12 ++--
 fs/nfs/blocklayout/blocklayout.c |  2 +-
 mm/page_io.c |  4 ++--
 17 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 921112b..3b8e970 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -581,6 +581,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 */
bio->bi_bdev = bio_src->bi_bdev;
bio_set_flag(bio, BIO_CLONED);
+   bio->bi_op = bio_src->bi_op;
bio->bi_rw = bio_src->bi_rw;
bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec;
@@ -663,6 +664,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
return NULL;
 
bio->bi_bdev= bio_src->bi_bdev;
+   bio->bi_op  = bio_src->bi_op;
bio->bi_rw  = bio_src->bi_rw;
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
@@ -1168,7 +1170,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
goto out_bmd;
 
if (iter->type & WRITE)
-   bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
 
ret = 0;
 
@@ -1338,7 +1340,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 * set data direction, and check if mapped pages need bouncing
 */
if (iter->type & WRITE)
-   bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
 
bio_set_flag(bio, BIO_USER_MAPPED);
 
@@ -1531,7 +1533,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void 
*data, unsigned int len,
bio->bi_private = data;
} else {
bio->bi_end_io = bio_copy_kern_endio;
-   bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
}
 
return bio;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index e092e13..386f57a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -484,6 +484,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t 
gfp_mask,
 
bio = bio_alloc(gfp_mask, 0);
bio->bi_bdev = bdev;
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= WRITE_FLUSH;
 
ret = submit_bio_wait(bio);
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5292e30..5c55817 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -42,7 +42,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
 {
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
-   int type = REQ_WRITE | REQ_DISCARD;
+   int type = 0;
unsigned int granularity;
int alignment;
struct bio_batch bb;
@@ -102,6 +102,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;
+   bio->bi_op = REQ_OP_DISCARD;
bio->bi_rw |= type;
 
bio->bi_iter.bi_size = req_sects << 9;
@@ -178,7 +179,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
bio->bi_io_vec->bv_page = page;
bio->bi_io_vec->bv_offset = 0;
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
-   bio->bi_rw |= REQ_WRITE | REQ_WRITE_SAME;
+   bio->bi_op = REQ_OP_WRITE_SAME;
 
if (nr_sects > max_write_same_sectors) {
bio->bi_iter.bi_size = max_write_same_sectors << 9;
@@ -240,7 +241,7 @@ static int __blkdev_issue_zeroout(struct block_device 
*bdev, sector_t sector,
bio->bi_bdev   = bdev;
bio->bi_end_io = bio_batch_end_io;
bio->bi_private = &bb;
-   bio->bi_rw |= WRITE;
+   bio->bi_op = REQ_OP_WRITE;
 
while (nr_sects != 0) {
sz = min((sector_t) PAGE_SIZE >> 9

[PATCH 15/35] mpage: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has the mpage.c code set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

I have run xfstest with xfs, but I am not sure
if I have stressed these code paths well.

Signed-off-by: Mike Christie 
---
 fs/mpage.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 9fec67f..3f7d221 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -55,11 +55,12 @@ static void mpage_end_io(struct bio *bio)
bio_put(bio);
 }
 
-static struct bio *mpage_bio_submit(int rw, struct bio *bio)
+static struct bio *mpage_bio_submit(int op, int op_flags, struct bio *bio)
 {
bio->bi_end_io = mpage_end_io;
-   bio->bi_rw |= rw;
-   guard_bio_eod(rw, bio);
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
+   guard_bio_eod(op, bio);
submit_bio(bio);
return NULL;
 }
@@ -269,7 +270,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, 
unsigned nr_pages,
 * This page will go to BIO.  Do we need to send this BIO off first?
 */
if (bio && (*last_block_in_bio != blocks[0] - 1))
-   bio = mpage_bio_submit(READ, bio);
+   bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
 
 alloc_new:
if (bio == NULL) {
@@ -286,7 +287,7 @@ alloc_new:
 
length = first_hole << blkbits;
if (bio_add_page(bio, page, length, 0) < length) {
-   bio = mpage_bio_submit(READ, bio);
+   bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
goto alloc_new;
}
 
@@ -294,7 +295,7 @@ alloc_new:
nblocks = map_bh->b_size >> blkbits;
if ((buffer_boundary(map_bh) && relative_block == nblocks) ||
(first_hole != blocks_per_page))
-   bio = mpage_bio_submit(READ, bio);
+   bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
else
*last_block_in_bio = blocks[blocks_per_page - 1];
 out:
@@ -302,7 +303,7 @@ out:
 
 confused:
if (bio)
-   bio = mpage_bio_submit(READ, bio);
+   bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
if (!PageUptodate(page))
block_read_full_page(page, get_block);
else
@@ -384,7 +385,7 @@ mpage_readpages(struct address_space *mapping, struct 
list_head *pages,
}
BUG_ON(!list_empty(pages));
if (bio)
-   mpage_bio_submit(READ, bio);
+   mpage_bio_submit(REQ_OP_READ, 0, bio);
return 0;
 }
 EXPORT_SYMBOL(mpage_readpages);
@@ -405,7 +406,7 @@ int mpage_readpage(struct page *page, get_block_t get_block)
bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio,
&map_bh, &first_logical_block, get_block, gfp);
if (bio)
-   mpage_bio_submit(READ, bio);
+   mpage_bio_submit(REQ_OP_READ, 0, bio);
return 0;
 }
 EXPORT_SYMBOL(mpage_readpage);
@@ -486,7 +487,7 @@ static int __mpage_writepage(struct page *page, struct 
writeback_control *wbc,
struct buffer_head map_bh;
loff_t i_size = i_size_read(inode);
int ret = 0;
-   int wr = (wbc->sync_mode == WB_SYNC_ALL ?  WRITE_SYNC : WRITE);
+   int op_flags = (wbc->sync_mode == WB_SYNC_ALL ?  WRITE_SYNC : 0);
 
if (page_has_buffers(page)) {
struct buffer_head *head = page_buffers(page);
@@ -595,7 +596,7 @@ page_is_mapped:
 * This page will go to BIO.  Do we need to send this BIO off first?
 */
if (bio && mpd->last_block_in_bio != blocks[0] - 1)
-   bio = mpage_bio_submit(wr, bio);
+   bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
 
 alloc_new:
if (bio == NULL) {
@@ -622,7 +623,7 @@ alloc_new:
wbc_account_io(wbc, page, PAGE_SIZE);
length = first_unmapped << blkbits;
if (bio_add_page(bio, page, length, 0) < length) {
-   bio = mpage_bio_submit(wr, bio);
+   bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
goto alloc_new;
}
 
@@ -632,7 +633,7 @@ alloc_new:
set_page_writeback(page);
unlock_page(page);
if (boundary || (first_unmapped != blocks_per_page)) {
-   bio = mpage_bio_submit(wr, bio);
+   bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
if (boundary_block) {
write_boundary_block(boundary_bdev,
boundary_block, 1 << blkbits);
@@ -644,7 +645,7 @@ alloc_new:
 
 confused:
if (bio)
-   bio = mpage_bio_submit(wr, bio);
+   bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
 
if (mpd->use_writepage) {
ret = mapping->a_ops->writepage(page, wbc);
@@ -701,9 +702,9 @@ mpage_writepages(struct address_space *mapping,
 
ret = write_cache_pages(mapping, wbc, __mpage_writepage, &mpd);
if (mpd.bio) {
-  

[PATCH 13/35] xfs: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has xfs set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

Note:
I have run xfs tests on these btrfs patches. There were some failures
with and without the patches. I have not had time to track down why
xfstest fails without the patches.

Signed-off-by: Mike Christie 
---
 fs/xfs/xfs_aops.c |  3 ++-
 fs/xfs/xfs_buf.c  | 27 +++
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a1052d2..3a00935 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -382,7 +382,8 @@ xfs_submit_ioend_bio(
atomic_inc(&ioend->io_remaining);
bio->bi_private = ioend;
bio->bi_end_io = xfs_end_bio;
-   bio->bi_rw |= (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
+   bio->bi_op = REQ_OP_WRITE;
+   bio->bi_rw |= WB_SYNC_ALL ? WRITE_SYNC : 0;
submit_bio(bio);
 }
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 644e676..4cfba72 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1124,7 +1124,8 @@ xfs_buf_ioapply_map(
int map,
int *buf_offset,
int *count,
-   int rw)
+   int op,
+   int op_flags)
 {
int page_index;
int total_nr_pages = bp->b_page_count;
@@ -1163,7 +1164,8 @@ next_chunk:
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = xfs_buf_bio_end_io;
bio->bi_private = bp;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
 
for (; size && nr_pages; nr_pages--, page_index++) {
int rbytes, nbytes = PAGE_SIZE - offset;
@@ -1207,7 +1209,8 @@ _xfs_buf_ioapply(
struct xfs_buf  *bp)
 {
struct blk_plug plug;
-   int rw;
+   int op;
+   int op_flags = 0;
int offset;
int size;
int i;
@@ -1226,14 +1229,13 @@ _xfs_buf_ioapply(
bp->b_ioend_wq = bp->b_target->bt_mount->m_buf_workqueue;
 
if (bp->b_flags & XBF_WRITE) {
+   op = REQ_OP_WRITE;
if (bp->b_flags & XBF_SYNCIO)
-   rw = WRITE_SYNC;
-   else
-   rw = WRITE;
+   op_flags = WRITE_SYNC;
if (bp->b_flags & XBF_FUA)
-   rw |= REQ_FUA;
+   op_flags |= REQ_FUA;
if (bp->b_flags & XBF_FLUSH)
-   rw |= REQ_FLUSH;
+   op_flags |= REQ_FLUSH;
 
/*
 * Run the write verifier callback function if it exists. If
@@ -1263,13 +1265,14 @@ _xfs_buf_ioapply(
}
}
} else if (bp->b_flags & XBF_READ_AHEAD) {
-   rw = READA;
+   op = REQ_OP_READ;
+   op_flags = REQ_RAHEAD;
} else {
-   rw = READ;
+   op = REQ_OP_READ;
}
 
/* we only use the buffer cache for meta-data */
-   rw |= REQ_META;
+   op_flags |= REQ_META;
 
/*
 * Walk all the vectors issuing IO on them. Set up the initial offset
@@ -1281,7 +1284,7 @@ _xfs_buf_ioapply(
size = BBTOB(bp->b_io_length);
blk_start_plug(&plug);
for (i = 0; i < bp->b_map_count; i++) {
-   xfs_buf_ioapply_map(bp, i, &offset, &size, rw);
+   xfs_buf_ioapply_map(bp, i, &offset, &size, op, op_flags);
if (bp->b_error)
break;
if (size <= 0)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/35] btrfs: update __btrfs_map_block for bi_op transition

2016-01-05 Thread mchristi
From: Mike Christie 

We no longer pass in a bitmap of rq_flag_bits bits
to __btrfs_map_block. It will always be a REQ_OP,
or the btrfs specific REQ_GET_READ_MIRRORS,
so this drops the bit tests.

Signed-off-by: Mike Christie 
---
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/inode.c   |  2 +-
 fs/btrfs/volumes.c | 55 +++---
 fs/btrfs/volumes.h |  4 ++--
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index add4af6..4d503d0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2043,7 +2043,7 @@ int btrfs_discard_extent(struct btrfs_root *root, u64 
bytenr,
 
 
/* Tell the block device(s) that the sectors can be discarded */
-   ret = btrfs_map_block(root->fs_info, REQ_DISCARD,
+   ret = btrfs_map_block(root->fs_info, REQ_OP_DISCARD,
  bytenr, &num_bytes, &bbio, 0);
/* Error condition is -ENOMEM */
if (!ret) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7a830c7..b1e88ec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8206,7 +8206,7 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
int async_submit = 0;
 
map_length = orig_bio->bi_iter.bi_size;
-   ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
+   ret = btrfs_map_block(root->fs_info, orig_bio->bi_op, start_sector << 9,
  &map_length, NULL, 0);
if (ret)
return -EIO;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0da1d32..bf1e9af 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5218,7 +5218,7 @@ void btrfs_put_bbio(struct btrfs_bio *bbio)
kfree(bbio);
 }
 
-static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
+static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int op,
 u64 logical, u64 *length,
 struct btrfs_bio **bbio_ret,
 int mirror_num, int need_raid_map)
@@ -5296,7 +5296,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
raid56_full_stripe_start *= full_stripe_len;
}
 
-   if (rw & REQ_DISCARD) {
+   if (op == REQ_OP_DISCARD) {
/* we don't discard raid56 yet */
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
ret = -EOPNOTSUPP;
@@ -5309,7 +5309,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
   For other RAID types and for RAID[56] reads, just allow a 
single
   stripe (on a single disk). */
if ((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
-   (rw & REQ_WRITE)) {
+   (op == REQ_OP_WRITE)) {
max_len = stripe_len * nr_data_stripes(map) -
(offset - raid56_full_stripe_start);
} else {
@@ -5332,8 +5332,8 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
btrfs_dev_replace_unlock(dev_replace);
 
if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
-   !(rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS)) &&
-   dev_replace->tgtdev != NULL) {
+   op != REQ_OP_WRITE && op != REQ_OP_DISCARD &&
+   op != REQ_GET_READ_MIRRORS && dev_replace->tgtdev != NULL) {
/*
 * in dev-replace case, for repair case (that's the only
 * case where the mirror is selected explicitly when
@@ -5422,15 +5422,17 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
(offset + *length);
 
if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
-   if (rw & REQ_DISCARD)
+   if (op == REQ_OP_DISCARD)
num_stripes = min_t(u64, map->num_stripes,
stripe_nr_end - stripe_nr_orig);
stripe_nr = div_u64_rem(stripe_nr, map->num_stripes,
&stripe_index);
-   if (!(rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS)))
+   if (op != REQ_OP_WRITE && op != REQ_OP_DISCARD &&
+   op != REQ_GET_READ_MIRRORS)
mirror_num = 1;
} else if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
-   if (rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS))
+   if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
+   op == REQ_GET_READ_MIRRORS)
num_stripes = map->num_stripes;
else if (mirror_num)
stripe_index = mirror_num - 1;
@@ -5443,7 +5445,8 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
}
 
} else if (map->type & BTRFS_BLOCK_GROUP_DUP

[PATCH 16/35] nilfs: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has nilfs set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 fs/nilfs2/segbuf.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 428ece8..8784272 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -350,7 +350,8 @@ static void nilfs_end_bio_write(struct bio *bio)
 }
 
 static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf,
-  struct nilfs_write_info *wi, int mode)
+  struct nilfs_write_info *wi, int mode,
+  int mode_flags)
 {
struct bio *bio = wi->bio;
int err;
@@ -368,7 +369,8 @@ static int nilfs_segbuf_submit_bio(struct 
nilfs_segment_buffer *segbuf,
 
bio->bi_end_io = nilfs_end_bio_write;
bio->bi_private = segbuf;
-   bio->bi_rw |= mode;
+   bio->bi_op = mode;
+   bio->bi_rw |= mode_flags;
submit_bio(bio);
segbuf->sb_nbio++;
 
@@ -442,7 +444,7 @@ static int nilfs_segbuf_submit_bh(struct 
nilfs_segment_buffer *segbuf,
return 0;
}
/* bio is FULL */
-   err = nilfs_segbuf_submit_bio(segbuf, wi, mode);
+   err = nilfs_segbuf_submit_bio(segbuf, wi, mode, 0);
/* never submit current bh */
if (likely(!err))
goto repeat;
@@ -466,19 +468,19 @@ static int nilfs_segbuf_write(struct nilfs_segment_buffer 
*segbuf,
 {
struct nilfs_write_info wi;
struct buffer_head *bh;
-   int res = 0, rw = WRITE;
+   int res = 0;
 
wi.nilfs = nilfs;
nilfs_segbuf_prepare_write(segbuf, &wi);
 
list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) {
-   res = nilfs_segbuf_submit_bh(segbuf, &wi, bh, rw);
+   res = nilfs_segbuf_submit_bh(segbuf, &wi, bh, REQ_OP_WRITE);
if (unlikely(res))
goto failed_bio;
}
 
list_for_each_entry(bh, &segbuf->sb_payload_buffers, b_assoc_buffers) {
-   res = nilfs_segbuf_submit_bh(segbuf, &wi, bh, rw);
+   res = nilfs_segbuf_submit_bh(segbuf, &wi, bh, REQ_OP_WRITE);
if (unlikely(res))
goto failed_bio;
}
@@ -488,8 +490,8 @@ static int nilfs_segbuf_write(struct nilfs_segment_buffer 
*segbuf,
 * Last BIO is always sent through the following
 * submission.
 */
-   rw |= REQ_SYNC;
-   res = nilfs_segbuf_submit_bio(segbuf, &wi, rw);
+   res = nilfs_segbuf_submit_bio(segbuf, &wi, REQ_OP_WRITE,
+ REQ_SYNC);
}
 
  failed_bio:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/35] hfsplus: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has hfsplus set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 fs/hfsplus/hfsplus_fs.h |  2 +-
 fs/hfsplus/part_tbl.c   |  5 +++--
 fs/hfsplus/super.c  |  6 --
 fs/hfsplus/wrapper.c| 15 +--
 4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index f91a1fa..80154aa 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -525,7 +525,7 @@ int hfsplus_compare_dentry(const struct dentry *parent,
 
 /* wrapper.c */
 int hfsplus_submit_bio(struct super_block *sb, sector_t sector, void *buf,
-  void **data, int rw);
+  void **data, int op, int op_flags);
 int hfsplus_read_wrapper(struct super_block *sb);
 
 /* time macros */
diff --git a/fs/hfsplus/part_tbl.c b/fs/hfsplus/part_tbl.c
index eb355d8..63164eb 100644
--- a/fs/hfsplus/part_tbl.c
+++ b/fs/hfsplus/part_tbl.c
@@ -112,7 +112,8 @@ static int hfs_parse_new_pmap(struct super_block *sb, void 
*buf,
if ((u8 *)pm - (u8 *)buf >= buf_size) {
res = hfsplus_submit_bio(sb,
 *part_start + HFS_PMAP_BLK + i,
-buf, (void **)&pm, READ);
+buf, (void **)&pm, REQ_OP_READ,
+0);
if (res)
return res;
}
@@ -136,7 +137,7 @@ int hfs_part_find(struct super_block *sb,
return -ENOMEM;
 
res = hfsplus_submit_bio(sb, *part_start + HFS_PMAP_BLK,
-buf, &data, READ);
+buf, &data, REQ_OP_READ, 0);
if (res)
goto out;
 
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 5d54490..01cf313 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -219,7 +219,8 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait)
 
error2 = hfsplus_submit_bio(sb,
   sbi->part_start + HFSPLUS_VOLHEAD_SECTOR,
-  sbi->s_vhdr_buf, NULL, WRITE_SYNC);
+  sbi->s_vhdr_buf, NULL, REQ_OP_WRITE,
+  WRITE_SYNC);
if (!error)
error = error2;
if (!write_backup)
@@ -227,7 +228,8 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait)
 
error2 = hfsplus_submit_bio(sb,
  sbi->part_start + sbi->sect_count - 2,
- sbi->s_backup_vhdr_buf, NULL, WRITE_SYNC);
+ sbi->s_backup_vhdr_buf, NULL, REQ_OP_WRITE,
+ WRITE_SYNC);
if (!error)
error2 = error;
 out:
diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
index 7e605b5..d09b726 100644
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -30,7 +30,8 @@ struct hfsplus_wd {
  * @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes
  * @buf: buffer for I/O
  * @data: output pointer for location of requested data
- * @rw: direction of I/O
+ * @op: direction of I/O
+ * @op_flags: request op flags
  *
  * The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than
  * HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads
@@ -44,7 +45,7 @@ struct hfsplus_wd {
  * will work correctly.
  */
 int hfsplus_submit_bio(struct super_block *sb, sector_t sector,
-   void *buf, void **data, int rw)
+  void *buf, void **data, int op, int op_flags)
 {
struct bio *bio;
int ret = 0;
@@ -65,9 +66,10 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t 
sector,
bio = bio_alloc(GFP_NOIO, 1);
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = sb->s_bdev;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
 
-   if (!(rw & WRITE) && data)
+   if (op != WRITE && data)
*data = (u8 *)buf + offset;
 
while (io_size > 0) {
@@ -182,7 +184,7 @@ int hfsplus_read_wrapper(struct super_block *sb)
 reread:
error = hfsplus_submit_bio(sb, part_start + HFSPLUS_VOLHEAD_SECTOR,
   sbi->s_vhdr_buf, (void **)&sbi->s_vhdr,
-  READ);
+  REQ_OP_READ, 0);
if (error)
goto out_free_backup_vhdr;
 
@@ -214,7 +216,8 @@ reread:
 
error = hfsplus_submit_bio(sb, part_start + part_size - 2,
   sbi->s_backup_vhdr_buf,
-  (void **)&sbi->s_backup_vhdr, READ);
+  (void **)&sbi->s_backup_vhdr, REQ_OP_READ,
+   

[PATCH 11/35] f2fs: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has f2fs set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 fs/f2fs/checkpoint.c| 10 ++
 fs/f2fs/data.c  | 33 -
 fs/f2fs/f2fs.h  |  5 +++--
 fs/f2fs/gc.c|  9 ++---
 fs/f2fs/inline.c|  3 ++-
 fs/f2fs/node.c  |  8 +---
 fs/f2fs/segment.c   | 10 +++---
 fs/f2fs/trace.c |  8 +---
 include/trace/events/f2fs.h | 34 +-
 9 files changed, 75 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index fdd43f7..92d05d8 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -55,13 +55,14 @@ static struct page *__get_meta_page(struct f2fs_sb_info 
*sbi, pgoff_t index,
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
-   .rw = READ_SYNC | REQ_META | REQ_PRIO,
+   .op = REQ_OP_READ,
+   .op_flags = READ_SYNC | REQ_META | REQ_PRIO,
.blk_addr = index,
.encrypted_page = NULL,
};
 
if (unlikely(!is_meta))
-   fio.rw &= ~REQ_META;
+   fio.op_flags &= ~REQ_META;
 repeat:
page = grab_cache_page(mapping, index);
if (!page) {
@@ -149,12 +150,13 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t 
start, int nrpages,
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
-   .rw = sync ? (READ_SYNC | REQ_META | REQ_PRIO) : READA,
+   .op = REQ_OP_READ,
+   .op_flags = sync ? (READ_SYNC | REQ_META | REQ_PRIO) : READA,
.encrypted_page = NULL,
};
 
if (unlikely(type == META_POR))
-   fio.rw &= ~REQ_META;
+   fio.op_flags &= ~REQ_META;
 
for (; nrpages-- > 0; blkno++) {
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5325408..14757cb 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -107,11 +107,12 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
if (!io->bio)
return;
 
-   if (is_read_io(fio->rw))
+   if (is_read_io(fio->op))
trace_f2fs_submit_read_bio(io->sbi->sb, fio, io->bio);
else
trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio);
-   io->bio->bi_rw |= fio->rw;
+   io->bio->bi_op = fio->op;
+   io->bio->bi_rw |= fio->op_flags;
 
submit_bio(io->bio);
io->bio = NULL;
@@ -130,10 +131,12 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
/* change META to META_FLUSH in the checkpoint procedure */
if (type >= META_FLUSH) {
io->fio.type = META_FLUSH;
+   io->fio.op = REQ_OP_WRITE;
if (test_opt(sbi, NOBARRIER))
-   io->fio.rw = WRITE_FLUSH | REQ_META | REQ_PRIO;
+   io->fio.op_flags = WRITE_FLUSH | REQ_META | REQ_PRIO;
else
-   io->fio.rw = WRITE_FLUSH_FUA | REQ_META | REQ_PRIO;
+   io->fio.op_flags = WRITE_FLUSH_FUA | REQ_META |
+   REQ_PRIO;
}
__submit_merged_bio(io);
up_write(&io->io_rwsem);
@@ -152,13 +155,14 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
f2fs_trace_ios(fio, 0);
 
/* Allocate a new bio */
-   bio = __bio_alloc(fio->sbi, fio->blk_addr, 1, is_read_io(fio->rw));
+   bio = __bio_alloc(fio->sbi, fio->blk_addr, 1, is_read_io(fio->op));
 
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
bio_put(bio);
return -EFAULT;
}
-   bio->bi_rw |= fio->rw;
+   bio->bi_op = fio->op;
+   bio->bi_rw |= fio->op_flags;
 
submit_bio(bio);
return 0;
@@ -169,7 +173,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
struct f2fs_sb_info *sbi = fio->sbi;
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io;
-   bool is_read = is_read_io(fio->rw);
+   bool is_read = is_read_io(fio->op);
struct page *bio_page;
 
io = is_read ? &sbi->read_io : &sbi->write_io[btype];
@@ -182,7 +186,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
inc_page_count(sbi, F2FS_WRITEBACK);
 
if (io->bio && (io->last_block_in_bio != fio->blk_addr - 1 ||
-   io->fio.rw != fio->rw))
+   (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags)))
__submit_merged_bio(io);
 alloc_new:
if (io->bio == NULL) {
@@ -278,7 +282,7 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
 }
 
 struct page *get_read_data_page(struct inode *inode, pgoff_t index,
-  

[PATCH 12/35] gfs2: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has gfs2 set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

v2:

Bob, I did not add your signed off, because there was
the gfs2_submit_bhs changes since last time you reviewed
it.

Signed-off-by: Mike Christie 
---
 fs/gfs2/log.c|  8 
 fs/gfs2/lops.c   | 12 +++-
 fs/gfs2/lops.h   |  2 +-
 fs/gfs2/meta_io.c|  8 +---
 fs/gfs2/ops_fstype.c |  1 +
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 0ff028c..e58ccef0 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -657,7 +657,7 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 
flags)
struct gfs2_log_header *lh;
unsigned int tail;
u32 hash;
-   int rw = WRITE_FLUSH_FUA | REQ_META;
+   int op_flags = WRITE_FLUSH_FUA | REQ_META;
struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
enum gfs2_freeze_state state = atomic_read(&sdp->sd_freeze_state);
lh = page_address(page);
@@ -682,12 +682,12 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 
flags)
if (test_bit(SDF_NOBARRIERS, &sdp->sd_flags)) {
gfs2_ordered_wait(sdp);
log_flush_wait(sdp);
-   rw = WRITE_SYNC | REQ_META | REQ_PRIO;
+   op_flags = WRITE_SYNC | REQ_META | REQ_PRIO;
}
 
sdp->sd_log_idle = (tail == sdp->sd_log_flush_head);
gfs2_log_write_page(sdp, page);
-   gfs2_log_flush_bio(sdp, rw);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, op_flags);
log_flush_wait(sdp);
 
if (sdp->sd_log_tail != tail)
@@ -738,7 +738,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock 
*gl,
 
gfs2_ordered_write(sdp);
lops_before_commit(sdp, tr);
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
 
if (sdp->sd_log_head != sdp->sd_log_flush_head) {
log_flush_wait(sdp);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index acc5ccb..11980f6 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -230,17 +230,19 @@ static void gfs2_end_log_write(struct bio *bio)
 /**
  * gfs2_log_flush_bio - Submit any pending log bio
  * @sdp: The superblock
- * @rw: The rw flags
+ * @op: REQ_OP
+ * @op_flags: rq_flag_bits
  *
  * Submit any pending part-built or full bio to the block device. If
  * there is no pending bio, then this is a no-op.
  */
 
-void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw)
+void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags)
 {
if (sdp->sd_log_bio) {
atomic_inc(&sdp->sd_log_in_flight);
-   sdp->sd_log_bio->bi_rw |= rw;
+   sdp->sd_log_bio->bi_op = op;
+   sdp->sd_log_bio->bi_rw |= op_flags;
submit_bio(sdp->sd_log_bio);
sdp->sd_log_bio = NULL;
}
@@ -300,7 +302,7 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, 
u64 blkno)
nblk >>= sdp->sd_fsb2bb_shift;
if (blkno == nblk)
return bio;
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
}
 
return gfs2_log_alloc_bio(sdp, blkno);
@@ -329,7 +331,7 @@ static void gfs2_log_write(struct gfs2_sbd *sdp, struct 
page *page,
bio = gfs2_log_get_bio(sdp, blkno);
ret = bio_add_page(bio, page, size, offset);
if (ret == 0) {
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
bio = gfs2_log_alloc_bio(sdp, blkno);
ret = bio_add_page(bio, page, size, offset);
WARN_ON(ret == 0);
diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h
index a65a7ba..e529f53 100644
--- a/fs/gfs2/lops.h
+++ b/fs/gfs2/lops.h
@@ -27,7 +27,7 @@ extern const struct gfs2_log_operations gfs2_databuf_lops;
 
 extern const struct gfs2_log_operations *gfs2_log_ops[];
 extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page);
-extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw);
+extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags);
 extern void gfs2_pin(struct gfs2_sbd *sdp, struct buffer_head *bh);
 
 static inline unsigned int buf_limit(struct gfs2_sbd *sdp)
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index e2fc2b9..3996c1d 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -213,7 +213,8 @@ static void gfs2_meta_read_endio(struct bio *bio)
  * Submit several consecutive buffer head I/O requests as a single bio I/O
  * request.  (See submit_bh_wbc.)
  */
-static void gfs2_submit_bhs(int rw, struct buffer_head *bhs[], int num)
+static void gfs2_submit_bhs(int op, int op_flags, struct buffer_head *bhs[],
+   int num)
 {
struct buffer_head *bh = bhs[0];
struct bio *bio;
@@ -230,7 +231,8 @@ static void gfs2_submit_bhs(int

[PATCH 22/35] drbd: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has drbd set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

Lars and Philip, I might have split this patch up a little weird.
Thisi patch handles setting up the bio, and then patch 30
(0030-block-fs-drivers-do-not-test-bi_rw-for-REQ_OPs.patch) handles
where were check/read bio->bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/block/drbd/drbd_actlog.c   | 29 -
 drivers/block/drbd/drbd_bitmap.c   |  6 +++---
 drivers/block/drbd/drbd_int.h  |  4 ++--
 drivers/block/drbd/drbd_main.c |  5 +++--
 drivers/block/drbd/drbd_receiver.c | 37 +
 drivers/block/drbd/drbd_worker.c   |  3 ++-
 6 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 6069e15..2fa8534 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -137,19 +137,19 @@ void wait_until_done_or_force_detached(struct drbd_device 
*device, struct drbd_b
 
 static int _drbd_md_sync_page_io(struct drbd_device *device,
 struct drbd_backing_dev *bdev,
-sector_t sector, int rw)
+sector_t sector, int op)
 {
struct bio *bio;
/* we do all our meta data IO in aligned 4k blocks. */
const int size = 4096;
-   int err;
+   int err, op_flags = 0;
 
device->md_io.done = 0;
device->md_io.error = -ENODEV;
 
-   if ((rw & WRITE) && !test_bit(MD_NO_FUA, &device->flags))
-   rw |= REQ_FUA | REQ_FLUSH;
-   rw |= REQ_SYNC | REQ_NOIDLE;
+   if ((op == REQ_OP_WRITE) && !test_bit(MD_NO_FUA, &device->flags))
+   op_flags |= REQ_FUA | REQ_FLUSH;
+   op_flags |= REQ_SYNC | REQ_NOIDLE;
 
bio = bio_alloc_drbd(GFP_NOIO);
bio->bi_bdev = bdev->md_bdev;
@@ -159,9 +159,10 @@ static int _drbd_md_sync_page_io(struct drbd_device 
*device,
goto out;
bio->bi_private = device;
bio->bi_end_io = drbd_md_endio;
-   bio->bi_rw = rw;
+   bio->bi_op = op;
+   bio->bi_rw = op_flags;
 
-   if (!(rw & WRITE) && device->state.disk == D_DISKLESS && device->ldev 
== NULL)
+   if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && 
device->ldev == NULL)
/* special case, drbd_md_read() during drbd_adm_attach(): no 
get_ldev */
;
else if (!get_ldev_if_state(device, D_ATTACHING)) {
@@ -174,7 +175,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
bio_get(bio); /* one bio_put() is in the completion handler */
atomic_inc(&device->md_io.in_use); /* drbd_md_put_buffer() is in the 
completion handler */
device->md_io.submit_jif = jiffies;
-   if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD))
+   if (drbd_insert_fault(device, (op == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD))
bio_io_error(bio);
else
submit_bio(bio);
@@ -188,7 +189,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
 }
 
 int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev 
*bdev,
-sector_t sector, int rw)
+sector_t sector, int op)
 {
int err;
D_ASSERT(device, atomic_read(&device->md_io.in_use) == 1);
@@ -197,19 +198,21 @@ int drbd_md_sync_page_io(struct drbd_device *device, 
struct drbd_backing_dev *bd
 
dynamic_drbd_dbg(device, "meta_data io: %s [%d]:%s(,%llus,%s) %pS\n",
 current->comm, current->pid, __func__,
-(unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ",
+(unsigned long long)sector, (op == REQ_OP_WRITE) ? "WRITE" : 
"READ",
 (void*)_RET_IP_ );
 
if (sector < drbd_md_first_sector(bdev) ||
sector + 7 > drbd_md_last_sector(bdev))
drbd_alert(device, "%s [%d]:%s(,%llus,%s) out of range md 
access!\n",
 current->comm, current->pid, __func__,
-(unsigned long long)sector, (rw & WRITE) ? "WRITE" : 
"READ");
+(unsigned long long)sector,
+(op == REQ_OP_WRITE) ? "WRITE" : "READ");
 
-   err = _drbd_md_sync_page_io(device, bdev, sector, rw);
+   err = _drbd_md_sync_page_io(device, bdev, sector, op);
if (err) {
drbd_err(device, "drbd_md_sync_page_io(,%llus,%s) failed with 
error %d\n",
-   (unsigned long long)sector, (rw & WRITE) ? "WRITE" : 
"READ", err);
+   (unsigned long long)sector,
+   (op == REQ_OP_WRITE) ? "WRITE" : "READ", err);
}
return err;
 }
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index ef44adb..126bf4a 100644
--- a/drivers/block/drbd/drbd_bitmap

[PATCH 18/35] pm: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has the pm swap code set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 kernel/power/swap.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 649dfc7..197076b 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -250,7 +250,7 @@ static void hib_end_io(struct bio *bio)
bio_put(bio);
 }
 
-static int hib_submit_io(int rw, pgoff_t page_off, void *addr,
+static int hib_submit_io(int op, int op_flags, pgoff_t page_off, void *addr,
struct hib_bio_batch *hb)
 {
struct page *page = virt_to_page(addr);
@@ -260,7 +260,8 @@ static int hib_submit_io(int rw, pgoff_t page_off, void 
*addr,
bio = bio_alloc(__GFP_RECLAIM | __GFP_HIGH, 1);
bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
bio->bi_bdev = hib_resume_bdev;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
 
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
@@ -296,7 +297,8 @@ static int mark_swapfiles(struct swap_map_handle *handle, 
unsigned int flags)
 {
int error;
 
-   hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
+   hib_submit_io(REQ_OP_READ, READ_SYNC, swsusp_resume_block,
+ swsusp_header, NULL);
if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
!memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
@@ -305,8 +307,8 @@ static int mark_swapfiles(struct swap_map_handle *handle, 
unsigned int flags)
swsusp_header->flags = flags;
if (flags & SF_CRC32_MODE)
swsusp_header->crc32 = handle->crc32;
-   error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
-   swsusp_header, NULL);
+   error = hib_submit_io(REQ_OP_WRITE, WRITE_SYNC,
+ swsusp_resume_block, swsusp_header, NULL);
} else {
printk(KERN_ERR "PM: Swap header not found!\n");
error = -ENODEV;
@@ -379,7 +381,7 @@ static int write_page(void *buf, sector_t offset, struct 
hib_bio_batch *hb)
} else {
src = buf;
}
-   return hib_submit_io(WRITE_SYNC, offset, src, hb);
+   return hib_submit_io(REQ_OP_WRITE, WRITE_SYNC, offset, src, hb);
 }
 
 static void release_swap_writer(struct swap_map_handle *handle)
@@ -982,7 +984,8 @@ static int get_swap_reader(struct swap_map_handle *handle,
return -ENOMEM;
}
 
-   error = hib_submit_io(READ_SYNC, offset, tmp->map, NULL);
+   error = hib_submit_io(REQ_OP_READ, READ_SYNC, offset,
+ tmp->map, NULL);
if (error) {
release_swap_reader(handle);
return error;
@@ -1006,7 +1009,7 @@ static int swap_read_page(struct swap_map_handle *handle, 
void *buf,
offset = handle->cur->entries[handle->k];
if (!offset)
return -EFAULT;
-   error = hib_submit_io(READ_SYNC, offset, buf, hb);
+   error = hib_submit_io(REQ_OP_READ, READ_SYNC, offset, buf, hb);
if (error)
return error;
if (++handle->k >= MAP_PAGE_ENTRIES) {
@@ -1508,7 +1511,8 @@ int swsusp_check(void)
if (!IS_ERR(hib_resume_bdev)) {
set_blocksize(hib_resume_bdev, PAGE_SIZE);
clear_page(swsusp_header);
-   error = hib_submit_io(READ_SYNC, swsusp_resume_block,
+   error = hib_submit_io(REQ_OP_READ, READ_SYNC,
+   swsusp_resume_block,
swsusp_header, NULL);
if (error)
goto put;
@@ -1516,7 +1520,8 @@ int swsusp_check(void)
if (!memcmp(HIBERNATE_SIG, swsusp_header->sig, 10)) {
memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10);
/* Reset swap signature now */
-   error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
+   error = hib_submit_io(REQ_OP_WRITE, WRITE_SYNC,
+   swsusp_resume_block,
swsusp_header, NULL);
} else {
error = -EINVAL;
@@ -1560,10 +1565,12 @@ int swsusp_unmark(void)
 {
int error;
 
-   hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
+   hib_submit_io(REQ_OP_READ, READ_SYNC, swsusp_resume_block,
+ swsusp_header, NULL);
if (!memcmp(HIBERNATE_SIG

[PATCH 21/35] bcache: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has bcache set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index db68562..4c48783 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  |= REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)
moving_init(io);
bio = &io->bio.bio;
 
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_end_io  = read_moving_endio;
 
if (bio_alloc_pages(bio, GFP_KERNEL))
d

[PATCH 19/35] dm: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has dm set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

I did some basic dm tests, but I think this patch should
be considered compile tested only. I have not tested all the
dm targets and I did not stress every code path I have touched.

Signed-off-by: Mike Christie 
---
 drivers/md/dm-bufio.c   |  8 +++---
 drivers/md/dm-crypt.c   |  1 +
 drivers/md/dm-io.c  | 57 ++---
 drivers/md/dm-kcopyd.c  | 25 +-
 drivers/md/dm-log-writes.c  |  6 ++---
 drivers/md/dm-log.c |  5 ++--
 drivers/md/dm-raid1.c   | 11 +---
 drivers/md/dm-snap-persistent.c | 24 +
 drivers/md/dm-thin.c|  7 ++---
 drivers/md/dm.c |  1 +
 include/linux/dm-io.h   |  3 ++-
 11 files changed, 82 insertions(+), 66 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 1fd25bf..b6055f2 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -574,7 +574,8 @@ static void use_dmio(struct dm_buffer *b, int rw, sector_t 
block,
 {
int r;
struct dm_io_request io_req = {
-   .bi_rw = rw,
+   .bi_op = rw,
+   .bi_op_flags = 0,
.notify.fn = dmio_complete,
.notify.context = b,
.client = b->c->dm_io,
@@ -634,7 +635,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, 
sector_t block,
 * the dm_buffer's inline bio is local to bufio.
 */
b->bio.bi_private = end_io;
-   b->bio.bi_rw |= rw;
+   b->bio.bi_op = rw;
 
/*
 * We assume that if len >= PAGE_SIZE ptr is page-aligned.
@@ -1327,7 +1328,8 @@ EXPORT_SYMBOL_GPL(dm_bufio_write_dirty_buffers);
 int dm_bufio_issue_flush(struct dm_bufio_client *c)
 {
struct dm_io_request io_req = {
-   .bi_rw = WRITE_FLUSH,
+   .bi_op = REQ_OP_WRITE,
+   .bi_op_flags = WRITE_FLUSH,
.mem.type = DM_IO_KMEM,
.mem.ptr.addr = NULL,
.client = c->dm_io,
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3147c8d..f466fec 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1133,6 +1133,7 @@ static void clone_init(struct dm_crypt_io *io, struct bio 
*clone)
clone->bi_private = io;
clone->bi_end_io  = crypt_endio;
clone->bi_bdev= cc->dev->bdev;
+   clone->bi_op  = io->base_bio->bi_op;
clone->bi_rw  = io->base_bio->bi_rw;
 }
 
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 107d445..790b185 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -278,8 +278,9 @@ static void km_dp_init(struct dpages *dp, void *data)
 /*-
  * IO routines that accept a list of pages.
  *---*/
-static void do_region(int rw, unsigned region, struct dm_io_region *where,
- struct dpages *dp, struct io *io)
+static void do_region(int op, int op_flags, unsigned region,
+ struct dm_io_region *where, struct dpages *dp,
+ struct io *io)
 {
struct bio *bio;
struct page *page;
@@ -295,24 +296,25 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
/*
 * Reject unsupported discard and write same requests.
 */
-   if (rw & REQ_DISCARD)
+   if (op == REQ_OP_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
-   else if (rw & REQ_WRITE_SAME)
+   else if (op == REQ_OP_WRITE_SAME)
special_cmd_max_sectors = q->limits.max_write_same_sectors;
-   if ((rw & (REQ_DISCARD | REQ_WRITE_SAME)) && special_cmd_max_sectors == 
0) {
+   if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_SAME) &&
+   special_cmd_max_sectors == 0) {
dec_count(io, region, -EOPNOTSUPP);
return;
}
 
/*
-* where->count may be zero if rw holds a flush and we need to
+* where->count may be zero if op holds a flush and we need to
 * send a zero-sized flush.
 */
do {
/*
 * Allocate a suitably sized-bio.
 */
-   if ((rw & REQ_DISCARD) || (rw & REQ_WRITE_SAME))
+   if ((op == REQ_OP_DISCARD) || (op == REQ_OP_WRITE_SAME))
num_bvecs = 1;
else
num_bvecs = min_t(int, BIO_MAX_PAGES,
@@ -322,14 +324,15 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
bio->bi_iter.bi_sector = where->sector + (where->count - 
remaining);
bio->bi_bdev = where->bdev;
bio->bi_end_io = endio;
-   bio->bi_rw |= rw;
+   bio->bi_op = op

[PATCH 17/35] ocfs2: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has ocfs2 set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 fs/ocfs2/cluster/heartbeat.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 41039c2..38596aa 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -391,7 +391,8 @@ static void o2hb_bio_end_io(struct bio *bio)
 static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
  struct o2hb_bio_wait_ctxt *wc,
  unsigned int *current_slot,
- unsigned int max_slots, int rw)
+ unsigned int max_slots, int op,
+ int op_flags)
 {
int len, current_page;
unsigned int vec_len, vec_start;
@@ -417,7 +418,8 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region 
*reg,
bio->bi_bdev = reg->hr_bdev;
bio->bi_private = wc;
bio->bi_end_io = o2hb_bio_end_io;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
 
vec_start = (cs << bits) % PAGE_CACHE_SIZE;
while(cs < max_slots) {
@@ -454,7 +456,7 @@ static int o2hb_read_slots(struct o2hb_region *reg,
 
while(current_slot < max_slots) {
bio = o2hb_setup_one_bio(reg, &wc, ¤t_slot, max_slots,
-READ);
+REQ_OP_READ, 0);
if (IS_ERR(bio)) {
status = PTR_ERR(bio);
mlog_errno(status);
@@ -486,7 +488,8 @@ static int o2hb_issue_node_write(struct o2hb_region *reg,
 
slot = o2nm_this_node();
 
-   bio = o2hb_setup_one_bio(reg, write_wc, &slot, slot+1, WRITE_SYNC);
+   bio = o2hb_setup_one_bio(reg, write_wc, &slot, slot+1, REQ_OP_WRITE,
+WRITE_SYNC);
if (IS_ERR(bio)) {
status = PTR_ERR(bio);
mlog_errno(status);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 29/35] ide cd: do not set REQ_WRITE on requests.

2016-01-05 Thread mchristi
From: Mike Christie 

The block layer will set the correct READ/WRITE operation flags/fields
when creating a request, so there is not need for drivers to set the
REQ_WRITE flag.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/ide/ide-cd_ioctl.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c
index 474173e..5887a7a 100644
--- a/drivers/ide/ide-cd_ioctl.c
+++ b/drivers/ide/ide-cd_ioctl.c
@@ -459,9 +459,6 @@ int ide_cdrom_packet(struct cdrom_device_info *cdi,
   layer. the packet must be complete, as we do not
   touch it at all. */
 
-   if (cgc->data_direction == CGC_DATA_WRITE)
-   flags |= REQ_WRITE;
-
if (cgc->sense)
memset(cgc->sense, 0, sizeof(struct request_sense));
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/35] block: set op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch converts the request related block layer code to set
request->op to a REQ_OP and cmd_flags to rq_flag_bits.

There is some tmp compat code when setting up cmd_flags so it
still carries both the op and flags. It will be removed in
in later patches in this set when I have converted all drivers.

I have not been able to test the mq paths with real mq hardware.

Signed-off-by: Mike Christie 
---
 block/blk-core.c   | 60 ++
 block/blk-flush.c  |  1 +
 block/blk-merge.c  | 10 
 block/blk-mq.c | 38 -
 block/cfq-iosched.c| 53 +++-
 block/elevator.c   |  8 +++
 include/linux/blk-cgroup.h | 13 +-
 include/linux/blkdev.h | 28 +++---
 include/linux/elevator.h   |  4 ++--
 9 files changed, 120 insertions(+), 95 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 954a450..dacbd68 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -959,10 +959,10 @@ static void __freed_request(struct request_list *rl, int 
sync)
  * A request has just been released.  Account for it, update the full and
  * congestion status, wake up any waiters.   Called under q->queue_lock.
  */
-static void freed_request(struct request_list *rl, unsigned int flags)
+static void freed_request(struct request_list *rl, int op, unsigned int flags)
 {
struct request_queue *q = rl->q;
-   int sync = rw_is_sync(flags);
+   int sync = rw_is_sync(op, flags);
 
q->nr_rqs[sync]--;
rl->count[sync]--;
@@ -1054,7 +1054,8 @@ static struct io_context *rq_ioc(struct bio *bio)
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
- * @rw_flags: RW and SYNC flags
+ * @op: REQ_OP_READ/REQ_OP_WRITE
+ * @op_flags: rq_flag_bits
  * @bio: bio to allocate request for (can be %NULL)
  * @gfp_mask: allocation mask
  *
@@ -1065,21 +1066,22 @@ static struct io_context *rq_ioc(struct bio *bio)
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *__get_request(struct request_list *rl, int rw_flags,
-struct bio *bio, gfp_t gfp_mask)
+static struct request *__get_request(struct request_list *rl, int op,
+int op_flags, struct bio *bio,
+gfp_t gfp_mask)
 {
struct request_queue *q = rl->q;
struct request *rq;
struct elevator_type *et = q->elevator->type;
struct io_context *ioc = rq_ioc(bio);
struct io_cq *icq = NULL;
-   const bool is_sync = rw_is_sync(rw_flags) != 0;
+   const bool is_sync = rw_is_sync(op, op_flags) != 0;
int may_queue;
 
if (unlikely(blk_queue_dying(q)))
return ERR_PTR(-ENODEV);
 
-   may_queue = elv_may_queue(q, rw_flags);
+   may_queue = elv_may_queue(q, op, op_flags);
if (may_queue == ELV_MQUEUE_NO)
goto rq_starved;
 
@@ -1123,7 +1125,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
 
/*
 * Decide whether the new request will be managed by elevator.  If
-* so, mark @rw_flags and increment elvpriv.  Non-zero elvpriv will
+* so, mark @op_flags and increment elvpriv.  Non-zero elvpriv will
 * prevent the current elevator from being destroyed until the new
 * request is freed.  This guarantees icq's won't be destroyed and
 * makes creating new ones safe.
@@ -1132,14 +1134,14 @@ static struct request *__get_request(struct 
request_list *rl, int rw_flags,
 * it will be created after releasing queue_lock.
 */
if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) {
-   rw_flags |= REQ_ELVPRIV;
+   op_flags |= REQ_ELVPRIV;
q->nr_rqs_elvpriv++;
if (et->icq_cache && ioc)
icq = ioc_lookup_icq(ioc, q);
}
 
if (blk_queue_io_stat(q))
-   rw_flags |= REQ_IO_STAT;
+   op_flags |= REQ_IO_STAT;
spin_unlock_irq(q->queue_lock);
 
/* allocate and init request */
@@ -1149,10 +1151,12 @@ static struct request *__get_request(struct 
request_list *rl, int rw_flags,
 
blk_rq_init(q, rq);
blk_rq_set_rl(rq, rl);
-   rq->cmd_flags = rw_flags | REQ_ALLOCED;
+   /* tmp compat - allow users to check either one for the op */
+   rq->cmd_flags = op | op_flags | REQ_ALLOCED;
+   rq->op = op;
 
/* init elvpriv */
-   if (rw_flags & REQ_ELVPRIV) {
+   if (op_flags & REQ_ELVPRIV) {
if (unlikely(et->icq_cache && !icq)) {
if (ioc)
icq = ioc_create_icq(ioc, q, gfp_mask);
@@ -1178,7 +1182,7 @@ out:
if

[PATCH 20/35] dm: pass dm stats data dir instead of bi_rw

2016-01-05 Thread mchristi
From: Mike Christie 

It looks like dm stats cares about the data direction
(READ vs WRITE) and does not need the bio/request flags.
Commands like REQ_FLUSH, REQ_DISCARD and REQ_WRITE_SAME
are currently always set with REQ_WRITE, so the extra check for
REQ_DISCARD in dm_stats_account_io is not needed.

This patch has it use the bio and request data_dir helpers
instead of accessing the bi_rw/cmd_flags directly. This makes
the next patches that remove the operation from the cmd_flags
and bi_rw easier, because we will no longer have the REQ_WRITE
bit set for operations like discards.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/md/dm-stats.c | 6 +++---
 drivers/md/dm.c   | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 8289804..96b5c1b 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -518,7 +518,7 @@ static void dm_stat_for_entry(struct dm_stat *s, size_t 
entry,
  struct dm_stats_aux *stats_aux, bool end,
  unsigned long duration_jiffies)
 {
-   unsigned long idx = bi_rw & REQ_WRITE;
+   unsigned long idx = bi_rw;
struct dm_stat_shared *shared = &s->stat_shared[entry];
struct dm_stat_percpu *p;
 
@@ -645,8 +645,8 @@ void dm_stats_account_io(struct dm_stats *stats, unsigned 
long bi_rw,
last = raw_cpu_ptr(stats->last);
stats_aux->merged =
(bi_sector == (ACCESS_ONCE(last->last_sector) &&
-  ((bi_rw & (REQ_WRITE | REQ_DISCARD)) ==
-   (ACCESS_ONCE(last->last_rw) & 
(REQ_WRITE | REQ_DISCARD)))
+  ((bi_rw == WRITE) ==
+   (ACCESS_ONCE(last->last_rw) == WRITE))
   ));
ACCESS_ONCE(last->last_sector) = end_sector;
ACCESS_ONCE(last->last_rw) = bi_rw;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1777c9c..5dbdae7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -686,7 +686,7 @@ static void start_io_acct(struct dm_io *io)
atomic_inc_return(&md->pending[rw]));
 
if (unlikely(dm_stats_used(&md->stats)))
-   dm_stats_account_io(&md->stats, bio->bi_rw, 
bio->bi_iter.bi_sector,
+   dm_stats_account_io(&md->stats, bio_data_dir(bio), 
bio->bi_iter.bi_sector,
bio_sectors(bio), false, 0, &io->stats_aux);
 }
 
@@ -701,7 +701,7 @@ static void end_io_acct(struct dm_io *io)
generic_end_io_acct(rw, &dm_disk(md)->part0, io->start_time);
 
if (unlikely(dm_stats_used(&md->stats)))
-   dm_stats_account_io(&md->stats, bio->bi_rw, 
bio->bi_iter.bi_sector,
+   dm_stats_account_io(&md->stats, bio_data_dir(bio), 
bio->bi_iter.bi_sector,
bio_sectors(bio), true, duration, 
&io->stats_aux);
 
/*
@@ -1084,7 +1084,7 @@ static void rq_end_stats(struct mapped_device *md, struct 
request *orig)
if (unlikely(dm_stats_used(&md->stats))) {
struct dm_rq_target_io *tio = tio_from_request(orig);
tio->duration_jiffies = jiffies - tio->duration_jiffies;
-   dm_stats_account_io(&md->stats, orig->cmd_flags, 
blk_rq_pos(orig),
+   dm_stats_account_io(&md->stats, rq_data_dir(orig), 
blk_rq_pos(orig),
tio->n_sectors, true, tio->duration_jiffies,
&tio->stats_aux);
}
@@ -2017,7 +2017,7 @@ static void dm_start_request(struct mapped_device *md, 
struct request *orig)
struct dm_rq_target_io *tio = tio_from_request(orig);
tio->duration_jiffies = jiffies;
tio->n_sectors = blk_rq_sectors(orig);
-   dm_stats_account_io(&md->stats, orig->cmd_flags, 
blk_rq_pos(orig),
+   dm_stats_account_io(&md->stats, rq_data_dir(orig), 
blk_rq_pos(orig),
tio->n_sectors, false, 0, &tio->stats_aux);
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/35] drivers: set request op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has the block driver use the request->op for REQ_OP
operations and cmd_flags for rq_flag_bits.

I have only tested scsi and rbd.

Signed-off-by: Mike Christie 
---
 drivers/block/loop.c  |  6 +++---
 drivers/block/mtip32xx/mtip32xx.c |  2 +-
 drivers/block/nbd.c   |  2 +-
 drivers/block/rbd.c   |  2 +-
 drivers/block/skd_main.c  | 11 ---
 drivers/block/xen-blkfront.c  |  8 +---
 drivers/md/dm.c   |  2 +-
 drivers/mmc/card/block.c  |  7 +++
 drivers/mmc/card/queue.c  |  6 ++
 drivers/mmc/card/queue.h  |  5 -
 drivers/mtd/mtd_blkdevs.c |  2 +-
 drivers/nvme/host/pci.c   |  4 ++--
 drivers/scsi/sd.c | 25 -
 13 files changed, 44 insertions(+), 38 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 423f4ca..e771bab 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -538,7 +538,7 @@ static int do_req_filebacked(struct loop_device *lo, struct 
request *rq)
if (rq->cmd_flags & REQ_WRITE) {
if (rq->cmd_flags & REQ_FLUSH)
ret = lo_req_flush(lo, rq);
-   else if (rq->cmd_flags & REQ_DISCARD)
+   else if (rq->op == REQ_OP_DISCARD)
ret = lo_discard(lo, rq, pos);
else if (lo->transfer)
ret = lo_write_transfer(lo, rq, pos);
@@ -1653,8 +1653,8 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (lo->lo_state != Lo_bound)
return -EIO;
 
-   if (lo->use_dio && !(cmd->rq->cmd_flags & (REQ_FLUSH |
-   REQ_DISCARD)))
+   if (lo->use_dio && (!(cmd->rq->cmd_flags & REQ_FLUSH) ||
+cmd->rq->op == REQ_OP_DISCARD))
cmd->use_aio = true;
else
cmd->use_aio = false;
diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 618c24f..8751caa 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3668,7 +3668,7 @@ static int mtip_submit_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq)
return -ENXIO;
}
 
-   if (rq->cmd_flags & REQ_DISCARD) {
+   if (rq->op == REQ_OP_DISCARD) {
int err;
 
err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 93b3f99..8e8f7e3 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -242,7 +242,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct 
request *req)
 
if (req->cmd_type == REQ_TYPE_DRV_PRIV)
type = NBD_CMD_DISC;
-   else if (req->cmd_flags & REQ_DISCARD)
+   else if (req->op == REQ_OP_DISCARD)
type = NBD_CMD_TRIM;
else if (req->cmd_flags & REQ_FLUSH)
type = NBD_CMD_FLUSH;
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 81ea69f..ea326ef 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3373,7 +3373,7 @@ static void rbd_queue_workfn(struct work_struct *work)
goto err;
}
 
-   if (rq->cmd_flags & REQ_DISCARD)
+   if (rq->op == REQ_OP_DISCARD)
op_type = OBJ_OP_DISCARD;
else if (rq->cmd_flags & REQ_WRITE)
op_type = OBJ_OP_WRITE;
diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index 586f916..f89a0c8 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -576,7 +576,6 @@ static void skd_request_fn(struct request_queue *q)
struct request *req = NULL;
struct skd_scsi_request *scsi_req;
struct page *page;
-   unsigned long io_flags;
int error;
u32 lba;
u32 count;
@@ -624,12 +623,11 @@ static void skd_request_fn(struct request_queue *q)
lba = (u32)blk_rq_pos(req);
count = blk_rq_sectors(req);
data_dir = rq_data_dir(req);
-   io_flags = req->cmd_flags;
 
-   if (io_flags & REQ_FLUSH)
+   if (req->cmd_flags & REQ_FLUSH)
flush++;
 
-   if (io_flags & REQ_FUA)
+   if (req->cmd_flags & REQ_FUA)
fua++;
 
pr_debug("%s:%s:%d new req=%p lba=%u(0x%x) "
@@ -735,7 +733,7 @@ static void skd_request_fn(struct request_queue *q)
else
skreq->sg_data_dir = SKD_DATA_DIR_HOST_TO_CARD;
 
-   if (io_flags & REQ_DISCARD) {
+   if (req->op == REQ_OP_DISCARD) {
page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
if (!page) {
pr_err("request_fn:Page allocation failed.\n");
@@ -852,9 +850,8 @@ static void skd_end_request(struct skd_device *skdev,
   

[PATCH 25/35] target: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has the target modules set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/target/target_core_iblock.c | 38 ++---
 drivers/target/target_core_pscsi.c  |  2 +-
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index bfc3e45..b83195b 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -330,7 +330,8 @@ static void iblock_bio_done(struct bio *bio)
 }
 
 static struct bio *
-iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num, int rw)
+iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num, int op,
+  int op_flags)
 {
struct iblock_dev *ib_dev = IBLOCK_DEV(cmd->se_dev);
struct bio *bio;
@@ -352,7 +353,8 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 
sg_num, int rw)
bio->bi_private = cmd;
bio->bi_end_io = &iblock_bio_done;
bio->bi_iter.bi_sector = lba;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
 
return bio;
 }
@@ -458,7 +460,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
goto fail;
cmd->priv = ibr;
 
-   bio = iblock_get_bio(cmd, block_lba, 1, WRITE);
+   bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE, 0);
if (!bio)
goto fail_free_ibr;
 
@@ -471,7 +473,8 @@ iblock_execute_write_same(struct se_cmd *cmd)
while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
!= sg->length) {
 
-   bio = iblock_get_bio(cmd, block_lba, 1, WRITE);
+   bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE,
+0);
if (!bio)
goto fail_put_bios;
 
@@ -657,7 +660,8 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
u32 sg_num = sgl_nents;
sector_t block_lba;
unsigned bio_cnt;
-   int rw = 0;
+   int op_flags = 0;
+   int op = 0;
int i;
 
if (data_direction == DMA_TO_DEVICE) {
@@ -668,17 +672,20 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
 * is not enabled, or if initiator set the Force Unit Access 
bit.
 */
if (q->flush_flags & REQ_FUA) {
-   if (cmd->se_cmd_flags & SCF_FUA)
-   rw = WRITE_FUA;
-   else if (!(q->flush_flags & REQ_FLUSH))
-   rw = WRITE_FUA;
-   else
-   rw = WRITE;
+   if (cmd->se_cmd_flags & SCF_FUA) {
+   op = REQ_OP_WRITE;
+   op_flags = WRITE_FUA;
+   } else if (!(q->flush_flags & REQ_FLUSH)) {
+   op = REQ_OP_WRITE;
+   op_flags = WRITE_FUA;
+   } else {
+   op = REQ_OP_WRITE;
+   }
} else {
-   rw = WRITE;
+   op = REQ_OP_WRITE;
}
} else {
-   rw = READ;
+   op = REQ_OP_READ;
}
 
/*
@@ -710,7 +717,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
return 0;
}
 
-   bio = iblock_get_bio(cmd, block_lba, sgl_nents, rw);
+   bio = iblock_get_bio(cmd, block_lba, sgl_nents, op, op_flags);
if (!bio)
goto fail_free_ibr;
 
@@ -734,7 +741,8 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
bio_cnt = 0;
}
 
-   bio = iblock_get_bio(cmd, block_lba, sg_num, rw);
+   bio = iblock_get_bio(cmd, block_lba, sg_num, op,
+op_flags);
if (!bio)
goto fail_put_bios;
 
diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index de18790..2cf915c 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -922,7 +922,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
goto fail;
 
if (rw)
-   bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
 
pr_debug("PSCSI: Allocated bio: %p,"
" dir: %s nr_vecs: %d\n", bio,
-- 
1.8.3.1

--
To un

[PATCH 28/35] blktrace: get op from req->op/bio->bi_op

2016-01-05 Thread mchristi
From: Mike Christie 

The bio and request struct now store the operation in
bio->bi_op/request->op. This patch has blktrace not
check bi_rw/cmd_flags.

This patch is only compile tested.

Signed-off-by: Mike Christie 
---
 include/linux/blktrace_api.h  |  2 +-
 include/trace/events/bcache.h | 12 ++
 include/trace/events/block.h  | 31 +-
 kernel/trace/blktrace.c   | 52 +++
 4 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index afc1343..ee25ba4 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -109,7 +109,7 @@ static inline int blk_cmd_buf_len(struct request *rq)
 }
 
 extern void blk_dump_cmd(char *buf, struct request *rq);
-extern void blk_fill_rwbs(char *rwbs, u32 rw, int bytes);
+extern void blk_fill_rwbs(char *rwbs, int op, u32 rw, int bytes);
 
 #endif /* CONFIG_EVENT_TRACING && CONFIG_BLOCK */
 
diff --git a/include/trace/events/bcache.h b/include/trace/events/bcache.h
index 981acf7..8abe564 100644
--- a/include/trace/events/bcache.h
+++ b/include/trace/events/bcache.h
@@ -27,7 +27,8 @@ DECLARE_EVENT_CLASS(bcache_request,
__entry->sector = bio->bi_iter.bi_sector;
__entry->orig_sector= bio->bi_iter.bi_sector - 16;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_op, bio->bi_rw,
+ bio->bi_iter.bi_size);
),
 
TP_printk("%d,%d %s %llu + %u (from %d,%d @ %llu)",
@@ -101,7 +102,8 @@ DECLARE_EVENT_CLASS(bcache_bio,
__entry->dev= bio->bi_bdev->bd_dev;
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_op, bio->bi_rw,
+ bio->bi_iter.bi_size);
),
 
TP_printk("%d,%d  %s %llu + %u",
@@ -136,7 +138,8 @@ TRACE_EVENT(bcache_read,
__entry->dev= bio->bi_bdev->bd_dev;
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_op, bio->bi_rw,
+ bio->bi_iter.bi_size);
__entry->cache_hit = hit;
__entry->bypass = bypass;
),
@@ -167,7 +170,8 @@ TRACE_EVENT(bcache_write,
__entry->inode  = inode;
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_op, bio->bi_rw,
+ bio->bi_iter.bi_size);
__entry->writeback = writeback;
__entry->bypass = bypass;
),
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index e8a5eca..4416dcd 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -84,7 +84,8 @@ DECLARE_EVENT_CLASS(block_rq_with_error,
0 : blk_rq_sectors(rq);
__entry->errors= rq->errors;
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
+   blk_fill_rwbs(__entry->rwbs, rq->op, rq->cmd_flags,
+ blk_rq_bytes(rq));
blk_dump_cmd(__get_str(cmd), rq);
),
 
@@ -162,7 +163,7 @@ TRACE_EVENT(block_rq_complete,
__entry->nr_sector = nr_bytes >> 9;
__entry->errors= rq->errors;
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
+   blk_fill_rwbs(__entry->rwbs, rq->op, rq->cmd_flags, nr_bytes);
blk_dump_cmd(__get_str(cmd), rq);
),
 
@@ -198,7 +199,8 @@ DECLARE_EVENT_CLASS(block_rq,
__entry->bytes = (rq->cmd_type == REQ_TYPE_BLOCK_PC) ?
blk_rq_bytes(rq) : 0;
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
+   blk_fill_rwbs(__entry->rwbs, rq->op, rq->cmd_flags,
+ blk_rq_bytes(rq));
blk_dump_cmd(__get_str(cmd), rq);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),
@@ -272,7 +274,8 @@ TRACE_EVENT(block_bio_bounce,
  bio->bi_bdev->bd_dev : 0;
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector 

[PATCH 23/35] md/raid: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has md/raid set the bio bi_op to a REQ_OP, and
rq_flag_bits to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/md/bitmap.c  |  2 +-
 drivers/md/dm-raid.c |  5 +++--
 drivers/md/md.c  | 11 +++
 drivers/md/md.h  |  3 ++-
 drivers/md/raid1.c   | 34 
 drivers/md/raid10.c  | 50 ++--
 drivers/md/raid5-cache.c | 25 +++-
 drivers/md/raid5.c   | 48 ++
 8 files changed, 101 insertions(+), 77 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 13811fc..18458f2 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -160,7 +160,7 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
 
if (sync_page_io(rdev, target,
 roundup(size, 
bdev_logical_block_size(rdev->bdev)),
-page, READ, true)) {
+page, REQ_OP_READ, 0, true)) {
page->index = index;
return 0;
}
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index a090121..43a749c 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -792,7 +792,7 @@ static int read_disk_sb(struct md_rdev *rdev, int size)
if (rdev->sb_loaded)
return 0;
 
-   if (!sync_page_io(rdev, 0, size, rdev->sb_page, READ, 1)) {
+   if (!sync_page_io(rdev, 0, size, rdev->sb_page, REQ_OP_READ, 0, 1)) {
DMERR("Failed to read superblock of device at position %d",
  rdev->raid_disk);
md_error(rdev->mddev, rdev);
@@ -1646,7 +1646,8 @@ static void attempt_restore_of_faulty_devices(struct 
raid_set *rs)
for (i = 0; i < rs->md.raid_disks; i++) {
r = &rs->dev[i].rdev;
if (test_bit(Faulty, &r->flags) && r->sb_page &&
-   sync_page_io(r, 0, r->sb_size, r->sb_page, READ, 1)) {
+   sync_page_io(r, 0, r->sb_size, r->sb_page, REQ_OP_READ, 0,
+1)) {
DMINFO("Faulty %s device #%d has readable super block."
   "  Attempting to revive it.",
   rs->raid_type->name, i);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e25ef97..ee1ef20 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -401,6 +401,7 @@ static void submit_flushes(struct work_struct *ws)
bi->bi_end_io = md_end_flush;
bi->bi_private = rdev;
bi->bi_bdev = rdev->bdev;
+   bi->bi_op = REQ_OP_WRITE;
bi->bi_rw |= WRITE_FLUSH;
atomic_inc(&mddev->flush_pending);
submit_bio(bi);
@@ -747,6 +748,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev 
*rdev,
bio_add_page(bio, page, size, 0);
bio->bi_private = rdev;
bio->bi_end_io = super_written;
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= WRITE_FLUSH_FUA;
 
atomic_inc(&mddev->pending_writes);
@@ -760,14 +762,15 @@ void md_super_wait(struct mddev *mddev)
 }
 
 int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
-struct page *page, int rw, bool metadata_op)
+struct page *page, int op, int op_flags, bool metadata_op)
 {
struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, rdev->mddev);
int ret;
 
bio->bi_bdev = (metadata_op && rdev->meta_bdev) ?
rdev->meta_bdev : rdev->bdev;
-   bio->bi_rw |= rw;
+   bio->bi_op = op;
+   bio->bi_rw |= op_flags;
if (metadata_op)
bio->bi_iter.bi_sector = sector + rdev->sb_start;
else if (rdev->mddev->reshape_position != MaxSector &&
@@ -793,7 +796,7 @@ static int read_disk_sb(struct md_rdev *rdev, int size)
if (rdev->sb_loaded)
return 0;
 
-   if (!sync_page_io(rdev, 0, size, rdev->sb_page, READ, true))
+   if (!sync_page_io(rdev, 0, size, rdev->sb_page, REQ_OP_READ, 0, true))
goto fail;
rdev->sb_loaded = 1;
return 0;
@@ -1479,7 +1482,7 @@ static int super_1_load(struct md_rdev *rdev, struct 
md_rdev *refdev, int minor_
return -EINVAL;
bb_sector = (long long)offset;
if (!sync_page_io(rdev, bb_sector, sectors << 9,
- rdev->bb_page, READ, true))
+ rdev->bb_page, REQ_OP_READ, 0, true))
return -EIO;
bbp = (u64 *)page_address(rdev->bb_page);
rdev->badblocks.shift = sb->bblog_shift;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5eb0475..d542059 100644
--- a/drivers/md/md.h
+++

[PATCH 30/35] block, fs, drivers: do not test bi_rw for REQ_OPs

2016-01-05 Thread mchristi
From: Mike Christie 

We no longer use the bio->bi_rw field for REQ_OPs: REQ_WRITE,
REQ_DISCARD, REQ_WRITE_SAME, so this patch stops checking
for them in bi_rw and also removes the related compat code.

Signed-off-by: Mike Christie 
---
 block/bio.c |  6 +++---
 block/blk-core.c| 31 -
 block/blk-merge.c   | 14 ++---
 block/blk-mq.c  |  3 +--
 drivers/ata/libata-scsi.c   |  2 +-
 drivers/block/brd.c |  2 +-
 drivers/block/drbd/drbd_main.c  | 14 ++---
 drivers/block/drbd/drbd_worker.c|  4 ++--
 drivers/block/loop.c|  6 +++---
 drivers/block/rbd.c |  2 +-
 drivers/block/rsxx/dma.c|  2 +-
 drivers/block/umem.c|  2 +-
 drivers/block/zram/zram_drv.c   |  2 +-
 drivers/ide/ide-floppy.c|  2 +-
 drivers/lightnvm/rrpc.c |  2 +-
 drivers/md/bcache/request.c | 10 +-
 drivers/md/dm-cache-target.c|  9 +
 drivers/md/dm-crypt.c   |  2 +-
 drivers/md/dm-log-writes.c  |  2 +-
 drivers/md/dm-raid1.c   |  8 
 drivers/md/dm-region-hash.c |  4 ++--
 drivers/md/dm-stripe.c  |  4 ++--
 drivers/md/dm-thin.c| 15 --
 drivers/md/dm.c |  6 +++---
 drivers/md/linear.c |  2 +-
 drivers/md/raid0.c  |  2 +-
 drivers/scsi/osd/osd_initiator.c|  4 ++--
 drivers/staging/lustre/lustre/llite/lloop.c |  8 
 include/linux/bio.h | 15 +-
 include/linux/fs.h  | 25 ---
 30 files changed, 97 insertions(+), 113 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3b8e970..ca0c52d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -669,10 +669,10 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
 
-   if (bio->bi_rw & REQ_DISCARD)
+   if (bio->bi_op == REQ_OP_DISCARD)
goto integrity_clone;
 
-   if (bio->bi_rw & REQ_WRITE_SAME) {
+   if (bio->bi_op == REQ_OP_WRITE_SAME) {
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
goto integrity_clone;
}
@@ -1792,7 +1792,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
 * Discards need a mutable bio_vec to accommodate the payload
 * required by the DSM TRIM and UNMAP commands.
 */
-   if (bio->bi_rw & REQ_DISCARD)
+   if (bio->bi_op == REQ_OP_DISCARD)
split = bio_clone_bioset(bio, gfp, bs);
else
split = bio_clone_fast(bio, gfp, bs);
diff --git a/block/blk-core.c b/block/blk-core.c
index dacbd68..0f6cb5c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1704,8 +1704,7 @@ void init_request_from_bio(struct request *req, struct 
bio *bio)
 {
req->cmd_type = REQ_TYPE_FS;
 
-   /* tmp compat. Allow users to set bi_op or bi_rw */
-   req->cmd_flags |= (bio->bi_rw | bio->bi_op) & REQ_COMMON_MASK;
+   req->cmd_flags |= bio->bi_rw & REQ_COMMON_MASK;
if (bio->bi_rw & REQ_RAHEAD)
req->cmd_flags |= REQ_FAILFAST_MASK;
 
@@ -1855,9 +1854,9 @@ static void handle_bad_sector(struct bio *bio)
char b[BDEVNAME_SIZE];
 
printk(KERN_INFO "attempt to access beyond end of device\n");
-   printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
+   printk(KERN_INFO "%s: rw=%d,%ld, want=%Lu, limit=%Lu\n",
bdevname(bio->bi_bdev, b),
-   bio->bi_rw,
+   bio->bi_op, bio->bi_rw,
(unsigned long long)bio_end_sector(bio),
(long long)(i_size_read(bio->bi_bdev->bd_inode) >> 9));
 }
@@ -1978,14 +1977,14 @@ generic_make_request_checks(struct bio *bio)
}
}
 
-   if ((bio->bi_rw & REQ_DISCARD) &&
+   if ((bio->bi_op == REQ_OP_DISCARD) &&
(!blk_queue_discard(q) ||
 ((bio->bi_rw & REQ_SECURE) && !blk_queue_secdiscard(q {
err = -EOPNOTSUPP;
goto end_io;
}
 
-   if (bio->bi_rw & REQ_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) {
+   if (bio->bi_op == REQ_OP_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) {
err = -EOPNOTSUPP;
goto end_io;
}
@@ -2039,12 +2038,6 @@ blk_qc_t generic_make_request(struct bio *bio)
struct bio_list bio_list_on_stack;
blk_qc_t ret = BLK_QC_T_NONE;
 
-   /* tmp compat. Allow users to

[PATCH 24/35] xen: set bi_op to REQ_OP

2016-01-05 Thread mchristi
From: Mike Christie 

This patch has xen set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/block/xen-blkback/blkback.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index c7e89af..c7d9643 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -488,7 +488,7 @@ static int xen_vbd_translate(struct phys_req *req, struct 
xen_blkif *blkif,
struct xen_vbd *vbd = &blkif->vbd;
int rc = -EACCES;
 
-   if ((operation != READ) && vbd->readonly)
+   if ((operation != REQ_OP_READ) && vbd->readonly)
goto out;
 
if (likely(req->nr_sects)) {
@@ -995,7 +995,7 @@ static int dispatch_discard_io(struct xen_blkif *blkif,
preq.sector_number = req->u.discard.sector_number;
preq.nr_sects  = req->u.discard.nr_sectors;
 
-   err = xen_vbd_translate(&preq, blkif, WRITE);
+   err = xen_vbd_translate(&preq, blkif, REQ_OP_WRITE);
if (err) {
pr_warn("access denied: DISCARD [%llu->%llu] on dev=%04x\n",
preq.sector_number,
@@ -1208,6 +1208,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
struct bio **biolist = pending_req->biolist;
int i, nbio = 0;
int operation;
+   int operation_flags = 0;
struct blk_plug plug;
bool drain = false;
struct grant_page **pages = pending_req->segments;
@@ -1226,17 +1227,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
switch (req_operation) {
case BLKIF_OP_READ:
blkif->st_rd_req++;
-   operation = READ;
+   operation = REQ_OP_READ;
break;
case BLKIF_OP_WRITE:
blkif->st_wr_req++;
-   operation = WRITE_ODIRECT;
+   operation = REQ_OP_WRITE;
+   operation_flags = WRITE_ODIRECT;
break;
case BLKIF_OP_WRITE_BARRIER:
drain = true;
case BLKIF_OP_FLUSH_DISKCACHE:
blkif->st_f_req++;
-   operation = WRITE_FLUSH;
+   operation = REQ_OP_WRITE;
+   operation_flags = WRITE_FLUSH;
break;
default:
operation = 0; /* make gcc happy */
@@ -1248,7 +1251,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
nseg = req->operation == BLKIF_OP_INDIRECT ?
   req->u.indirect.nr_segments : req->u.rw.nr_segments;
 
-   if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
+   if (unlikely(nseg == 0 && operation_flags != WRITE_FLUSH) ||
unlikely((req->operation != BLKIF_OP_INDIRECT) &&
 (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) ||
unlikely((req->operation == BLKIF_OP_INDIRECT) &&
@@ -1289,7 +1292,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
if (xen_vbd_translate(&preq, blkif, operation) != 0) {
pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n",
-operation == READ ? "read" : "write",
+operation == REQ_OP_READ ? "read" : "write",
 preq.sector_number,
 preq.sector_number + preq.nr_sects,
 blkif->vbd.pdevice);
@@ -1348,7 +1351,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
bio->bi_private = pending_req;
bio->bi_end_io  = end_block_io_op;
bio->bi_iter.bi_sector  = preq.sector_number;
-   bio->bi_rw  |= operation;
+   bio->bi_op  = operation;
+   bio->bi_rw  |= operation_flags;
}
 
preq.sector_number += seg[i].nsec;
@@ -1356,7 +1360,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
/* This will be hit if the operation was a flush or discard. */
if (!bio) {
-   BUG_ON(operation != WRITE_FLUSH);
+   BUG_ON(operation_flags != WRITE_FLUSH);
 
bio = bio_alloc(GFP_KERNEL, 0);
if (unlikely(bio == NULL))
@@ -1366,7 +1370,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
bio->bi_bdev= preq.bdev;
bio->bi_private = pending_req;
bio->bi_end_io  = end_block_io_op;
-   bio->bi_rw  |= operation;
+   bio->bi_op  = operation;
+   bio->bi_rw  |= operation_flags;
}
 
atomic_set(&pending_req->pendcnt, nbio);
@@ -1378,9 +1383,9 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
/* Let the I/Os go.. */
blk_finish_plug(&plug);
 
-   if (operation == READ)
+   if (operation 

[PATCH 31/35] block, fs: remove old REQ definitions.

2016-01-05 Thread mchristi
From: Mike Christie 

We no longer use REQ_WRITE. REQ_WRITE_SAME and REQ_DISCARD,
so this patch removes them.

Signed-off-by: Mike Christie 
---
 include/linux/blk_types.h   | 19 +--
 include/linux/fs.h  | 21 +++--
 include/trace/events/f2fs.h |  1 -
 3 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 6e49c91..bb30c2b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -151,7 +151,6 @@ struct bio {
  */
 enum rq_flag_bits {
/* common flags */
-   __REQ_WRITE,/* not set, read. set, write */
__REQ_FAILFAST_DEV, /* no driver retries of device errors */
__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
__REQ_FAILFAST_DRIVER,  /* no driver retries of driver errors */
@@ -159,9 +158,7 @@ enum rq_flag_bits {
__REQ_SYNC, /* request is sync (sync write or read) */
__REQ_META, /* metadata io request */
__REQ_PRIO, /* boost priority in cfq */
-   __REQ_DISCARD,  /* request to discard sectors */
__REQ_SECURE,   /* secure discard (used with __REQ_DISCARD) */
-   __REQ_WRITE_SAME,   /* write same block many times */
 
__REQ_NOIDLE,   /* don't anticipate more IO after this one */
__REQ_INTEGRITY,/* I/O includes block integrity payload */
@@ -197,28 +194,22 @@ enum rq_flag_bits {
__REQ_NR_BITS,  /* stops here */
 };
 
-#define REQ_WRITE  (1ULL << __REQ_WRITE)
 #define REQ_FAILFAST_DEV   (1ULL << __REQ_FAILFAST_DEV)
 #define REQ_FAILFAST_TRANSPORT (1ULL << __REQ_FAILFAST_TRANSPORT)
 #define REQ_FAILFAST_DRIVER(1ULL << __REQ_FAILFAST_DRIVER)
 #define REQ_SYNC   (1ULL << __REQ_SYNC)
 #define REQ_META   (1ULL << __REQ_META)
 #define REQ_PRIO   (1ULL << __REQ_PRIO)
-#define REQ_DISCARD(1ULL << __REQ_DISCARD)
-#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME)
 #define REQ_NOIDLE (1ULL << __REQ_NOIDLE)
 #define REQ_INTEGRITY  (1ULL << __REQ_INTEGRITY)
 
 #define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
 #define REQ_COMMON_MASK \
-   (REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
-REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
-REQ_SECURE | REQ_INTEGRITY)
+   (REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | REQ_NOIDLE | \
+REQ_FLUSH | REQ_FUA | REQ_SECURE | REQ_INTEGRITY)
 #define REQ_CLONE_MASK REQ_COMMON_MASK
 
-#define BIO_NO_ADVANCE_ITER_MASK   (REQ_DISCARD|REQ_WRITE_SAME)
-
 /* This mask is used for both bio and request merge checking */
 #define REQ_NOMERGE_FLAGS \
(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | 
REQ_FLUSH_SEQ)
@@ -250,9 +241,9 @@ enum rq_flag_bits {
 
 enum req_op {
REQ_OP_READ,
-   REQ_OP_WRITE= REQ_WRITE,
-   REQ_OP_DISCARD  = REQ_DISCARD,
-   REQ_OP_WRITE_SAME   = REQ_WRITE_SAME,
+   REQ_OP_WRITE,
+   REQ_OP_DISCARD, /* request to discard sectors */
+   REQ_OP_WRITE_SAME,  /* write same block many times */
 };
 
 typedef unsigned int blk_qc_t;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 85007cd..d57a5b5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -151,9 +151,10 @@ typedef void (dax_iodone_t)(struct buffer_head *bh_map, 
int uptodate);
 #define CHECK_IOVEC_ONLY -1
 
 /*
- * The below are the various read and write types that we support. Some of
+ * The below are the various read and write flags that we support. Some of
  * them include behavioral modifiers that send information down to the
- * block layer and IO scheduler. Terminology:
+ * block layer and IO scheduler. They should be used along with a req_op.
+ * Terminology:
  *
  * The block layer uses device plugging to defer IO a little bit, in
  * the hope that we will see more IO very shortly. This increases
@@ -192,19 +193,19 @@ typedef void (dax_iodone_t)(struct buffer_head *bh_map, 
int uptodate);
  * non-volatile media on completion.
  *
  */
-#define RW_MASKREQ_WRITE
+#define RW_MASKREQ_OP_WRITE
 #define RWA_MASK   REQ_RAHEAD
 
-#define READ   0
+#define READ   REQ_OP_READ
 #define WRITE  RW_MASK
 #define READA  RWA_MASK
 
-#define READ_SYNC  (READ | REQ_SYNC)
-#define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE)
-#define WRITE_ODIRECT  (WRITE | REQ_SYNC)
-#define WRITE_FLUSH(WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH)
-#define WRITE_FUA  (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FUA)
-#define WRITE_FLUSH_FUA(WRITE | REQ_SYNC

[PATCH 34/35] block: add QUEUE_FLAGs for flush and fua

2016-01-05 Thread mchristi
From: Mike Christie 

The last patch added a REQ_OP_FLUSH for request_fn drivers
and the next patch renames REQ_FLUSH to REQ_PREFLUSH which
will be used by file systems and make_request_fn drivers.

This leaves REQ_FLUSH/REQ_FUA defined for drivers to tell
the block layer if flush/fua is supported. The names are
confusing and I bet will will accidentally be used by
people to request flushes. To avoid that, this patch adds
QUEUE_FLAGs for flush and fua which drivers will use to
indicate what they support.

Signed-off-by: Mike Christie 
---
 block/blk-core.c|  3 +-
 block/blk-flush.c   | 12 
 block/blk-settings.c| 20 --
 drivers/block/drbd/drbd_main.c  |  3 +-
 drivers/block/loop.c|  2 +-
 drivers/block/mtip32xx/mtip32xx.c   |  3 +-
 drivers/block/nbd.c |  6 ++--
 drivers/block/osdblk.c  |  2 +-
 drivers/block/ps3disk.c |  2 +-
 drivers/block/skd_main.c|  3 +-
 drivers/block/virtio_blk.c  |  4 +--
 drivers/block/xen-blkback/xenbus.c  |  2 +-
 drivers/block/xen-blkfront.c| 55 ++---
 drivers/ide/ide-disk.c  |  6 ++--
 drivers/md/bcache/super.c   |  4 +--
 drivers/md/dm-table.c   | 32 +
 drivers/md/md.c |  3 +-
 drivers/md/raid5-cache.c|  3 +-
 drivers/mmc/card/block.c|  3 +-
 drivers/mtd/mtd_blkdevs.c   |  2 +-
 drivers/nvme/host/core.c|  6 ++--
 drivers/scsi/sd.c   | 13 +
 drivers/target/target_core_iblock.c |  6 ++--
 include/linux/blkdev.h  |  6 ++--
 24 files changed, 107 insertions(+), 94 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ae2afab..bb29230 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1969,7 +1969,8 @@ generic_make_request_checks(struct bio *bio)
 * drivers without flush support don't have to worry
 * about them.
 */
-   if ((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && !q->flush_flags) {
+   if ((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) &&
+   !(blk_queue_flush(q) || blk_queue_fua(q))) {
bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
if (!nr_sectors) {
err = 0;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 0e5561e..633f9b3 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -95,17 +95,18 @@ enum {
 static bool blk_kick_flush(struct request_queue *q,
   struct blk_flush_queue *fq);
 
-static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
+static unsigned int blk_flush_policy(struct request *rq)
 {
+   struct request_queue *q = rq->q;
unsigned int policy = 0;
 
if (blk_rq_sectors(rq))
policy |= REQ_FSEQ_DATA;
 
-   if (fflags & REQ_FLUSH) {
+   if (blk_queue_flush(q)) {
if (rq->cmd_flags & REQ_FLUSH)
policy |= REQ_FSEQ_PREFLUSH;
-   if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
+   if (!blk_queue_fua(q) && (rq->cmd_flags & REQ_FUA))
policy |= REQ_FSEQ_POSTFLUSH;
}
return policy;
@@ -385,8 +386,7 @@ static void mq_flush_data_end_io(struct request *rq, int 
error)
 void blk_insert_flush(struct request *rq)
 {
struct request_queue *q = rq->q;
-   unsigned int fflags = q->flush_flags;   /* may change, cache */
-   unsigned int policy = blk_flush_policy(fflags, rq);
+   unsigned int policy = blk_flush_policy(rq);
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
 
/*
@@ -394,7 +394,7 @@ void blk_insert_flush(struct request *rq)
 * REQ_FLUSH and FUA for the driver.
 */
rq->cmd_flags &= ~REQ_FLUSH;
-   if (!(fflags & REQ_FUA))
+   if (!blk_queue_fua(q))
rq->cmd_flags &= ~REQ_FUA;
 
/*
diff --git a/block/blk-settings.c b/block/blk-settings.c
index dd49735..3cef016 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -820,26 +820,6 @@ void blk_queue_update_dma_alignment(struct request_queue 
*q, int mask)
 }
 EXPORT_SYMBOL(blk_queue_update_dma_alignment);
 
-/**
- * blk_queue_flush - configure queue's cache flush capability
- * @q: the request queue for the device
- * @flush: 0, REQ_FLUSH or REQ_FLUSH | REQ_FUA
- *
- * Tell block layer cache flush capability of @q.  If it supports
- * flushing, REQ_FLUSH should be set.  If it supports bypassing
- * write cache for individual writes, REQ_FUA should be set.
- */
-void blk_queue_flush(struct request_queue *q, unsigned int flush)
-{
-   WARN_ON_ONCE(flush & ~(REQ_FLUSH | REQ_FUA));
-
-   if (WARN_ON_ONCE(!(flush & REQ_FLUSH) && (flush & REQ_FUA)))
-   flush &= ~REQ_FUA;
-
-   q->flush_flags = flush & (REQ_FLUSH | REQ_FUA);
-}
-EXPORT_SYMBOL_GPL(

[PATCH 33/35] block, drivers: add REQ_OP_FLUSH operation

2016-01-05 Thread mchristi
From: Mike Christie 

This adds a REQ_OP_FLUSH operation that is sent to request_fn
based drivers by the block layer's flush code, instead of
sending requests with the request->cmd_flags REQ_FLUSH bit set.

For the following 3 flush related patches, I have not tested
every driver. I have only tested scsi with xfs and btrfs.

Signed-off-by: Mike Christie 
---
 Documentation/block/writeback_cache_control.txt | 6 +++---
 block/blk-flush.c   | 6 +++---
 drivers/block/loop.c| 4 ++--
 drivers/block/nbd.c | 2 +-
 drivers/block/osdblk.c  | 2 +-
 drivers/block/ps3disk.c | 4 ++--
 drivers/block/skd_main.c| 2 +-
 drivers/block/virtio_blk.c  | 2 +-
 drivers/block/xen-blkfront.c| 8 
 drivers/ide/ide-disk.c  | 2 +-
 drivers/md/dm.c | 4 ++--
 drivers/mmc/card/block.c| 5 ++---
 drivers/mmc/card/queue.h| 2 +-
 drivers/mtd/mtd_blkdevs.c   | 2 +-
 drivers/nvme/host/pci.c | 2 +-
 drivers/scsi/sd.c   | 7 +++
 include/linux/blk_types.h   | 1 +
 include/linux/blkdev.h  | 3 +++
 kernel/trace/blktrace.c | 5 -
 19 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/Documentation/block/writeback_cache_control.txt 
b/Documentation/block/writeback_cache_control.txt
index 83407d3..ea5550f 100644
--- a/Documentation/block/writeback_cache_control.txt
+++ b/Documentation/block/writeback_cache_control.txt
@@ -73,9 +73,9 @@ doing:
 
blk_queue_flush(sdkp->disk->queue, REQ_FLUSH);
 
-and handle empty REQ_FLUSH requests in its prep_fn/request_fn.  Note that
+and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
 REQ_FLUSH requests with a payload are automatically turned into a sequence
-of an empty REQ_FLUSH request followed by the actual write by the block
+of an empty REQ_OP_FLUSH request followed by the actual write by the block
 layer.  For devices that also support the FUA bit the block layer needs
 to be told to pass through the REQ_FUA bit using:
 
@@ -83,4 +83,4 @@ to be told to pass through the REQ_FUA bit using:
 
 and the driver must handle write requests that have the REQ_FUA bit set
 in prep_fn/request_fn.  If the FUA bit is not natively supported the block
-layer turns it into an empty REQ_FLUSH request after the actual write.
+layer turns it into an empty REQ_OP_FLUSH request after the actual write.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index b4eb0e8..0e5561e 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -29,7 +29,7 @@
  * The actual execution of flush is double buffered.  Whenever a request
  * needs to execute PRE or POSTFLUSH, it queues at
  * fq->flush_queue[fq->flush_pending_idx].  Once certain criteria are met, a
- * flush is issued and the pending_idx is toggled.  When the flush
+ * REQ_OP_FLUSH is issued and the pending_idx is toggled.  When the flush
  * completes, all the requests which were pending are proceeded to the next
  * step.  This allows arbitrary merging of different types of FLUSH/FUA
  * requests.
@@ -329,8 +329,8 @@ static bool blk_kick_flush(struct request_queue *q, struct 
blk_flush_queue *fq)
}
 
flush_rq->cmd_type = REQ_TYPE_FS;
-   flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
-   flush_rq->op = REQ_OP_WRITE;
+   flush_rq->cmd_flags = REQ_SYNC | REQ_NOIDLE | REQ_FLUSH_SEQ;
+   flush_rq->op = REQ_OP_FLUSH;
flush_rq->rq_disk = first_rq->rq_disk;
flush_rq->end_io = flush_end_io;
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 1afc03c..a3d1293 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -536,7 +536,7 @@ static int do_req_filebacked(struct loop_device *lo, struct 
request *rq)
pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
 
if (op_is_write(rq->op)) {
-   if (rq->cmd_flags & REQ_FLUSH)
+   if (rq->op == REQ_OP_FLUSH)
ret = lo_req_flush(lo, rq);
else if (rq->op == REQ_OP_DISCARD)
ret = lo_discard(lo, rq, pos);
@@ -1653,7 +1653,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (lo->lo_state != Lo_bound)
return -EIO;
 
-   if (lo->use_dio && (!(cmd->rq->cmd_flags & REQ_FLUSH) ||
+   if (lo->use_dio && (cmd->rq->op != REQ_OP_FLUSH ||
 cmd->rq->op == REQ_OP_DISCARD))
cmd->use_aio = true;
else
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 8e8f7e3..ced3382 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -244,7 +244,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct 
request *req)
   

[PATCH 35/35] block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH

2016-01-05 Thread mchristi
From: Mike Christie 

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie 
---
 Documentation/block/writeback_cache_control.txt | 31 +
 block/blk-core.c| 12 +-
 block/blk-flush.c   |  4 ++--
 block/blk-mq.c  |  4 ++--
 drivers/block/drbd/drbd_actlog.c|  2 +-
 drivers/block/drbd/drbd_main.c  |  2 +-
 drivers/block/drbd/drbd_receiver.c  |  2 +-
 drivers/block/drbd/drbd_req.c   |  2 +-
 drivers/md/bcache/journal.c |  2 +-
 drivers/md/bcache/request.c |  8 +++
 drivers/md/dm-cache-target.c|  9 +++
 drivers/md/dm-crypt.c   |  7 +++---
 drivers/md/dm-era-target.c  |  4 ++--
 drivers/md/dm-io.c  |  2 +-
 drivers/md/dm-log-writes.c  |  2 +-
 drivers/md/dm-raid1.c   |  5 ++--
 drivers/md/dm-region-hash.c |  4 ++--
 drivers/md/dm-snap.c|  6 ++---
 drivers/md/dm-stripe.c  |  2 +-
 drivers/md/dm-thin.c|  8 +++
 drivers/md/dm.c | 12 +-
 drivers/md/linear.c |  2 +-
 drivers/md/md.c |  2 +-
 drivers/md/multipath.c  |  2 +-
 drivers/md/raid0.c  |  2 +-
 drivers/md/raid1.c  |  3 ++-
 drivers/md/raid10.c |  2 +-
 drivers/md/raid5-cache.c|  2 +-
 drivers/md/raid5.c  |  2 +-
 fs/btrfs/check-integrity.c  |  8 +++
 fs/jbd2/journal.c   |  2 +-
 fs/xfs/xfs_buf.c|  2 +-
 include/linux/blk_types.h   |  8 +++
 include/linux/fs.h  |  4 ++--
 include/trace/events/f2fs.h |  2 +-
 kernel/trace/blktrace.c |  5 ++--
 36 files changed, 92 insertions(+), 86 deletions(-)

diff --git a/Documentation/block/writeback_cache_control.txt 
b/Documentation/block/writeback_cache_control.txt
index ea5550f..9869f18 100644
--- a/Documentation/block/writeback_cache_control.txt
+++ b/Documentation/block/writeback_cache_control.txt
@@ -20,11 +20,11 @@ a forced cache flush, and the Force Unit Access (FUA) flag 
for requests.
 Explicit cache flushes
 --
 
-The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
+The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
 the filesystem and will make sure the volatile cache of the storage device
 has been flushed before the actual I/O operation is started.  This explicitly
 guarantees that previously completed write requests are on non-volatile
-storage before the flagged bio starts. In addition the REQ_FLUSH flag can be
+storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
 set on an otherwise empty bio structure, which causes only an explicit cache
 flush without any dependent I/O.  It is recommend to use
 the blkdev_issue_flush() helper for a pure cache flush.
@@ -41,21 +41,21 @@ signaled after the data has been committed to non-volatile 
storage.
 Implementation details for filesystems
 --
 
-Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to
+Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
 worry if the underlying devices need any explicit cache flushing and how
-the Forced Unit Access is implemented.  The REQ_FLUSH and REQ_FUA flags
+the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
 may both be set on a single bio.
 
 
 Implementation details for make_request_fn based block drivers
 --
 
-These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit
+These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
 directly below the submit_bio interface.  For remapping drivers the REQ_FUA
 bits need to be propagated to underlying devices, and a global flush needs
-to be implemented for bios with the REQ_FLUSH bit set.  For real device
-drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits
-on non-empty bios can simply be ignored, and REQ_FLUSH requests without
+to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
+drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bit

[PATCH 32/35] block: shrink bi_rw and bi_op

2016-01-05 Thread mchristi
From: Mike Christie 

There is no need for bi_op/op and bi_rw to be so large
now, so this patch shrinks them.

Signed-off-by: Mike Christie 
---
 block/blk-core.c   |  2 +-
 drivers/md/dm-flakey.c |  2 +-
 drivers/md/raid5.c | 13 +++--
 fs/btrfs/check-integrity.c |  4 ++--
 fs/btrfs/inode.c   |  2 +-
 include/linux/bio.h| 13 ++---
 include/linux/blk_types.h  | 11 +++
 include/linux/blkdev.h |  2 +-
 8 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0f6cb5c..ae2afab 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1854,7 +1854,7 @@ static void handle_bad_sector(struct bio *bio)
char b[BDEVNAME_SIZE];
 
printk(KERN_INFO "attempt to access beyond end of device\n");
-   printk(KERN_INFO "%s: rw=%d,%ld, want=%Lu, limit=%Lu\n",
+   printk(KERN_INFO "%s: rw=%d,%u, want=%Lu, limit=%Lu\n",
bdevname(bio->bi_bdev, b),
bio->bi_op, bio->bi_rw,
(unsigned long long)bio_end_sector(bio),
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 09e2afc..b831226 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -266,7 +266,7 @@ static void corrupt_bio_data(struct bio *bio, struct 
flakey_c *fc)
data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value;
 
DMDEBUG("Corrupting data bio=%p by writing %u to byte %u "
-   "(rw=%c bi_rw=%lu bi_sector=%llu cur_bytes=%u)\n",
+   "(rw=%c bi_rw=%u bi_sector=%llu cur_bytes=%u)\n",
bio, fc->corrupt_bio_value, fc->corrupt_bio_byte,
(bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_rw,
(unsigned long long)bio->bi_iter.bi_sector, bio_bytes);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index fa4fe95..aafd49e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1015,9 +1015,9 @@ again:
: raid5_end_read_request;
bi->bi_private = sh;
 
-   pr_debug("%s: for %llu schedule op %ld on disc %d\n",
+   pr_debug("%s: for %llu schedule op %d,%u on disc %d\n",
__func__, (unsigned long long)sh->sector,
-   bi->bi_rw, i);
+   bi->bi_op, bi->bi_rw, i);
atomic_inc(&sh->count);
if (sh != head_sh)
atomic_inc(&head_sh->count);
@@ -1067,10 +1067,10 @@ again:
rbi->bi_end_io = raid5_end_write_request;
rbi->bi_private = sh;
 
-   pr_debug("%s: for %llu schedule op %ld on "
+   pr_debug("%s: for %llu schedule op %d,%u on "
 "replacement disc %d\n",
__func__, (unsigned long long)sh->sector,
-   rbi->bi_rw, i);
+   rbi->bi_op, rbi->bi_rw, i);
atomic_inc(&sh->count);
if (sh != head_sh)
atomic_inc(&head_sh->count);
@@ -1102,8 +1102,9 @@ again:
if (!rdev && !rrdev) {
if (op_is_write(op))
set_bit(STRIPE_DEGRADED, &sh->state);
-   pr_debug("skip op %ld on disc %d for sector %llu\n",
-   bi->bi_rw, i, (unsigned long long)sh->sector);
+   pr_debug("skip op %d,%u on disc %d for sector %llu\n",
+bi->bi_op, bi->bi_rw, i,
+(unsigned long long)sh->sector);
clear_bit(R5_LOCKED, &sh->dev[i].flags);
set_bit(STRIPE_HANDLE, &sh->state);
}
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index e409d1f..1623d11 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2985,7 +2985,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bio(rw=%d,0x%lx, bi_vcnt=%u,"
+  "submit_bio(rw=%d,0x%x, bi_vcnt=%u,"
   " bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
   bio->bi_op, bio->bi_rw, bio->bi_vcnt,
   (unsigned long long)bio->bi_iter.bi_sector,
@@ -3028,7 +3028,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
-  "submit_bio(rw=

[PATCH 02/35] block: add REQ_OP definitions and bi_op/op fields

2016-01-05 Thread mchristi
From: Mike Christie 

The following patches separate the operation (write, read, discard,
etc) from the flags in bi_rw/cmd_flags. This patch adds definitions
for request/bio operations, adds fields to the request/bio to set
them, and some temporary compat code so the kernel/modules can use
either one. In the final patches this compat code will be removed
when everything is converted.

Also, in this patch the REQ_OPs match the REQ rq_flag_bits ones
for compat reasons while all the code is converted in this set. In the
last patches that will also be removed.

Signed-off-by: Mike Christie 
---
 block/blk-core.c  | 19 ---
 include/linux/blk_types.h | 15 ++-
 include/linux/blkdev.h|  1 +
 include/linux/fs.h| 37 +++--
 4 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9b887e3..954a450 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1697,7 +1697,8 @@ void init_request_from_bio(struct request *req, struct 
bio *bio)
 {
req->cmd_type = REQ_TYPE_FS;
 
-   req->cmd_flags |= bio->bi_rw & REQ_COMMON_MASK;
+   /* tmp compat. Allow users to set bi_op or bi_rw */
+   req->cmd_flags |= (bio->bi_rw | bio->bi_op) & REQ_COMMON_MASK;
if (bio->bi_rw & REQ_RAHEAD)
req->cmd_flags |= REQ_FAILFAST_MASK;
 
@@ -2032,6 +2033,12 @@ blk_qc_t generic_make_request(struct bio *bio)
struct bio_list bio_list_on_stack;
blk_qc_t ret = BLK_QC_T_NONE;
 
+   /* tmp compat. Allow users to set either one or both.
+* This will be removed when we have converted
+* everyone in the next patches.
+*/
+   bio->bi_rw |= bio->bi_op;
+
if (!generic_make_request_checks(bio))
goto out;
 
@@ -2101,6 +2108,12 @@ EXPORT_SYMBOL(generic_make_request);
  */
 blk_qc_t submit_bio(struct bio *bio)
 {
+   /* tmp compat. Allow users to set either one or both.
+* This will be removed when we have converted
+* everyone in the next patches.
+*/
+   bio->bi_rw |= bio->bi_op;
+
/*
 * If it's a regular read/write or a barrier with data attached,
 * go through the normal accounting stuff before submission.
@@ -2972,8 +2985,8 @@ EXPORT_SYMBOL_GPL(__blk_end_request_err);
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 struct bio *bio)
 {
-   /* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw */
-   rq->cmd_flags |= bio->bi_rw & REQ_WRITE;
+   /* tmp compat. Allow users to set bi_op or bi_rw */
+   rq->cmd_flags |= bio_data_dir(bio);
 
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 86a38ea..6e49c91 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -48,9 +48,15 @@ struct bio {
struct block_device *bi_bdev;
unsigned intbi_flags;   /* status, command, etc */
int bi_error;
-   unsigned long   bi_rw;  /* bottom bits READ/WRITE,
+   unsigned long   bi_rw;  /* bottom bits rq_flags_bits
 * top bits priority
 */
+   /*
+* this will be a u8 in the next patches and bi_rw can be shrunk to
+* a u32. For compat in these transistional patches op is a int here.
+*/
+   int bi_op;  /* REQ_OP */
+
 
struct bvec_iterbi_iter;
 
@@ -242,6 +248,13 @@ enum rq_flag_bits {
 #define REQ_HASHED (1ULL << __REQ_HASHED)
 #define REQ_MQ_INFLIGHT(1ULL << __REQ_MQ_INFLIGHT)
 
+enum req_op {
+   REQ_OP_READ,
+   REQ_OP_WRITE= REQ_WRITE,
+   REQ_OP_DISCARD  = REQ_DISCARD,
+   REQ_OP_WRITE_SAME   = REQ_WRITE_SAME,
+};
+
 typedef unsigned int blk_qc_t;
 #define BLK_QC_T_NONE  -1U
 #define BLK_QC_T_SHIFT 16
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 29189ae..35b9eb3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -96,6 +96,7 @@ struct request {
struct request_queue *q;
struct blk_mq_ctx *mq_ctx;
 
+   int op;
u64 cmd_flags;
unsigned cmd_type;
unsigned long atomic_flags;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3b4e751..fb9e516 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2405,15 +2405,48 @@ extern void make_bad_inode(struct inode *);
 extern bool is_bad_inode(struct inode *);
 
 #ifdef CONFIG_BLOCK
+
+static inline bool op_is_write(int op)
+{
+   switch (op) {
+   case REQ_OP_WRITE:
+   case REQ_OP_WRITE_SAME:
+   case REQ_OP_DISCARD:
+   return true;
+   default:
+   return false;
+   }
+}
+
 

[PATCH 00/35 v2] separate operations from flags in the bio/request structs

2016-01-05 Thread mchristi
The following patches begin to cleanup the request->cmd_flags and
bio->bi_rw mess. We currently use cmd_flags to specify the operation,
attributes and state of the request. For bi_rw we use it for similar
info and also the priority but then also have another bi_flags field
for state. At some point, we abused them so much we just made cmd_flags
64 bits, so we could add more.

The following patches seperate the operation (read, write discard,
flush, etc) from cmd_flags/bi_rw.

This patchset was made against linux-next from today Jan 5 2016.
(git tag next-20160105).

v2.

1. Dropped arguments from submit_bio, and had callers setup
bio.
2. Add REQ_OP_FLUSH for request_fn users and renamed REQ_FLUSH
to REQ_PREFLUSH for make_request_fn users.
3. Dropped bio/rq_data_dir functions, and added a op_is_write
function instead.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Check - "type mismatch with chunk"

2016-01-05 Thread Martin Steigerwald
Am Dienstag, 5. Januar 2016, 15:34:35 CET schrieb Duncan:
> Christoph Anton Mitterer posted on Sat, 02 Jan 2016 06:12:46 +0100 as
> 
> excerpted:
> > On Fri, 2015-12-25 at 08:06 +, Duncan wrote:
> >> I wasn't personally sure if 4.1 itself was affected or not, but the
> >> wiki says don't use 4.1.1 as it's broken with this bug, with the
> >> quick-fix in 4.1.2, so I /think/ 4.1 itself is fine.  A scan with a
> >> current btrfs check should tell you for sure.  But if you meant 4.1.1
> >> and only typed 4.1, then yes, better redo.
> > 
> > What exactly was that bug in 4.1.1 mkfs and how would one notice that
> > one suffers from it?
> > I created a number of personal filesystems that I use "productively" and
> > I'm not 100% sure during which version I've created them... :/
> >
> > 
> >
> > Is there some easy way to find out, like a fs creation time stamp??
> 
> I believe a current btrfs check will flag the errors, but can't fix them, 
> as the problem was in the filesystem creation and is simply too deep to 
> fix, so the bad filesystems must be wiped and recreated with a mkfs.btrfs 
> without the bug, to fix.

btrfs check from btrfs tools 4.3.1 on kernel 4.4-rc6 has not been able to fix 
these errors and I recreated the filesystem that had the errors. I think I 
mentioned it also in this thread.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Check - "type mismatch with chunk"

2016-01-05 Thread Christoph Anton Mitterer
On Tue, 2016-01-05 at 15:34 +, Duncan wrote:
> >What exactly was that bug in 4.1.1 mkfs and how would one notice
> > that
> > one suffers from it?
> > I created a number of personal filesystems that I use
> > "productively" and
> > I'm not 100% sure during which version I've created them... :/
> > 
> > Is there some easy way to find out, like a fs creation time stamp??
> 
> I believe a current btrfs check will flag the errors, but can't fix
> them, 
> as the problem was in the filesystem creation and is simply too deep
> to 
> fix, so the bad filesystems must be wiped and recreated with a
> mkfs.btrfs 
> without the bug, to fix.
If I didn't mix things up, there was a post by someone just few days
ago, which showed the error that would pop up on fsck.


> the 
> people volunteering (directly or indirectly) to do that coding
> scratch, 
> or choose not to scratch by spending their time and/or resources 
> elsewhere, their own itches in the priority they choose.
Sure that,'s all clear.
And obviously I didn't want to distract anyone from working on it. It's
just if those people wouldn't care on which part of btrfs they're
working,... than I'd have considered btrfs-convert rather just a nice-
to have.


> It's the same reason that I as a kde user who finds the gnome "dumb-
> down" 
> approach horribly frustrating, remain extremely glad there's a gnome 
> project for those who approve of that sort of approach to work on -- 
I'd rather have wished that all those guys get hired by Apple or MS ;-)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2016-01-05 Thread Christoph Anton Mitterer
On Tue, 2016-01-05 at 11:44 +0100, David Sterba wrote:
> We have a full 32 bit number space, so multiples of power of 2 are
> also
> possible if that makes sense.
Hmm that would make a maximum of 4GiB RAID chunks...
perhaps we should reserve some of the higher bits for a multiplier, in
case 4GiB would ever become too little O;-)


>  In general we don't need to set additional
> limitations besides minimum, maximum and "minimal step".
And that can/should be done in the userland.


> > Are there any concerns/constraints with too small/too big chunks
> > when
> > these play together with lower block layers (I'd guess not).
> 
> I don't think so.

Well I was mainly thinking about dm-crypt, that uses 512B blocks and in
fact that size wouldn't be easy to change, as (IIRC) larger block sizes
make XTS less secure.
Obviously *this* isn't anything that btrfs would have to worry about,
especially as we're anyway on a higher block layer level,.. but it just
reminded me that there can be cases where too large / too small may
actually cause issues.


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Purposely using btrfs RAID1 in degraded mode ?

2016-01-05 Thread Psalle

Hello Alphazo,

I am a mere btrfs user, but given the discussions I regularly see here 
about difficulties with degraded filesystems I wouldn't rely on this 
(yet?) as a regular work strategy, even if it's supposed to work.


If you're familiar with git, perhaps git-annex could be an alternative.

-Psalle.

On 04/01/16 18:00, Alphazo wrote:

Hello,

My picture library today lies on an external hard drive that I sync on
a regular basis with a couple of servers and other external drives.
I'm interested by the on-the-fly checksum brought by btrfs and would
like to get your opinion on the following unusual use case that I have
tested:
- Create a btrfs with the two drives with RAID1
- When at home I can work with the two drives connected so I can enjoy
the self-healing feature if a bit goes mad so I only backup perfect
copies to my backup servers.
- When not at home I only bring one external drive and manually mount
it in degraded mode so I can continue working on my pictures while
still having checksum error detection (but not correction).
- When coming back home I can plug-back the seconde drive and initiate
a scrub or balance to get the second drive duplicated.

I have tested the above use case with a couple of USB flash drive and
even used btrfs over dm-crypt partitions and it seemed to work fine
but I wanted to get some advices from the community if this is really
a bad practice that should not be used on the long run. Is there any
limitation/risk to read/write to/from a degraded filesystem knowing it
will be re-synced later?

Thanks
alphazo

PS: I have also investigated the RAID1 on a single drive with two
partitions but I cannot afford the half capacity resulting from that
approach.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device removal seems to be very slow (kernel 4.1.15)

2016-01-05 Thread Lionel Bouton
Le 05/01/2016 14:04, David Goodwin a écrit :
> Using btrfs progs 4.3.1 on a Vanilla kernel.org 4.1.15 kernel.
>
> time btrfs device delete /dev/xvdh /backups
>
> real13936m56.796s
> user0m0.000s
> sys 1351m48.280s
>
>
> (which is about 9 days).
>
> Where :
>
> /dev/xvdh was 120gb in size.
>

That's very slow. Last week with a 4.1.12 kernel I just deleted a 3TB
SATA 7200rpm device with ~1.5TB used on a RAID10 filesystem (reduced
from 6 3TB devices to 5 devices in the process) in approximately 38
hours. This was without virtualisation though but there were some
damaged sectors to handle along the way which should have slowed the
delete a bit and it had more than 10 times the data to move than your
/dev/xvdh.

Note about the damaged sectors :
we use 7 disks for this BTRFS RAID10 arrays but to reduce the risk of
having to restore huge backups (see recent discussion about BTRFS RAID10
not protecting against 2-devices failure at all), as soon as numerous
damaged sectors appear on a drive we delete it from the RAID10 and add
it to a MD RAID1 array which is one of the devices on the BTRFS RAID10
(right now we have 5 devices in the RAID10 one of them being a 3-way md
RAID1 with disks having these numerous reallocated sectors)  : so the
reads from the deleted device had some errors to handle and the writes
on the md RAID1 device triggered some sector relocations too. Note that
ideally I would replace at least 2 of the disks in the md RAID1 because
I know from experience that they will fail in the short future (my
estimate is between right now and 6 months at best given the current
rate of reallocated sectors) but replacing a working drive with damaged
sectors costs us some downtime and a one time fee (unlike a drive which
is either unreadable or doesn't pass SMART tests anymore). We can live
with both the occasional slowdowns (SATA errors generated when the
drives detect new damaged sectors usually block IOs for a handful of
seconds) and the minor risk this causes : until now this worked OK for
this server, the md RAID1 array acts as a buffer for disks that are
slowly dying (and the monthly BTRFS scrub + md raid check helps getting
the worst ones up to the point where they fail fast enough to avoid
accumulating too much bad drives in this array for long periods of time).

>
> /backups is a single / "raid 0" volume that now looks like :
>
> Label: 'BACKUP_BTRFS_SNAPS'  uuid: 6ee08c31-f310-4890-8424-b88bb77186ed
> Total devices 3 FS bytes used 301.09GiB
> devid1 size 100.00GiB used 90.00GiB path /dev/xvdg
> devid3 size 220.00GiB used 196.06GiB path /dev/xvdi
> devid4 size 221.00GiB used 59.06GiB path /dev/xvdj
>
>
> There are about 400 snapshots on it.

I'm not sure if the number of snapshots can impact the device delete
operation: the slow part of device delete is relocating block groups
which (AFAIK) seems to be one level down in the stack and shouldn't even
know about snapshots. If however you create or delete snapshots during
the delete operation you could probably slow down the delete.

Best regards,

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix transaction handle leak on failure to create hard link

2016-01-05 Thread fdmanana
From: Filipe Manana 

If we failed to create a hard link we were not always releasing the
the transaction handle we got before, resulting in a memory leak and
preventing any other tasks from being able to commit the current
transaction.
Fix this by always releasing our transaction handle.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/inode.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5dbc07a..018c2a6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6488,7 +6488,7 @@ out_unlock_inode:
 static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
  struct dentry *dentry)
 {
-   struct btrfs_trans_handle *trans;
+   struct btrfs_trans_handle *trans = NULL;
struct btrfs_root *root = BTRFS_I(dir)->root;
struct inode *inode = d_inode(old_dentry);
u64 index;
@@ -6514,6 +6514,7 @@ static int btrfs_link(struct dentry *old_dentry, struct 
inode *dir,
trans = btrfs_start_transaction(root, 5);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
+   trans = NULL;
goto fail;
}
 
@@ -6547,9 +6548,10 @@ static int btrfs_link(struct dentry *old_dentry, struct 
inode *dir,
btrfs_log_new_name(trans, inode, NULL, parent);
}
 
-   btrfs_end_transaction(trans, root);
btrfs_balance_delayed_items(root);
 fail:
+   if (trans)
+   btrfs_end_transaction(trans, root);
if (drop_inode) {
inode_dec_link_count(inode);
iput(inode);
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid1 vs raid5

2016-01-05 Thread Psalle
Hello all and excuse me if this is a silly question. I looked around in 
the wiki and list archives but couldn't find any in-depth discussion 
about this:


I just realized that, since raid1 in btrfs is special (meaning only two 
copies in different devices), the effect in terms of resilience achieved 
with raid1 and raid5 are the same: you can lose one drive and not lose data.


So!, presuming that raid5 were at the same level of maturity, what would 
be the pros/cons of each mode?


As a corollary, I guess that if raid1 is considered a good compromise, 
then functional equivalents to raid6 and beyond could simply be 
implemented as "storing n copies in different devices", dropping any 
complex parity computations and making this mode entirely generic. Since 
this seems pretty obvious, I'd welcome your insights on what are the 
things I'm missing, since it doesn't exist (and it isn't planned to be 
this way, AFAIK). I can foresee consistency difficulties, but that seems 
hardly insurmountable if its being done for raid1?


Thanks in advance,
Psalle.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Check - "type mismatch with chunk"

2016-01-05 Thread Duncan
Christoph Anton Mitterer posted on Sat, 02 Jan 2016 06:12:46 +0100 as
excerpted:

> On Fri, 2015-12-25 at 08:06 +, Duncan wrote:
>> I wasn't personally sure if 4.1 itself was affected or not, but the
>> wiki says don't use 4.1.1 as it's broken with this bug, with the
>> quick-fix in 4.1.2, so I /think/ 4.1 itself is fine.  A scan with a
>> current btrfs check should tell you for sure.  But if you meant 4.1.1
>> and only typed 4.1, then yes, better redo.

> What exactly was that bug in 4.1.1 mkfs and how would one notice that
> one suffers from it?
> I created a number of personal filesystems that I use "productively" and
> I'm not 100% sure during which version I've created them... :/
> 
> Is there some easy way to find out, like a fs creation time stamp??

I believe a current btrfs check will flag the errors, but can't fix them, 
as the problem was in the filesystem creation and is simply too deep to 
fix, so the bad filesystems must be wiped and recreated with a mkfs.btrfs 
without the bug, to fix.

There's a current or near-current thread (say from Dec 28 or newer) that 
seems to have the specific errors btrfs check reports, based on others' 
replies.  I'm getting behind on this list so won't go attempting to find 
it ATM, and haven't seen the problem myself, so...

>> Unfortunately, the btrfs-
>> convert bug isn't as nailed down, but btrfs-convert has a warning up on
>> the wiki anyway, as currently being buggy and not reliable.

> I hope I don't step on anyone's toes who puts efforts into this, but in
> all doing respect,... I think the whole convert thing is at least in
> parts a waste of manpower - or perhaps better said: it would be nice to
> have, but given the lack of manpower at btrfs development and the
> numerous areas[0] that would need some urgent and probably lots of
> care,... having a convert from other fs to btrfs seems like luxury that
> isn't really needed.

Were btrfs development a zero-sum game, I'd tend to agree with you.  
However, in a free/libre and open source software environment where
many/most contributions are from volunteers, either directly or because 
someone with resources who can't themselves do the coding has taken an 
interest and is effectively volunteering to exchange some of those 
resources for code, things are a bit different.  Code talks, and the 
people volunteering (directly or indirectly) to do that coding scratch, 
or choose not to scratch by spending their time and/or resources 
elsewhere, their own itches in the priority they choose.  That 
environment -- our environment as a FLOSS project -- is no longer zero-
sum as you can't force volunteers to work on something they're not 
interested in -- they walk away and do something else instead.

And obviously, here we have someone with a particular itch to do btrfs-
convert.  To the extent that their code works, as you said, it's it's 
nice to have, particularly if the alternative isn't that some other part 
of btrfs works better, but that they walk away and do their volunteering 
on some other project instead.

Of course right now the code isn't working so well, thus the big warning 
on the wiki and the various recommendations here not to use it.  But to 
the extent that someone takes an interest and fixes it to work again and 
their code is to project standard, particularly if their work doesn't 
come at the expense of other btrfs code, as it may well not if it's 
something a dev takes as a personal challenge and thus puts more hours 
into it than he would into anything else btrfs related, who are we to say 
no, we aren't going to take that unarguably useful code?


It's the same reason that I as a kde user who finds the gnome "dumb-down" 
approach horribly frustrating, remain extremely glad there's a gnome 
project for those who approve of that sort of approach to work on -- by 
definition, they'd find working on kde, with its plethora of options 
approach, extremely frustrating, and would either walk away, or worse yet 
for those who prefer kde's approach, muck things up with fights that 
either take away options or at minimum cause less actual kde work to get 
done.  People who complain about gnome and kde and xfce and ... all 
taking away effort from each other simply don't understand how freedomware 
works, and that it's not a question of wasting effort that could be 
better put to use elsewhere, but of either not having that effort spent 
at all or worse yet, of starting fights so less gets done and more people 
simply quit in frustration.


So it's not a zero-sum game at all, and if the choice is in having 
someone that's interested in btrfs-convert spend time on it, or having 
them not working on btrfs at all, as is very possibly the case, I, and I 
guess most on the project, would prefer they spend the time on
btrfs-convert, even if it's not the absolutely most critical tool in the 
btrfs-tools package. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, 

Re: evidence of persistent state, despite device disconnects

2016-01-05 Thread Duncan
Chris Murphy posted on Sun, 03 Jan 2016 14:33:40 -0700 as excerpted:

> kernel-4.4.0-0.rc6.git0.1.fc24.x86_64 btrfs-progs 4.3.1
> 
> There was some copy pasting, hence /mnt/brick vs /mnt/brick2 confusion,
> but the volume was always cleanly mounted and umounted.
> 
> The biggest problem I have with all of this is the completely silent
> addition of single chunks. That made the volume, in effect, no longer
> completely raid1. No other details matter, except to try and reproduce
> the problem, and find its source so it can be fixed. It is a bug,
> because it's definitely not sane or expected behavior at all.

If there's no way you mounted it degraded,rw at any point, I agree, 
single mode chunks are unexpected on a raid1 for both data and metadata, 
and it's a bug -- possibly actually related to that new code that allows 
degraded,rw recovery via per-chunk checks.

If however you mounted it degraded,rw at some point, then I'd say the bug 
is in wetware, as in that case, based on my understanding, it's working 
as intended.  I was inclined to believe that was what happened based on 
the obviously partial sequence in the earlier post, but if you say you 
didn't... then it's all down to duplication and finding why it's suddenly 
reverting to single mode on non-degraded mounts, which indeed /is/ a bug.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Check metadata redundancy on balance

2016-01-05 Thread David Sterba
On Tue, Dec 08, 2015 at 09:25:03AM +, sam tygier wrote:
> Signed-off-by: Sam Tygier 
> 
> From: Sam Tygier 
> Date: Sat, 3 Oct 2015 16:43:48 +0100
> Subject: [PATCH] Btrfs: Check metadata redundancy on balance
> 
> When converting a filesystem via balance check that metadata mode
> is at least as redundant as the data mode. For example give warning
> when:
> -dconvert=raid1 -mconvert=single

Missing signed-off-by

> ---
>  fs/btrfs/volumes.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6fc73586..40247e9 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3584,6 +3584,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   }
>   } while (read_seqretry(&fs_info->profiles_lock, seq));
>  
> + if (btrfs_get_num_tolerated_disk_barrier_failures(bctl->meta.target) <
> + 
> btrfs_get_num_tolerated_disk_barrier_failures(bctl->data.target)) {
> + btrfs_info(fs_info,
> + "Warning: metatdata has lower redundancy than data\n");

If it's a warning then please use btrfs_warn. The message gets a higher
priority and would be caught by log scanners as an issue worth attention.

Also, explicitly mentioning the profiles for data and metadata would be
better.

And sorry that it takes so long to get this patch merged.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: only free root_path if it was allocated from the heap

2016-01-05 Thread Neil Horman
On Tue, Jan 05, 2016 at 03:01:38PM +0100, David Sterba wrote:
> On Thu, Dec 03, 2015 at 01:45:44PM -0500, Neil Horman wrote:
> > Noticed this while doing some snapshots in a chroot environment
> > 
> > btrfs receive can set root_path to either realmnt, which is passed in from 
> > the
> > command line, or to a heap allocated via find_mount_root  in do_receive.  We
> > should only free the later, not the former, as the former results in an 
> > invalid
> > pointer warning from glibc during free.
> > 
> > Signed-off-by: Neil Horman 
> 
> Sorry, I missed this patch, now applied, thanks.
> 

Thanks!
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix output of compression message in btrfs_parse_options()

2016-01-05 Thread David Sterba
On Wed, Dec 16, 2015 at 11:57:38AM +0900, Tsutomu Itoh wrote:
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 974be09..dcc1f15 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2709,7 +2709,7 @@ int open_ctree(struct super_block *sb,
>* In the long term, we'll store the compression type in the super
>* block, and it'll be used for per file compression control.
>*/
> - fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
> + fs_info->compress_type = BTRFS_COMPRESS_NONE;

This would change the default compression type, eg. when the compression
is turned on via chattr +c . This would break the applications out
there, the fix has to avoid changing that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Runs the xor function if a Block has failed

2016-01-05 Thread Sanidhya Solanki
On Tue, 5 Jan 2016 10:22:36 +0100
David Sterba  wrote:

> If the data a rerecovered, why is -EIO still returned? 

In the other places in the file where the code appears, the submitted
patch is all that is required to do the xor. I think we also need to
include the following line:
memcpy(pointers[nr_data], pointers[0], PAGE_SIZE);
and delete the "return EIO" statement which I believe appears to be a
placeholder for the xor function.
In the end the final patch should look something like this:
> -  * TODO, we should redo the xor here.
>*/
> + memcpy(pointers[nr_data], pointers[0], 
> PAGE_SIZE);
> + run_xor(pointers, rbio->nr_data - 1, 
> PAGE_CACHE_SIZE); 
> - err = -EIO;

So, I was just going to send you the mail as it is written above, but I
decided to investigate. The commit in question that added the todo was
53b381b3abeb86f12787a6c40fee9b2f71edc23b. Unfortunately, it was not
submitted  by the original author, nor was it by any means a small
dedicated patch. It adds the entire file, without much comment or
explanation and has not been touched since. 

However, we can get some idea of what is expected by looking at line
2398 in raid56.c, where a similar case of raid 5 recovery is handled.

So what the patch described above does is deal with a scenario where
no q stripe or bad data block exists, and, we can only rebuild from the
p-stripe, in effect like a raid 5 recovery.

So, if you are satisfied with the above retouched change, I can modify
my original patch with your suggestions and my changes, and, I can
forward them to the list again.

> Also, I see some post-recovery steps eg. for the damaged P stripes
> (at label pstripes) and I'd expect something similar for the case
> you're fixing.

I believe that is the case because the other cases still have the q-
stripe available tor rebuild from, which requires cleanup afterwards,
but the raid5-like scenario above does not. Let me know if anything
else is needed.

> I'm not familiar with the raid56 implementation but the fix looks
> suspiciously trivial and I doubt that the xor was omitted out of
> laziness.

I guess we will never know.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: only free root_path if it was allocated from the heap

2016-01-05 Thread David Sterba
On Thu, Dec 03, 2015 at 01:45:44PM -0500, Neil Horman wrote:
> Noticed this while doing some snapshots in a chroot environment
> 
> btrfs receive can set root_path to either realmnt, which is passed in from the
> command line, or to a heap allocated via find_mount_root  in do_receive.  We
> should only free the later, not the former, as the former results in an 
> invalid
> pointer warning from glibc during free.
> 
> Signed-off-by: Neil Horman 

Sorry, I missed this patch, now applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] Btrfs: disable online scrub repair on ro cases

2016-01-05 Thread David Sterba
On Mon, Dec 07, 2015 at 10:26:05AM -0800, Liu Bo wrote:
> On Mon, Dec 07, 2015 at 03:37:43PM +0100, David Sterba wrote:
> > On Fri, Dec 04, 2015 at 09:58:04AM -0800, Liu Bo wrote:
> > > This disables repair process on ro cases as it can cause system
> > > to be unresponsive on the ASSERT() in repair_io_failure().
> > > 
> > > This can happen when scrub is running and a hardware error pops up,
> > > we should fallback to ro mounts gracefully instead of being unresponsive.
> > 
> > So this will also report the error as uncorrectable. This might be a bit
> > misleading, if a device error happens first and then some potentially
> > corectable errors are detected. This could be accounted as 'unverified'
> > error, that has closet maning.
> 
> Make sense, we can do
> if (ret < 0 && ret == -EROFS)
>   spin_lock();
>   unverified++;
>   spin_unlock()
> 
> However, in scrub_fixup_nodatasum() all errors including ENOMEM of path
> allocation and failure of trans are interpreted to 'uncorrectable', So I
> wander it means this 'uncorrectable' is only valid in this scrub process?

I'm not sure we have a proper definition of the various stats. My user
expectation is that 'uncorrectable' refers to permament errors, so we
should try to match the type of error everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3 v2] btrfs: use list_for_each_entry* in backref.c

2016-01-05 Thread David Sterba
On Mon, Dec 21, 2015 at 11:50:23PM +0800, Geliang Tang wrote:
> Use list_for_each_entry*() to simplify the code.
> 
> Signed-off-by: Geliang Tang 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: use list_for_each_entry_safe in free-space-cache.c

2016-01-05 Thread David Sterba
On Fri, Dec 18, 2015 at 10:17:00PM +0800, Geliang Tang wrote:
> Use list_for_each_entry_safe() instead of list_for_each_safe() to
> simplify the code.
> 
> Signed-off-by: Geliang Tang 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: use list_for_each_entry* in check-integrity.c

2016-01-05 Thread David Sterba
On Fri, Dec 18, 2015 at 10:16:59PM +0800, Geliang Tang wrote:
> Use list_for_each_entry*() instead of list_for_each*() to simplify
> the code.
> 
> Signed-off-by: Geliang Tang 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Chris Bainbridge
On 5 January 2016 at 01:57, Qu Wenruo  wrote:
>>
>> Data, single: total=106.79GiB, used=82.01GiB
>> System, single: total=4.00MiB, used=16.00KiB
>> Metadata, single: total=2.01GiB, used=1.51GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> That's the btrfs fi df misleading output confusing you.
>
> In fact, your metadata is already used up without available space.
> GlobalReserve should also be counted as Metadata *used* space.

Thanks for the explanation - the FAQ[1] misleads when it describes
GlobalReserve as "The block reserve is only virtual and is not stored
on the devices." - which sounds like the reserve is literally not
stored on the drive.

The FAQ[2] also suggests that the free space in metadata can be less
than the block reserve total:

"If the free space in metadata is less than or equal to the block
reserve value (typically 512 MiB, but might be something else on a
particularly small or large filesystem), then it's close to full."

But what you are saying is that this is wrong and the free space in
metadata can never be less than the block reserve, because the block
reserve includes the metadata free space?

[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_GlobalReserve_and_why_does_.27btrfs_fi_df.27_show_it_as_single_even_on_RAID_filesystems.3F
[2] 
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

> Good, 5GiB freed space, it can be allocated for metadata to slightly reduce
> the metadata pressure.
>
> But not for long.
> The root resolve will be, add more space into this btrfs.

Yes but this is a 128GB SSD and metadata could have been reallocated
from some of the 25GB of free space allocated to data. Even with a
bigger drive, it is possible that chunks could be allocated to data,
and then later operations requiring more metadata will still run out
(running out of metadata space seems to be a reasonably common
occurrence judging by the number of "why is btrfs reporting no space
when I have space free" questions). The file system shouldn't be
corrupted when that happens.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device removal seems to be very slow (kernel 4.1.15)

2016-01-05 Thread Austin S. Hemmelgarn

On 2016-01-05 08:04, David Goodwin wrote:

Using btrfs progs 4.3.1 on a Vanilla kernel.org 4.1.15 kernel.

time btrfs device delete /dev/xvdh /backups

real13936m56.796s
user0m0.000s
sys 1351m48.280s


(which is about 9 days).


Where :

/dev/xvdh was 120gb in size.
OK, based on the device names, you're running this inside a Xen instance 
with para-virtualized storage drivers (or Amazon EC2, which is the same 
thing at it's core), and that will have at least some impact on 
performance (although it will be less impact than if you were using full 
virtualization). If you have administrative access to Domain 0, and can 
afford to have the VM down, I would suggest checking how long the 
equivalent operation takes from Domain 0 (note that to properly check 
this, you would need to re-add the device to the FS, re-balance the FS, 
and then delete the device).  If you get similar results in Domain 0 and 
in the VM, then that rules out virtualization as the bottleneck (for 
para-virtualized storage backed by physical block devices on the local 
system (as opposed to files, or networked block devices), you should see 
at most a 10% performance gain running it in Domain 0 assuming both the 
VM and Domain 0 have the same number of VCPU's and same amount of RAM).



/backups is a single / "raid 0" volume that now looks like :

Label: 'BACKUP_BTRFS_SNAPS'  uuid: 6ee08c31-f310-4890-8424-b88bb77186ed
 Total devices 3 FS bytes used 301.09GiB
 devid1 size 100.00GiB used 90.00GiB path /dev/xvdg
 devid3 size 220.00GiB used 196.06GiB path /dev/xvdi
 devid4 size 221.00GiB used 59.06GiB path /dev/xvdj


There are about 400 snapshots on it.
This may be part of the issue.  Assuming that /dev/xvdh was mostly full 
like /dev/xvdg and /dev/xvdi are now, then that would mean it would take 
longer to remove from the filesystem, because all the chunks that are 
partially on the device being removed need to be moved to another 
device. On top of that, whenever a chunk moves, metadata needs to be 
updated, which means a lot of updates if you have a lot of shared 
extents, which I'm assuming is the case based on the number of snapshots.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


device removal seems to be very slow (kernel 4.1.15)

2016-01-05 Thread David Goodwin

Using btrfs progs 4.3.1 on a Vanilla kernel.org 4.1.15 kernel.

time btrfs device delete /dev/xvdh /backups

real13936m56.796s
user0m0.000s
sys 1351m48.280s


(which is about 9 days).


Where :

/dev/xvdh was 120gb in size.


/backups is a single / "raid 0" volume that now looks like :

Label: 'BACKUP_BTRFS_SNAPS'  uuid: 6ee08c31-f310-4890-8424-b88bb77186ed
Total devices 3 FS bytes used 301.09GiB
devid1 size 100.00GiB used 90.00GiB path /dev/xvdg
devid3 size 220.00GiB used 196.06GiB path /dev/xvdi
devid4 size 221.00GiB used 59.06GiB path /dev/xvdj


There are about 400 snapshots on it.


thanks
David.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replace a corrupted block using a known-good file from another filesystem ?

2016-01-05 Thread Austin S. Hemmelgarn

On 2016-01-05 07:25, Sylvain Joyeux wrote:

In the course of the few btrfs crashes I had on my USB backup drive
(NOT the drive from my other bug report, which is an internal SATA
drive) - in the last 6 months or so - I ended up having a 4 to 5 bad
checksums reported by scrub.

This drive is used to synchronize snapshots from my main machine, and
the corrupted files are system files that are still present on the
main machine.

I could obviously reset the filesystem on the USB drive, but since the
goal is to keep a backup history (which the main machine does not
keep), I would rather avoid that.

Would there be a way to replace the bad blocks using the good file on
the main filesystem ? Replacing it in each snapshot separately does
not look very appealing as the file is present on most of them.
Short of manually modifying the underlying block device directly, there 
really isn't much you can do.  However, if the file is in the same place 
in every snapshot, it should be really easy to script from the command 
line with a simple for loop.  Assuming that the file is /bin/bash, the 
following should work if run from the directory containing the snapshots 
(assuming of course that the snapshots are read-only, if not you can 
just remove both of the btrfs property set lines):


for snapshot in * ; do
btrfs property set ${snapshot} ro false
cp /bin/bash ${snapshot}/bin/bash
btrfs property set ${snapshot} ro true
done

You can of course replace /bin/bash in that with any file.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Replace a corrupted block using a known-good file from another filesystem ?

2016-01-05 Thread Sylvain Joyeux
In the course of the few btrfs crashes I had on my USB backup drive
(NOT the drive from my other bug report, which is an internal SATA
drive) - in the last 6 months or so - I ended up having a 4 to 5 bad
checksums reported by scrub.

This drive is used to synchronize snapshots from my main machine, and
the corrupted files are system files that are still present on the
main machine.

I could obviously reset the filesystem on the USB drive, but since the
goal is to keep a backup history (which the main machine does not
keep), I would rather avoid that.

Would there be a way to replace the bad blocks using the good file on
the main filesystem ? Replacing it in each snapshot separately does
not look very appealing as the file is present on most of them.

Sylvain
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Computer freezes, btrfs check reports bad extent ... type mismatch with chunk

2016-01-05 Thread Sylvain Joyeux
> What does btrfs-show-super  /dev/sda5 list as output for incompat_flags  ?

incompat_flags  0x161
   ( MIXED_BACKREF |
 BIG_METADATA |
 EXTENDED_IREF |
 SKINNY_METADATA )

> It would be much better if you can provide some info about the operations you 
> are doing when the bug is triggered. This should help us to reproduce the bug.

I use the partition as a target for snapsync, a tool that basically
synchronizes snapshots made by snapper into a different (local) drive.
snapsync only does three operations on /dev/sda5:
 - btrfs receive
 - btrfs subvolume delete
 - btrfs subvolume sync (done just after either a receive, or after
possibly many deletes)

These operations are done serially. The only concurrency would be on
the sending filesystem, as snapper and snapsync are not synchronized.
It is therefore possible that snapper creates or deletes snapshots
while snapsync is sending some, including deleting the very snapshot
that snapsync is currently sending.

> Would you please try 4.4-rc kernel and see if it still triggers the bug?

I tried again this morning and I could not trigger the bug on 4.1.15.
The only thing that involved /dev/sda5 I did in the meantime was to
boot on a 4.3.3 kernel and mount /dev/sda5 - not doing any other
operations - and *then* reboot on 4.1.15 to try to trigger the bug
again. The partition was unmounted and unused since yesterday. The
only difference I can see with the situation yesterday is that I did
not run snapsync at boot, but manually (I obviously disabled snapsync
on boot since it was making my kernel freeze)

Even though I don't get any kernel message, btrfs check still reports
the 'type mismatch' errors.

Sylvain

2016-01-04 23:18 GMT-02:00 Qu Wenruo :
>
>
> Sylvain Joyeux wrote on 2016/01/04 17:25 -0200:
>>
>> I currently have a btrfs partition that, when mounted, ends up
>> freezing my system. The only error I get when running btrfs check on
>> it is a whole lot of "bad extent ... type mismatch with chunk"
>>
>> I've added the patch https://patchwork.kernel.org/patch/7687611/ on
>> top of btrfs-progs v4.3.1 (
>> 7c3394ed9ef2063a7256d4bc078a485b6f826bc5), with no change.
>>
>> dmesg reports first the following message twice
>>
>> 13:29:09 kernel: [ cut here ]
>> 13:29:09 kernel: WARNING: CPU: 0 PID: 6 at
>> /home/kernel/COD/linux/fs/btrfs/extent-tree.c:6226
>> __btrfs_free_extent+0x83b/0xc40 [btrfs]()
>> 13:29:09 kernel: Modules linked in: binfmt_misc cmac rfcomm bnep
>> bbswitch(OE) nls_iso8859_1 arc4 intel_rapl iosf_mbi
>> x86_pkg_temp_thermal intel_powerclamp coretemp dm_multipath
>> snd_hda_codec_realtek snd_hda_codec_hdmi scsi_dh snd_hda_codec_generic
>> kvm_intel kvm snd_hda_intel snd_hda_controller crct10dif_pclmul
>> crc32_pclmul iwlmvm snd_hda_codec snd_hda_core aesni_intel mac80211
>> aes_x86_64 lrw gf128mul glue_helper ablk_helper uvcvideo cryptd
>> snd_hwdep videobuf2_vmalloc videobuf2_memops snd_seq_midi snd_pcm
>> videobuf2_core snd_seq_midi_event v4l2_common videodev snd_rawmidi
>> serio_raw joydev media iwlwifi snd_seq thinkpad_acpi snd_seq_device
>> nvram btusb rtsx_pci_ms snd_timer cfg80211 btbcm btintel memstick
>> lpc_ich bluetooth snd soundcore mei_me mei shpchp ie31200_edac
>> edac_core mac_hid intel_rst parport_pc
>> 13:29:09 kernel:  ppdev lp parport autofs4 btrfs raid10 raid456
>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>> raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid ses
>> enclosure uas usb_storage rtsx_pci_sdmmc i915 i2c_algo_bit
>> drm_kms_helper e1000e psmouse ahci drm ptp rtsx_pci libahci pps_core
>> wmi video
>> 13:29:09 kernel: CPU: 0 PID: 6 Comm: kworker/u16:0 Tainted: G
>>   OE   4.1.15-040115-generic #201512150136
>> 13:29:09 kernel: Hardware name: LENOVO 20ANCTO1WW/20ANCTO1WW, BIOS
>> GLET81WW (2.35 ) 09/10/2015
>> 13:29:09 kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
>> [btrfs]
>> 13:29:09 kernel:  c03e1478 88042beaba68 817e3f7c
>> 0007
>> 13:29:09 kernel:   88042beabaa8 81079b8a
>> 1c098000
>> 13:29:09 kernel:  000c0a108000 88042bfb6000 8803fc0aa1b0
>> fffe
>> 13:29:09 kernel: Call Trace:
>> 13:29:09 kernel:  [] dump_stack+0x45/0x57
>> 13:29:09 kernel:  [] warn_slowpath_common+0x8a/0xc0
>> 13:29:09 kernel:  [] warn_slowpath_null+0x1a/0x20
>> 13:29:09 kernel:  [] __btrfs_free_extent+0x83b/0xc40
>> [btrfs]
>> 13:29:09 kernel:  [] ? __slab_free+0xa7/0x2b0
>> 13:29:09 kernel:  []
>> __btrfs_run_delayed_refs+0x9c7/0x1260 [btrfs]
>> 13:29:09 kernel:  [] ? __slab_free+0xa7/0x2b0
>> 13:29:09 kernel:  [] ?
>> join_transaction.isra.13+0x129/0x420 [btrfs]
>> 13:29:09 kernel:  []
>> btrfs_run_delayed_refs.part.68+0x73/0x270 [btrfs]
>> 13:29:09 kernel:  [] delayed_ref_async_start+0x8c/0xb0
>> [btrfs]
>> 13:29:09 kernel:  [] btrfs_scrubnc_helper+0xca/0x2b0
>> [btrfs]
>
>
> Seem

Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-05 Thread David Sterba
On Wed, Oct 07, 2015 at 07:40:46PM +0530, Chandan Rajendra wrote:
> On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
> > On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> > > + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> > > + mutex_unlock(&root->objectid_mutex);
> > > + ret = -ENOSPC;
> > 
> > ENOSPC ... I don't think it's right as this could be with a normal
> > enospc during subvolume creation. The problem is that theh inode number
> > space is exhausted, the closest error code I see is EOVERFLOW. As this
> > is an ioctl we can afford to define the meaning of this return value as
> > such (unlike for eg. creat()/open()).
> > 
> > > + goto free_root_dev;
> > > + }
> > > +
> > > + mutex_unlock(&root->objectid_mutex);
> > > +
> > > 
> > >   return 0;
> 
> David, Are you suggesting that we return -EOVERFLOW from within
> btrfs_init_fs_root() and continue returning -ENOSPC in case of error
> (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
> open_ctree()?
> 
> If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
> btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
> servicing the mount() syscall.

Sorry for not answering that. As you're going to resend it, please
use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow
error in the mount path.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2016-01-05 Thread David Sterba
On Sun, Jan 03, 2016 at 03:26:25AM +0100, Christoph Anton Mitterer wrote:
> On Sun, 2016-01-03 at 09:37 +0800, Qu Wenruo wrote:
> > And since you are making the stripe size configurable, then user is 
> > responsible for any too large or too small stripe size setting.
> That pops up the questions, which raid chunk sizes the kernel,
> respectively the userland tools should allow for btrfs...
> 
> I'd guess only powers of 2, some minimum, some maximum.

We have a full 32 bit number space, so multiples of power of 2 are also
possible if that makes sense. In general we don't need to set additional
limitations besides minimum, maximum and "minimal step".

> Are there any concerns/constraints with too small/too big chunks when
> these play together with lower block layers (I'd guess not).

I don't think so.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2016-01-05 Thread David Sterba
On Wed, Dec 30, 2015 at 04:21:47PM -0500, Sanidhya Solanki wrote:
> On Wed, 30 Dec 2015 17:17:22 +0100
> David Sterba  wrote:
> 
> > Let me note that a good reputation is also built from patch reviews
> > (hint hint).
> 
> Unfortunately, not too many patches coming in for BTRFS presently.
> Mailing list activity is down to 25-35 mails per day. Mostly feature
> and bug requests.
> 
> I will try to pitch in with patch reviews where possible.

It was not meant specifically to you, but I won't discourage you from
doing reviews of course. The period where a review is expected can vary
and is bound to the development cycle of kernel. At the latest, they
should come before the integration branch is put togheter (before the
merge window), and for the rc's it's before the next schedule (less than
a week).

The reviewed-by tag has a real meaning and weight in the community

http://lxr.free-electrons.com/source/Documentation/SubmittingPatches#L552

and besides that, subscribes the person to the blame game and can cause
bad feelings if the code turns out to be buggy later on.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2016-01-05 Thread David Sterba
On Thu, Dec 31, 2015 at 08:46:36AM +0800, Qu Wenruo wrote:
> > Let me note that a good reputation is also built from patch reviews
> > (hint hint).
> 
> I must admit I'm a bad reviewer.
> As when I review something, I always has an eager to rewrite part or all 
> the patch to follow my idea, even it's just a choice between different 
> design.

Yeah that's natural, but even if one does not completely agree, it's
still possible to verify that the implementation is correct.

The reviews also help to find and share some common style of
implementation so the maintainers don't scream when they see a patch and
developers and are not suprised that the patches take several rounds.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confining scrub to a subvolume

2016-01-05 Thread David Sterba
On Wed, Dec 30, 2015 at 08:26:00PM +0100, Christoph Anton Mitterer wrote:
> On Wed, 2015-12-30 at 18:39 +0100, David Sterba wrote:
> > The closest would be to read the files and look for any reported
> > errors.
> Doesn't that fail for any multi-device setup, in which case btrfs reads
> the blocks only from one device, and if that verifies, doesn't check
> the other?

That's right, it's not equivalent to the scrub.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Runs the xor function if a Block has failed

2016-01-05 Thread David Sterba
On Wed, Dec 30, 2015 at 09:15:44PM -0500, Sanidhya Solanki wrote:
> On Wed, 30 Dec 2015 18:18:26 +0100
> David Sterba  wrote:
> 
> > That's just the comment copied, the changelog does not explain why
> > it's ok to do just the run_xor there. It does not seem trivial to me.
> > Please describe that the end result after the code change is expected.
> 
> In the RAID 6 case after a failure, we discover that the failure
> affected the entire P stripe, without any bad data occurring. Hence, we
> xor the previously stored parity data to return the data that was lost
> in the P stripe failure.
> 
> The xor-red data is from the parity blocks. Hence, we are left with 
> recovered data belonging to the P stripe.

If the data a rerecovered, why is -EIO still returned? Also, I see some
post-recovery steps eg. for the damaged P stripes (at label pstripes)
and I'd expect something similar for the case you're fixing.

I'm not familiar with the raid56 implementation but the fix looks
suspiciously trivial and I doubt that the xor was omitted out of
laziness.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html