Re: Incremental backup for a raid1

2014-03-14 Thread Duncan
Michael Schuerig posted on Thu, 13 Mar 2014 20:12:44 +0100 as excerpted:

 My backup use case is different from the what has been recently
 discussed in another thread. I'm trying to guard against hardware
 failure and other causes of destruction.
 
 I have a btrfs raid1 filesystem spread over two disks. I want to backup
 this filesystem regularly and efficiently to an external disk (same
 model as the ones in the raid) in such a way that
 
 * when one disk in the raid fails, I can substitute the backup and
 rebalancing from the surviving disk to the substitute only applies the
 missing changes.
 
 * when the entire raid fails, I can re-build a new one from the backup.
 
 The filesystem is mounted at its root and has several nested subvolumes
 and snapshots (in a .snapshots subdir on each subvol).
 
 Is it possible to do what I'm looking for?

AFAICS, as mentioned down the other subthread, the closest thing to this 
would be N-way mirroring, a coming feature on the roadmap for 
introduction after raid5/6 mode[1] gets completed.  The current raid1 
mode is 2-way-mirroring only, regardless of the number of devices.

N-way-mirroring is actually my most hotly anticipated feature for a 
different reason[2], but for you it would work like this:

1) Setup the 3-way (or 4-way if preferred) mirroring and balance to 
ensured copies of all data on all devices.

2) Optionally scrub to ensure the integrity of all copies.

3) Disconnect the backup device(s).  (Don't btrfs device delete, this 
would remove the copy.  Just disconnect.)  

4) Store the backups.

5) Periodically get them out and reconnect.

6) Rebalance to update.  (Since the devices remain members of the mirror, 
simply outdated, the balance should only update, not rewrite the entire 
thing.)

7) Optionally scrub to verify.

8) Repeat steps 3-7 as necessary.

If you went 4-way so two backups and alternated the one you plugged in, 
it'd also protect against mishap that might take out all devices during 
steps 5-7 when the backup is connected as well, since you'd still have 
that other backup available.

Unfortunately, completing raid5/6 support is still an ongoing project, 
and as a result, fully functional and /reasonably/ tested N-way-mirroring 
remains the same 6-months-minimum away that it has been for over a year 
now.  But I sure am anticipating that day!

---
[1] Currently, the raid5/6 support is incomplete, the parity is 
calculated and writes are done, but some restore scenarios aren't yet 
properly supported and raid5/6-mode scrub isn't complete either, so the 
current code is considered testing-only, not for deployment where the 
raid5/6 feature would actually be relied on.  That has remained the 
raid5/6 status for several kernels now, as the focus has been on bugfixing 
other areas including snapshot-aware defrag which is currently 
deactivated due to horrible scaling issues (current defrag COWS the 
operational mount only, duplicating previously shared blocks), send/
receive.

[2] In addition to loss of N-1 device-protection, I really love btrfs' 
data integrity features and the ability to recover from other copies if 
the one is found to be corrupted, which is why I'm running raid1 mode 
here.  But currently, there's only the two copies and if both get 
corrupted...  My sweet spot would be three copies, allowing corruption of 
two and recovery from the third, which is why I personally am so hotly 
anticipating N-way-mirroring, but unfortunately, it's looking a bit like 
the proverbial carrot on the stick in front of the donkey, these days.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel
On Thu, 13 Mar 2014 09:39:02 PM Chris Murphy wrote:

 smartctl -a or -x will tell you what SATA revision is in place. The queued
 trim support is in SATA Rev 3.1. I'm not certain if this requires only the
 drive to support that revision level, or both controller and drive.

Both I'd say as I believe it's the controller that has to issue it to the 
drive, and the drive needs to understand it.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.


Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel
Hi Marc,

On Thu, 13 Mar 2014 10:17:50 PM Marc MERLIN wrote:

 I'm not sure I'm seeing this, which field is that?

I *think* you want smartctl -i instead, and look for the field that says 
something like:

ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

So if my understanding is correct that says it's just rev. 3.0 so TRIM for 
this is synchronous.

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Hugo Mills
On Thu, Mar 13, 2014 at 10:16:28PM +, Hugo Mills wrote:
 On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:
  Lets try this again.  We can deadlock the box if we send on a box and try to
  write onto the same fs with the app that is trying to listen to the send 
  pipe.
  This is because the writer could get stuck waiting for a transaction commit
  which is being blocked by the send.  So fix this by making sure looking at 
  the
  commit roots is always going to be consistent.  We do this by keeping track 
  of
  which roots need to have their commit roots swapped during commit, and then
  taking the commit_root_sem and swapping them all at once.  Then make sure we
  take a read lock on the commit_root_sem in cases where we search the commit 
  root
  to make sure we're always looking at a consistent view of the commit roots.
  Previously we had problems with this because we would swap a fs tree commit 
  root
  and then swap the extent tree commit root independently which would cause 
  the
  backref walking code to screw up sometimes.  With this patch we no longer
  deadlock and pass all the weird send/receive corner cases.  Thanks,
 
There's something still going on here. I managed to get about twice
 as far through my test as I had before, but I again got an unexpected
 EOF in stream, with btrfs send returning 1. As before, I have this in
 syslog:
 
 Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
 backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
 found extent=36504023040\x0a
 
So, on the evidence of one data point (I'll have another one when I
 wake up tomorrow morning), this has made the problem harder to trigger
 but it's still possible.

   Data point two has arrived, and it's gone boom at about the same
point. The first failed at:
2014-03-13 22:09:11,749INFO Read 7247356514 bytes total
and the second at:
2014-03-14 03:53:46,990INFO Read 7247357071 bytes total
at approximately 1h45 into the process. The boot and home subvols have
been OK, and have been backing up happily all this time, but both are
smaller than the (~10 GiB) root subvol.

   I can add a load of data to /home and see if the problem happens
with a larger send size, or if it's just the process writing to a
subvol that has the snapshot being sent that causes it.

   The interesting thing here is that the error seems to be fairly
reliably in the same place (more or less). Before this patch, I was
seeing lockups (or EOF, with the earlier version of this patch) at
approximately 3.6-3.8 GB. Now it looks like it's going to be 7.2 GB.

   At least it's not locking up any more, just dying noisily (which is
marginally preferable).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hail and greetings.  We are a flat-pack invasion force from ---   
 Planet Ikea. We come in pieces. 


signature.asc
Description: Digital signature


Re: Incremental backup for a raid1

2014-03-14 Thread Michael Schuerig
On Friday 14 March 2014 06:42:27 Duncan wrote:
 N-way-mirroring is actually my most hotly anticipated feature for a 
 different reason[2], but for you it would work like this:
 
 1) Setup the 3-way (or 4-way if preferred) mirroring and balance to 
 ensured copies of all data on all devices.
 
 2) Optionally scrub to ensure the integrity of all copies.
 
 3) Disconnect the backup device(s).  (Don't btrfs device delete, this 
 would remove the copy.  Just disconnect.)
 
 4) Store the backups.
 
 5) Periodically get them out and reconnect.
 
 6) Rebalance to update.  (Since the devices remain members of the
 mirror,  simply outdated, the balance should only update, not rewrite
 the entire thing.)
 
 7) Optionally scrub to verify.
 
 8) Repeat steps 3-7 as necessary.

Judging from your description, N-way mirroring is (going to be) exactly 
what I was hoping for.

Michael

-- 
Michael Schuerig
mailto:mich...@schuerig.de
http://www.schuerig.de/michael/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to view transaction log chronologically, human-readable?

2014-03-14 Thread Marcel Partap
[...]
Theoretically, there should be someone on this mailing list capable of
answering this question, no?
Please feel invited to share your insights ;)
#Regards



On 01/03/14 02:21, Marcel Partap wrote:
 Dear BTFRS devs,
 I have a 1TB btrfs volume mounted read-only since two years because I
 deleted a bunch of files and didn't want to give up on them.
 Now with latest btrfs-find-root and btrfs restore --dry-run -t in a
 loop, I generated the full list of files contained in the last several
 hundred root trees. However, diffing these, I find the current one being
 the same until 94 root trees back, and the ones before contain earlier
 changes. Maybe by my own fault that is..whatever.
 
 Is there a way to just view the transaction history in a human-readable way?
 
 #Regards
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


UOB-X1H: Message..

2014-03-14 Thread Cham Tao Soon


I have proposal for you.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental backup for a raid1

2014-03-14 Thread Duncan
Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted:

[Duncan posted...]

 3) Disconnect the backup device(s).  (Don't btrfs device delete, this
 would remove the copy.  Just disconnect.)

Hmm...  Looking back at what I wrote...

Presumably either have the filesystem unmounted for the disconnect (and 
ideally, the system off, tho with modern drives in theory that's not an 
issue, but still good if it can be done), or at least remounted read-only.

I had guessed that was implicit, but making it explicit is probably best 
all around, just in case.  At least I can rest better with it, having 
made that explicit.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Duncan
Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:

 On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
 
 On Mar 13, 2014, at 8:11 PM, Marc MERLIN m...@merlins.org wrote:
 
  On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
  discard is, except on the very latest hardware, a synchronous
  command (it's a limitation of the SATA standard), and therefore
  results in very very poor performance.
  
  Interesting. How do I know if a given SSD will hang on discard?
  Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
 
 smartctl -a or -x will tell you what SATA revision is in place. The
 queued trim support is in SATA Rev 3.1. I'm not certain if this
 requires only the drive to support that revision level, or both
 controller and drive.
 
 I'm not sure I'm seeing this, which field is that?

 ATA Version is:   8
 ATA Standard is:  ATA-8-ACS revision 4c

Your drive didn't report it, but here, I have SATA fields as well, in 
addition to the ATA fields:

Here's the fields from my Corsair Neutron SSDs:

ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 6.0 Gb/s

Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:

ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s

(More about that below.)

Smartctl version here is 6.2 2013-07-26 r3841, according to the output. 
(I'm running gentoo/~amd64 FWIW so it's a local-build). You snipped that 
bit of your output so I can't compare.

But it may also depend on whether smartctl auto-detected and used the ATA 
or the SCSI (or something else) command set and how your devices are 
actually connected, plus BIOS settings, etc.  See the manpage 
documentation for the -d TYPE (--device=TYPE) option and the ATA/SCSI/SAT 
discussion rather further down the manpage for more.

Here I have direct SATA connections with the BIOS set to AHCI mode and am 
thus using the kernel's AHCI drivers, since that's the most common SATA 
chipset standard these days, thus increasing portability given my 
monolithic kernel build.

smartctl's -d test reports an original guess of scsi, changed to sat 
after detection.

Of course connection via USB bridge or the like complicates things 
considerably.


Meanwhile, SATA 2.5, 6 Gb/s on the SSDs, SATA 2.6, 3 Gb/s on the spinning 
rust?  WTF?  The SSDs have SATA 2.5 but 6 Gb/s while the spinning rust 
has a later 2.6 but only 3 Gb/s (tho of course on a mechanical drive the 
bus speed won't be the bottleneck)?  Now I'm confused.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix joining same transaction handle more than twice

2014-03-14 Thread Wang Shilong

On 03/13/2014 10:05 PM, Josef Bacik wrote:

On 03/13/2014 01:19 AM, Wang Shilong wrote:

We hit something like the following function call flows:

|-run_delalloc_range()
  |-btrfs_join_transaction()
|-cow_file_range()
  |-btrfs_join_transaction()
|-find_free_extent()
  |-btrfs_join_transaction()

Trace infomation can be seen as:

[ 7411.127040] [ cut here ]
[ 7411.127060] WARNING: CPU: 0 PID: 11557 at fs/btrfs/transaction.c:383 
start_transaction+0x561/0x580 [btrfs]()
[ 7411.127079] CPU: 0 PID: 11557 Comm: kworker/u8:9 Tainted: G   O 
3.13.0+ #4
[ 7411.127080] Hardware name: LENOVO QiTianM4350/ , BIOS F1KT52AUS 05/24/2013
[ 7411.127085] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-5)
[ 7411.127092] Call Trace:
[ 7411.127097]  [815b87b0] dump_stack+0x45/0x56
[ 7411.127101]  [81051ffd] warn_slowpath_common+0x7d/0xa0
[ 7411.127102]  [810520da] warn_slowpath_null+0x1a/0x20
[ 7411.127109]  [a0444fb1] start_transaction+0x561/0x580 [btrfs]
[ 7411.127115]  [a0445027] btrfs_join_transaction+0x17/0x20 [btrfs]
[ 7411.127120]  [a0431c91] find_free_extent+0xa21/0xb50 [btrfs]
[ 7411.127126]  [a0431f68] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[ 7411.127131]  [a04322ce] btrfs_alloc_free_block+0xee/0x440 [btrfs]
[ 7411.127137]  [a043bd6e] ? btree_set_page_dirty+0xe/0x10 [btrfs]
[ 7411.127142]  [a041da51] __btrfs_cow_block+0x121/0x530 [btrfs]
[ 7411.127146]  [a041dfff] btrfs_cow_block+0x11f/0x1c0 [btrfs]
[ 7411.127151]  [a0421b74] btrfs_search_slot+0x1d4/0x9c0 [btrfs]
[ 7411.127157]  [a0438567] btrfs_lookup_file_extent+0x37/0x40 [btrfs]
[ 7411.127163]  [a0456bfc] __btrfs_drop_extents+0x16c/0xd90 [btrfs]
[ 7411.127169]  [a0444ae3] ? start_transaction+0x93/0x580 [btrfs]
[ 7411.127171]  [811663e2] ? kmem_cache_alloc+0x132/0x140
[ 7411.127176]  [a041cd9a] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[ 7411.127182]  [a044aa61] cow_file_range_inline+0x181/0x2e0 [btrfs]
[ 7411.127187]  [a044aead] cow_file_range+0x2ed/0x440 [btrfs]
[ 7411.127194]  [a0464d7f] ? free_extent_buffer+0x4f/0xb0 [btrfs]
[ 7411.127200]  [a044b38f] run_delalloc_nocow+0x38f/0xa60 [btrfs]
[ 7411.127207]  [a0461600] ? test_range_bit+0x30/0x180 [btrfs]
[ 7411.127212]  [a044bd48] run_delalloc_range+0x2e8/0x350 [btrfs]
[ 7411.127219]  [a04618f9] ? find_lock_delalloc_range+0x1a9/0x1e0 
[btrfs]
[ 7411.127222]  [812a1e71] ? blk_queue_bio+0x2c1/0x330
[ 7411.127228]  [a0462ad4] __extent_writepage+0x2f4/0x760 [btrfs]

Here we fix it by avoiding joining transaction again if we have held
a transaction handle when allocating chunk in find_free_extent().



So I just put that warning there to see if we were ever embedding 3
joins at a time, not because it was an actual problem, I'd say just kill
the warning.  Thanks,
We need keep @orgin_rsv and restore it when ending transaction. So we'd 
better

not embed more than 2 joins now.

Thanks,
Wang


Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Wang Shilong
 Lets try this again.  We can deadlock the box if we send on a box and try to
 write onto the same fs with the app that is trying to listen to the send pipe.
 This is because the writer could get stuck waiting for a transaction commit
 which is being blocked by the send.  So fix this by making sure looking at the
 commit roots is always going to be consistent.  We do this by keeping track of
 which roots need to have their commit roots swapped during commit, and then
 taking the commit_root_sem and swapping them all at once.  Then make sure we
 take a read lock on the commit_root_sem in cases where we search the commit 
 root
 to make sure we're always looking at a consistent view of the commit roots.
 Previously we had problems with this because we would swap a fs tree commit 
 root
 and then swap the extent tree commit root independently which would cause the
 backref walking code to screw up sometimes.  With this patch we no longer
 deadlock and pass all the weird send/receive corner cases.  Thanks,

Now btrfs send are alway searching commit root! Your codes only seems to 
protect backref codes,
it reduce transaction blocked but make it not safe as we have discussed before.

-Wang
 
 Reportedy-by: Hugo Mills h...@carfax.org.uk
 Signed-off-by: Josef Bacik jba...@fb.com
 ---
 fs/btrfs/backref.c | 33 +++
 fs/btrfs/ctree.c   | 88 --
 fs/btrfs/ctree.h   |  3 +-
 fs/btrfs/disk-io.c |  3 +-
 fs/btrfs/extent-tree.c | 20 ++--
 fs/btrfs/inode-map.c   | 14 
 fs/btrfs/send.c| 57 ++--
 fs/btrfs/transaction.c | 45 --
 fs/btrfs/transaction.h |  1 +
 9 files changed, 77 insertions(+), 187 deletions(-)
 
 diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
 index 860f4f2..0be0e94 100644
 --- a/fs/btrfs/backref.c
 +++ b/fs/btrfs/backref.c
 @@ -329,7 +329,10 @@ static int __resolve_indirect_ref(struct btrfs_fs_info 
 *fs_info,
   goto out;
   }
 
 - root_level = btrfs_old_root_level(root, time_seq);
 + if (path-search_commit_root)
 + root_level = btrfs_header_level(root-commit_root);
 + else
 + root_level = btrfs_old_root_level(root, time_seq);
 
   if (root_level + 1 == level) {
   srcu_read_unlock(fs_info-subvol_srcu, index);
 @@ -1092,9 +1095,9 @@ static int btrfs_find_all_leafs(struct 
 btrfs_trans_handle *trans,
  *
  * returns 0 on success,  0 on error.
  */
 -int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 - struct btrfs_fs_info *fs_info, u64 bytenr,
 - u64 time_seq, struct ulist **roots)
 +static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 +   struct btrfs_fs_info *fs_info, u64 bytenr,
 +   u64 time_seq, struct ulist **roots)
 {
   struct ulist *tmp;
   struct ulist_node *node = NULL;
 @@ -1130,6 +1133,20 @@ int btrfs_find_all_roots(struct btrfs_trans_handle 
 *trans,
   return 0;
 }
 
 +int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 +  struct btrfs_fs_info *fs_info, u64 bytenr,
 +  u64 time_seq, struct ulist **roots)
 +{
 + int ret;
 +
 + if (!trans)
 + down_read(fs_info-commit_root_sem);
 + ret = __btrfs_find_all_roots(trans, fs_info, bytenr, time_seq, roots);
 + if (!trans)
 + up_read(fs_info-commit_root_sem);
 + return ret;
 +}
 +
 /*
  * this makes the path point to (inum INODE_ITEM ioff)
  */
 @@ -1509,6 +1526,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
   if (IS_ERR(trans))
   return PTR_ERR(trans);
   btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
 + } else {
 + down_read(fs_info-commit_root_sem);
   }
 
   ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid,
 @@ -1519,8 +1538,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 
   ULIST_ITER_INIT(ref_uiter);
   while (!ret  (ref_node = ulist_next(refs, ref_uiter))) {
 - ret = btrfs_find_all_roots(trans, fs_info, ref_node-val,
 -tree_mod_seq_elem.seq, roots);
 + ret = __btrfs_find_all_roots(trans, fs_info, ref_node-val,
 +  tree_mod_seq_elem.seq, roots);
   if (ret)
   break;
   ULIST_ITER_INIT(root_uiter);
 @@ -1542,6 +1561,8 @@ out:
   if (!search_commit_root) {
   btrfs_put_tree_mod_seq(fs_info, tree_mod_seq_elem);
   btrfs_end_transaction(trans, fs_info-extent_root);
 + } else {
 + up_read(fs_info-commit_root_sem);
   }
 
   return ret;
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 88d1b1e..9d89c16 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ 

Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots

2014-03-14 Thread Wang Shilong
Hi Josef,

Just ping this again.

Did you have any good ideas to rebuild extent tree if broken filesystem
is filled with snapshots.?

I was working on this recently, i was blocked that i can not verify if an extent
is *FULL BACKREF* mode or not. As a *FULL BACKREF* extent's refs can be 1
and more than 1..

I am willing to test  codes or have a try if you could give me some advice etc.

-Wang

 On 03/10/2014 11:50 PM, Josef Bacik wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 03/10/2014 08:12 AM, Shilong Wang wrote:
 Hi Josef,
 
 As i haven't thought any better ideas to rebuild extent tree which
 contains extent that owns 'FULL BACKREF' flag.
 
 Considering an extent's refs can be equal or more than 1 if this
 extent has *FULL BACKREF* flag, so we could not make sure an
 extent's flag by only searching fs/file tree any more.
 
 So until now, i just disable this option if snapshots exists,
 please correct me if i miss something here. Or you have any better
 ideas to solve this problem.~_~
 
 
 I thought the fsck stuff rebuilds full backref refs properly, does it
 not?  If it doesn't we need to fix that, however I'm fine with
 disabling the option if snapshots exist for the time being.  Thanks,
 If there are no snapshots, --init-extent-tree can works as expected.
 I just have not thought a better idea to rebuild extent tree if we do have
 snapshots which means we may have an extent with *FULL BACKREF*
 flag.
 
 Thanks,
 Wang
 
 Josef
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
 
 iQIcBAEBAgAGBQJTHd9NAAoJEANb+wAKly3BYCYP/0iTaaa7w0SnfXtgjoVyX+nT
 +e0Pa46zeKzpTujotCDb9E/2PBesCAvA4Psog3rkfsqJ2nXN9cERN4E6/JG4nAHh
 Hv4KPo+w+tCkC4U2wSoDivYrVk9G5SH25ewkgW6iheSYNIlm+PLbOQz9DzGjCFDp
 51J9tG5E010siOyhlLCyGj8ZTj+gXuoQVWKCS8dOpCLMrbYYjMDXa562hqWaLoS/
 t3eSfP7Tnnpl43NiMZI4fWrzmlFa5lba5iJmG59FeyiseRH4Zrhee4St1L1xDL5A
 /6f3tJJT7DJjRRJFv0nJAOvOPyFaK8bMaYmOQJg6VrhcyPKM3BxBVEab3HrmQ7jt
 LCMWobpIcM7e5BugmbTGGsFymhv05SQgvYGzpzRVXdsSzqubuqTcXwloNU5RyyFF
 sXT9IiW9wAibHe7mDN7V6nfo1bVfHsjvSVi1rqz4/zFOWyh8oqxfEhxUJIWhfFsn
 j0WJevvqKnjBJujyyuQpL13tzh69qei0AHOEme3R46BSRMnyuacy/WOeyo4VXPcn
 0GIeWbngAIWF/quhoQGkvofRMlPgftiDge8uz9pbm3IEKeiP9dQ/HvKsIHMKjnKW
 3dEBvMV/CSUQNek4VjO1ALefTRZQvJVL8Wxdij4W+djJw/uVX7fOhuqdkqyfM3FY
 CKSB3HUSUtDCammsvgQA
 =OT98
 -END PGP SIGNATURE-
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental backup for a raid1

2014-03-14 Thread George Mitchell
Actually, an interesting concept would be to have the initial two drive 
RAID 1 mirrored by 2 additional drives in 4-way configuration on a 
second machine at a remote location on a private high speed network with 
both machines up 24/7.  In that case, if such a configuration would 
work, either machine could be obliterated and the data would survive 
fully intact in full duplex mode.  It would just need to be remounted 
from the backup system and away it goes.  Just thinking of interesting 
possibilities with n-way mirroring.  Oh how I would love to have n-way 
mirroring to play with!




On 03/14/2014 04:24 AM, Duncan wrote:

Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted:

[Duncan posted...]


3) Disconnect the backup device(s).  (Don't btrfs device delete, this
would remove the copy.  Just disconnect.)

Hmm...  Looking back at what I wrote...

Presumably either have the filesystem unmounted for the disconnect (and
ideally, the system off, tho with modern drives in theory that's not an
issue, but still good if it can be done), or at least remounted read-only.

I had guessed that was implicit, but making it explicit is probably best
all around, just in case.  At least I can rest better with it, having
made that explicit.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots

2014-03-14 Thread Josef Bacik
On 03/14/2014 09:36 AM, Wang Shilong wrote:
 Hi Josef,
 
 Just ping this again.
 
 Did you have any good ideas to rebuild extent tree if broken
 filesystem is filled with snapshots.?
 
 I was working on this recently, i was blocked that i can not verify
 if an extent is *FULL BACKREF* mode or not. As a *FULL BACKREF*
 extent's refs can be 1 and more than 1..
 
 I am willing to test  codes or have a try if you could give me some
 advice etc.
 

Full backrefs aren't too hard.  Basically all you have to do is walk
down the fs tree and keep track of btrfs_header_owner(eb) for
everything we walk into.  If btrfs_header_owner(eb) == root-objectid
for the tree we are walking down then we need a ye olde normal backref
for this block.  If btrfs_header_owner(eb) != root-objectid we _may_
need a full backref, it depends on who owns the parent block.  The
following may be incomplete, I'm kind of sick

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root-objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root-node) == root-objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root-objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root-objectid and
btrfs_header_owner(eb) != root-objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_level(eb) != root-objectid  btrfs_header_level(parent)
!= root-objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent-start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

4) Not really a whole special case, just something to keep in mind, if
btrfs_header_owner(parent) == root-objectid but
btrfs_header_owner(eb) != root-objectid that means we have a normal
TREE_BLOCK_REF on eb, it's only when the parent doesn't match our
current root that it's a problem.


Does that make sense?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental backup for a raid1

2014-03-14 Thread Duncan
George Mitchell posted on Fri, 14 Mar 2014 06:46:19 -0700 as excerpted:

 Actually, an interesting concept would be to have the initial two drive
 RAID 1 mirrored by 2 additional drives in 4-way configuration on a
 second machine at a remote location on a private high speed network with
 both machines up 24/7.  In that case, if such a configuration would
 work, either machine could be obliterated and the data would survive
 fully intact in full duplex mode.  It would just need to be remounted
 from the backup system and away it goes.  Just thinking of interesting
 possibilities with n-way mirroring.  Oh how I would love to have n-way
 mirroring to play with!

In terms of raid1, mdraid already supports such a concept with its write 
mostly component device designation.  A component device designated 
write mostly is never read from unless it becomes the only device 
available, so it's perfect for such an over-the-net real-time-online-
backup solution.

The other half of the solution are the various block-device-over-network 
drivers such as BLK_DEV_NBD (see Documentation/blockdev/nbd.txt) for the 
client side, the server-side of which is in userspace.  That lets you 
have what appears to be a block-device routed over the inet to that 
remote location.

Of course mdraid is lacking btrfs' data integrity features, etc, with its 
raid1 implementation entirely lacking any data integrity or real-time 
cross-checking at all, but unlike btrfs' N-way-mirroring it gets points 
for actually being available right now, so as they say, YMMV.

Of course the other notable issue with your idea is that while it DOES 
address the real-time remote redundancy issue, that doesn't (by itself) 
deal with fat-fingering or similar issues where real-time actually means 
the same problem's duplicated to the backup as well.

But btrfs snapshots address the fat-fingering issue and can be done on 
the partially-remote filesystem solution as well, and local or remote-
local solutions (like periodic btrfs send to a separate local filesystem 
at both ends) can deal with the filesystem damage possibilities.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Cleanup the btrfs_workqueue related function type

2014-03-14 Thread David Sterba
On Thu, Mar 06, 2014 at 04:19:50AM +, quwen...@cn.fujitsu.com wrote:
 @@ -23,11 +23,13 @@
  struct btrfs_workqueue;
  /* Internal use only */
  struct __btrfs_workqueue;
 +struct btrfs_work;
 +typedef void (*btrfs_func_t)(struct btrfs_work *arg);

I don't see what's wrong with the non-typedef type, CodingStyle
discourages from using typedefs in general (Chapter 5).

The name btrfs_func_t is a generic, if you really need to use a typedef
here, please change it to something closer to the workqueues, eg.
btrfs_work_func_t.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)

2014-03-14 Thread Josef Bacik
On 03/12/2014 11:18 AM, Marc MERLIN wrote:
 I have a file server with 4 cpu cores and 5 btrfs devices: Label:
 btrfs_boot  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b Total
 devices 1 FS bytes used 48.92GiB devid1 size 79.93GiB used
 73.04GiB path /dev/mapper/cryptroot
 
 Label: varlocalspace  uuid: 9f46dbe2-1344-44c3-b0fb-af2888c34f18 
 Total devices 1 FS bytes used 1.10TiB devid1 size 1.63TiB used
 1.50TiB path /dev/mapper/cryptraid0
 
 Label: btrfs_pool1  uuid: 6358304a-2234-4243-b02d-4944c9af47d7 
 Total devices 1 FS bytes used 7.16TiB devid1 size 14.55TiB used
 7.50TiB path /dev/mapper/dshelf1
 
 Label: btrfs_pool2  uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6 
 Total devices 1 FS bytes used 3.34TiB devid1 size 7.28TiB used
 3.42TiB path /dev/mapper/dshelf2
 
 Label: bigbackup  uuid: 024ba4d0-dacb-438d-9f1b-eeb34083fe49 Total
 devices 5 FS bytes used 6.02TiB devid1 size 1.82TiB used
 1.43TiB path /dev/dm-9 devid2 size 1.82TiB used 1.43TiB path
 /dev/dm-6 devid3 size 1.82TiB used 1.43TiB path /dev/dm-5 devid
 4 size 1.82TiB used 1.43TiB path /dev/dm-7 devid5 size 1.82TiB
 used 1.43TiB path /dev/dm-8
 
 
 I have a very long running btrfs send/receive from btrfs_pool1 to
 bigbackup (long running meaning that it's been slowly copying over
 5 days)
 
 The problem is that this is blocking IO to btrfs_pool2 which is
 using totally different drives. By blocking IO I mean that IO to
 pool2 kind of works sometimes, and hangs for very long times at
 other times.
 
 It looks as if one rsync to btrfs_pool2 or one piece of IO hangs on
 a shared lock and once that happens, all IO to btrfs_pool2 stops
 for a long time. It does recover eventually without reboot, but the
 wait times are ridiculous (it could be 1H or more).
 
 As I write this, I have a killall -9 rsync that waited for over
 10mn before these processes would finally die: 23555   07:36
 wait_current_trans.isra.15 rsync -av -SH --delete (...) 23556
 07:36 exit   [rsync] defunct 25387
 2-04:41:22 wait_current_trans.isra.15 rsync --password-file
 (...) 27481   31:26 wait_current_trans.isra.15 rsync
 --password-file  (...) 2926804:41:34 wait_current_trans.isra.15
 rsync --password-file  (...) 2934304:41:31 exit
 [rsync] defunct 2949204:41:27 wait_current_trans.isra.15
 rsync --password-file  (...)
 
 1455907:14:49 wait_current_trans.isra.15 cp -i -al current
 20140312-feisty
 
 This is all stuck in btrfs kernel code. If someeone wants sysrq-w,
 there it is. 
 https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs_full.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=NfFB494sWgA3qCQbFaAQO2FapIJ6kuZcyS6PlP%2FXkCg%3D%0As=573f0b2deecc8980550a7645c9627b918659e0ab067590577c8ead4a59498bc1

  A quick summary: SysRq : Show Blocked State task
 PC stack   pid father btrfs-cleaner   D 8802126b0840 0
 3332  2 0x 8800c5dc9d00 0046
 8800c5dc9fd8 8800c69f6310 000141c0 8800c69f6310
 88017574c170 880211e671e8  880211e67000
 8801e5936e20 8800c5dc9d10 Call Trace: [8160b0d9]
 schedule+0x73/0x75 [8122a3c7]
 wait_current_trans.isra.15+0x98/0xf4 [81085062] ?
 finish_wait+0x65/0x65 [8122b86c]
 start_transaction+0x48e/0x4f2 [8122bc4f] ?
 __btrfs_end_transaction+0x2a1/0x2c6 [8122b8eb]
 btrfs_start_transaction+0x1b/0x1d [8121c5cd]
 btrfs_drop_snapshot+0x443/0x610 [8160d7b3] ?
 _raw_spin_unlock+0x17/0x2a [81074efb] ?
 finish_task_switch+0x51/0xdb [8160afbf] ?
 __schedule+0x537/0x5de [8122c08d]
 btrfs_clean_one_deleted_snapshot+0x103/0x10f [81224859]
 cleaner_kthread+0x103/0x136 [81224756] ?
 btrfs_alloc_root+0x26/0x26 [8106bc1b] kthread+0xae/0xb6 
 [8106bb6d] ? __kthread_parkme+0x61/0x61 
 [816141bc] ret_from_fork+0x7c/0xb0 [8106bb6d] ?
 __kthread_parkme+0x61/0x61 btrfs-transacti D 88021387eb00 0
   2 0x 8800c5dcb890 0046
 8800c5dcbfd8 88021387e5d0 000141c0 88021387e5d0
 88021f2141c0 88021387e5d0 8800c5dcb930 810fe574
 0002 8800c5dcb8a0 Call Trace: [810fe574]
 ? wait_on_page_read+0x3c/0x3c [8160b0d9]
 schedule+0x73/0x75 [8160b27e] io_schedule+0x60/0x7a 
 [810fe582] sleep_on_page+0xe/0x12 [8160b510]
 __wait_on_bit+0x48/0x7a [810fe522]
 wait_on_page_bit+0x7a/0x7c [81085096] ?
 autoremove_wake_function+0x34/0x34 [81245c70]
 read_extent_buffer_pages+0x1bf/0x204 [81223710] ?
 free_root_pointers+0x5b/0x5b [81224412]
 btree_read_extent_buffer_pages.constprop.45+0x66/0x100 
 [81225367] read_tree_block+0x2f/0x47 [8120e4b6]
 read_block_for_search.isra.26+0x24a/0x287 [8120fcf7]
 btrfs_search_slot+0x4f4/0x6bb [81214c3d]
 

Re: Incremental backup for a raid1

2014-03-14 Thread Austin S Hemmelgarn
On 2014-03-14 09:46, George Mitchell wrote:
 Actually, an interesting concept would be to have the initial two drive
 RAID 1 mirrored by 2 additional drives in 4-way configuration on a
 second machine at a remote location on a private high speed network with
 both machines up 24/7.  In that case, if such a configuration would
 work, either machine could be obliterated and the data would survive
 fully intact in full duplex mode.  It would just need to be remounted
 from the backup system and away it goes.  Just thinking of interesting
 possibilities with n-way mirroring.  Oh how I would love to have n-way
 mirroring to play with!
That can already be done, albeit slightly differently by stacking btrfs
RAID 1 on top of a pair of DRBD devices.  Of course, this doesn't
provide quite the same degree of safety as your suggestion, but it does
work (and DRBD makes the remote copy write-mostly for the local system
automatically).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Josef Bacik
On 03/11/2014 07:44 PM, Sage Weil wrote:
 Hey,
 
 Is this something you guys have seen before?  This is from v3.13-rc2.
 
 kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at 
 /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 
 __btrfs_free_extent+0x9ce/0xa20 [btrfs]()
 kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) 
 ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) 
 reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) 
 iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) 
 libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) 
 joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) 
 acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) 
 scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) 
 hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) 
 mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) 
 mdio(F) bnx2(F) libcrc32c(F)
 kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I 
  3.14.0-rc5-ceph-00016-gf31a96a #1
 kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 
 1.6.3 02/07/2011
 kernel: [49432.794223]  1674 8800bf1cbac8 816e4840 
 88022726ef90
 kernel: [49432.801700]   8800bf1cbb08 810524ac 
 a8b07e50
 kernel: [49432.809176]  880094e74120  b07c9000 
 
 kernel: [49432.816653] Call Trace:
 kernel: [49432.819119]  [816e4840] dump_stack+0x46/0x58
 kernel: [49432.825384]  [810524ac] warn_slowpath_common+0x8c/0xc0
 kernel: [49432.831413]  [810524fa] warn_slowpath_null+0x1a/0x20
 kernel: [49432.837284]  [a010b4be] __btrfs_free_extent+0x9ce/0xa20 
 [btrfs]
 kernel: [49432.844108]  [a01110b8] 
 __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
 kernel: [49432.851465]  [a0109458] ? 
 block_rsv_release_bytes+0x108/0x190 [btrfs]
 kernel: [49432.858823]  [a0114066] 
 btrfs_run_delayed_refs+0x76/0x2a0 [btrfs]
 kernel: [49432.865869]  [a01251ff] 
 __btrfs_end_transaction+0x26f/0x370 [btrfs]
 kernel: [49432.873044]  [a0125330] btrfs_end_transaction+0x10/0x20 
 [btrfs]
 kernel: [49432.879872]  [a01327de] btrfs_link+0x13e/0x1d0 [btrfs]
 kernel: [49432.885903]  [811b7571] vfs_link+0x1b1/0x270
 kernel: [49432.891060]  [811b8120] SyS_linkat+0x210/0x2d0
 kernel: [49432.896394]  [811b81fe] SyS_link+0x1e/0x20
 kernel: [49432.901380]  [816f7cd6] system_call_fastpath+0x1a/0x1f
 
 The full dump is at
 
   
 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0
   
 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gzk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14
 

Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
kernel line says 3.14.0-rc5, have you had it happen in both places?  Thanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Josef Bacik

On 03/14/2014 11:34 AM, Sage Weil wrote:

On Fri, 14 Mar 2014, Josef Bacik wrote:

On 03/11/2014 07:44 PM, Sage Weil wrote:

Hey,

Is this something you guys have seen before?  This is from v3.13-rc2.

kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at 
/srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 
__btrfs_free_extent+0x9ce/0xa20 [btrfs]()
kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) 
ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) 
reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) 
ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) 
libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) joydev(F) dcdbas(F) 
i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) acpi_power_meter(F) 
lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) scsi_transport_iscsi(F) 
nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) hid_generic(F) usbhid(F) 
hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) 
ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F)
kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I  
3.14.0-rc5-ceph-00016-gf31a96a #1
kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 
1.6.3 02/07/2011
kernel: [49432.794223]  1674 8800bf1cbac8 816e4840 
88022726ef90
kernel: [49432.801700]   8800bf1cbb08 810524ac 
a8b07e50
kernel: [49432.809176]  880094e74120  b07c9000 

kernel: [49432.816653] Call Trace:
kernel: [49432.819119]  [816e4840] dump_stack+0x46/0x58
kernel: [49432.825384]  [810524ac] warn_slowpath_common+0x8c/0xc0
kernel: [49432.831413]  [810524fa] warn_slowpath_null+0x1a/0x20
kernel: [49432.837284]  [a010b4be] __btrfs_free_extent+0x9ce/0xa20 
[btrfs]
kernel: [49432.844108]  [a01110b8] 
__btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
kernel: [49432.851465]  [a0109458] ? 
block_rsv_release_bytes+0x108/0x190 [btrfs]
kernel: [49432.858823]  [a0114066] btrfs_run_delayed_refs+0x76/0x2a0 
[btrfs]
kernel: [49432.865869]  [a01251ff] 
__btrfs_end_transaction+0x26f/0x370 [btrfs]
kernel: [49432.873044]  [a0125330] btrfs_end_transaction+0x10/0x20 
[btrfs]
kernel: [49432.879872]  [a01327de] btrfs_link+0x13e/0x1d0 [btrfs]
kernel: [49432.885903]  [811b7571] vfs_link+0x1b1/0x270
kernel: [49432.891060]  [811b8120] SyS_linkat+0x210/0x2d0
kernel: [49432.896394]  [811b81fe] SyS_link+0x1e/0x20
kernel: [49432.901380]  [816f7cd6] system_call_fastpath+0x1a/0x1f

The full dump is at


https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0

https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gzk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14



Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
kernel line says 3.14.0-rc5, have you had it happen in both places?  Thanks,


Whoops, that's my mistake.. it's 3.14-rc5.  The exact commit is it
git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some
ceph patches.



Cool, not worried about what you guys are doing, just wondering if it 
may be related to me screwing around in delayed ref land recently or if 
you had seen it earlier too.  Thanks,


Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Filipe David Manana
On Fri, Mar 14, 2014 at 3:35 PM, Josef Bacik jba...@fb.com wrote:
 On 03/14/2014 11:34 AM, Sage Weil wrote:

 On Fri, 14 Mar 2014, Josef Bacik wrote:

 On 03/11/2014 07:44 PM, Sage Weil wrote:

 Hey,

 Is this something you guys have seen before?  This is from v3.13-rc2.

 kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at
 /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748
 __btrfs_free_extent+0x9ce/0xa20 [btrfs]()
 kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F)
 cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F)
 xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F)
 ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F)
 libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F)
 joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F)
 mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F)
 auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F)
 sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F)
 mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F)
 scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F)
 kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF
 I  3.14.0-rc5-ceph-00016-gf31a96a #1
 kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648,
 BIOS 1.6.3 02/07/2011
 kernel: [49432.794223]  1674 8800bf1cbac8
 816e4840 88022726ef90
 kernel: [49432.801700]   8800bf1cbb08
 810524ac a8b07e50
 kernel: [49432.809176]  880094e74120 
 b07c9000 
 kernel: [49432.816653] Call Trace:
 kernel: [49432.819119]  [816e4840] dump_stack+0x46/0x58
 kernel: [49432.825384]  [810524ac]
 warn_slowpath_common+0x8c/0xc0
 kernel: [49432.831413]  [810524fa]
 warn_slowpath_null+0x1a/0x20
 kernel: [49432.837284]  [a010b4be]
 __btrfs_free_extent+0x9ce/0xa20 [btrfs]
 kernel: [49432.844108]  [a01110b8]
 __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
 kernel: [49432.851465]  [a0109458] ?
 block_rsv_release_bytes+0x108/0x190 [btrfs]
 kernel: [49432.858823]  [a0114066]
 btrfs_run_delayed_refs+0x76/0x2a0 [btrfs]
 kernel: [49432.865869]  [a01251ff]
 __btrfs_end_transaction+0x26f/0x370 [btrfs]
 kernel: [49432.873044]  [a0125330]
 btrfs_end_transaction+0x10/0x20 [btrfs]
 kernel: [49432.879872]  [a01327de] btrfs_link+0x13e/0x1d0
 [btrfs]
 kernel: [49432.885903]  [811b7571] vfs_link+0x1b1/0x270
 kernel: [49432.891060]  [811b8120] SyS_linkat+0x210/0x2d0
 kernel: [49432.896394]  [811b81fe] SyS_link+0x1e/0x20
 kernel: [49432.901380]  [816f7cd6]
 system_call_fastpath+0x1a/0x1f

 The full dump is at


 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0

 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gzk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0As=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14


 Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
 kernel line says 3.14.0-rc5, have you had it happen in both places?
 Thanks,


 Whoops, that's my mistake.. it's 3.14-rc5.  The exact commit is it
 git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some
 ceph patches.


 Cool, not worried about what you guys are doing, just wondering if it may be
 related to me screwing around in delayed ref land recently or if you had
 seen it earlier too.  Thanks,

I ran into this a couple times months ago, definitely way before the
recent changes in the ref merging code added in 3.14. I had balance
running with concurrent snapshot creation and deletion at the time,
but unsuccessful so far to trigger it again.


 Josef


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Josef Bacik

On 03/13/2014 06:16 PM, Hugo Mills wrote:

On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:

Lets try this again.  We can deadlock the box if we send on a box and try to
write onto the same fs with the app that is trying to listen to the send pipe.
This is because the writer could get stuck waiting for a transaction commit
which is being blocked by the send.  So fix this by making sure looking at the
commit roots is always going to be consistent.  We do this by keeping track of
which roots need to have their commit roots swapped during commit, and then
taking the commit_root_sem and swapping them all at once.  Then make sure we
take a read lock on the commit_root_sem in cases where we search the commit root
to make sure we're always looking at a consistent view of the commit roots.
Previously we had problems with this because we would swap a fs tree commit root
and then swap the extent tree commit root independently which would cause the
backref walking code to screw up sometimes.  With this patch we no longer
deadlock and pass all the weird send/receive corner cases.  Thanks,


There's something still going on here. I managed to get about twice
as far through my test as I had before, but I again got an unexpected
EOF in stream, with btrfs send returning 1. As before, I have this in
syslog:

Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
found extent=36504023040\x0a



I just noticed that the offset you have there is freaking gigantic, like 
700mb, which is way larger than what an extent should be.  Here is a 
newer debug patch, just chuck the old on and put this instead and re-run


http://paste.fedoraproject.org/85486/39482301

thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Marc MERLIN
On Fri, Mar 14, 2014 at 12:07:54PM +, Duncan wrote:
 Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:
 
  On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
  
  On Mar 13, 2014, at 8:11 PM, Marc MERLIN m...@merlins.org wrote:
  
   On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
   discard is, except on the very latest hardware, a synchronous
   command (it's a limitation of the SATA standard), and therefore
   results in very very poor performance.
   
   Interesting. How do I know if a given SSD will hang on discard?
   Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
  
  smartctl -a or -x will tell you what SATA revision is in place. The
  queued trim support is in SATA Rev 3.1. I'm not certain if this
  requires only the drive to support that revision level, or both
  controller and drive.
  
  I'm not sure I'm seeing this, which field is that?
 
  ATA Version is:   8
  ATA Standard is:  ATA-8-ACS revision 4c
 
 Your drive didn't report it, but here, I have SATA fields as well, in 
 addition to the ATA fields:
 
 Here's the fields from my Corsair Neutron SSDs:
 
 ATA Version is:   ATA8-ACS (minor revision not indicated)
 SATA Version is:  SATA 2.5, 6.0 Gb/s
 
 Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:
 
 ATA Version is:   ATA8-ACS T13/1699-D revision 4
 SATA Version is:  SATA 2.6, 3.0 Gb/s

Ok, my smartmontools was too old. I got a newer one and now have proper
output:
Device Model: Samsung SSD 840 EVO 1TB
Serial Number:S1D9NEAD934600N
LU WWN Device Id: 5 002538 85009a8ff
Firmware Version: EXT0BB0Q
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Fri Mar 14 10:49:39 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

So I have Sata 3.1, that's great news, it means I can keep using discard
without worrying about performance and hangs

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Martin K. Petersen
 Marc == Marc MERLIN m...@merlins.org writes:

Marc,

Marc So I have Sata 3.1, that's great news, it means I can keep using
Marc discard without worrying about performance and hangs

The fact that the drive reports compliance with a certain version of
SATA does not in any way imply that it implements all commands defined
in that specification.

The location where queued TRIM support is reported is somewhat unusual.
And last I looked hdparm -I had no infrastructure in place to report
stuff contained in log pages.

The kernel does look the right place to determine whether to issue the
queued or unqueued variant or not. But the information isn't exported to
userland.

So right now I'm afraid we don't have a good way for a user to determine
whether a device supports queued trims or not.

I guess we could consider either adding an ATA-specific I don't suck
flag in sysfs, add the missing code to hdparm, or both...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Holger Hoffstätte
On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:

 So right now I'm afraid we don't have a good way for a user to determine
 whether a device supports queued trims or not.

Mount with discard, unpack kernel tree, sync, rm -rf tree.
If it takes several seconds, you have sync discard, no?

This changed somewhere around kernel 3.8.x; before that it used to be 
acceptably fast. Since then I only do batch trims, daily (server) or 
weekly (laptop).

-h

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix race when updating existing ref head

2014-03-14 Thread Filipe David Borba Manana
While we update an existing ref head's extent_op, we're not holding
its spinlock, so while we're updating its extent_op contents (key,
flags) we can have a task running __btrfs_run_delayed_refs() that
holds the ref head's lock and sets its extent_op to NULL right after
the task updating the ref head just checked its extent_op was not NULL.

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---
 fs/btrfs/delayed-ref.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2502ba5..3129964 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -495,6 +495,7 @@ update_existing_head_ref(struct btrfs_delayed_ref_node 
*existing,
ref = btrfs_delayed_node_to_head(update);
BUG_ON(existing_ref-is_data != ref-is_data);
 
+   spin_lock(existing_ref-lock);
if (ref-must_insert_reserved) {
/* if the extent was freed and then
 * reallocated before the delayed ref
@@ -536,7 +537,6 @@ update_existing_head_ref(struct btrfs_delayed_ref_node 
*existing,
 * only need the lock for this case cause we could be processing it
 * currently, for refs we just added we know we're a-ok.
 */
-   spin_lock(existing_ref-lock);
existing-ref_mod += update-ref_mod;
spin_unlock(existing_ref-lock);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Murphy

On Mar 13, 2014, at 11:17 PM, Marc MERLIN m...@merlins.org wrote:

 On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
 
 On Mar 13, 2014, at 8:11 PM, Marc MERLIN m...@merlins.org wrote:
 
 On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
 discard is, except on the very latest hardware, a synchronous command
 (it's a limitation of the SATA standard), and therefore results in
 very very poor performance.
 
 Interesting. How do I know if a given SSD will hang on discard?
 Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
 
 smartctl -a or -x will tell you what SATA revision is in place. The queued 
 trim support is in SATA Rev 3.1. I'm not certain if this requires only the 
 drive to support that revision level, or both controller and drive.
 
 I'm not sure I'm seeing this, which field is that?
 
 === START OF INFORMATION SECTION ===
 Device Model: Samsung SSD 840 EVO 1TB
 Serial Number:S1D9NEAD934600N
 LU WWN Device Id: 5 002538 85009a8ff
 Firmware Version: EXT0BB0Q
 User Capacity:1,000,204,886,016 bytes [1.00 TB]
 Sector Size:  512 bytes logical/physical
 Device is:Not in smartctl database [for details use: -P showall]
 ATA Version is:   8
 ATA Standard is:  ATA-8-ACS revision 4c
 Local Time is:Thu Mar 13 22:15:14 2014 PDT
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled


After ATA Version for me.

$ smartctl -a /dev/disk0
smartctl 6.1 2013-03-16 r3800 [x86_64-apple-darwin12.3.0] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: SAMSUNG SSD 830 Series
Serial Number:S0Z4NEAC933856
LU WWN Device Id: 5 002538 043584d30
Firmware Version: CXM03B1Q
User Capacity:256,060,514,304 bytes [256 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 2
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Fri Mar 14 15:37:07 2014 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The Samsung hardware by and large is fairly well behaved with discard in my 
experience. But it does really depend a lot on the workload. I'd notice 
occasional random freezes for a couple of seconds when I had it enabled in OS X 
(totally different animal from the kernel up), nothing severe. But it was 
annoying enough I disabled it, and the problem went away. Apple doesn't enable 
trim by default on non-Apple SSD's still, so the idea that everyone else is 
doing this isn't true. The Windows implementation is rather complex, and also 
isn't always used contrary to what's been reported (on the everybody panic or 
get mad NOW type web sites).

If you want to be conservative about it, I'd say just manually run fstrim when 
the system is idle. Do that once a week or two. Chron job it if you want.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-14 Thread Rich Freeman
On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman
r-bt...@thefreemanclan.net wrote:
 On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik jba...@fb.com wrote:
 On 03/12/2014 08:56 AM, Rich Freeman wrote:

  After a number of reboots the system became stable, presumably
 whatever race condition btrfs was hitting followed a favorable
 path.

 I do have a 2GB btrfs-image pre-dating my application of this
 patch that was causing the issue last week.


 Uhm wow that's pretty epic.  I will talk to chris and figure out how
 we want to deal with that and send you a patch shortly.  Thanks,

 A tiny bit more background.

And some more background.  I had more reboots over the next two days
at the same time each day, just after my crontab successfully
completed.  One of the last thing it does is runs the snapper cleanups
which delete a bunch of snapshots.  During a reboot I checked and
there were a bunch of deleted snapshots, which disappeared over the
next 30-60 seconds before the panic, and then they would re-appear on
the next reboot.

I disabled the snapper cron job and this morning had no issues at all.
 One day isn't much to establish a trend, but I suspect that this is
the cause.  Obviously getting rid of snapshots would be desirable at
some point, but I can wait for a patch.  Snapper would be deleting
about 48 snapshots at the same time, since I create them hourly and
the cleanup occurs daily on two different subvolumes on the same
filesystem.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Hugo Mills
On Fri, Mar 14, 2014 at 02:51:22PM -0400, Josef Bacik wrote:
 On 03/13/2014 06:16 PM, Hugo Mills wrote:
 On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:
 Lets try this again.  We can deadlock the box if we send on a box and try to
 write onto the same fs with the app that is trying to listen to the send 
 pipe.
 This is because the writer could get stuck waiting for a transaction commit
 which is being blocked by the send.  So fix this by making sure looking at 
 the
 commit roots is always going to be consistent.  We do this by keeping track 
 of
 which roots need to have their commit roots swapped during commit, and then
 taking the commit_root_sem and swapping them all at once.  Then make sure we
 take a read lock on the commit_root_sem in cases where we search the commit 
 root
 to make sure we're always looking at a consistent view of the commit roots.
 Previously we had problems with this because we would swap a fs tree commit 
 root
 and then swap the extent tree commit root independently which would cause 
 the
 backref walking code to screw up sometimes.  With this patch we no longer
 deadlock and pass all the weird send/receive corner cases.  Thanks,
 
 There's something still going on here. I managed to get about twice
 as far through my test as I had before, but I again got an unexpected
 EOF in stream, with btrfs send returning 1. As before, I have this in
 syslog:
 
 Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
 backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
 found extent=36504023040\x0a
 
 
 I just noticed that the offset you have there is freaking gigantic,
 like 700mb, which is way larger than what an extent should be.  Here
 is a newer debug patch, just chuck the old on and put this instead
 and re-run
 
 http://paste.fedoraproject.org/85486/39482301

   That last run, with the above patch, failed again, at approximately
the same place again. The only output in dmesg is:

[ 6488.168469] BTRFS error (device sda2): did not find backref in send_root. 
inode=1786631, offset=825257984, disk_byte=36504023040 found 
extent=36504023040, len=1294336

as before. Definitely no kernel WARN, no backtraces.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature


btrfs: lock inversion between delayed_node-mutex and found-groups_sem

2014-03-14 Thread Sasha Levin

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following:

[  788.451695] =
[  788.452455] [ INFO: possible irq lock inversion dependency detected ]
[  788.453020] 3.14.0-rc6-next-20140313-sasha-00010-gb8c1db1-dirty #217 
Tainted: GW
[  788.453827] -
[  788.454371] kswapd3/4199 just changed the state of lock:
[  788.454902]  (delayed_node-mutex){+.+.-.}, at: 
__btrfs_release_delayed_node+0x4f/0x140 (fs/btrfs/delayed-inode.c:263)
[  788.455890] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[  788.456543]  (found-groups_sem){+.}

and interrupts could create inverse lock ordering between them.

[  788.457491]
[  788.457491] other info that might help us debug this:
[  788.458115]  Possible interrupt unsafe locking scenario:
[  788.458115]
[  788.458756]CPU0CPU1
[  788.459188]
[  788.459625]   lock(found-groups_sem);
[  788.460041]local_irq_disable();
[  788.460041]lock(delayed_node-mutex);
[  788.460041]lock(found-groups_sem);
[  788.460041]   Interrupt
[  788.460041] lock(delayed_node-mutex);
[  788.460041]
[  788.460041]  *** DEADLOCK ***
[  788.460041]
[  788.460041] 2 locks held by kswapd3/4199:
[  788.460041]  #0:  (shrinker_rwsem){..}, at: shrink_slab+0x3f/0x160 
(mm/vmscan.c:360)
[  788.460041]  #1:  (type-s_umount_key#108){.+.+..}, at: 
grab_super_passive+0x56/0x90 (fs/super.c:361)
[  788.460041]
[  788.460041] the shortest dependencies between 2nd lock and 1st lock:
[  788.460041]  - (found-groups_sem){+.} ops: 46 {
[  788.460041] HARDIRQ-ON-W at:
[  788.460041]   mark_irqflags+0xf0/0x170 
(kernel/locking/lockdep.c:2800)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  788.460041]   down_write+0x5c/0xc0 
(arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:50)
[  788.460041]   __link_block_group+0x45/0x110 
(fs/btrfs/extent-tree.c:8348)
[  788.460041]   btrfs_read_block_groups+0x3ae/0x700 
(fs/btrfs/extent-tree.c:8533)
[  788.460041]   open_ctree+0x1abf/0x2210 
(fs/btrfs/disk-io.c:2749)
[  788.460041]   btrfs_fill_super+0x81/0x140 
(fs/btrfs/super.c:958)
[  788.460041]   btrfs_mount+0x26a/0x300 
(fs/btrfs/super.c:1295)
[  788.460041]   mount_fs+0x8d/0x1a0 (fs/super.c:1091)
[  788.460041]   vfs_kern_mount+0x79/0x150 
(fs/namespace.c:813)
[  788.460041]   do_new_mount+0xcd/0x1c0 
(fs/namespace.c:2068)[  788.460041]   do_mount+0x15d/0x210 
(fs/namespace.c:2392)
[  788.460041]   SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 
fs/namespace.c:2560)
[  788.460041]   tracesys+0xdd/0xe2 
(arch/x86/kernel/entry_64.S:749)
[  788.460041] HARDIRQ-ON-R at:
[  788.460041]   mark_irqflags+0xbc/0x170 
(kernel/locking/lockdep.c:2792)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  788.460041]   down_read+0x4c/0xa0 
(arch/x86/include/asm/rwsem.h:83 kernel/locking/rwsem.c:23)
[  788.460041]   
btrfs_calc_num_tolerated_disk_barrier_failures+0x2a7/0x3a0 
(fs/btrfs/disk-io.c:3309)
[  788.460041]   open_ctree+0x1af7/0x2210 
(fs/btrfs/disk-io.c:2755)
[  788.460041]   btrfs_fill_super+0x81/0x140 
(fs/btrfs/super.c:958)
[  788.460041]   btrfs_mount+0x26a/0x300 
(fs/btrfs/super.c:1295)
[  788.460041]   mount_fs+0x8d/0x1a0 (fs/super.c:1091)
[  788.460041]   vfs_kern_mount+0x79/0x150 
(fs/namespace.c:813)
[  788.460041]   do_new_mount+0xcd/0x1c0 
(fs/namespace.c:2068)
[  788.460041]   do_mount+0x15d/0x210 (fs/namespace.c:2392)
[  788.460041]   SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 
fs/namespace.c:2560)
[  788.460041]   tracesys+0xdd/0xe2 
(arch/x86/kernel/entry_64.S:749)
[  788.460041] SOFTIRQ-ON-W at:
[  788.460041]   mark_irqflags+0x110/0x170 
(kernel/locking/lockdep.c:2804)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 

Re: [PATCH] Btrfs-progs: scrub: don't call unlock if pthread_mutex_lock fails

2014-03-14 Thread Rakesh Pandit
Hi,

Forgot to mention the reason for change. If accepted this can be
included in commit message:

On Sat, Mar 15, 2014 at 01:49:45AM +0200, Rakesh Pandit wrote:
 If pthread_mutex_lock fails (rare but fix it anyway), don't call
 pthread_mutex_unlock on mutex.


Rationale being that if pthread_mutex_lock fails pthread_mutex_unlock
will always fail and overwrite actual error value in err.
 
 Signed-off-by: Rakesh Pandit rak...@tuxera.com

regards,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel
On Fri, 14 Mar 2014 06:33:24 PM Chris Samuel wrote:

 I *think* you want smartctl -i instead, and look for the field that says 
 something like:
 
 ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

Late night, cut and pasted the wrong line of output, mine says:

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Of course that's what the drive is reporting it supports, I'm not sure whether 
that's the result of what has been negotiated between the controller and drive 
or purely what the drive supports.

To get more information from smartctl you can use the --identify=wb option 
instead of -i and that should give you a lot more detail about what then 
drives claims to (and not to) support.   On the version in Kubuntu 13.10 
(6.1+svn3812-1) it only reports 3 things regarding TRIM for my drives.

chris@quad:/tmp$ sudo smartctl --identify=wb -d sat /dev/sdb | egrep -i 'trim|
discard'
  69 14  1   Deterministic data after trim supported
  69  5  0   Trimmed LBA range(s) returning zeroed data supported
 169  0  1   Trim bit in DATA SET MANAGEMENT command supported

I'm currently doing a git clone of their SVN repo to see if there's any new 
functionality that will gather any more information.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.


Re: discard synchronous on most SSDs?

2014-03-14 Thread Marc MERLIN
On Fri, Mar 14, 2014 at 08:46:09PM +, Holger Hoffstätte wrote:
 On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:
 
  So right now I'm afraid we don't have a good way for a user to determine
  whether a device supports queued trims or not.
 
 Mount with discard, unpack kernel tree, sync, rm -rf tree.
 If it takes several seconds, you have sync discard, no?

Mmmh, interesting point.

legolas:/usr/src# time rm -rf linux-3.14-rc5
real0m1.584s
user0m0.008s
sys 0m1.524s

I remounted my FS with remount,nodiscard, and the time was the same.

 This changed somewhere around kernel 3.8.x; before that it used to be 
 acceptably fast. Since then I only do batch trims, daily (server) or 
 weekly (laptop).

I'm never really timed this before. Is it supposed to be faster than 1.5s on
a fast SSD?

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html