Re: [PATCH 10/15] btrfs-progs: fix qgroup realloc inheritance

2013-08-18 Thread Arne Jansen
On 08/15/13 01:16, Zach Brown wrote:
 qgroup.c:82:23: warning: memcpy with byte count of 0
 qgroup.c:83:23: warning: memcpy with byte count of 0
 
 The inheritance wasn't copying qgroups[] because a confused sizeof()
 gave 0 byte memcpy()s.  It's been like this for the year since it was
 merged, so I guess this isn't a very important thing to do :).

It only seems to hit if you give -[cx] before -i. I guess only very
few people use these options in the first place. They are primarily
for hosting providers.

Reviewed-by: Arne Jansen sensi...@gmx.net
 
 Signed-off-by: Zach Brown z...@redhat.com
 ---
  qgroup.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/qgroup.c b/qgroup.c
 index 038c4dc..86fe2b2 100644
 --- a/qgroup.c
 +++ b/qgroup.c
 @@ -74,7 +74,7 @@ qgroup_inherit_realloc(struct btrfs_qgroup_inherit 
 **inherit, int n, int pos)
  
   if (*inherit) {
   struct btrfs_qgroup_inherit *i = *inherit;
 - int s = sizeof(out-qgroups);
 + int s = sizeof(out-qgroups[0]);
  
   out-num_qgroups = i-num_qgroups;
   out-num_ref_copies = i-num_ref_copies;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


uncorrectable errors after btrfs replace

2013-08-18 Thread Stuart Pook

hi all

I moved my btrfs filesystems around using btrfs replace and now I have errors 
(lots of errors)

[63724.419779] BTRFS info (device dm-12): csum failed ino 9340 off 8192 csum 
717036259 private 94677163

: root; time  btrfs  scrub start -Bd /disks/backups
scrub device /dev/dm-11 (id 1) done
scrub started at Sun Aug 18 15:17:50 2013 and finished after 4487 
seconds
total bytes scrubbed: 576.46GB with 261883 errors
error details: csum=261883
corrected errors: 0, uncorrectable errors: 261883, unverified errors: 0

I had two 2 Tb disks who's data I needed to swap (/mnt on a WD-Black  
/disks/backup on a HD204UI). Both had btrfs systems but /disks/backup was encrypted 
using luks. I had a spare 640 Gb WD-Blue disk that I plugged into an SATA dock for 
this operation.

I btrfs resized /disks/backup to fit in 590 GB then I btrfs replaced /disks/backup to a new 
luks partition on the WD-Blue disk. Then I btrfs replaced /mnt to the HD204UI.  Then I btrfs 
replaced the backup data to a new luks partition on the WD-Black. I then got IO Errors reading /disks/backup.

I'm using: Linux kooka 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 
GNU/Linux
and btrfs-tools 0.19+20130315-5

rsync: write failed on 
/disks/backups/snapshot_rsync/stuart/secret/current/.purple/accounts.xml: 
Input/output error (5)

Lots of files on /disks/backup have errors. smartctl says passed for all the 
drives.

This is a summary of what I did:

6  btrfs filesystem resize 580g .
9  time btrfs  balance start -musage=1 -dusage=1 .  time  btrfs 
filesystem resize 580g .
   10  time  btrfs filesystem resize 590g .
   12  cryptsetup luksOpen /dev/sdd2 640Gb
   13  time btrfs replace start  /dev/dm-11 /dev/dm-12 -B /disks/backups
   14  time btrfs replace start  /dev/dm-11 /dev/dm-12 -B /disks/backups
   18  cryptsetup remove _dev_sdc2
   19  fdisk /dev/sdc
   32  time btrfs replace start  /dev/sdb1  /dev/sdc2 -B /mnt
   34  btrfs filesystem label  /dev/dm-12
   36   btrfs filesystem label /disks/backups backups2Tb
   38   btrfs filesystem label /disks/backups
   39  cryptsetup luksFormat /dev/sdb2
   40  cryptsetup luksAddKey /dev/sdb2
   41  cryptsetup open  /dev/sdb2 newbackups
   43  time btrfs replace start  /dev/dm-12  /dev/dm-11 -B /disks/backups
   44  btrfs filesystem show
   45  cryptsetup status 640Gb
   46  cryptsetup remove 640Gb
   47  btrfs filesystem show
   49  btrfs filesystem resize max /disks/backups/
   54  /etc/local/backups
# errors !
   57  time  btrfs  scrub start -Bd /disks/backups

Lots of errors in /var/log/syslog

Aug 18 12:27:51 kooka kernel: [54113.507151] btrfs: dev_replace from 
/dev/mapper/640Gb (devid 1) to /dev/dm-11) started
Aug 18 12:27:51 kooka kernel: [54113.601334] device label backups2Tb devid 1 
transid 39282 /dev/dm-12
Aug 18 12:28:03 kooka kernel: [54125.020038] ata10.00: exception Emask 0x10 
SAct 0x3dfe0ff0 SErr 0x780100 action 0x6
Aug 18 12:28:03 kooka kernel: [54125.020043] ata10.00: irq_stat 0x0800
Aug 18 12:28:03 kooka kernel: [54125.020047] ata10: SError: { UnrecovData 10B8B 
Dispar BadCRC Handshk }
Aug 18 12:28:03 kooka kernel: [54125.020050] ata10.00: failed command: READ 
FPDMA QUEUED
Aug 18 12:28:03 kooka kernel: [54125.020056] ata10.00: cmd 
60/18:20:c0:18:0b/00:00:00:00:00/40 tag 4 ncq 12288 in
Aug 18 12:28:03 kooka kernel: [54125.020056]  res 
40/00:5c:f0:1a:0b/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 12:28:03 kooka kernel: [54125.020059] ata10.00: status: { DRDY }
[...]
Aug 18 12:28:03 kooka kernel: [54125.020262] ata10: hard resetting link
Aug 18 12:28:03 kooka kernel: [54125.512032] ata10: SATA link up 3.0 Gbps 
(SStatus 123 SControl 300)
Aug 18 12:28:03 kooka kernel: [54125.523759] ata10.00: configured for UDMA/133
Aug 18 12:28:03 kooka kernel: [54125.536380] ata10: EH complete
Aug 18 12:28:04 kooka kernel: [54125.770176] ata10.00: exception Emask 0x10 
SAct 0x7fff SErr 0x780100 action 0x6
Aug 18 12:28:04 kooka kernel: [54125.770181] ata10.00: irq_stat 0x0800
Aug 18 12:28:04 kooka kernel: [54125.770184] ata10: SError: { UnrecovData 10B8B 
Dispar BadCRC Handshk }
[...]
Aug 18 12:28:17 kooka kernel: [54138.957095] ata10.00: status: { DRDY }
Aug 18 12:28:17 kooka kernel: [54138.957100] ata10: hard resetting link
Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up 1.5 Gbps 
(SStatus 113 SControl 310)
Aug 18 12:28:17 kooka kernel: [54139.449972] ata10.00: configured for UDMA/133
Aug 18 12:28:17 kooka kernel: [54139.464065] ata10: EH complete
[...]

Aug 18 12:38:31 kooka kernel: [54753.527070] btrfs: checksum error at logical 
52642709504 on dev /dev/dm-12, sector 104931328, root 1281, inode 42152, offset 
0, length 4096, links 1 (path: X)
[...]
Aug 18 12:38:31 kooka kernel: [54753.606566] btrfs: bdev /dev/dm-12 errs: wr 0, 
rd 0, flush 0, corrupt 1, gen 0
[...]
Aug 18 12:38:32 kooka kernel: [54753.679513] btrfs: bdev /dev/dm-12 errs: wr 0, 
rd 0, flush 0, corrupt 10, gen 0
Aug 18 12:38:36 

Re: uncorrectable errors after btrfs replace

2013-08-18 Thread Chris Murphy

On Aug 18, 2013, at 1:12 PM, Stuart Pook slp644...@pook.it wrote:

6  btrfs filesystem resize 580g .

You first shrank a 2TB btrfs file system on dmcrypt device to 590GB. But then 
you didn't resize the dm device or the partition?

 9  time btrfs  balance start -musage=1 -dusage=1 .  time  btrfs filesystem 
 resize 580g .
  10  time  btrfs filesystem resize 590g .


You followed the resize of the fs, but not the underlying devices, with a 
balance, then resized it two more times? This is weird, but also makes the 
sequence difficult to follow.

   13  time btrfs replace start  /dev/dm-11 /dev/dm-12 -B /disks/backups
   14  time btrfs replace start  /dev/dm-11 /dev/dm-12 -B /disks/backups

Why is this command repeated? What's with the numbering system that skips 
numbers?

 
 
 [...]
 Aug 18 12:28:03 kooka kernel: [54125.020262] ata10: hard resetting link
 Aug 18 12:28:03 kooka kernel: [54125.512032] ata10: SATA link up 3.0 Gbps 
 (SStatus 123 SControl 300)
 Aug 18 12:28:03 kooka kernel: [54125.523759] ata10.00: configured for UDMA/133
 Aug 18 12:28:03 kooka kernel: [54125.536380] ata10: EH complete
 Aug 18 12:28:04 kooka kernel: [54125.770176] ata10.00: exception Emask 0x10 
 SAct 0x7fff SErr 0x780100 action 0x6
 Aug 18 12:28:04 kooka kernel: [54125.770181] ata10.00: irq_stat 0x0800
 Aug 18 12:28:04 kooka kernel: [54125.770184] ata10: SError: { UnrecovData 
 10B8B Dispar BadCRC Handshk }
 [...]
 Aug 18 12:28:17 kooka kernel: [54138.957095] ata10.00: status: { DRDY }
 Aug 18 12:28:17 kooka kernel: [54138.957100] ata10: hard resetting link
 Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up 1.5 Gbps 
 (SStatus 113 SControl 310)
 Aug 18 12:28:17 kooka kernel: [54139.449972] ata10.00: configured for UDMA/133
 Aug 18 12:28:17 kooka kernel: [54139.464065] ata10: EH complete

Bad connection so libata is dropping the link from 3 Gbps to 1.5Gbps.
 

 199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age   Always  
  -   12080

This confirms that both ends of the cable are sensing communication problems 
between drive and controller. The cable needs to be replaced, likely it's the 
connector not the cable itself.


 I guess that /disks/backup is mostly dead and that I should just reformat it. 
  What do you think?

Well I think I'd try to simplify this drastically and see if you've got a 
reproducing bug. The steps you've got I find mostly incoherent, so I can't try 
to do what you did to see if it's reproducible.

 Next time I'll watch /var/log/syslog but I would have preferred that btrfs 
 replace stop when getting errors.

The errors should be self correcting, but the mere fact they're happening means 
that some errors could be occurring but aren't detected. If the data is 
corrupting in-transit, but the drive or controller didn't report a problem, 
then btrfs has no way of knowing it was written incorrectly. There's only so 
much software can do to overcome blatant hardware problems.

But, it seems unlikely such a high percent of errors would go undetected to 
result in so many uncorrectable errors, so there may be user error here along 
with a bug.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: uncorrectable errors after btrfs replace

2013-08-18 Thread Stuart Pook

hi Chris

thanks for your reply. I was unable to save the filesystem. Even after deleting 
all but 4Gb I still had too many errors so I just reformated the device.  I'm 
glad that it was my backups and not my data.

On 18/08/13 23:43, Chris Murphy wrote:

On Aug 18, 2013, at 1:12 PM, Stuart Pook slp644...@pook.it wrote:


6  btrfs filesystem resize 580g .


You first shrank a 2TB btrfs file system on dmcrypt device to 590GB.
But then you didn't resize the dm device or the partition?


no, I had no need to resize the dm device or partition.  I just read that when doing a 
replace the new device must be no smaller than the old device.  So I shrunk the old 
device using btrfs filesystem resize.  Once the resize worked I was able to 
do the replace but I didn't try to replace before resizing.

This is what btrfs(1) says on Debian: The targetdev needs to be same size or larger 
than the srcdev.  I may be confused here.


9  time btrfs balance start -musage=1 -dusage=1 .  time btrfs filesystem 
resize 580g .


I was surprised that the resize to 580Gb didn't work so I tried a magical 
rebalance before doing the resize to 580 again.  It still didn't work (not 
enough space) but a resize to 590 Gb did.


10  time  btrfs filesystem resize 590g .


this worked


You followed the resize of the fs, but not the underlying devices,
with a balance, then resized it two more times?


The resize to 580 didn't work. So I did a balance.  The resize to 580 still 
didn't work so I resized to 590.


This is weird, but also makes the sequence difficult to follow.



13  time btrfs replace start  /dev/dm-11 /dev/dm-12 -B /disks/backups
14  time btrfs replace start  /dev/dm-11 /dev/dm-12-B /disks/backups



Why is this command repeated? What's with the numbering system that
skips numbers?


The command is repeated because I cancelled it my mistake by setting the 
filesystem to readonly.  I'm not sure if I restarted it by rerunning the 
replace or just by remounting the filesystem readwrite in another window.

I'll put all of the commands at the end of this list.


Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up1.5 Gbps 
(SStatus 113 SControl 310)

Bad connection so libata is dropping the link from 3 Gbps to1.5Gbps.

199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age Always - 12080


This confirms that both ends of the cable are sensing communication
problems between drive and controller. The cable needs to be
replaced, likely it's the connector not the cable itself.


I think that I should stop using my SATA dock with the SATA ports on my 
motherboard which are probably not designed to be hot plugged.


I guess that /disks/backup is mostly dead and that I should just
reformat it.  What do you think?


Well I think I'd try to simplify this drastically and see if you've
got a reproducing bug.


I ran a badblocks scan on the raw device (not the luks device) and didn't get 
any errors.


The steps you've got I find mostly incoherent, so I can't try to do
what you did to see if it's reproducible.


yes, this was the first time I've tried this.  And just to make this more 
difficult some commands were typed in a different window.
 

Next time I'll watch /var/log/syslog but I would have preferred
that btrfs replace stop when getting errors.


The errors should be self correcting, but the mere fact they're
happening means that some errors could be occurring but aren't
detected. If the data is corrupting in-transit, but the drive or
controller didn't report a problem, then btrfs has no way of knowing
it was written incorrectly.


The data was written to the WD-Blue (640Gb) disk and then copied off it.  The 
only errors I saw concerned the WB-Blue.  If the errors were data corruption on 
writing or reading the WD-Blue then I would have thought that the checksums 
would have told me that there was something wrong.  btrfs didn't give me an IO 
error until I started to read the files when the data was on a final disk.

Does btrfs replace check the ckecksums as it reads the data from the disk 
that is being replaced?

Just to be clear. This is the series of btrfs replace I did:

backups : HD204UI - WD-Blue
/mnt : WD-Black - HD204UI
backups : WD-Blue - WD-Black

I guess that my backups were corrupted was they were written to or read from 
the WD-Blue. Wouldn't the checksums have detected this problem before the data 
was written to the WD-Black?


There's only so much software can do to overcome blatant hardware problems.


I was hoping to be informed of them


But, it seems unlikely such a high percent of errors would go
undetected to result in so many uncorrectable errors, so there may be
user error here along with a bug.


I'm not sure how I could have done it better. Does btrfs replace check that 
the data is correctly written to the new disk before it is removed from the old disk?  
Should I have used the 2 disks to make a RAID-1 and then done a scrub before removing the 
old disk?

Here is the complete list of 

Re: uncorrectable errors after btrfs replace

2013-08-18 Thread Chris Murphy

On Aug 18, 2013, at 4:35 PM, Stuart Pook slp644...@pook.it wrote:
 
 You first shrank a 2TB btrfs file system on dmcrypt device to 590GB.
 But then you didn't resize the dm device or the partition?
 
 no, I had no need to resize the dm device or partition.

OK well it's unusual to resize a file system and then not resize the containing 
block device. I don't know if Btrfs cares about this or not.

 
 I ran a badblocks scan on the raw device (not the luks device) and didn't get 
 any errors.

badblocks will depend on the drive determining a persistent read failure with a 
sector, and timing out before the SCSI block layer times out. Since the linux 
SCSI driver time out is 30 seconds, and most consumer drive ECT is 120 seconds, 
the bus is reset before the drive has a chance to report a bad sector. So I 
think you're better off using smartctl -l long tests to find bad sectors on a 
disk.

Further a smartctl -x may show SATA Phy Event Counters, which should have 0's 
or very low numbers and if not then that's also an indicator of hardware 
problems.


 The data was written to the WD-Blue (640Gb) disk and then copied off it.  The 
 only errors I saw concerned the WB-Blue.  If the errors were data corruption 
 on writing or reading the WD-Blue then I would have thought that the 
 checksums would have told me that there was something wrong.  btrfs didn't 
 give me an IO error until I started to read the files when the data was on a 
 final disk.

How does Btrfs know there's been a failure during write if the hardware hasn't 
detected it? Btrfs doesn't re-read everything it just wrote to the drive to 
confirm it was written correctly. It assumes it was unless there's a hardware 
error. It wouldn't know this until a Btrfs scrub is done on the written drive. 

What I can't tell you is how Btrfs behaves and if it behaves correctly, when 
writing data to hardware having transient errors. I don't know what it does 
when the hardware reports the error, but presumably if the hardware doesn't 
report an error Btrfs can't do anything about that except on the next read or 
scrub.




 
 Just to be clear. This is the series of btrfs replace I did:
 
 backups : HD204UI - WD-Blue
 /mnt : WD-Black - HD204UI
 backups : WD-Blue - WD-Black
 
 I guess that my backups were corrupted was they were written to or read from 
 the WD-Blue. Wouldn't the checksums have detected this problem before the 
 data was written to the WD-Black?

When you first encountered the btrfs reported csum errors, what operation was 
occurring?

 
 There's only so much software can do to overcome blatant hardware problems.
 
 I was hoping to be informed of them

Well you were informed of them in dmesg, by virtue of the controller having 
problems talking to a SATA rev 2 drive at rev 2 speed, with a negotiated 
fallback to rev 1 speed.
 
 But, it seems unlikely such a high percent of errors would go
 undetected to result in so many uncorrectable errors, so there may be
 user error here along with a bug.
 
 I'm not sure how I could have done it better. Does btrfs replace check that 
 the data is correctly written to the new disk before it is removed from the 
 old disk?

That's a valid question. Hopefully someone more knowledgable can answer what 
the expected error handling behavior is supposed to be.

  Should I have used the 2 disks to make a RAID-1 and then done a scrub before 
 removing the old disk?

Good question. Possibly it's best practices to use btrfs replace with an 
existing raid1, rather than using it as a way to move a single copy of data 
from one disk to another. I think you'd have been better off using btrfs send 
and receive for this operation.

A full dmesg might also be enlightening even if it is really long. Just put it 
in its own email without comment. I think pasting it out of forum is less 
preferred.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: uncorrectable errors after btrfs replace

2013-08-18 Thread George Mitchell
This is just a comment from someone following all of this from the 
sidelines.


And that is that I see so much going on here with this procedure that is 
scares me.  Once a single operation reaches a certain degree of 
complexity I get really scared because all it takes is a single misstep 
and my data is gone.  And that happens so easily as complexity increases 
and confusion tends to set in.  In this particular situation, my 
solution would probably have been to create a new btrfs partition from 
scratch on the new drive and simply mount the source partition/drive ro 
and rsync the data across to the target partition/drive rather than 
trying to do the btrfs replace operation.  That way I could have 
verified the target drive before erasing the source drive and I would 
not have had to worry about partition sizes, encryption, etc.


That said, I am certainly thankful that this was backup data and not 
working data.  But I think it serves as a cautionary tale as to not 
assuming that something should be done just because it theoretically can 
be done.  I am not really familiar with btrfs replace but would imagine 
that it is intended for use more in a raid situation than in simply 
moving data from one drive to another.




On 08/18/2013 05:42 PM, Chris Murphy wrote:

On Aug 18, 2013, at 4:35 PM, Stuart Pook slp644...@pook.it wrote:

You first shrank a 2TB btrfs file system on dmcrypt device to 590GB.
But then you didn't resize the dm device or the partition?

no, I had no need to resize the dm device or partition.

OK well it's unusual to resize a file system and then not resize the containing 
block device. I don't know if Btrfs cares about this or not.


I ran a badblocks scan on the raw device (not the luks device) and didn't get 
any errors.

badblocks will depend on the drive determining a persistent read failure with a 
sector, and timing out before the SCSI block layer times out. Since the linux 
SCSI driver time out is 30 seconds, and most consumer drive ECT is 120 seconds, 
the bus is reset before the drive has a chance to report a bad sector. So I 
think you're better off using smartctl -l long tests to find bad sectors on a 
disk.

Further a smartctl -x may show SATA Phy Event Counters, which should have 0's 
or very low numbers and if not then that's also an indicator of hardware 
problems.



The data was written to the WD-Blue (640Gb) disk and then copied off it.  The 
only errors I saw concerned the WB-Blue.  If the errors were data corruption on 
writing or reading the WD-Blue then I would have thought that the checksums 
would have told me that there was something wrong.  btrfs didn't give me an IO 
error until I started to read the files when the data was on a final disk.

How does Btrfs know there's been a failure during write if the hardware hasn't 
detected it? Btrfs doesn't re-read everything it just wrote to the drive to 
confirm it was written correctly. It assumes it was unless there's a hardware 
error. It wouldn't know this until a Btrfs scrub is done on the written drive.

What I can't tell you is how Btrfs behaves and if it behaves correctly, when 
writing data to hardware having transient errors. I don't know what it does 
when the hardware reports the error, but presumably if the hardware doesn't 
report an error Btrfs can't do anything about that except on the next read or 
scrub.





Just to be clear. This is the series of btrfs replace I did:

backups : HD204UI - WD-Blue
/mnt : WD-Black - HD204UI
backups : WD-Blue - WD-Black

I guess that my backups were corrupted was they were written to or read from 
the WD-Blue. Wouldn't the checksums have detected this problem before the data 
was written to the WD-Black?

When you first encountered the btrfs reported csum errors, what operation was 
occurring?


There's only so much software can do to overcome blatant hardware problems.

I was hoping to be informed of them

Well you were informed of them in dmesg, by virtue of the controller having 
problems talking to a SATA rev 2 drive at rev 2 speed, with a negotiated 
fallback to rev 1 speed.

But, it seems unlikely such a high percent of errors would go
undetected to result in so many uncorrectable errors, so there may be
user error here along with a bug.

I'm not sure how I could have done it better. Does btrfs replace check that 
the data is correctly written to the new disk before it is removed from the old disk?

That's a valid question. Hopefully someone more knowledgable can answer what 
the expected error handling behavior is supposed to be.


  Should I have used the 2 disks to make a RAID-1 and then done a scrub before 
removing the old disk?

Good question. Possibly it's best practices to use btrfs replace with an 
existing raid1, rather than using it as a way to move a single copy of data 
from one disk to another. I think you'd have been better off using btrfs send 
and receive for this operation.

A full dmesg might also be enlightening even if it is really 

Re: [PATCH] btrfs-progs: restore passing of super_bytenr to device scan

2013-08-18 Thread Miao Xie
Hi,

On thu, 15 Aug 2013 20:42:42 -0400, Jeff Mahoney wrote:
 Commit 615f2867 (Btrfs-progs: cleanup similar code in open_ctree_*
 and close_ctree) introduced a regression in btrfs-convert.

Wang has fixed this problem.

[PATCH] Btrfs-progs: fix wrong arg sb_bytenr for btrfs_scan_fs_devices()

Thanks
Miao

 open_ctree takes a sb_bytenr argument to specify where to find the
 superblock. Under normal conditions, this will be at BTRFS_SUPER_INFO_OFFSET,
 and that commit assumed as much under all conditions.
 
 make_btrfs allows the caller to specify which blocks to use for
 certain blocks (including the superblock) and this is used by btrfs-convert
 to avoid overwriting the source file system's superblock until the
 conversion is complete.
 
 When btrfs-convert goes to open the newly initialized file system, it
 fails with: No valid btrfs found since its superblock wasn't written
 to the normal location.
 
 This patch restores the passing down of super_bytesnr to
 btrfs_scan_one_device.
 
 Signed-off-by: Jeff Mahoney je...@suse.com
 ---
  btrfs-find-root.c |  2 +-
  cmds-chunk.c  |  2 +-
  disk-io.c | 10 +++---
  disk-io.h |  3 ++-
  4 files changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/btrfs-find-root.c b/btrfs-find-root.c
 index 9b3d7df..374cf81 100644
 --- a/btrfs-find-root.c
 +++ b/btrfs-find-root.c
 @@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
 char *device)
   return NULL;
   }
  
 - ret = btrfs_scan_fs_devices(fd, device, fs_devices);
 + ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0);
   if (ret)
   goto out;
  
 diff --git a/cmds-chunk.c b/cmds-chunk.c
 index 03314de..6ada328 100644
 --- a/cmds-chunk.c
 +++ b/cmds-chunk.c
 @@ -1291,7 +1291,7 @@ static int recover_prepare(struct recover_control *rc, 
 char *path)
   goto fail_free_sb;
   }
  
 - ret = btrfs_scan_fs_devices(fd, path, fs_devices);
 + ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0);
   if (ret)
   goto fail_free_sb;
  
 diff --git a/disk-io.c b/disk-io.c
 index 13dbe27..1b91de6 100644
 --- a/disk-io.c
 +++ b/disk-io.c
 @@ -909,13 +909,17 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info 
 *fs_info)
  }
  
  int btrfs_scan_fs_devices(int fd, const char *path,
 -   struct btrfs_fs_devices **fs_devices)
 +   struct btrfs_fs_devices **fs_devices,
 +   u64 super_bytenr)
  {
   u64 total_devs;
   int ret;
  
 + if (super_bytenr == 0)
 + super_bytenr = BTRFS_SUPER_INFO_OFFSET;
 +
   ret = btrfs_scan_one_device(fd, path, fs_devices,
 - total_devs, BTRFS_SUPER_INFO_OFFSET);
 + total_devs, super_bytenr);
   if (ret) {
   fprintf(stderr, No valid Btrfs found on %s\n, path);
   return ret;
 @@ -1001,7 +1005,7 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
 const char *path,
   if (restore)
   fs_info-on_restoring = 1;
  
 - ret = btrfs_scan_fs_devices(fp, path, fs_devices);
 + ret = btrfs_scan_fs_devices(fp, path, fs_devices, sb_bytenr);
   if (ret)
   goto out;
  
 diff --git a/disk-io.h b/disk-io.h
 index effaa9f..d7792e0 100644
 --- a/disk-io.h
 +++ b/disk-io.h
 @@ -59,7 +59,8 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info,
  void btrfs_release_all_roots(struct btrfs_fs_info *fs_info);
  void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info);
  int btrfs_scan_fs_devices(int fd, const char *path,
 -   struct btrfs_fs_devices **fs_devices);
 +   struct btrfs_fs_devices **fs_devices,
 +   u64 super_bytenr);
  int btrfs_setup_chunk_tree_and_device_map(struct btrfs_fs_info *fs_info);
  
  struct btrfs_root *open_ctree(const char *filename, u64 sb_bytenr, int 
 writes);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

2013-08-18 Thread Miao Xie
On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote:
 I added a patch where we started taking the ordered operations mutex when we
 waited on ordered extents.  We need this because we splice the list and 
 process
 it, so if a flusher came in during this scenario it would think the list was
 empty and we'd usually get an early ENOSPC.  The problem with this is that 
 this
 lock is used in transaction committing.  So we end up with something like this
 
 Transaction commit
   - wait on writers
 
 Delalloc flusher
   - run_ordered_operations (holds mutex)
   -wait for filemap-flush to do its thing
 
 flush task
   - cow_file_range
   -wait on btrfs_join_transaction because we're commiting
 
 some other task
   - commit_transaction because we notice trans-transaction-flush is set
   - run_ordered_operations (hang on mutex)

Sorry, I can not understand this explanation. As far as I know, if the flush 
task
waits on btrfs_join_transaction(), it means the transaction is under commit
(state = TRANS_STATE_COMMIT_DOING), and all the external 
writers(TRANS_START/TRANS_ATTACH/
TRANS_USERSPACE) have quitted the current transaction, so no one would try to 
call
run_ordered_operations().

Could you show us the reproduce steps?

Thanks
Miao

 
 We need to disentangle the ordered operations flushing from the delalloc
 flushing, since they are separate things.  This solves the deadlock issue I 
 was
 seeing.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.h|7 +++
  fs/btrfs/disk-io.c  |1 +
  fs/btrfs/ordered-data.c |4 ++--
  3 files changed, 10 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index ea4cc16..d79e32c 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1418,6 +1418,13 @@ struct btrfs_fs_info {
* before jumping into the main commit.
*/
   struct mutex ordered_operations_mutex;
 +
 + /*
 +  * Same as ordered_operations_mutex except this is for ordered extents
 +  * and not the operations.
 +  */
 + struct mutex ordered_extent_flush_mutex;
 +
   struct rw_semaphore extent_commit_sem;
  
   struct rw_semaphore cleanup_work_sem;
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index c82025d..880dcde 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2288,6 +2288,7 @@ int open_ctree(struct super_block *sb,
  
  
   mutex_init(fs_info-ordered_operations_mutex);
 + mutex_init(fs_info-ordered_extent_flush_mutex);
   mutex_init(fs_info-tree_log_mutex);
   mutex_init(fs_info-chunk_mutex);
   mutex_init(fs_info-transaction_kthread_mutex);
 diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
 index 8136982..b52b2c4 100644
 --- a/fs/btrfs/ordered-data.c
 +++ b/fs/btrfs/ordered-data.c
 @@ -671,7 +671,7 @@ int btrfs_run_ordered_operations(struct 
 btrfs_trans_handle *trans,
   INIT_LIST_HEAD(splice);
   INIT_LIST_HEAD(works);
  
 - mutex_lock(root-fs_info-ordered_operations_mutex);
 + mutex_lock(root-fs_info-ordered_extent_flush_mutex);
   spin_lock(root-fs_info-ordered_root_lock);
   list_splice_init(cur_trans-ordered_operations, splice);
   while (!list_empty(splice)) {
 @@ -718,7 +718,7 @@ out:
   list_del_init(work-list);
   btrfs_wait_and_free_delalloc_work(work);
   }
 - mutex_unlock(root-fs_info-ordered_operations_mutex);
 + mutex_unlock(root-fs_info-ordered_extent_flush_mutex);
   return ret;
  }
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html