Re: Btrfs out of inodes becomes corrupt

2012-02-08 Thread Felix Blanke

On 2/8/12 7:41 AM, Chris Samuel wrote:

On Monday 06 February 2012 06:57:42 Hugo Mills wrote:


This al under debian  with kernel 2.6.32-5.


Aargh.

You are aware that this is an insanely old version of the brtfs
code, and it has major flaws in it?


As someone who runs his work laptop with a 2.6.32 laptop and btrfs for
/home I'll bite at this one - I've held off updating after reading the
issues people were reporting on the list with newer kernels that did
not appear to be present in 2.6.32 (indeed IIRC a particular problem
from that time could only be solved at the time be remounting the
filesystem with 2.6.32, at which point newer kernels could access it
again).   It's served me very well, it "just works" (so far at least).

It's now sounding that 3.2 is probably stable enough for me to
consider updating to the next KUbuntu release when it comes out.

NB:  Yes, I do make nightly backups, and no, I don't run the
filesystem at anything close to even a quarter full (not that that
guarantees anything, or is even particularly deliberate).

cheers,
Chris


Ofc you can be one of the lucky users who doesn't hit a bug on .32. Imho 
that isn't an argument for not upgrading.
There are new bugs in 3.2, but on the other hand there are hundreds of 
fixed bugs, new features etc.


Especially using alpha releases of software without upgrading whenever 
there is a chance to seems very strange to me :)


Kind regards,
Felix Blanke
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


passing positive numbers to ERR_PTR()

2012-02-08 Thread Dan Carpenter
Hi Jan,

Smatch complains when you pass positive numbers to ERR_PTR().  There
is a warning triggered in iref_to_path().

fs/btrfs/backref.c +920 iref_to_path()
   918  
   919  if (ret)
   920  return ERR_PTR(ret);
   ^^^
"ret" can be either a negative error code, zero, or one here.

   921  

I looked at the code, but couldn't tell if it was intentional or not.
It really is pretty unusual to do that, so maybe there should be a
comment or something.

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs out of inodes becomes corrupt

2012-02-08 Thread Hugo Mills
On Mon, Feb 06, 2012 at 03:35:14PM +, Tommy Faasen wrote:
> I rolled a new kernel 3.2.4 and it picked everything up.
> No crashes, my disk was still full, with 40+ gigs free
> but now I can delete files and access them.
> 
> I'm running btrfs fi balance /mountpoint at the moment which I understand 
> frees up the remaining space.

   Well, not quite. It has a side-effect that if you have an
over-allocation of space to one or other of data/metadata, it will
serve to reduce that allocation, giving back the over-allocated space
to the free pool, from where it can be reallocated by the other group
type.

   So if you have, say 5G allocated for metadata, but only 500M used,
you could gain back a large amount (but probably not all) of the
unused 4.5G by running a balance.

   Hugo.

> Thanks again for your assistance!
> 
> Tommy
> 
> On Mon, Feb 06, 2012 at 11:40:31AM +, Hugo Mills wrote:
> > On Mon, Feb 06, 2012 at 05:33:51AM -0600, cwillu wrote:
> > > >  3.0 is two revisions old -- something on the order of 6
> > > > months. There's been a lot of development since then, too. You really
> > > > should be running 3.2 or 3.2-rc2 (i.e. the latest released or latest
> > > > development version of the kernel).
> > > 
> > > I think you meant 3.2 or 3.3-rc2.
> > 
> >I did indeed. Thanks for the correction.
> > 
> >Hugo.
> > 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Quantum Mechanics: the dreams stuff is made of. --- 


signature.asc
Description: Digital signature


Re: btrfs unmountable after failed suspend

2012-02-08 Thread Chris Mason
On Tue, Feb 07, 2012 at 06:10:15PM -0600, Chester wrote:
> This is dmesg mounted with -o ro,recovery
> [   20.957392] exe used greatest stack depth: 4920 bytes left
> [  145.340317] device label BtrfsLinux devid 1 transid 332442 /dev/sda6
> [  145.341702] btrfs: enabling auto recovery
> [  145.341803] btrfs: disk space caching is enabled
> [  152.457967] btrfs: corrupt leaf, bad key order:
> block=653297209344,root=1, slot=7
> [  152.487933] btrfs: corrupt leaf, bad key order:
> block=653297209344,root=1, slot=7
> [  152.488326] [ cut here ]
> [  152.488549] kernel BUG at fs/btrfs/extent-tree.c:5797!

Well, this isn't good.  If you can run btrfs-zero-log it'll get past
this part, but I'd suggest a fsck run to see if there are other
corrupted blocks.

Bad key ordering is usually from memory corruption, so this block
probably isn't alone.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid positive number with ERR_PTR

2012-02-08 Thread Jan Schmidt
inode_ref_info() returns 1 when the element wasn't found and < 0 on error,
just like btrfs_search_slot(). In iref_to_path() it's an error when the
inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order
to avoid ERR_PTR(1), we now set ret to -ENOENT in that case.

Signed-off-by: Jan Schmidt 
---
On 08.02.2012 10:18, Dan Carpenter wrote:
> I looked at the code, but couldn't tell if it was intentional or not.

It wasn't :-)

Thank you,
-Jan
---
 fs/btrfs/backref.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
---

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 633c701..98f6bf10 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -892,6 +892,8 @@ static char *iref_to_path(struct btrfs_root *fs_root, 
struct btrfs_path *path,
if (eb != eb_in)
free_extent_buffer(eb);
ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
+   if (ret > 0)
+   ret = -ENOENT;
if (ret)
break;
next_inum = found_key.offset;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable after failed suspend

2012-02-08 Thread Chester
On Wed, Feb 8, 2012 at 6:55 AM, Chris Mason  wrote:
> On Tue, Feb 07, 2012 at 06:10:15PM -0600, Chester wrote:
>> This is dmesg mounted with -o ro,recovery
>> [   20.957392] exe used greatest stack depth: 4920 bytes left
>> [  145.340317] device label BtrfsLinux devid 1 transid 332442 /dev/sda6
>> [  145.341702] btrfs: enabling auto recovery
>> [  145.341803] btrfs: disk space caching is enabled
>> [  152.457967] btrfs: corrupt leaf, bad key order:
>> block=653297209344,root=1, slot=7
>> [  152.487933] btrfs: corrupt leaf, bad key order:
>> block=653297209344,root=1, slot=7
>> [  152.488326] [ cut here ]
>> [  152.488549] kernel BUG at fs/btrfs/extent-tree.c:5797!
>
> Well, this isn't good.  If you can run btrfs-zero-log it'll get past
> this part, but I'd suggest a fsck run to see if there are other
> corrupted blocks.
I've already tried the -o recovery option at mount. I was told it does
the same as btrfs-zero-log (but probably less destructive). Just a
quick question: Will the release of btrfsck later this month be able
to fix these corruptions?
>
> Bad key ordering is usually from memory corruption, so this block
> probably isn't alone.
Yeah. Could be from using zcache. I haven't had a problem with it
until I tried to suspend to RAM though.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs support for efficient SSD operation (data blocks alignment)

2012-02-08 Thread Martin
My understanding is that for x86 architecture systems, btrfs only allows
a sector size of 4kB for a HDD/SSD. That is fine for the present HDDs
assuming the partitions are aligned to a 4kB boundary for that device.

However for SSDs...

I'm using for example a 60GByte SSD that has:

8kB page size;
16kB logical to physical mapping chunk size;
2MB erase block size;
64MB cache.

And the sector size reported to Linux 3.0 is the default 512 bytes!


My first thought is to try formatting with a sector size of 16kB to
align with the SSD logical mapping chunk size. This is to avoid SSD
write amplification. Also, the data transfer performance for that device
is near maximum for writes with a blocksize of 16kB and above. Yet,
btrfs supports a 4kByte page/sector size only at present...


Is there any control possible over the btrfs filesystem structure to map
metadata and data structures to the underlying device boundaries?

For example to maximise performance, can the data chunks and the data
chunk size be aligned to be sympathetic to the SSD logical mapping chunk
size and the erase block size?

What features other than the trim function does btrfs employ to optimise
for SSD operation?


Regards,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable after failed suspend

2012-02-08 Thread Chris Mason
On Wed, Feb 08, 2012 at 01:22:19PM -0600, Chester wrote:
> On Wed, Feb 8, 2012 at 6:55 AM, Chris Mason  wrote:
> > On Tue, Feb 07, 2012 at 06:10:15PM -0600, Chester wrote:
> >> This is dmesg mounted with -o ro,recovery
> >> [   20.957392] exe used greatest stack depth: 4920 bytes left
> >> [  145.340317] device label BtrfsLinux devid 1 transid 332442 /dev/sda6
> >> [  145.341702] btrfs: enabling auto recovery
> >> [  145.341803] btrfs: disk space caching is enabled
> >> [  152.457967] btrfs: corrupt leaf, bad key order:
> >> block=653297209344,root=1, slot=7
> >> [  152.487933] btrfs: corrupt leaf, bad key order:
> >> block=653297209344,root=1, slot=7
> >> [  152.488326] [ cut here ]
> >> [  152.488549] kernel BUG at fs/btrfs/extent-tree.c:5797!
> >
> > Well, this isn't good.  If you can run btrfs-zero-log it'll get past
> > this part, but I'd suggest a fsck run to see if there are other
> > corrupted blocks.
> I've already tried the -o recovery option at mount. I was told it does
> the same as btrfs-zero-log (but probably less destructive). Just a

It does zero the log, but looks like it does so a little too late.  The
mount -o recovery code zeros it if we failed to read some of the tree
roots, but you're hitting the tree log before we fail.  Long story
short, you need to btrfs-zero-log ;)

> quick question: Will the release of btrfsck later this month be able
> to fix these corruptions?

Fixing the key ordering is pretty easy, I can do that here.  But I'll
need to see the fsck output to say if the rest is fixed in the current
code.

> >
> > Bad key ordering is usually from memory corruption, so this block
> > probably isn't alone.
> Yeah. Could be from using zcache. I haven't had a problem with it
> until I tried to suspend to RAM though.

Could be, I'd suggest running with CONFIG_DEBUG_PAGE_ALLOC.  You might
also just have bad ram.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable after failed suspend

2012-02-08 Thread Chester
On Wed, Feb 8, 2012 at 2:26 PM, Chris Mason  wrote:
> On Wed, Feb 08, 2012 at 01:22:19PM -0600, Chester wrote:
>> On Wed, Feb 8, 2012 at 6:55 AM, Chris Mason  wrote:
>> > On Tue, Feb 07, 2012 at 06:10:15PM -0600, Chester wrote:
>> >> This is dmesg mounted with -o ro,recovery
>> >> [   20.957392] exe used greatest stack depth: 4920 bytes left
>> >> [  145.340317] device label BtrfsLinux devid 1 transid 332442 /dev/sda6
>> >> [  145.341702] btrfs: enabling auto recovery
>> >> [  145.341803] btrfs: disk space caching is enabled
>> >> [  152.457967] btrfs: corrupt leaf, bad key order:
>> >> block=653297209344,root=1, slot=7
>> >> [  152.487933] btrfs: corrupt leaf, bad key order:
>> >> block=653297209344,root=1, slot=7
>> >> [  152.488326] [ cut here ]
>> >> [  152.488549] kernel BUG at fs/btrfs/extent-tree.c:5797!
>> >
>> > Well, this isn't good.  If you can run btrfs-zero-log it'll get past
>> > this part, but I'd suggest a fsck run to see if there are other
>> > corrupted blocks.
>> I've already tried the -o recovery option at mount. I was told it does
>> the same as btrfs-zero-log (but probably less destructive). Just a
>
> It does zero the log, but looks like it does so a little too late.  The
> mount -o recovery code zeros it if we failed to read some of the tree
> roots, but you're hitting the tree log before we fail.  Long story
> short, you need to btrfs-zero-log ;)
>
>> quick question: Will the release of btrfsck later this month be able
>> to fix these corruptions?
>
> Fixing the key ordering is pretty easy, I can do that here.  But I'll
> need to see the fsck output to say if the rest is fixed in the current
> code.
>
>> >
>> > Bad key ordering is usually from memory corruption, so this block
>> > probably isn't alone.
>> Yeah. Could be from using zcache. I haven't had a problem with it
>> until I tried to suspend to RAM though.
>
> Could be, I'd suggest running with CONFIG_DEBUG_PAGE_ALLOC.  You might
> also just have bad ram.

I certainly hope it's not just bad ram. I just got this laptop half a year ago!
I'll try to get a fsck output when I get home..
>
> -chris
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Premature ENOSPC only with zlib Compression

2012-02-08 Thread Mitch Harder
On Wed, Jan 18, 2012 at 10:13 AM, Mitch Harder
 wrote:
> I have a Btrfs partition that is reliably reproducing premature ENOSPC
> when restoring the disk from a tar file, but it is only happening with
> zlib compression (lzo or no compression proceeds normally).
>
> I've had the same issue at least back through the 3.1 kernel series,
> and I've been having intermittent issues even further back.
>
> I am currently using a 3.2.1 kernel merged with Chris' latest
> integration branch.
>
> I've performed about 12 trials trying to explore various combinations
> of compress, compress-force, compress[-force]=[zlib,lzo] and
> autodefrag.
>
> If I use no compression, or if I explicitly declare lzo compression, I
> don't receive the premature ENOSPC when untarring my restoration
> archive to the empty partition.
>
> If I don't specify compression (zlib is the default) or specify zlib,
> I get consistent premature ENOSPC errors regardless of other
> combinations.
>
> I apologize if this is already general knowledge, but I couldn't see
> where this has been posted to the list before.
>
> As time allows, I will try to capture exactly where this ENOSPC is
> being issued in btrfs by inserting WARN_ON's in my local version
> where-ever ENOSPC is set.

Some follow-up...

I've injected some debugging code to isolate when the ENOSPC is being
generated when using zlib compression.

When using zlib, I'm getting intermittent ENOSPC in the
may_commit_transaction() function in extent-tree.c at this point:

if (delayed_rsv->size < bytes) {
spin_unlock(&delayed_rsv->lock);
return -ENOSPC;
}

The typical values for (delayed_rsv->size < bytes) have been:
delayed_rsv->size ( = 0x6) < bytes ( = 0x78000)

This typically occurs when unzipping a section of my backup that
contains lots of small files that are probably being mostly in-lined.

I don't see errors in this section when using lzo or no compression.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkfs: Handle creation of filesystem larger than the first device

2012-02-08 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/26/2012 11:03 AM, Jan Kara wrote:
> make_btrfs() function takes a size of filesystem as an argument. It
> uses this value to set the size of the first device as well which
> is wrong for filesystems larger than this device. It results in
> 'attemp to access beyond end of device' messages from the kernel.
> So add size of the first device as an argument to make_btrfs().

I don't think this patch is correct.  Yes, the size switch only
applies to the first device, so it doesn't make any sense to try to
use a value larger than that device.  Attempting to do so probably
should be trapped and it should error out, and the man page should
probably clarify the fact that the size is only for the first device.

It looks like you think the size should somehow apply to multiple
devices or to the total fs size when creating on multiple devices, and
that just doesn't make sense.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPMvCrAAoJEJrBOlT6nu75A/IH/3Pn9MFhxXI1kTu1jriA/1ZA
IaCPkZbNFvS1DC5U8E75Ys4Qtn/SkwVOdGG8VCObfJKhhbWXEGKLdtllxV8WUkRM
QYN3rFeb3gLxb9UIcyyRC+RDtJtVzVXClFN7WYgA2QXmCyYdnV3axzvO/tkvADuq
Is28sKnYzV9poKTlghqFmEGmqcnTtfFKg9MC60wGDKEOMuAeijImGaAEp773G7+S
JSOOPcuDj/Lh7ZO+duR2ul+zUI9DWr2IbZM6zUxOoN2fZEJAwJLNPsU7rBDX8w+g
FVHFHrRv6wVGq0I7Dvb2flif5O0wSRA5yhK/7GanaVEMBKSV9A0c5qOE9LakL/s=
=X7QW
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS crash during mount

2012-02-08 Thread Daniel Kuhn
After a forced power turn-off the filesystem of my primary boot 
partition cannot be mounted anymore,
btrfs crashes during the mount process. I'm using OpenSuse 12.1 but I've 
also tried mounting with a newer kernel 3.2.2 (systemrescue cd) and with 
a usb-converter connected to another PC without success.


The kernel log seems pretty specific about the crash location, see below.

Best regards,
Daniel Kuhn


[   66.476674] [ cut here ]
[   66.476684] kernel BUG at fs/btrfs/free-space-cache.c:1515!
[   66.476691] invalid opcode:  [#1] SMP
[   66.476699] Modules linked in: tpm_tis tpm tpm_bios i2c_nforce2 
serio_raw pcspkr floppy k10temp asus_atk0110 raid10 raid456 
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx 
raid1 raid0 multipath linear ata_generic nouveau ttm drm_kms_helper drm 
i2c_algo_bit firewire_ohci i2c_core pata_acpi mxm_wmi forcedeth 
pata_marvell firewire_core pata_amd video wmi

[   66.476752]
[   66.476759] Pid: 1844, comm: mount Not tainted 3.2.2-alt250-i586 #2 
System manufacturer System Product Name/M3N-HT DELUXE

[   66.476772] EIP: 0060:[] EFLAGS: 00010206 CPU: 2
[   66.476785] EIP is at remove_from_bitmap+0xa8/0x285
[   66.476792] EAX: 6a92c000 EBX:  ECX: 0005c000 EDX: 0002
[   66.476799] ESI: f2f5baa8 EDI: f2f5ba8c EBP: f2f5ba48 ESP: f2f5b9ec
[   66.476805]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   66.476813] Process mount (pid: 1844, ti=f2f5a000 task=f2ff7080 
task.ti=f2f5a000)

[   66.476818] Stack:
[   66.476822]  f2f5ba2c 0385  f2f5ba58 f2750370 f2f5ba48 
f2f5ba44 f2f5ba40
[   66.476837]  0019 71bf 0002 71c0 0002 f3159600 
073ba000 
[   66.476851]  0005c000  6a92c000 0002 f2f5baa8  
f2750370 f2f5baa0

[   66.476865] Call Trace:
[   66.476877]  [] btrfs_remove_free_space+0x34c/0x370
[   66.476889]  [] btrfs_alloc_logged_file_extent+0x114/0x211
[   66.476900]  [] ? btrfs_free_path+0x1b/0x1e
[   66.476909]  [] ? btrfs_free_path+0x1b/0x1e
[   66.476919]  [] replay_one_extent+0x470/0x5f2
[   66.476929]  [] ? __fsnotify_inode_delete+0x8/0xa
[   66.476941]  [] replay_one_buffer+0x1d6/0x229
[   66.476950]  [] walk_down_log_tree+0x15b/0x2cd
[   66.476959]  [] walk_log_tree+0x71/0x188
[   66.476968]  [] btrfs_recover_log_trees+0x24a/0x257
[   66.476977]  [] ? add_inode_ref+0x480/0x480
[   66.476987]  [] open_ctree+0x116a/0x1415
[   66.476998]  [] btrfs_mount+0x43b/0x749
[   66.477009]  [] ? cpumask_next+0x12/0x14
[   66.477018]  [] ? pcpu_alloc+0x6ca/0x6ff
[   66.477027]  [] ? ida_get_new_above+0x14a/0x164
[   66.477036]  [] ? alloc_vfsmnt+0x80/0x111
[   66.477046]  [] ? __kmalloc_track_caller+0x134/0x13e
[   66.477055]  [] mount_fs+0x62/0x139
[   66.477064]  [] vfs_kern_mount+0x4f/0x7b
[   66.477073]  [] do_kern_mount+0x38/0xb6
[   66.477082]  [] do_mount+0x60f/0x65c
[   66.477090]  [] ? strndup_user+0x29/0x3a
[   66.477098]  [] sys_mount+0x68/0x94
[   66.477108]  [] syscall_call+0x7/0xb
[   66.477112] Code: e4 8d 55 e4 89 45 e8 8b 45 d8 8d 4d ec 89 14 24 8b 
55 b4 e8 c6 f8 ff ff 85 c0 78 0f 8b 55 f0 3b 57 04 8b 45 ec 75 04 3b 07 
74 04 <0f> 0b eb fe 8b 4d b4 8b 5d b4 8b 49 0c 89 4d dc 8b 4b 10 39 ca
[   66.477179] EIP: [] remove_from_bitmap+0xa8/0x285 SS:ESP 
0068:f2f5b9ec

[   66.477235] ---[ end trace 2e72e8358ee32e95 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS crash during mount

2012-02-08 Thread cwillu
On Wed, Feb 8, 2012 at 4:19 PM, Daniel Kuhn  wrote:
> After a forced power turn-off the filesystem of my primary boot partition
> cannot be mounted anymore,
> btrfs crashes during the mount process. I'm using OpenSuse 12.1 but I've
> also tried mounting with a newer kernel 3.2.2 (systemrescue cd) and with a
> usb-converter connected to another PC without success.
>
> The kernel log seems pretty specific about the crash location, see below.
>
> Best regards,
> Daniel Kuhn
>
>
> [   66.476674] [ cut here ]
> [   66.476684] kernel BUG at fs/btrfs/free-space-cache.c:1515!
> [   66.476691] invalid opcode:  [#1] SMP
> [   66.476699] Modules linked in: tpm_tis tpm tpm_bios i2c_nforce2 serio_raw
> pcspkr floppy k10temp asus_atk0110 raid10 raid456 async_raid6_recov async_pq
> raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear
> ata_generic nouveau ttm drm_kms_helper drm i2c_algo_bit firewire_ohci
> i2c_core pata_acpi mxm_wmi forcedeth pata_marvell firewire_core pata_amd
> video wmi
> [   66.476752]
> [   66.476759] Pid: 1844, comm: mount Not tainted 3.2.2-alt250-i586 #2
> System manufacturer System Product Name/M3N-HT DELUXE
> [   66.476772] EIP: 0060:[] EFLAGS: 00010206 CPU: 2
> [   66.476785] EIP is at remove_from_bitmap+0xa8/0x285
> [   66.476792] EAX: 6a92c000 EBX:  ECX: 0005c000 EDX: 0002
> [   66.476799] ESI: f2f5baa8 EDI: f2f5ba8c EBP: f2f5ba48 ESP: f2f5b9ec
> [   66.476805]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [   66.476813] Process mount (pid: 1844, ti=f2f5a000 task=f2ff7080
> task.ti=f2f5a000)
> [   66.476818] Stack:
> [   66.476822]  f2f5ba2c 0385  f2f5ba58 f2750370 f2f5ba48
> f2f5ba44 f2f5ba40
> [   66.476837]  0019 71bf 0002 71c0 0002 f3159600
> 073ba000 
> [   66.476851]  0005c000  6a92c000 0002 f2f5baa8 
> f2750370 f2f5baa0
> [   66.476865] Call Trace:
> [   66.476877]  [] btrfs_remove_free_space+0x34c/0x370
> [   66.476889]  [] btrfs_alloc_logged_file_extent+0x114/0x211
> [   66.476900]  [] ? btrfs_free_path+0x1b/0x1e
> [   66.476909]  [] ? btrfs_free_path+0x1b/0x1e
> [   66.476919]  [] replay_one_extent+0x470/0x5f2
> [   66.476929]  [] ? __fsnotify_inode_delete+0x8/0xa
> [   66.476941]  [] replay_one_buffer+0x1d6/0x229
> [   66.476950]  [] walk_down_log_tree+0x15b/0x2cd
> [   66.476959]  [] walk_log_tree+0x71/0x188
> [   66.476968]  [] btrfs_recover_log_trees+0x24a/0x257
> [snip]

-o recovery under 3.2 or later should fix it up.  You'll want to
remain on 3.2 at that point, and then switch to 3.3 when that's
released, and so on.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkfs: Handle creation of filesystem larger than the first device

2012-02-08 Thread Jan Kara
On Wed 08-02-12 17:01:15, Phillip Susi wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 1/26/2012 11:03 AM, Jan Kara wrote:
> > make_btrfs() function takes a size of filesystem as an argument. It
> > uses this value to set the size of the first device as well which
> > is wrong for filesystems larger than this device. It results in
> > 'attemp to access beyond end of device' messages from the kernel.
> > So add size of the first device as an argument to make_btrfs().
> 
> I don't think this patch is correct.  Yes, the size switch only
> applies to the first device, so it doesn't make any sense to try to
> use a value larger than that device.  Attempting to do so probably
> should be trapped and it should error out, and the man page should
> probably clarify the fact that the size is only for the first device.
> 
> It looks like you think the size should somehow apply to multiple
> devices or to the total fs size when creating on multiple devices, and
> that just doesn't make sense.
  Thanks for your reply. I admit I was not sure what exactly size argument
should be. So after looking into the code for a while I figured it should
be a total size of the filesystem - or differently it should be size of
virtual block address space in the filesystem. Thus when filesystem has
more devices (or admin wants to add more devices later), it can be larger
than the first device. But I'm not really a btrfs developper so I might be
wrong and of course feel free to fix the issue as you deem fit.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable after failed suspend

2012-02-08 Thread Chester
On Wed, Feb 8, 2012 at 8:46 PM, Chester  wrote:
> On Wed, Feb 8, 2012 at 2:26 PM, Chris Mason  wrote:
>> On Wed, Feb 08, 2012 at 01:22:19PM -0600, Chester wrote:
>>> On Wed, Feb 8, 2012 at 6:55 AM, Chris Mason  wrote:
>>> > On Tue, Feb 07, 2012 at 06:10:15PM -0600, Chester wrote:
>>> >> This is dmesg mounted with -o ro,recovery
>>> >> [   20.957392] exe used greatest stack depth: 4920 bytes left
>>> >> [  145.340317] device label BtrfsLinux devid 1 transid 332442 /dev/sda6
>>> >> [  145.341702] btrfs: enabling auto recovery
>>> >> [  145.341803] btrfs: disk space caching is enabled
>>> >> [  152.457967] btrfs: corrupt leaf, bad key order:
>>> >> block=653297209344,root=1, slot=7
>>> >> [  152.487933] btrfs: corrupt leaf, bad key order:
>>> >> block=653297209344,root=1, slot=7
>>> >> [  152.488326] [ cut here ]
>>> >> [  152.488549] kernel BUG at fs/btrfs/extent-tree.c:5797!
>>> >
>>> > Well, this isn't good.  If you can run btrfs-zero-log it'll get past
>>> > this part, but I'd suggest a fsck run to see if there are other
>>> > corrupted blocks.
>>> I've already tried the -o recovery option at mount. I was told it does
>>> the same as btrfs-zero-log (but probably less destructive). Just a
>>
>> It does zero the log, but looks like it does so a little too late.  The
>> mount -o recovery code zeros it if we failed to read some of the tree
>> roots, but you're hitting the tree log before we fail.  Long story
>> short, you need to btrfs-zero-log ;)
>>
>>> quick question: Will the release of btrfsck later this month be able
>>> to fix these corruptions?
>>
>> Fixing the key ordering is pretty easy, I can do that here.  But I'll
>> need to see the fsck output to say if the rest is fixed in the current
>> code.
>>
>>> >
>>> > Bad key ordering is usually from memory corruption, so this block
>>> > probably isn't alone.
>>> Yeah. Could be from using zcache. I haven't had a problem with it
>>> until I tried to suspend to RAM though.
>>
>> Could be, I'd suggest running with CONFIG_DEBUG_PAGE_ALLOC.  You might
>> also just have bad ram.
>
> I certainly hope it's not just bad ram. I just got this laptop half a year 
> ago!
> I'll try to get a fsck output when I get home..
>>
>> -chris
>>

Here's the fsck output.. Looks a little long

failed to find block number 653284814848
 generation 332442 owner 2
fs uuid 0f5b2f4f-1aa0-4e6f-b904-e5b4d4588144
chunk uuid 27536f0d-993b-4da3-85eb-1c9b08c435cb
item 0 key (653284786176 EXTENT_ITEM 4096) itemoff 3926 itemsize 69
extent refs 3 gen 330325 flags 2
tree block key (5587 c 5579) level 0
tree block backref root 256
shared block backref parent 742205943808
shared block backref parent 737977237504
item 1 key (653284790272 EXTENT_ITEM 4096) itemoff 3875 itemsize 51
extent refs 1 gen 332218 flags 2
tree block key (654686625792 a8 4096) level 0
tree block backref root 2
item 2 key (653284794368 EXTENT_ITEM 4096) itemoff 3824 itemsize 51
extent refs 1 gen 332438 flags 2
tree block key (18446744073709551606 80 390195183616) level 1
tree block backref root 7
item 3 key (653284798464 EXTENT_ITEM 4096) itemoff 3773 itemsize 51
extent refs 1 gen 332438 flags 2
tree block key (18446744073709551606 80 390443347968) level 0
tree block backref root 7
item 4 key (653284802560 EXTENT_ITEM 4096) itemoff 3722 itemsize 51
extent refs 1 gen 332438 flags 2
tree block key (400765636608 a8 155648) level 1
tree block backref root 2
item 5 key (653284806656 EXTENT_ITEM 4096) itemoff 3671 itemsize 51
extent refs 1 gen 332438 flags 2
tree block key (402394804224 a8 12288) level 0
tree block backref root 2
item 6 key (653284810752 EXTENT_ITEM 4096) itemoff 3602 itemsize 69
extent refs 3 gen 329238 flags 2
tree block key (50522 60 524) level 0
tree block backref root 256
shared block backref parent 742140727296
shared block backref parent 738011779072
item 7 key (653284831232 EXTENT_ITEM 4096) itemoff 3551 itemsize 51
extent refs 1 gen 332218 flags 2
tree block key (654686961664 a8 4096) level 0
tree block backref root 2
item 8 key (653284818944 EXTENT_ITEM 4096) itemoff 3500 itemsize 51
extent refs 1 gen 332218 flags 2
tree block key (654687121408 a8 4096) level 0
tree block backref root 2
item 9 key (653284823040 EXTENT_ITEM 4096) itemoff 3449 itemsize 51
extent refs 1 gen 332218 flags 2
tree block key (654688612352 a8 4096) level 0
tree block backref root 2
item 

Re: btrfs support for efficient SSD operation (data blocks alignment)

2012-02-08 Thread Liu Bo
On 02/09/2012 03:24 AM, Martin wrote:
> My understanding is that for x86 architecture systems, btrfs only allows
> a sector size of 4kB for a HDD/SSD. That is fine for the present HDDs
> assuming the partitions are aligned to a 4kB boundary for that device.
> 
> However for SSDs...
> 
> I'm using for example a 60GByte SSD that has:
> 
> 8kB page size;
> 16kB logical to physical mapping chunk size;
> 2MB erase block size;
> 64MB cache.
> 
> And the sector size reported to Linux 3.0 is the default 512 bytes!
> 
> 
> My first thought is to try formatting with a sector size of 16kB to
> align with the SSD logical mapping chunk size. This is to avoid SSD
> write amplification. Also, the data transfer performance for that device
> is near maximum for writes with a blocksize of 16kB and above. Yet,
> btrfs supports a 4kByte page/sector size only at present...
> 
> 
> Is there any control possible over the btrfs filesystem structure to map
> metadata and data structures to the underlying device boundaries?
> 
> For example to maximise performance, can the data chunks and the data
> chunk size be aligned to be sympathetic to the SSD logical mapping chunk
> size and the erase block size?
> 

The metadata buffer size will support size larger than 4K at least, it is on 
development.

> What features other than the trim function does btrfs employ to optimise
> for SSD operation?
> 

e.g COW(avoid writing to one place multi-times),
delayed allocation(intend to reduce the write frequency)

thanks,
liubo

> 
> Regards,
> Martin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Premature ENOSPC only with zlib Compression

2012-02-08 Thread Liu Bo
On 02/09/2012 05:01 AM, Mitch Harder wrote:
> On Wed, Jan 18, 2012 at 10:13 AM, Mitch Harder
>  wrote:
>> I have a Btrfs partition that is reliably reproducing premature ENOSPC
>> when restoring the disk from a tar file, but it is only happening with
>> zlib compression (lzo or no compression proceeds normally).
>>
>> I've had the same issue at least back through the 3.1 kernel series,
>> and I've been having intermittent issues even further back.
>>
>> I am currently using a 3.2.1 kernel merged with Chris' latest
>> integration branch.
>>
>> I've performed about 12 trials trying to explore various combinations
>> of compress, compress-force, compress[-force]=[zlib,lzo] and
>> autodefrag.
>>
>> If I use no compression, or if I explicitly declare lzo compression, I
>> don't receive the premature ENOSPC when untarring my restoration
>> archive to the empty partition.
>>
>> If I don't specify compression (zlib is the default) or specify zlib,
>> I get consistent premature ENOSPC errors regardless of other
>> combinations.
>>
>> I apologize if this is already general knowledge, but I couldn't see
>> where this has been posted to the list before.
>>
>> As time allows, I will try to capture exactly where this ENOSPC is
>> being issued in btrfs by inserting WARN_ON's in my local version
>> where-ever ENOSPC is set.
> 
> Some follow-up...
> 
> I've injected some debugging code to isolate when the ENOSPC is being
> generated when using zlib compression.
> 
> When using zlib, I'm getting intermittent ENOSPC in the
> may_commit_transaction() function in extent-tree.c at this point:
> 
>   if (delayed_rsv->size < bytes) {
>   spin_unlock(&delayed_rsv->lock);
>   return -ENOSPC;
>   }
> 
> The typical values for (delayed_rsv->size < bytes) have been:
> delayed_rsv->size ( = 0x6) < bytes ( = 0x78000)
> 
> This typically occurs when unzipping a section of my backup that
> contains lots of small files that are probably being mostly in-lined.
> 
> I don't see errors in this section when using lzo or no compression.

Hi Mitch,

Would you like to try this patch? 

thanks,
liubo

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8603ee4..d83b15e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3483,28 +3483,34 @@ static int may_commit_transaction(struct btrfs_root 
*root,
if (force)
goto commit;
 
-   /* See if there is enough pinned space to make this reservation */
-   spin_lock(&space_info->lock);
-   if (space_info->bytes_pinned >= bytes) {
+   if (space_info != delayed_rsv->space_info) {
+   /*
+* For DATA:
+* See if there is enough pinned space to make this reservation
+*/
+   spin_lock(&space_info->lock);
+   if (space_info->bytes_pinned < bytes) {
+   spin_unlock(&space_info->lock);
+   return -ENOSPC;
+   }
spin_unlock(&space_info->lock);
-   goto commit;
-   }
-   spin_unlock(&space_info->lock);
-
-   /*
-* See if there is some space in the delayed insertion reservation for
-* this reservation.
-*/
-   if (space_info != delayed_rsv->space_info)
-   return -ENOSPC;
-
-   spin_lock(&delayed_rsv->lock);
-   if (delayed_rsv->size < bytes) {
+   } else {
+   /*
+* For METADATA:
+* See if there is enough space(pinned and delayed insertion)
+* to make this reservation
+*/
+   spin_lock(&space_info->lock);
+   spin_lock(&delayed_rsv->lock);
+   if (space_info->bytes_pinned + delayed_rsv->size < bytes) {
+   spin_unlock(&delayed_rsv->lock);
+   spin_unlock(&space_info->lock);
+   return -ENOSPC;
+   }
spin_unlock(&delayed_rsv->lock);
-   return -ENOSPC;
-   }
-   spin_unlock(&delayed_rsv->lock);
+   spin_unlock(&space_info->lock);
 
+   }
 commit:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
-- 
1.6.5.2


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/inode.c:2222

2012-02-08 Thread Sage Weil
I've made a little progress here.  The rmdir succeeds and successfully 
adds the orphan item to the tree.  Shortly after that, we (async) snapshot 
current/ to snap_127294/.  And then the orphan cleanup gets

[87552.240450] [ cut here ]
[87552.240477] WARNING: at 
/srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/inode.c:2237 
btrfs_orphan_cleanup+0x44b/0x4d0 [btrfs]()
[87552.240480] Hardware name: PDSMi
[87552.240482] Modules linked in: radeon ttm drm_kms_helper drm ppdev 
parport_pc lp serio_raw parport i3000_edac edac_core i2c_algo_bit shpchp 
ahci libahci floppy e1000e btrfs zlib_deflate crc32c libcrc32c
[87552.240504] Pid: 20605, comm: ceph-osd Not tainted 3.2.0-ceph-00142-g9e98323 
#1
[87552.240507] Call Trace:
[87552.240514]  [] warn_slowpath_common+0x7f/0xc0
[87552.240518]  [] warn_slowpath_null+0x1a/0x20
[87552.240531]  [] btrfs_orphan_cleanup+0x44b/0x4d0 [btrfs]
[87552.240536]  [] ? do_raw_spin_unlock+0x5e/0xb0
[87552.240550]  [] btrfs_mksubvol+0x302/0x3d0 [btrfs]
[87552.240564]  [] btrfs_ioctl_snap_create_transid+0xe8/0x180 
[btrfs]
[87552.240578]  [] btrfs_ioctl_snap_create_v2+0x89/0x100 
[btrfs]
[87552.240582]  [] ? __lock_acquire+0x210/0x15d0
[87552.240596]  [] btrfs_ioctl+0x41c/0x1290 [btrfs]
[87552.240601]  [] ? retint_restore_args+0x13/0x13
[87552.240606]  [] ? fget_light+0x40/0x130
[87552.240610]  [] do_vfs_ioctl+0xa4/0x580
[87552.240614]  [] ? fget_light+0xd2/0x130
[87552.240617]  [] ? fget_light+0x40/0x130
[87552.240620]  [] sys_ioctl+0xa1/0xb0
[87552.240624]  [] system_call_fastpath+0x16/0x1b
[87552.240627] ---[ end trace 3842aca7e75013c8 ]---
[87552.240643] btrfs: Error removing orphan entry, stopping orphan cleanup

causing the ioctl to return -22.  ls -al shows

drwxr-xr-x 1 ubuntu ubuntu   42 2012-02-08 16:12 current
-rw-r--r-- 1 ubuntu ubuntu   37 2012-02-08 15:00 fsid
-rw-r--r-- 1 ubuntu ubuntu   21 2012-02-08 15:00 magic
drwxr-xr-x 1 ubuntu ubuntu  234 2012-02-08 16:12 snap_127219
drwxr-xr-x 1 ubuntu ubuntu  158 2012-02-08 16:12 snap_127249
d? ? ?  ? ?? snap_127294

(presumably just because we errored out of create_snapshot() before 
d_instantiate?)

The inode it processes has ino 28844 uid/gid 1000.1000 mode 040755 size 
60... and nlink=1.

btrfs-debug-tree output includes

item 1 key (28844 INODE_ITEM 0) itemoff 59 itemsize 160
inode generation 13654 size 60 block group 0 mode 40755 links 1

in three subvols: snap_127219 snap_127249 snap_127294.  Not current.

The first two snaps were taken prior to the rmdir, so that's fine.  But 
the last one just after... the sequence of operations was actually

thread 0start async snap ioctl, current/ to snap_127294/ ...
thread 1rmdir current/0.17_254 = 0   (this is the bad dir inode)
thread 1unlink current/foo   (unrelated)
thread 1unlink current/bar   (unrelated)
thread 1rmdir current/0.17_head = 0  (unrelated)
thread 0...ioctl returns = -EINVAL

It looks like there is some strange leakage between the snapshot and the 
rmdir nlink->0, or some strange interaction between the orphan cleanup and 
the new snapshot.  I'm not quite sure why the orphan cleanup is actually 
done there at all, though, so...

Any suggestions where to look next?

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkfs: Handle creation of filesystem larger than the first device

2012-02-08 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/08/2012 06:20 PM, Jan Kara wrote:
>   Thanks for your reply. I admit I was not sure what exactly size argument
> should be. So after looking into the code for a while I figured it should
> be a total size of the filesystem - or differently it should be size of
> virtual block address space in the filesystem. Thus when filesystem has
> more devices (or admin wants to add more devices later), it can be larger
> than the first device. But I'm not really a btrfs developper so I might be
> wrong and of course feel free to fix the issue as you deem fit.

The size of the fs is the total size of the individual disks.  When you limit 
the size, you limit the size of a disk, not the whole fs.  IIRC, mkfs 
initializes the fs on the first disk, which is why it was using that size as 
the size of the whole fs, and then adds the other disks after ( which then add 
their size to the total fs size ).  It might be nice if mkfs could take sizes 
for each disk, but it only seems to take one size for the initial disk.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPMzf2AAoJEJrBOlT6nu75Ci8H/j3+8AR5H+UGOzpwMEBmPViJ
PCVc5fAqOgLlQgjAII9dF74/1a6NyC9hjWBXPlhfrc3rA0JBj6x2AknvGnTQ6/Xo
4hMu8sFSSOtHf/aTXh7B7YJ/WrqDgkiEOSpcRVJyltzhKt0bbE3t9/IfxAhvkB1z
3CuEs9UeIn9wOV2fcyXoNMWpPQ+tNkxrvE817BHjPdQ5Z1+d2Cc0AxM22lgBVsZZ
J+oneFOeqSIGZ9hbr0WVEjHaWJpxEapNmVGE5RIrpneTGpe3eAijqbBa8TEg+C2R
iVCT7tBG3gOGhRoApMNM2IP2TgGLHMRgwP8QQv4/9MTNrOEP3G77tbCDHBfKMNA=
=g+5L
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: return EUCLEAN rather than ENXIO once internal error has occurred for SEEK_DATA/SEEK_HOLE inquiry

2012-02-08 Thread Jeff Liu
By referring to http://linux.die.net/man/2/lseek, return ENXIO only
when "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry.
But we return it in case of internal issue too if btrfs_get_extent_fiemap() 
failed
due to other issues.  This will confuse the user applications to be expecting 
ENXIO when
trying to find a specific data or hole location once it has occurred.

Thanks Dave for pointing that out in XFS thread.

This patch fix it to return EUCLEAN, or maybe another particular errno is more 
reasonable in Btrfs to indicate this fatal error?

Thanks,
-Jeff


Cc: da...@fromorbit.com
Signed-off-by: Jie Liu 

---
 fs/btrfs/file.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 97fbe93..6693040 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1761,7 +1761,7 @@ static int find_desired_extent(struct inode *inode, 
loff_t *offset, int origin)
 start - root->sectorsize,
 root->sectorsize, 0);
if (IS_ERR(em)) {
-   ret = -ENXIO;
+   ret = -EUCLEAN;
goto out;
}
last_end = em->start + em->len;
@@ -1773,7 +1773,7 @@ static int find_desired_extent(struct inode *inode, 
loff_t *offset, int origin)
while (1) {
em = btrfs_get_extent_fiemap(inode, NULL, 0, start, len, 0);
if (IS_ERR(em)) {
-   ret = -ENXIO;
+   ret = -EUCLEAN;
break;
}
 
-- 
1.7.9
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: return EUCLEAN rather than ENXIO once internal error has occurred for SEEK_DATA/SEEK_HOLE inquiry

2012-02-08 Thread Jeff Liu
On 02/09/2012 11:46 AM, Jeff Liu wrote:

> By referring to http://linux.die.net/man/2/lseek, return ENXIO only
> when "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry.
> But we return it in case of internal issue too if btrfs_get_extent_fiemap() 
> failed
> due to other issues.  This will confuse the user applications to be expecting 
> ENXIO when
> trying to find a specific data or hole location once it has occurred.
> 
> Thanks Dave for pointing that out in XFS thread.
> 
> This patch fix it to return EUCLEAN, or maybe another particular errno is 
> more reasonable in Btrfs to indicate this fatal error?

Or maybe just return the error that was happened at internal routine, to
give user more accurate error info, which is better?

Thanks,
-Jeff

> 
> Thanks,
> -Jeff
> 
> 
> Cc: da...@fromorbit.com
> Signed-off-by: Jie Liu 
> 
> ---
>  fs/btrfs/file.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 97fbe93..6693040 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1761,7 +1761,7 @@ static int find_desired_extent(struct inode *inode, 
> loff_t *offset, int origin)
>start - root->sectorsize,
>root->sectorsize, 0);
>   if (IS_ERR(em)) {
> - ret = -ENXIO;
> + ret = -EUCLEAN;
>   goto out;
>   }
>   last_end = em->start + em->len;
> @@ -1773,7 +1773,7 @@ static int find_desired_extent(struct inode *inode, 
> loff_t *offset, int origin)
>   while (1) {
>   em = btrfs_get_extent_fiemap(inode, NULL, 0, start, len, 0);
>   if (IS_ERR(em)) {
> - ret = -ENXIO;
> + ret = -EUCLEAN;
>   break;
>   }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: return EUCLEAN rather than ENXIO once internal error has occurred for SEEK_DATA/SEEK_HOLE inquiry

2012-02-08 Thread Dave Chinner
On Thu, Feb 09, 2012 at 12:08:47PM +0800, Jeff Liu wrote:
> On 02/09/2012 11:46 AM, Jeff Liu wrote:
> 
> > By referring to http://linux.die.net/man/2/lseek, return ENXIO only
> > when "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry.
> > But we return it in case of internal issue too if btrfs_get_extent_fiemap() 
> > failed
> > due to other issues.  This will confuse the user applications to be 
> > expecting ENXIO when
> > trying to find a specific data or hole location once it has occurred.
> > 
> > Thanks Dave for pointing that out in XFS thread.
> > 
> > This patch fix it to return EUCLEAN, or maybe another particular errno is 
> > more reasonable in Btrfs to indicate this fatal error?
> 
> Or maybe just return the error that was happened at internal routine, to
> give user more accurate error info, which is better?

Return the internal error unchanged - a failure to read the extent
list (EIO) is different to a corruption detected in the extent
map read from disk (EUCLEAN). Having a user report the appropriate
error makes our life much simpler when it comes to trying to
understand their problem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3 v2] xfstests: add btrfs online defragments QA test

2012-02-08 Thread Liu Bo
As the title shows, we port btrfs online defragments QA test into xfstests.

v1->v2:
- place the real tests inside testcases.

Signed-off-by: Liu Bo 
---
 278  |  247 ++
 278.args |   18 +
 278.out  |   75 +++
 group|1 +
 4 files changed, 341 insertions(+), 0 deletions(-)
 create mode 100755 278
 create mode 100644 278.args
 create mode 100644 278.out

diff --git a/278 b/278
new file mode 100755
index 000..71f12e0
--- /dev/null
+++ b/278
@@ -0,0 +1,247 @@
+#! /bin/bash
+# FS QA Test No. 278
+#
+# Btrfs Online defragmentation tests
+#
+#---
+# Copyright (c) 2012 Fujitsu Liu Bo.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+# creator
+owner=liubo2...@cn.fujitsu.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+test_path="`pwd`"
+progs_dir="$test_path/src/btrfs_online_defragment/"
+tmp=tmp/$$
+defrag_args="$test_path/${seq}.args"
+
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f $tmp.*
+}
+
+_create_file()
+{
+   CNT=11999
+   FILESIZE=48000
+   if [ "$DEFRAG_TARGET" = "1" ];then
+   for i in `seq $CNT -1 0`; do
+   dd if=/dev/zero of=$SCRATCH_MNT/tmp_file bs=4k count=1 \
+conv=notrunc seek=$i oflag=sync &>/dev/null
+   done
+   # get md5sum
+   md5sum $SCRATCH_MNT/tmp_file > /tmp/checksum
+   elif [ "$DEFRAG_TARGET" = "2" ];then
+   mkdir $SCRATCH_MNT/tmp_dir
+   for i in `seq $CNT -1 0`; do
+   dd if=/dev/zero of=$SCRATCH_MNT/tmp_dir/tmp_file bs=4k \
+   count=1 conv=notrunc seek=$i oflag=sync &>/dev/null
+   done
+   # get md5sum
+   md5sum $SCRATCH_MNT/tmp_dir/tmp_file > /tmp/checksum
+   elif [ "$DEFRAG_TARGET" = "3" ];then
+   for i in `seq $CNT -1 0`; do
+   dd if=/dev/zero of=$SCRATCH_MNT/tmp_file bs=4k count=1 \
+   conv=notrunc seek=$i oflag=sync &>/dev/null
+   done
+   # get md5sum
+   md5sum $SCRATCH_MNT/tmp_file > /tmp/checksum
+   fi
+}
+
+_btrfs_online_defrag()
+{
+   str=""
+   if [ "$FILE_RANGE" = "2" ];then
+   str="$str -s -1 -l $((FILESIZE / 2)) "
+   elif [ "$FILE_RANGE" = "3" ];then
+   str="$str -s $((FILESIZE + 1)) -l $((FILESIZE / 2)) "
+   HAVE_DEFRAG=1
+   elif [ "$FILE_RANGE" = "4" ];then
+   str="$str -l -1 "
+   elif [ "$FILE_RANGE" = "5" ];then
+   str="$str -l $((FILESIZE + 1)) "
+   elif [ "$FILE_RANGE" = "6" ];then
+   str="$str -l $((FILESIZE / 2)) "
+   fi
+
+   if [ "$DEFRAG_COMPRESS" = "2" ];then
+   str="$str -c "
+   fi
+
+   if [ "$FLUSH" = "2" ];then
+   str="$str -f "
+   fi
+
+   if [ "$THRESH" = "2" ];then
+   str="$str -t -1 "
+   elif [ "$THRESH" = "3" ];then
+   str="$str -t $PAGESIZE "
+   fi
+
+   if [ "$str" != "" ]; then
+   btrfs filesystem defragment $str $SCRATCH_MNT/tmp_file
+   else
+   if [ "$DEFRAG_TARGET" = "1" ];then
+   btrfs filesystem defragment $SCRATCH_MNT/tmp_file
+   elif [ "$DEFRAG_TARGET" = "2" ];then
+   btrfs filesystem defragment $SCRATCH_MNT/tmp_dir
+   elif [ "$DEFRAG_TARGET" = "3" ];then
+   btrfs filesystem defragment $SCRATCH_MNT
+   fi
+   fi
+   ret_val=$?
+   sync
+   if [ $ret_val -ne 20 ];then
+   echo "btrfs filesystem defragment failed! err is $ret_val"
+   fi
+}
+
+_checksum()
+{
+   md5sum -c /tmp/checksum > /dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   echo "md5 checksum failed!"
+   fi
+}
+
+_fsck()
+{
+   btrfsck $SCRATCH_DEV > /dev/null 2>&1
+   ret_val=$?
+   if [ $ret_val -ne 0 ]; then
+   echo "btrfsck _FAIL_! err is $ret_val"
+   fi
+}
+
+_parse_options()
+{
+   PASS=0
+   if [ "`echo $args | grep "#"`" 

Re: [PATCH] Btrfs: return EUCLEAN rather than ENXIO once internal error has occurred for SEEK_DATA/SEEK_HOLE inquiry

2012-02-08 Thread Jeff Liu
On 02/09/2012 12:51 PM, Dave Chinner wrote:

> On Thu, Feb 09, 2012 at 12:08:47PM +0800, Jeff Liu wrote:
>> On 02/09/2012 11:46 AM, Jeff Liu wrote:
>>
>>> By referring to http://linux.die.net/man/2/lseek, return ENXIO only
>>> when "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry.
>>> But we return it in case of internal issue too if btrfs_get_extent_fiemap() 
>>> failed
>>> due to other issues.  This will confuse the user applications to be 
>>> expecting ENXIO when
>>> trying to find a specific data or hole location once it has occurred.
>>>
>>> Thanks Dave for pointing that out in XFS thread.
>>>
>>> This patch fix it to return EUCLEAN, or maybe another particular errno is 
>>> more reasonable in Btrfs to indicate this fatal error?
>>
>> Or maybe just return the error that was happened at internal routine, to
>> give user more accurate error info, which is better?
> 
> Return the internal error unchanged - a failure to read the extent
> list (EIO) is different to a corruption detected in the extent
> map read from disk (EUCLEAN). Having a user report the appropriate
> error makes our life much simpler when it comes to trying to
> understand their problem

Definitely. I will repost this patch later.

Thanks,
-Jeff

> 
> Cheers,
> 
> Dave.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry

2012-02-08 Thread Jeff Liu
Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or 
SEEK_HOLE inquiry
in a desired file range, so we should return the internal error unchanged if 
btrfs_get_extent_fiemap()
call failed, rather than ENXIO.

Cc: Dave Chinner 
Signed-off-by: Jie Liu 

---
 fs/btrfs/file.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 97fbe93..6d9e796 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1761,7 +1761,7 @@ static int find_desired_extent(struct inode *inode, 
loff_t *offset, int origin)
 start - root->sectorsize,
 root->sectorsize, 0);
if (IS_ERR(em)) {
-   ret = -ENXIO;
+   ret = PTR_ERR(em);
goto out;
}
last_end = em->start + em->len;
@@ -1773,7 +1773,7 @@ static int find_desired_extent(struct inode *inode, 
loff_t *offset, int origin)
while (1) {
em = btrfs_get_extent_fiemap(inode, NULL, 0, start, len, 0);
if (IS_ERR(em)) {
-   ret = -ENXIO;
+   ret = PTR_ERR(em);
break;
}
 
-- 
1.7.9
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html