How many subvols/snapshots are possible? (limits?)

2013-05-09 Thread Martin
Dear Devs,

This is more a use case question of what is a good idea to do...


Can btrfs support snapshots of the filesystem at very regular intervals,
say minute by minute or even second by second?

Or are there limits that will be hit with metadata overheads or
links/reference limits or CPU overheads if 'too many' snapshots/subvols
are made?

If snapshots were to be taken once a minute and retained, what breaks first?


What are 'reasonable' (maximum) numbers for frequency and number of held
versions?


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Virtual Device Support

2013-05-19 Thread Martin
On 10/05/13 15:03, George Mitchell wrote:
 One the things that is frustrating me the most at this point from a user
 perspective ...  The current method of simply using a
 random member device or a LABEL or a UUID is just not working well for
 me.  Having a well thought out virtual device infrastructure would...

Sorry, I'm a bit lost for your comments...

What is your use case and what are you hoping/expecting to see?


I've been following development of btrfs for a while and I'm looking
forward to use it to efficiently replace some of the very useful
features of LVM2, drbd, and md-raid that I'm using at present...

OK, so the way of managing all that is going to be a little different.

How would you want that?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Virtual Device Support

2013-05-19 Thread Martin
OK, so to summarise:


On 19/05/13 15:49, George Mitchell wrote:
 In reply to both of these comments in one message, let me give you an
 example.
 
 I use shell scripts to mount and unmount btrfs volumes for backup
 purposes.  Most of these volumes are not listed in fstab simply because
 I do not want to have to clutter my fstab with volumes that are used
 only for backup.  So the only way I can mount them is either by LABEL or
 by UUID.  But I can't unmount them by either LABEL or UUID because that
 is not supported by util-linux and they have no intention of supporting
 it in the future.  So I have to resort to unmounting by directory ...

Which all comes to a way of working...

Likewise, I have some old and long used backups scripts that mount a
one-of-many backups disk pack. My solution is to use filesystem labels
and to use 'sed' to update just the one line in /etc/fstab for the
backups mount point label so that the backups are then mounted/unmounted
by the mount point.

I've never been able to use the /dev/sdXX numbering because the multiple
physical drives can be detected in a different order.

Agreed, that for the sake of good consistency, being able to unmount by
filesystem label is a nice/good idea. But is there any interest for that
to be picked up? Put in a bug/feature request onto bugzilla?


I would guess that most developers focus on mount point and let
fstab/mtab sort out the detail...

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs (general) raid for other filesystems?

2013-05-19 Thread Martin
Just a random Sunday afternoon thought:

We've got some rather nice variations on the block-level RAID schemes
but instead being implemented at the filesystem level in btrfs...

Could the btrfs RAID be coded to be general so that a filesystem stack
could be set up whereby the filesystem level raids could be used for ANY
filesystem?


So for example, we could have the stack:


filesystem level RAID

 |
 V

filesystem

 |
 V

Block level


So, an interesting variation could be to have filesystem level raid
operating on ext4 or nilfs or whatever... Would that be a sensible idea?



Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs pseudo-drbd

2013-05-19 Thread Martin
Dear Devs,

Would there be any problem to use nbd (/dev/ndX) devices to gain
btrfs-raid across multiple physical hosts across a network? (For a sort
of btrfs-drbd! :-) )


Regards,
Martin


http://en.wikipedia.org/wiki/Network_block_device

http://www.drbd.org/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs (general) raid for other filesystems?

2013-05-19 Thread Martin
On 19/05/13 18:39, Clemens Eisserer wrote:
 Hi Martin,
 
 So, an interesting variation could be to have filesystem level raid
 operating on ext4 or nilfs or whatever... Would that be a sensible idea?
 
 Thats already supported by using LVM. What do you think you would gain
 from layering in top of btrfs?

md-raid and lvm-raid are raid at the block level.

btrfs-raid offers a greater variety and far greater flexibility of raid
options individually for filedata and metadata at the filesystem level.

raid at the filesystem level should also gain higher performance over
that of just blindly replicating blocks of binary data across devices at
the block level.


My thoughts are to take advantage of the btrfs-raid work being done but
for all filesystems. Hence, we can then have a very flexible raid
available for whatever filesystem might be best for any underlying device.

OK... So we make all of lvm, md-raid, and drbd all redundant!

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs (general) raid for other filesystems?

2013-05-20 Thread Martin
On 19/05/13 20:34, Chris Murphy wrote:
 On May 19, 2013, at 12:59 PM, Martin m_bt...@ml1.co.uk wrote:
 
 btrfs-raid offers a greater variety and far greater flexibility of
 raid options individually for filedata and metadata at the
 filesystem level.
 
 Well it really doesn't. The btrfs raid advantages leverage prior work
 that makes btrfs what it is.

Indeed, the btrfs raid as evolving looks to be tightly part of btrfs
itself, shaped by what is being done in btrfs...

And also there is the work going into how the 'raid' semantics
operate for data and the filesystem metadata.

Also tied into that is storage balancing and load (io bandwidth)
balancing with most recently developers looking how to move 'hot' data
onto preferred physical drives?


 OK... So we make all of lvm, md-raid, and drbd all redundant!
 
 No they are different things for different use cases. What you seem
 to be asking for is for a ZFS-like feature that allows other file
 systems to exist on ZFS, and thereby gaining some of the advantage of
 the underlying file system.

That's going a little too deep...


My thoughts are much more shallow. Can the raid and load balancing work
being done for btrfs be bundled up so as to permit that to also be used
instead as a filesystem layer that then utilises /any/ underlying
filesystem?

So, instead of btrfs style file-level raid and load balancing only on
devices which have been formatted with btrfs, the raid and load
balancing operates as a filsystem layer that coordinates storing files
on any motley collection of multiple whatever filesystem-on-device.

Obvious enough for raid1 to 'tee' a file out to multiple filesystems.

For the other raids, filenames would need to be munged to denote their
multiple parts (simply always append a 6 character index?). raid0 would
need a file to be split into parts and then those file parts
concatenated upon reading under the original filename. raid5/6 would
similarly need file splitting but also the data redundancy added.

For example, for paranoid redundancy and fast operation:


raid1 + load balance

 |
 V

btrfs on HDD1, ext4 on HDD2, NILFS on flash1, nfs to host2


Obviously, doing that loses any features (such as snapshots) not common
across all the group.


As for a use case? Would that be a good idea or not? :-)

One thought is that users could set up funky redundant operation across
networked devices using nfs.

Another thought is that we go to an awful lot of trouble to accommodate
extremely different storage technologies that are only ever going to
physically diverge further. For example, we have HDDs and SSDs. We also
have much cheaper flash with very limited wear levelling ideal for
'cold' data. Or even raw flash without all the proprietary firmware
obscurity... Hence dedicate a particular filesystem to each rather than
one monster for all?


The raid + load balance could be a well defined layer with no or few
special hooks into the lower layers.


All just a thought...

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Virtual Device Support

2013-05-21 Thread Martin
On 21/05/13 04:37, Chris Murphy wrote:
 
 On May 20, 2013, at 7:08 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Chris Murphy posted on Sun, 19 May 2013 12:18:19 -0600 as
 excerpted:
 
 It seems inconsistent that mount and unmount allows a /dev/
 designation, but only mount honors label and UUID.
 
 Yes.
 
 I'm going to contradict myself and point out that mount with label or
 UUID is made unambiguous via either the default subvolume being
 mounted, or the -o subvol= option being specified. The volume label
 and UUID doesn't apply to umount because it's an ambiguous command.
 You'd have to umount a mountpoint, or possibly a subvolume specific
 UUID.


I'll admit that I prefer working with filesystem labels.


This is getting rather semantic... From man umount, this is what
umount intends:

#
umount [-dflnrv] {dir|device}...

The  umount  command  detaches the file system(s) mentioned from the
file hierarchy.  A file system is specified by giving the directory
where it has been mounted.  Giving the special device on which the file
system lives may also work, but is obsolete, mainly because it will fail
in case this device was mounted on more than one directory.
#


I guess the ideas of labels and UUID and multiple devices came out a few
years later?... For btrfs, umount needs to operate on the default subvol
but with the means for also specifying a specific subvol if needed.

One hook for btrfs to extend what/how 'umount' operates might be to
perhaps extend what can be done with a /sbin/(u?)mount.btrfs 'helper'?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Virtual Device Support (N-way mirror code)

2013-05-21 Thread Martin
Duncan,

Thanks for quiet a historical summary.

Yep, ReiserFS has stood the test of time very well and I'm still using
and abusing it still on various servers all the way from something like
a decade ago!

More recently I've been putting newer systems on ext4 mainly to take
advantage of extents for large files on all disk types, and also
deferred allocation to hopefully reduce wear on SSDs.

Meanwhile, I've seen no need to change the ReiserFS on the existing
systems, even for the multi-Terabyte backups. The near unlimited file
linking is beautiful for creating in effect incremental backups spanning
years!

All on raid1 or raid5, and all remarkably robust.

Enough waffle! :-)


On 21/05/13 04:59, Duncan wrote:
 And hopefully, now that btrfs raid5/6 is in, in a few cycles the N-way 
 mirrored code will make it as well

I too am waiting for the N-way mirrored code for example to have 3
copies of data/metadata across 4 physical disks.

When might that hit? Or is there a stable patch that can be added into
kernel 3.8.13?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs pseudo-drbd

2013-05-22 Thread Martin
On 19/05/13 18:32, Martin wrote:
 Dear Devs,
 
 Would there be any problem to use nbd (/dev/ndX) devices to gain
 btrfs-raid across multiple physical hosts across a network? (For a sort
 of btrfs-drbd! :-) )
 
 
 Regards,
 Martin
 
 
 http://en.wikipedia.org/wiki/Network_block_device
 
 http://www.drbd.org/


As a follow-up, both nbd and AoE look to be active.

nbd uses tcp/ip (layer 3) and is network routable;

AoE operates on layer 2 (no IP addressing) and so looks to enjoy a lower
overhead for the performance. Ideal for putting together your own low
cost SAN!


Network Block Device (TCP version)
http://nbd.sourceforge.net/

ATA Over Ethernet: As an Alternative
http://www.rfxn.com/ata-over-ethernet-as-an-alternative/

EtherDrive® storage and Linux 2.6
http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html




Hope of interest,

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs raid1 on 16TB: INFO: task rsync:11022 blocked for more than 180 seconds

2013-06-05 Thread Martin
Dear Devs,

I have x4 4TB HDDs formatted with:

mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef]


/etc/fstab mounts with the options:

noatime,noauto,space_cache,inode_cache


All on kernel 3.8.13.


Upon using rsync to copy some heavily hardlinked backups from ReiserFS,
I've so far had various:

INFO: task rsync:11022 blocked for more than 180 seconds

and one:

INFO: task btrfs-endio-wri:10816 blocked for more than 180 seconds

Further detail listed below.


What's the fix or any debug worthwhile?

Regards,
Martin



x1 of these:

kernel: INFO: task rsync:11022 blocked for more than 180 seconds.
kernel: echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
message.
kernel: rsync   D  0 11022  11021 0x
kernel: 88012b0ae360 0082 815f1400 000120c0
kernel: 4000 880108a67fd8  810312ac
kernel: 8801115ae748 810e3bad 8801115ae748 0081
kernel: Call Trace:
kernel: [810312ac] ? ns_capable+0x33/0x46
kernel: [810e3bad] ? generic_permission+0x19e/0x1fe
kernel: [810e427d] ? __inode_permission+0x2f/0x6d
kernel: [810e3d63] ? lookup_fast+0x39/0x23c
kernel: [811f464c] ? wait_current_trans.isra.29+0xa9/0xd8
kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79
kernel: [811f5b59] ? start_transaction+0x3de/0x408
kernel: [810f013c] ? setattr_copy+0x8c/0xcb
kernel: [811ff22b] ? btrfs_dirty_inode+0x24/0xa4
kernel: [810effe8] ? notify_change+0x1f0/0x2b8
kernel: [810ff680] ? utimes_common+0x10c/0x135
kernel: [810df445] ? cp_new_stat+0x10d/0x11f
kernel: [810ff79a] ? do_utimes+0xf1/0x129
kernel: [810df7d9] ? sys_newlstat+0x23/0x2b
kernel: [810ff89b] ? sys_utimensat+0x64/0x6b
kernel: [81431652] ? system_call_fastpath+0x16/0x1b


x2 of these:

kernel: INFO: task rsync:11022 blocked for more than 180 seconds.
kernel: echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
message.
kernel: rsync   D  0 11022  11021 0x
kernel: 88012b0ae360 0082 815f1400 000120c0
kernel: 4000 880108a67fd8  810e3d63
kernel: 88012b0ae360 810e3b65 88001e959ef8 0081
kernel: Call Trace:
kernel: [810e3d63] ? lookup_fast+0x39/0x23c
kernel: [810e3b65] ? generic_permission+0x156/0x1fe
kernel: [810e427d] ? __inode_permission+0x2f/0x6d
kernel: [810e3d63] ? lookup_fast+0x39/0x23c
kernel: [811f464c] ? wait_current_trans.isra.29+0xa9/0xd8
kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79
kernel: [811f5b59] ? start_transaction+0x3de/0x408
kernel: [810f013c] ? setattr_copy+0x8c/0xcb
kernel: [811ff22b] ? btrfs_dirty_inode+0x24/0xa4
kernel: [810effe8] ? notify_change+0x1f0/0x2b8
kernel: [810ff680] ? utimes_common+0x10c/0x135
kernel: [810df445] ? cp_new_stat+0x10d/0x11f
kernel: [810ff79a] ? do_utimes+0xf1/0x129
kernel: [810df7d9] ? sys_newlstat+0x23/0x2b
kernel: [810ff89b] ? sys_utimensat+0x64/0x6b
kernel: [81431652] ? system_call_fastpath+0x16/0x1b


x7 of these:

kernel: INFO: task rsync:11022 blocked for more than 180 seconds.
kernel: echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
message.
kernel: rsync   D  0 11022  11021 0x
kernel: 88012b0ae360 0082 815f1400 000120c0
kernel: 4000 880108a67fd8 88010d9270c9 810e520b
kernel: 7fffd5adb458 811e2fba 880108a67d88 810e3411
kernel: Call Trace:
kernel: [810e520b] ? path_init+0x1da/0x32c
kernel: [811e2fba] ? reserve_metadata_bytes.isra.59+0x7b/0x741
kernel: [810e3411] ? complete_walk+0x85/0xd6
kernel: [810ecfbc] ? __d_lookup+0x60/0x122
kernel: [811f464c] ? wait_current_trans.isra.29+0xa9/0xd8
kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79
kernel: [811f5b59] ? start_transaction+0x3de/0x408
kernel: [810e5b96] ? kern_path_create+0x78/0x110
kernel: [81200836] ? btrfs_link+0x75/0x185
kernel: [810e4c11] ? vfs_link+0x102/0x184
kernel: [810e7e90] ? sys_linkat+0x16d/0x1c7
kernel: [81431652] ? system_call_fastpath+0x16/0x1b


x1 of these:

kernel: INFO: task btrfs-endio-wri:10816 blocked for more than 180 seconds.
kernel: echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
message.
kernel: btrfs-endio-wri D  0 10816  2 0x
kernel: 880129bf4f80 0046 815f1400 000120c0
kernel: 4000 88010c635fd8 8801294404ea 88012944
Mkernel:  0050  880129e85240
kernel: Call Trace:
kernel: [810d2dbf] ? kmem_cache_alloc+0x3e/0xde

btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Martin
Dear Devs,

I have x4 4TB HDDs formatted with:

mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef]


/etc/fstab mounts with the options:

noatime,noauto,space_cache,inode_cache


All on kernel 3.8.13.


Upon using rsync to copy some heavily hardlinked backups from ReiserFS,
I've seen:


The following block rsv returned -28 is repeated 7 times until there
is a call trace for:

WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x3d/0xad().

Then, the mount is set read-only.


How to fix or debug?

Thanks,
Martin



kernel: [ cut here ]
kernel: WARNING: at fs/btrfs/extent-tree.c:6372
btrfs_alloc_free_block+0xd3/0x29c()
kernel: Hardware name: GA-MA790FX-DS5
kernel: btrfs: block rsv returned -28
kernel: Modules linked in: raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_pq act_police cls_basic cls_flow
cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress
sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_time
xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_policy xt_nat
ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set
ip_set nf_nat
_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp
nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_conntrack_tftp
nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp
nf_conntrack_pptp nf_
conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323
nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner xt_NFQUEUE xt_NFLOG
nfnetlink_log xt_multiport xt_mar
k xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP
xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp
xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_i
pv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables
x_tables bridge stp llc rtc snd_hda_codec_realtek fbcon bitblit
softcursor font nouveau video mxm_wmi cfbfillrect cfbimgblt cfbcopyarea
i2c_algo_bit evdev d
rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4 pcspkr
snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd i2c_core processor
button thermal_sys sky2 wmi backlight fb fbdev pata_acpi firewire_ohci
firewire_cor
e pata_atiixp usbhid pata_jmicron sata_sil24
kernel: Pid: 10980, comm: btrfs-transacti Not tainted 3.8.13-gentoo #1
kernel: Call Trace:
kernel: [811e6600] ? btrfs_init_new_buffer+0xef/0xf6
kernel: [810289c8] ? warn_slowpath_common+0x78/0x8c
kernel: [81028a74] ? warn_slowpath_fmt+0x45/0x4a
kernel: [81278f2c] ? ___ratelimit+0xc4/0xd0
kernel: [811e66da] ? btrfs_alloc_free_block+0xd3/0x29c
kernel: [811d68e5] ? __btrfs_cow_block+0x136/0x454
kernel: [811f0d47] ? btrfs_buffer_uptodate+0x40/0x56
kernel: [811d6d8c] ? btrfs_cow_block+0x132/0x19d
kernel: [811da606] ? btrfs_search_slot+0x2f5/0x624
kernel: [811dbc5a] ? btrfs_insert_empty_items+0x5c/0xaf
kernel: [811e5089] ? run_clustered_refs+0x852/0x8e6
kernel: [811e4d20] ? run_clustered_refs+0x4e9/0x8e6
kernel: [811e7f6b] ? btrfs_run_delayed_refs+0x10d/0x289
kernel: [811f4ec6] ? btrfs_commit_transaction+0x3a5/0x93c
kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79
kernel: [811f5a8c] ? start_transaction+0x311/0x408
kernel: [811eed7e] ? transaction_kthread+0xd1/0x16d
kernel: [811eecad] ? btrfs_alloc_root+0x34/0x34
kernel: [810420b3] ? kthread+0xad/0xb5
kernel: [81042006] ? __kthread_parkme+0x5e/0x5e
kernel: [814315ac] ? ret_from_fork+0x7c/0xb0
kernel: [81042006] ? __kthread_parkme+0x5e/0x5e
kernel: ---[ end trace b584e8ceb642293f ]---
kernel: [ cut here ]



kernel: [ cut here ]
kernel: WARNING: at fs/btrfs/super.c:256
__btrfs_abort_transaction+0x3d/0xad()
kernel: Hardware name: GA-MA790FX-DS5
kernel: btrfs: Transaction aborted
kernel: Modules linked in: raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_pq act_police cls_basic cls_flow
cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress
sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_time
xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_policy xt_nat
ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set
ip_set nf_nat
_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp
nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_conntrack_tftp
nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp
nf_conntrack_pptp nf_
conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323
nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner xt_NFQUEUE xt_NFLOG
nfnetlink_log xt_multiport xt_mar
k xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP
xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT

Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Martin
On 05/06/13 16:05, Hugo Mills wrote:
 On Wed, Jun 05, 2013 at 03:57:42PM +0100, Martin wrote:
 Dear Devs,
 
 I have x4 4TB HDDs formatted with:
 
 mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef]
 
 
 /etc/fstab mounts with the options:
 
 noatime,noauto,space_cache,inode_cache
 
 
 All on kernel 3.8.13.
 
 
 Upon using rsync to copy some heavily hardlinked backups from
 ReiserFS, I've seen:
 
 
 The following block rsv returned -28 is repeated 7 times until
 there is a call trace for:
 
 This is ENOSPC. Can you post the output of btrfs fi df 
 /mountpoint and btrfs fi show, please?


btrfs fi df:

Data, RAID1: total=2.85TB, used=2.84TB
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=412.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=27.00GB, used=25.82GB
Metadata: total=8.00MB, used=0.00


btrfs fi show:

Label: 'bu-16TB_0'  uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9
Total devices 4 FS bytes used 2.87TB
devid4 size 3.64TB used 1.44TB path /dev/sdf
devid3 size 3.64TB used 1.44TB path /dev/sde
devid1 size 3.64TB used 1.44TB path /dev/sdc
devid2 size 3.64TB used 1.44TB path /dev/sdd


And df -h:

Filesystem  Size  Used Avail Use% Mounted on
/dev/sde 15T  5.8T  8.9T  40% /mnt/sata16




 WARNING: at fs/btrfs/super.c:256
 __btrfs_abort_transaction+0x3d/0xad().
 
 Then, the mount is set read-only.
 
 
 How to fix or debug?
 
 Thanks, Martin
 
 
 
 kernel: [ cut here ] kernel: WARNING: at
 fs/btrfs/extent-tree.c:6372 btrfs_alloc_free_block+0xd3/0x29c() 
 kernel: Hardware name: GA-MA790FX-DS5 kernel: btrfs: block rsv
 returned -28 kernel: Modules linked in: raid456 async_raid6_recov
 async_memcpy async_pq async_xor xor async_tx raid6_pq act_police
 cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb
 sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter
 xt_statistic xt_CT xt_LOG xt_time xt_connlimit xt_realm
 xt_addrtype xt_comment xt_recent xt_policy xt_nat ipt_ULOG
 ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set 
 ip_set nf_nat _tftp nf_nat_snmp_basic nf_conntrack_snmp
 nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323
 nf_nat_ftp nf_conntrack_tftp nf_conntrack_sip
 nf_conntrack_proto_udplite nf_conntrack_proto_sctp 
 nf_conntrack_pptp nf_ conntrack_proto_gre nf_conntrack_netlink
 nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc
 nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner
 xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mar k xt_mac
 xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP 
 xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT
 xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat
 nf_conntrack_ipv4 nf_defrag_i pv4 nf_conntrack iptable_mangle
 nfnetlink iptable_filter ip_tables x_tables bridge stp llc rtc
 snd_hda_codec_realtek fbcon bitblit softcursor font nouveau video
 mxm_wmi cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit evdev d 
 rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4
 pcspkr snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd
 i2c_core processor button thermal_sys sky2 wmi backlight fb fbdev
 pata_acpi firewire_ohci firewire_cor e pata_atiixp usbhid
 pata_jmicron sata_sil24 kernel: Pid: 10980, comm: btrfs-transacti
 Not tainted 3.8.13-gentoo #1 kernel: Call Trace: kernel:
 [811e6600] ? btrfs_init_new_buffer+0xef/0xf6 kernel:
 [810289c8] ? warn_slowpath_common+0x78/0x8c kernel:
 [81028a74] ? warn_slowpath_fmt+0x45/0x4a kernel:
 [81278f2c] ? ___ratelimit+0xc4/0xd0 kernel:
 [811e66da] ? btrfs_alloc_free_block+0xd3/0x29c kernel:
 [811d68e5] ? __btrfs_cow_block+0x136/0x454 kernel:
 [811f0d47] ? btrfs_buffer_uptodate+0x40/0x56 kernel:
 [811d6d8c] ? btrfs_cow_block+0x132/0x19d kernel:
 [811da606] ? btrfs_search_slot+0x2f5/0x624 kernel:
 [811dbc5a] ? btrfs_insert_empty_items+0x5c/0xaf kernel:
 [811e5089] ? run_clustered_refs+0x852/0x8e6 kernel:
 [811e4d20] ? run_clustered_refs+0x4e9/0x8e6 kernel:
 [811e7f6b] ? btrfs_run_delayed_refs+0x10d/0x289 kernel:
 [811f4ec6] ? btrfs_commit_transaction+0x3a5/0x93c 
 kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79 
 kernel: [811f5a8c] ? start_transaction+0x311/0x408 
 kernel: [811eed7e] ? transaction_kthread+0xd1/0x16d 
 kernel: [811eecad] ? btrfs_alloc_root+0x34/0x34 kernel:
 [810420b3] ? kthread+0xad/0xb5 kernel:
 [81042006] ? __kthread_parkme+0x5e/0x5e kernel:
 [814315ac] ? ret_from_fork+0x7c/0xb0 kernel:
 [81042006] ? __kthread_parkme+0x5e/0x5e kernel: ---[
 end trace b584e8ceb642293f ]--- kernel: [ cut here
 ]
 
 
 
 kernel: [ cut here ] kernel: WARNING: at
 fs/btrfs/super.c:256 __btrfs_abort_transaction+0x3d/0xad() 
 kernel: Hardware name: GA-MA790FX-DS5 kernel: btrfs: Transaction

Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Martin
On 05/06/13 16:43, Hugo Mills wrote:
 On Wed, Jun 05, 2013 at 04:28:33PM +0100, Martin wrote:
 On 05/06/13 16:05, Hugo Mills wrote:
 On Wed, Jun 05, 2013 at 03:57:42PM +0100, Martin wrote:
 Dear Devs,
 
 I have x4 4TB HDDs formatted with:
 
 mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef]
 
 
 /etc/fstab mounts with the options:
 
 noatime,noauto,space_cache,inode_cache
 
 
 All on kernel 3.8.13.
 
 
 Upon using rsync to copy some heavily hardlinked backups
 from ReiserFS, I've seen:
 
 
 The following block rsv returned -28 is repeated 7 times
 until there is a call trace for:
 
 This is ENOSPC. Can you post the output of btrfs fi df 
 /mountpoint and btrfs fi show, please?
 
 
 btrfs fi df:
 
 Data, RAID1: total=2.85TB, used=2.84TB Data: total=8.00MB,
 used=0.00 System, RAID1: total=8.00MB, used=412.00KB System:
 total=4.00MB, used=0.00 Metadata, RAID1: total=27.00GB,
 used=25.82GB Metadata: total=8.00MB, used=0.00
 
 
 btrfs fi show:
 
 Label: 'bu-16TB_0'  uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9 
 Total devices 4 FS bytes used 2.87TB devid4 size 3.64TB used
 1.44TB path /dev/sdf devid3 size 3.64TB used 1.44TB path
 /dev/sde devid1 size 3.64TB used 1.44TB path /dev/sdc devid
 2 size 3.64TB used 1.44TB path /dev/sdd
 
 OK, so you've got plenty of space to allocate. There were some 
 issues in this area (block reserves and ENOSPC, and I think 
 specifically addressing the issue of ENOSPC when there's space 
 available to allocate) that were fixed between 3.8 and 3.9 (and 
 probably some between 3.9 and 3.10-rc as well), so upgrading your 
 kernel _may_ help here.
 
 Something else that may possibly help as a sticking-plaster is to 
 write metadata more slowly, so that you don't have quite so much of
 it waiting to be written out for the next transaction. Practically,
 this may involve things like running sync on a loop. But it's
 definitely a horrible hack that may help if you're desperate for a
 quick fix until you can finish creating metadata so quickly and
 upgrade your kernel...
 
 Hugo.

Thanks for that. I can give kernel 3.9.4 a try. For a giggle, I'll try
first with nice 19 and syncs in a loop...


One confusing bit is why the Data, RAID1: total=2.85TB from btrfs
fi df?

Thanks,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Martin
On 05/06/13 17:24, David Sterba wrote:
 On Wed, Jun 05, 2013 at 04:43:29PM +0100, Hugo Mills wrote:
OK, so you've got plenty of space to allocate. There were some
 issues in this area (block reserves and ENOSPC, and I think
 specifically addressing the issue of ENOSPC when there's space
 available to allocate) that were fixed between 3.8 and 3.9 (and
 probably some between 3.9 and 3.10-rc as well), so upgrading your
 kernel _may_ help here.
 
 This is supposed to be fixed by
 https://patchwork-mail2.kernel.org/patch/2558911/
 
 that went ti 3.10-rc with some followup patches, so it might not be
 enough as a standalone fix.
 
 Unless you really need 'inode_cache', remove it from the mount options.

Thanks for that. Remounting without the inode_cache option looks to be
allowing rsync to continue. (No sync loop needed.)


For a 16TB raid1 on kernel 3.8.13, any good mount options to try?

For that size of storage and with many hard links, is there any
advantage formatting with leaf/node size greater than the default 4kBytes?


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-07 Thread Martin
On 05/06/13 22:12, Martin wrote:
 On 05/06/13 17:24, David Sterba wrote:
 On Wed, Jun 05, 2013 at 04:43:29PM +0100, Hugo Mills wrote:
OK, so you've got plenty of space to allocate. There were some
 issues in this area (block reserves and ENOSPC, and I think
 specifically addressing the issue of ENOSPC when there's space
 available to allocate) that were fixed between 3.8 and 3.9 (and
 probably some between 3.9 and 3.10-rc as well), so upgrading your
 kernel _may_ help here.

 This is supposed to be fixed by
 https://patchwork-mail2.kernel.org/patch/2558911/

 that went ti 3.10-rc with some followup patches, so it might not be
 enough as a standalone fix.

 Unless you really need 'inode_cache', remove it from the mount options.
 
 Thanks for that. Remounting without the inode_cache option looks to be
 allowing rsync to continue. (No sync loop needed.)

rsync is still running ok but the data copying is awfully slow... The
copy across is going to take many days at this rate :-(


 For a 16TB raid1 on kernel 3.8.13, any good mount options to try?
 
 For that size of storage and with many hard links, is there any
 advantage formatting with leaf/node size greater than the default 4kBytes?

Any hints/tips? ;-)


Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid1 inefficient unbalanced filesystem reads

2013-06-28 Thread Martin
On kernel 3.8.13:

Using two equal performance SATAII HDDs, formatted for btrfs raid1 for
both data and metadata and:

The second disk appears to suffer about x8 the read activity of the
first disk. This causes the second disk to quickly get maxed out whilst
the first disk remains almost idle.

Total writes to the two disks is equal.

This is noticeable for example when running emerge --sync or running
compiles on Gentoo.


Is this a known feature/problem or worth looking/checking further?

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 inefficient unbalanced filesystem reads

2013-06-28 Thread Martin
On 28/06/13 16:39, Hugo Mills wrote:
 On Fri, Jun 28, 2013 at 11:34:18AM -0400, Josef Bacik wrote:
 On Fri, Jun 28, 2013 at 02:59:45PM +0100, Martin wrote:
 On kernel 3.8.13:
 
 Using two equal performance SATAII HDDs, formatted for btrfs
 raid1 for both data and metadata and:
 
 The second disk appears to suffer about x8 the read activity of
 the first disk. This causes the second disk to quickly get
 maxed out whilst the first disk remains almost idle.
 
 Total writes to the two disks is equal.
 
 This is noticeable for example when running emerge --sync or
 running compiles on Gentoo.
 
 
 Is this a known feature/problem or worth looking/checking
 further?
 
 So we balance based on pids, so if you have one process that's
 doing a lot of work it will tend to be stuck on one disk, which
 is why you are seeing that kind of imbalance.  Thanks,
 
 The other scenario is if the sequence of processes executed to do 
 each compilation step happens to be an even number, then the 
 heavy-duty file-reading parts will always hit the same parity of
 PID number. If each tool has, say, a small wrapper around it, then
 the wrappers will all run as (say) odd PIDs, and the tools
 themselves will run as even pids...

Ouch! Good find...

To just test with a:

for a in {1..4} ; do ( dd if=/dev/zero of=$a bs=10M count=100  ) ; done

ps shows:

martin9776  9.6  0.1  18740 10904 pts/2D17:15   0:00 dd
martin9778  8.5  0.1  18740 10904 pts/2D17:15   0:00 dd
martin9780  8.5  0.1  18740 10904 pts/2D17:15   0:00 dd
martin9782  9.5  0.1  18740 10904 pts/2D17:15   0:00 dd


More to the story from atop looks to be:

One disk maxed out with x3 dd on one cpu core, the second disk
utilised by one dd on the second CPU core...


Looks like using a simple round-robin is pathological for an even
number of disks, or indeed if you have a mix of disks with different
capabilities. File access will pile up on the slowest of the disks or
on whatever HDD coincides with the process (pid) creation multiple...


So... an immediate work-around is to go all SSD or work in odd
multiples of HDDs?!

Rather than that: Any easy tweaks available please?


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 inefficient unbalanced filesystem reads

2013-06-28 Thread Martin
On 28/06/13 18:04, Josef Bacik wrote:
 On Fri, Jun 28, 2013 at 09:55:31AM -0700, George Mitchell wrote:
 On 06/28/2013 09:25 AM, Martin wrote:
 On 28/06/13 16:39, Hugo Mills wrote:
 On Fri, Jun 28, 2013 at 11:34:18AM -0400, Josef Bacik wrote:
 On Fri, Jun 28, 2013 at 02:59:45PM +0100, Martin wrote:
 On kernel 3.8.13:

 flow of continual reads and writes very balanced across the first four
 drives in this set and then, like a big burp, a huge write on the fifth
 drive.  But absolutely no reads from the fifth drive so far. Very

 Well that is interesting, writes should be relatively balanced across all
 drives.  Granted we try and coalesce all writes to one drive, flush those out,
 and go on to the next drive, but you shouldn't be seeing the kind of activity
 you are currently seeing.  I will take a look at it next week and see whats
 going on.
 
 As for reads we could definitely be much smarter, I would like to do something
 like this (I'm spelling it out in case somebody wants to do it before I get to
 it)
 
 1) Keep a per-device counter of how many read requests have been done.
 2) Make the PID based decision, and then check and see if the device we've
 chosen has many more read requests than the other device.  If so choose the
 other device.
  - EXCEPTION: if we are doing a big sequential read we want to stay on one 
 disk
 since the head will be already in place on the disk we've been pegging, so
 ignore the logic for this.  This means saving the last sector we read from
 and comparing it to the next sector we are going to read from, MD does 
 this.
 - EXCEPTION to the EXCEPTION: if the devices are SSD's then don't bother
doing this work, always maintain evenness amongst the devices.
 
 If somebody were going to do this, they'd just have to find the places where 
 we
 call find_live_mirror in volumes.c and adjust their logic to just hand
 find_live_mirror the entire map and then go through the devices and make their
 decision.  You'd still need to keep the device replace logic.  Thanks,


Mmmm... I'm not sure trying to balance historical read/write counts is
the way to go... What happens for the use case of an SSD paired up with
a HDD? (For example an SSD and a similarly sized Raptor or enterprise
SCSI?...) Or even just JBODs of a mishmash of different speeds?

Rather than trying to balance io counts, can a realtime utilisation
check be made and go for the least busy?

That can be biased secondly to balance IO counts if some
'non-performance' flag/option is set/wanted by the user. Otherwise, go
firstly for what is recognised to be the fastest or least busy?...


Good find and good note!

And thanks greatly for so quickly picking this up.

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck output: What does it all mean?

2013-06-29 Thread Martin
This is the btrfsck output for a real-world rsync backup onto a btrfs
raid1 mirror across 4 drives (yes, I know at the moment for btrfs raid1
there's only ever two copies of the data...)


checking extents
checking fs roots
root 5 inode 18446744073709551604 errors 2000
root 5 inode 18446744073709551605 errors 1
root 256 inode 18446744073709551604 errors 2000
root 256 inode 18446744073709551605 errors 1
found 3183604633600 bytes used err is 1
total csum bytes: 3080472924
total tree bytes: 28427821056
total fs tree bytes: 23409475584
btree space waste bytes: 4698218231
file data blocks allocated: 3155176812544
 referenced 3155176812544
Btrfs Btrfs v0.19
Command exited with non-zero status 1


So: What does that little lot mean?

The drives were mounted and active during an unexpected power-plug pull :-(


Safe to mount again or are there other checks/fixes needed?

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 inefficient unbalanced filesystem reads

2013-06-29 Thread Martin
On 29/06/13 10:41, Russell Coker wrote:
 On Sat, 29 Jun 2013, Martin wrote:
 Mmmm... I'm not sure trying to balance historical read/write counts is
 the way to go... What happens for the use case of an SSD paired up with
 a HDD? (For example an SSD and a similarly sized Raptor or enterprise
 SCSI?...) Or even just JBODs of a mishmash of different speeds?

 Rather than trying to balance io counts, can a realtime utilisation
 check be made and go for the least busy?
 
 It would also be nice to be able to tune this.  For example I've got a RAID-1 
 array that's mounted noatime, hardly ever written, and accessed via NFS on 
 100baseT.  It would be nice if one disk could be spun down for most of the 
 time and save 7W of system power.  Something like the --write-mostly option 
 of 
 mdadm would be good here.

For that case, a --read-mostly would be more apt ;-)

Hence, add a check to preferentially use last disk used if all are idle?


 Also it should be possible for a RAID-1 array to allow faster reads for a 
 single process reading a single file if the file in question is fragmented.

That sounds good but complicated to gather and sort the fragments into
groups per disk... Or is something like that already done by the block
device elevator for HDDs?

Also, is head seek optimisation turned off for SSD accesses?


(This is sounding like a lot more than just swapping:

current-pid % map-num_stripes

to a

psuedorandomhash( current-pid ) % map-num_stripes

... ;-) )


Are there any readily accessible present state for such as disk activity
or queue length or access latency available for the btrfs process to read?

I suspect a good first guess to cover many conditions would be to
'simply' choose whichever device is powered up and has the lowest
current latency, or if idle has the lowest historical latency...


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Which better: rsync or snapshot + rsync --delete

2013-08-02 Thread Martin
Which is 'best' or 'faster'?

Take a snapshot of an existing backup and then rsync --delete into
that to make a backup of some other filesystem?

Or use rsync --link to link a new backup tree against a previous
backup tree for the some other filesystem?

Which case does btrfs handle the better?

Would there be any problems for doing this over an nfs mount of the btrfs?


Both cases can take advantage of the raid and dedup and compression
features of btrfs. Would taking a btrfs snapshot be better than rsync
creating the hard links to unchanged files?

Any other considerations?

(There are perhaps about 5% new or changed files each time.)

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Corrupt btrfs filesystem recovery... (Due to *sata* errors)

2013-09-28 Thread Martin
This may be of interest for the fail cause aswel as how to recover...


I have a known good 2TB (4kByte physical sectors) HDD that supports
sata3 (6Gbit/s). Writing data via rsync at the 6Gbit/s sata rate caused
IO errors for just THREE sectors...

Yet btrfsck bombs out with LOTs of errors...

How best to recover from this?

(This is a 'backup' disk so not 'critical' but it would be nice to avoid
rewriting about 1.5TB of data over the network...)


Is there an obvious sequence/recipe to follow for recovery?

Thanks,
Martin



Further details:

Linux  3.10.7-gentoo-r1 #2 SMP Fri Sep 27 23:38:06 BST 2013 x86_64 AMD
E-450 APU with Radeon(tm) HD Graphics AuthenticAMD GNU/Linux

# btrfs version
Btrfs v0.20-rc1-358-g194aa4a

Single 2TB HDD using default mkbtrfs.
Entire disk (/dev/sdc) is btrfs (no partitions).


The IO errors were:

kernel: end_request: I/O error, dev sdc, sector 3215049328
kernel: end_request: I/O error, dev sdc, sector 3215049328
kernel: end_request: I/O error, dev sdc, sector 3215049328
kernel: end_request: I/O error, dev sdc, sector 3215049328
kernel: end_request: I/O error, dev sdc, sector 3215049328
kernel: end_request: I/O error, dev sdc, sector 3206563752
kernel: end_request: I/O error, dev sdc, sector 3206563752
kernel: end_request: I/O error, dev sdc, sector 3206563752
kernel: end_request: I/O error, dev sdc, sector 3206563752
kernel: end_request: I/O error, dev sdc, sector 3206563752
kernel: end_request: I/O error, dev sdc, sector 3213925248
kernel: end_request: I/O error, dev sdc, sector 3213925248
kernel: end_request: I/O error, dev sdc, sector 3213925248
kernel: end_request: I/O error, dev sdc, sector 3213925248
kernel: end_request: I/O error, dev sdc, sector 3213925248

Lots of sata error noise omitted.


The sata problem was fixed by limiting libata to 3Gbit/s:

libata.force=3.0G

added onto the Grub kernel line.

Running badblocks twice in succession (non-destructive data test!)
shows no surface errors and no further errors on the sata interface.

Running btrfsck twice gives the same result, giving a failure with:

Ignoring transid failure
btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec-ino
!= key-objectid || rec-refs  1)' failed.


An abridged summary is:

checking extents
parent transid verify failed on 907185082368 wanted 15935 found 12264
parent transid verify failed on 907185082368 wanted 15935 found 12264
parent transid verify failed on 907185127424 wanted 15935 found 12264
parent transid verify failed on 907185127424 wanted 15935 found 12264
leaf parent key incorrect 907185135616
bad block 907185135616
parent transid verify failed on 915444707328 wanted 16974 found 13021
parent transid verify failed on 915444707328 wanted 16974 found 13021
parent transid verify failed on 915445092352 wanted 16974 found 13021
parent transid verify failed on 915445092352 wanted 16974 found 13021
leaf parent key incorrect 915444883456
bad block 915444883456
leaf parent key incorrect 915445014528
bad block 915445014528
parent transid verify failed on 907185082368 wanted 15935 found 12264
parent transid verify failed on 907185082368 wanted 15935 found 12264
parent transid verify failed on 907185127424 wanted 15935 found 12264
parent transid verify failed on 907185127424 wanted 15935 found 12264
leaf parent key incorrect 907183771648
bad block 907183771648
leaf parent key incorrect 907183779840
bad block 907183779840
leaf parent key incorrect 907183783936
bad block 907183783936
[...]
leaf parent key incorrect 907185913856
bad block 907185913856
leaf parent key incorrect 907185917952
bad block 907185917952
parent transid verify failed on 915431579648 wanted 16974 found 16972
parent transid verify failed on 915431579648 wanted 16974 found 16972
parent transid verify failed on 915432382464 wanted 16974 found 16972
parent transid verify failed on 915432382464 wanted 16974 found 16972
parent transid verify failed on 915444707328 wanted 16974 found 13021
parent transid verify failed on 915444707328 wanted 16974 found 13021
parent transid verify failed on 915445092352 wanted 16974 found 13021
parent transid verify failed on 915445092352 wanted 16974 found 13021
parent transid verify failed on 915445100544 wanted 16974 found 13021
parent transid verify failed on 915445100544 wanted 16974 found 13021
parent transid verify failed on 915432734720 wanted 16974 found 16972
parent transid verify failed on 915432734720 wanted 16974 found 16972
parent transid verify failed on 915433144320 wanted 16974 found 16972
parent transid verify failed on 915433144320 wanted 16974 found 16972
parent transid verify failed on 915431862272 wanted 16974 found 16972
parent transid verify failed on 915431862272 wanted 16974 found 16972
parent transid verify failed on 915444715520 wanted 16974 found 13021
parent transid verify failed on 915444715520 wanted 16974 found 13021
parent transid verify failed on 915445166080 wanted 16974 found 13021
parent transid verify failed on 915445166080 wanted 16974 found

Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)

2013-09-28 Thread Martin
Chris,

All agreed. Further comment inlined:

(Should have mentioned more prominently that the hardware problem has
been worked-around by limiting the sata to 3Gbit/s on bootup.)


On 28/09/13 21:51, Chris Murphy wrote:
 
 On Sep 28, 2013, at 1:26 PM, Martin m_bt...@ml1.co.uk wrote:
 
 Writing data via rsync at the 6Gbit/s sata rate caused IO errors
 for just THREE sectors...
 
 Yet btrfsck bombs out with LOTs of errors…
 
 Any fs will bomb out on write errors.

Indeed. However, are not the sata errors reported back to btrfs so that
it knows whatever parts haven't been updated?

Is there not a mechanism to then go read-only?

Also, should not the journal limit the damage?


 How best to recover from this?
 
 Why you're getting I/O errors at SATA 6Gbps link speed needs to be
 understood. Is it a bad cable? Bad SATA port? Drive or controller
 firmware bug? Or libata driver bug?

I systematically eliminated such as leads, PSU, and NCQ. Limiting libata
to only use 3Gbit/s is the one change that gives a consistent fix. The
HDD and motherboard both support 6Gbit/s, but hey-ho, that's an
experiment I can try again some other time when I have another HDD/SSD
to test in there.

In any case, for the existing HDD - motherboard combination, using sata2
rather than sata3 speeds shouldn't noticeably impact performance. (Other
than sata2 works reliably and so is infinitely better for this case!)


 Lots of sata error noise omitted.
 
 And entire dmesg might still be useful. I don't know if the list will
 handle the whole dmesg in one email, but it's worth a shot (reply to
 an email in the thread, don't change the subject).

I can email directly if of use/interest. Let me know offlist.


 do a smartctl -x on the drive, chances are it's recording PHY Event

(smartctl -x errors shown further down...)

Nothing untoward noticed:

# smartctl -a /dev/sdc

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EARX-00PASB0
Serial Number:WD-...
LU WWN Device Id: ...
Firmware Version: 51.0AB51
User Capacity:2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:Sat Sep 28 23:35:57 2013 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[...]

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always
  -   9
  3 Spin_Up_Time0x0027   253   159   021Pre-fail  Always
  -   1983
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always
  -   55
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always
  -   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always
  -   0
  9 Power_On_Hours  0x0032   099   099   000Old_age   Always
  -   800
 10 Spin_Retry_Count0x0032   100   253   000Old_age   Always
  -   0
 11 Calibration_Retry_Count 0x0032   100   253   000Old_age   Always
  -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always
  -   53
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always
  -   31
193 Load_Cycle_Count0x0032   199   199   000Old_age   Always
  -   3115
194 Temperature_Celsius 0x0022   118   110   000Old_age   Always
  -   32
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always
  -   0
197 Current_Pending_Sector  0x0032   200   200   000Old_age   Always
  -   0
198 Offline_Uncorrectable   0x0030   200   200   000Old_age
Offline  -   0
199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age   Always
  -   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000Old_age
Offline  -   0


# smartctl -x /dev/sdc

... also shows the errors it saw:

(Just the last 4 copied which look timed for when the HDD was last
exposed to 6Gbit/s sata)

Error 46 [21] occurred at disk power-on lifetime: 755 hours (31 days +
11 hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  01 -- 51 00 08 00 00 6c 1a 4b b0 e0 00  Error: AMNF 8 sectors at LBA =
0x6c1a4bb0 = 1813662640

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time
Command/Feature_Name

Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)

2013-09-28 Thread Martin
On 28/09/13 20:26, Martin wrote:

 ... btrfsck bombs out with LOTs of errors...
 
 How best to recover from this?
 
 (This is a 'backup' disk so not 'critical' but it would be nice to avoid
 rewriting about 1.5TB of data over the network...)
 
 
 Is there an obvious sequence/recipe to follow for recovery?


I've got the drive reliably working with the sata limited to 3Gbit/s.
What is the best sequence to try to tidy-up and carry on with the 1.5TB
or so of data on there, rather than working from scratch?


So far, I've only run btrfsck since the corruption errors for the three
sectors...

Suggestions for recovery?

Thanks,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-09-28 Thread Martin
On 28/09/13 23:54, Martin wrote:
 On 28/09/13 20:26, Martin wrote:
 
 ... btrfsck bombs out with LOTs of errors...

 How best to recover from this?

 (This is a 'backup' disk so not 'critical' but it would be nice to avoid
 rewriting about 1.5TB of data over the network...)


 Is there an obvious sequence/recipe to follow for recovery?
 
 
 I've got the drive reliably working with the sata limited to 3Gbit/s.
 What is the best sequence to try to tidy-up and carry on with the 1.5TB
 or so of data on there, rather than working from scratch?
 
 
 So far, I've only run btrfsck since the corruption...

So...

Any options for btrfsck to fix things?

Or is anything/everything that is fixable automatically fixed on the
next mount?

Or should:

btrfs scrub /dev/sdX

be run first?

Or?


What does btrfs do (or can do) for recovery?

Advice welcomed,

Thanks,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)

2013-09-28 Thread Martin
Chris,

Thanks for good comment/discussion.

On 29/09/13 03:06, Chris Murphy wrote:
 
 On Sep 28, 2013, at 4:51 PM, Martin m_bt...@ml1.co.uk wrote:
 

 Stick with forced 3Gbps, but I think it's worth while to find out
 what the actual problem is. One day you forget about this 3Gbps SATA
 link, upgrade or regress to another kernel and you don't have the
 3Gbps forced speed on the parameter line, and poof - you've got more
 problems again. The hardware shouldn't negotiate a 6Gbps link and
 then do a backwards swan dive at 30,000' with your data as if it's an
 after thought.

I've got an engineer's curiosity so that one is very definitely marked
for revisiting at some time... If only to blog that x-y-z combination is
a tar pit for your data...


 In any case, for the existing HDD - motherboard combination, using
 sata2 rather than sata3 speeds shouldn't noticeably impact
 performance. (Other than sata2 works reliably and so is infinitely
 better for this case!)
 
 It's true.

Well, the IO data rate for badblocks is exactly the same as before,
limited by the speed of the physical rust spinning and data density...


 I would also separately unmount the file system, note the latest
 kernel message, then mount the file system and see if there are any
 kernel messages that might indicate recognition of problems with the
 fs.
 
 I would not use btrfsck --repair until someone says it's a good idea.
 That person would not be me.

It is sat unmounted until some informed opinion is gained...


Thanks again for your notes,

Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-09-29 Thread Martin
On 29/09/13 06:11, Duncan wrote:
 Martin posted on Sun, 29 Sep 2013 03:10:37 +0100 as excerpted:
 
 So...

 Any options for btrfsck to fix things?

 Or is anything/everything that is fixable automatically fixed on the
 next mount?

 Or should:

 btrfs scrub /dev/sdX

 be run first?

 Or?


 What does btrfs do (or can do) for recovery?
 
 Here's a general-case answer (courtesy gmane) to the order in which to 
 try recovery question, that Hugo posted a few weeks ago:
 
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999

Thanks for that. Very well found!

The instructions from Hugo are:


   Let's assume that you don't have a physical device failure (which
is a different set of tools -- mount -odegraded, btrfs dev del
missing).

   First thing to do is to take a btrfs-image -c9 -t4 of the
filesystem, and keep a copy of the output to show josef. :)

   Then start with -orecovery and -oro,recovery for pretty much
anything.

   If those fail, then look in dmesg for errors relating to the log
tree -- if that's corrupt and can't be read (or causes a crash), use
btrfs-zero-log.

   If there's problems with the chunk tree -- the only one I've seen
recently was reporting something like can't map address -- then
chunk-recover may be of use.

   After that, btrfsck is probably the next thing to try. If options
-s1, -s2, -s3 have any success, then btrfs-select-super will help by
replacing the superblock with one that works. If that's not going to
be useful, fall back to btrfsck --repair.

   Finally, btrfsck --repair --init-extent-tree may be necessary if
there's a damaged extent tree. Finally, if you've got corruption in
the checksums, there's --init-csum-tree.

   Hugo.


Those will be tried next...



 Note that in specific cases someone who knew what they were doing could 
 omit some steps and focus on others, but I'm not at that level of know 
 what I'm doing, so...
 
 Scrub... would go before this, if it's useful.  But scrub depends on a 
 second, valid copy being available in ordered to fix the bad-checksum 
 one.  On a single device btrfs, btrfs defaults to DUP metadata (unless 
 it's SSD), so you may have a second copy for that, but you won't have a 
 second copy of the data.  This is a very strong reason to go btrfs raid1 
 mode (for both data and metadata) if you can, because that gives you a 
 second copy of everything, thereby actually making use of btrfs' checksum 
 and scrub ability.  (Unfortunately, there is as yet no way to do N-way 
 mirroring, there's only the second copy not a third, no matter how many 
 devices you have in that raid1.)
 
 Finally, if you mentioned your kernel (and btrfs-tools) version(s) I 
 missed it, but [boilerplate recommendation, stressed repeatedly both in 
 the wiki and on-list] btrfs being still labeled experimental and under 
 serious development, there's still lots of bugs fixed every kernel 
 release.  So as Chris Murphy said, if you're not on 3.11-stable or 3.12-
 rcX already, get there.  Not only can the safety of your data depend on 
 it, but by choosing to run experimental we're all testers, and our 
 reports if something does go wrong will be far more usable if we're on a 
 current kernel.  Similarly, btrfs-tools 0.20-rc1 is already somewhat old; 
 you really should be on a git-snapshot beyond that.  (The master branch 
 is kept stable, work is done in other branches and only merged to master 
 when it's considered suitably stable, so a recently updated btrfs-tools 
 master HEAD is at least in theory always the best possible version you 
 can be running.  If that's ever NOT the case, then testers need to be 
 reporting that ASAP so it can be fixed, too.)
 
 Back to the kernel, it's worth noting that 3.12-rcX includes an option 
 that turns off most btrfs bugons by default.  Unless you're a btrfs 
 developer (which it doesn't sound like you are), you'll want to activate 
 that (turning off the bugons), as they're not helpful for ordinary users 
 and just force unnecessary reboots when something minor and otherwise 
 immediately recoverable goes wrong.  That's just one of the latest fixes.

Looking up what's available for Gentoo, the maintainers there look to be
nicely sharp with multiple versions available all the way up to kernel
3.11.2...

There's also the latest available from btrfs tools with
sys-fs/btrfs-progs ...

OK, so onto the cutting edge to compile them in...


Thanks all,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-09-29 Thread Martin
On 29/09/13 22:29, Martin wrote:

 Looking up what's available for Gentoo, the maintainers there look to be
 nicely sharp with multiple versions available all the way up to kernel
 3.11.2...

That is being pulled in now as expected:

sys-kernel/gentoo-sources-3.11.2


 There's also the latest available from btrfs tools with
 sys-fs/btrfs-progs ...

Oddly, that caused emerge to report:

[ebuild UD ] sys-fs/btrfs-progs-0.19.11 [0.20_rc1_p358] 0 kB

which is a *downgrade*. Hence, I'm keeping with the 0.20_rc1_p358.


 OK, so onto the cutting edge to compile them in...

Interesting times as is said in a certain part of the world...
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-10-02 Thread Martin
So... The fix:


(

Summary:

Mounting -o recovery,noatime worked well and allowed a diff check to
complete for all but one directory tree. So very nearly all the data is
fine.

Deleting the failed directory tree caused a call stack dump and eventually:

kernel: parent transid verify failed on 915444822016 wanted 16974 found
13021
kernel: BTRFS info (device sdc): failed to delete reference to
eggdrop-1.6.19.ebuild, inode 2096893 parent 5881667
kernel: BTRFS error (device sdc) in __btrfs_unlink_inode:3662: errno=-5
IO failure
kernel: BTRFS info (device sdc): forced readonly


Greater detail listed below.

What next best to try?

Safer to try again but this time with with no_space_cache,no_inode_cache?

Thanks,
Martin

)



On 29/09/13 22:29, Martin wrote:
 On 29/09/13 06:11, Duncan wrote:

 What does btrfs do (or can do) for recovery?

 Here's a general-case answer (courtesy gmane) to the order in which to 
 try recovery question, that Hugo posted a few weeks ago:

 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999
 
 Thanks for that. Very well found!
 
 The instructions from Hugo are:
 
 
Let's assume that you don't have a physical device failure (which
 is a different set of tools -- mount -odegraded, btrfs dev del
 missing).
 
First thing to do is to take a btrfs-image -c9 -t4 of the
 filesystem, and keep a copy of the output to show josef. :)
 
Then start with -orecovery and -oro,recovery for pretty much
 anything.

For anyone following this, first a health warning:

If your data is in any way critical or important, then you should
already have a backup copy elsewhere. If not, best make a binary image
copy of your disk first!


OK... So with the latest kernel (3.11.2) and btrfs tools
(Btrfs v0.20-rc1-358-g194aa4a) and the sequence went:


mount -v -t btrfs -o recovery LABEL=bu_A /mnt/bu_A

(From syslog:)

kernel: device label bu_A devid 1 transid 17222 /dev/sdc
kernel: btrfs: enabling auto recovery
kernel: btrfs: disk space caching is enabled
kernel: btrfs: bdev /dev/sdc errs: wr 0, rd 27, flush 0, corrupt 0, gen 0

Running through a diff check for part of the backups, syslog reported:

kernel: btrfs read error corrected: ino 1 off 915433144320 (dev /dev/sdc
sector 1813661856)

Also, the HDD was showing quite a few write operations so... Is
noatime set?... Ooops... Didn't include a ro... So, killed the diff
check and remounted:

mount -v -t btrfs -o remount,recovery,noatime /mnt/bu_A
mount: /dev/sdc mounted on /mnt/bu_A

kernel: btrfs: enabling inode map caching
kernel: btrfs: enabling auto recovery
kernel: btrfs: disk space caching is enabled

And running the diff check again... Now zero writes to the HDD :-)


Various syslog messages were given:

kernel: parent transid verify failed on 907185135616 wanted 15935 found
12264
kernel: btrfs read error corrected: ino 1 off 907185135616 (dev /dev/sdc
sector 1781823824)
kernel: parent transid verify failed on 907185143808 wanted 15935 found
12264
kernel: btrfs read error corrected: ino 1 off 907185143808 (dev /dev/sdc
sector 1781823840)
kernel: parent transid verify failed on 907185139712 wanted 15935 found
12264
kernel: btrfs read error corrected: ino 1 off 907185139712 (dev /dev/sdc
sector 1781823832)
kernel: parent transid verify failed on 907185152000 wanted 15935 found
10903
kernel: btrfs read error corrected: ino 1 off 907185152000 (dev /dev/sdc
sector 1781823856)
kernel: parent transid verify failed on 907183783936 wanted 15935 found
12263
kernel: btrfs read error corrected: ino 1 off 907183783936 (dev /dev/sdc
sector 1781821184)
kernel: parent transid verify failed on 907183792128 wanted 15935 found
10903
kernel: btrfs read error corrected: ino 1 off 907183792128 (dev /dev/sdc
sector 1781821200)
kernel: parent transid verify failed on 907183796224 wanted 15935 found
12263
kernel: btrfs read error corrected: ino 1 off 907183796224 (dev /dev/sdc
sector 1781821208)
kernel: parent transid verify failed on 907183841280 wanted 15935 found
10903
kernel: btrfs read error corrected: ino 1 off 907183841280 (dev /dev/sdc
sector 1781821296)
kernel: parent transid verify failed on 907183878144 wanted 15935 found
12263
kernel: btrfs read error corrected: ino 1 off 907183878144 (dev /dev/sdc
sector 1781821368)
kernel: parent transid verify failed on 907183874048 wanted 15935 found
12263
kernel: btrfs read error corrected: ino 1 off 907183874048 (dev /dev/sdc
sector 1781821360)
kernel: verify_parent_transid: 25 callbacks suppressed
kernel: parent transid verify failed on 915431288832 wanted 16974 found
16972
kernel: repair_io_failure: 25 callbacks suppressed
kernel: btrfs read error corrected: ino 1 off 915431288832 (dev /dev/sdc
sector 1813658232)
kernel: parent transid verify failed on 915444523008 wanted 16974 found
13021
kernel: parent transid verify failed on 915444523008 wanted 16974 found
13021
[...]

One directory tree failed the diff checks so I 'mv'-ed that one tree to
rename it out of the way and then ran an rm -Rf to remove

Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-10-04 Thread Martin
What best to try next?

mount -o recovery,noatime

btrfsck:
--repairtry to repair the filesystem
--init-csum-treecreate a new CRC tree
--init-extent-tree  create a new extent tree

or is a scrub worthwhile?


The fail and switch to read-only occured whilst trying to delete a known
bad directory tree. No worries for losing the data in that.

But how best to clean up the filesystem errors?


Thanks,
Martin




On 03/10/13 17:56, Martin wrote:
 On 03/10/13 01:49, Martin wrote:
 
 Summary:

 Mounting -o recovery,noatime worked well and allowed a diff check to
 complete for all but one directory tree. So very nearly all the data is
 fine.

 Deleting the failed directory tree caused a call stack dump and eventually:

 kernel: parent transid verify failed on 915444822016 wanted 16974 found
 13021
 kernel: BTRFS info (device sdc): failed to delete reference to
 eggdrop-1.6.19.ebuild, inode 2096893 parent 5881667
 kernel: BTRFS error (device sdc) in __btrfs_unlink_inode:3662: errno=-5
 IO failure
 kernel: BTRFS info (device sdc): forced readonly


 Greater detail listed below.

 What next best to try?

 Safer to try again but this time with with no_space_cache,no_inode_cache?

 Thanks,
 Martin
 
 
 Next best step to try?

 Remount -o recovery,noatime again?
 
 
 In the meantime, trying:
 
 btrfsck /dev/sdc
 
 gave the following output + abort:
 
 parent transid verify failed on 915444523008 wanted 16974 found 13021
 Ignoring transid failure
 btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec-ino
 != key-objectid || rec-refs  1)' failed.
 id not match free space cache generation (1625)
 free space inode generation (0) did not match free space cache
 generation (1607)
 free space inode generation (0) did not match free space cache
 generation (1604)
 free space inode generation (0) did not match free space cache
 generation (1606)
 free space inode generation (0) did not match free space cache
 generation (1620)
 free space inode generation (0) did not match free space cache
 generation (1626)
 free space inode generation (0) did not match free space cache
 generation (1609)
 free space inode generation (0) did not match free space cache
 generation (1653)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1649)
 
 
 (There was no syslog output.)
 
 Full btrfsck listing attached.
 
 
 Suggestions please?
 
 Thanks,
 Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs recovery: What do the commands actually do?

2013-10-04 Thread Martin
There's ad-hoc comment for various commands to recover from filesystem
errors.

But what do they actually do and when should what command be used?

(The wiki gives scant indication other than to 'blindly' try things...)


There's:

mount -o recovery,noatime

btrfsck:
--repairtry to repair the filesystem
--init-csum-treecreate a new CRC tree
--init-extent-tree  create a new extent tree

And there is scrub...


What do they do exactly and what are the indicators to try using them?

Or when should you 'give up' on a filsystem and just retrieve whatever
data can be read and start again?


All that lot sounds good for a wiki page ;-)

Thanks,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs recovery: What do the commands actually do?

2013-10-04 Thread Martin
On 04/10/13 19:32, Duncan wrote:
 Martin posted on Fri, 04 Oct 2013 16:47:19 +0100 as condensed:
 
 There's ad-hoc comment for various commands to recover from filesystem
 errors.

 But what do they actually do and when should what command be used?
 What do they do exactly and what are the indicators to try using them?
 Or when should you 'give up' on a filsystem and just retrieve whatever
 data can be read and start again?

 All that lot sounds good for a wiki page ;-)
 
 I recognize your name so you're a regular poster and may well have seen

Hail fellow Gentoo-er ;-)


This is a prod from the thread:

http://article.gmane.org/gmane.comp.file-systems.btrfs/28775


 this recover steps/order post from Hugo Mills, but you didn't mention it, 
 so...
 
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999
 
 As you suggest, that should really go in the wiki (maybe it's there 
 already since that post, I haven't actually checked recently, but your 
 post reads as if you looked and couldn't find a recovery list of this 
 nature), but I've not gotten around to creating an account for myself 
 there yet and committing it, and if no one else has either...
 
 But I do have it bookmarked for posting here, and for the day I do create 
 myself that wiki account, if no one else has gotten to it by then...
 
 And while that answers what and what order, it doesn't cover what the 
 commands actually do or why you'd /use/ that order, and that'd be very 
 good to add as well.

Yep.

I'm using this as a bit of a test case as to how best to recover from
whatever inevitable hiccups.

All the more important to gain a good understanding before doing similar
things to 16TB arrays...


Comment/advice welcomed (please).

Thanks,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-10-05 Thread Martin
No comment so blindly trying:

btrfsck --repair /dev/sdc

gave the following abort:

btrfsck: extent-tree.c:2736: alloc_reserved_tree_block: Assertion
`!(ret)' failed.

Full output attached.


All on:

3.11.2-gentoo
Btrfs v0.20-rc1-358-g194aa4a

For a 2TB single HDD formatted with defaults.


What next?

Thanks,
Martin




 In the meantime, trying:

 btrfsck /dev/sdc

 gave the following output + abort:

 parent transid verify failed on 915444523008 wanted 16974 found 13021
 Ignoring transid failure
 btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec-ino
 != key-objectid || rec-refs  1)' failed.
 id not match free space cache generation (1625)
 free space inode generation (0) did not match free space cache
 generation (1607)
 free space inode generation (0) did not match free space cache
 generation (1604)
 free space inode generation (0) did not match free space cache
 generation (1606)
 free space inode generation (0) did not match free space cache
 generation (1620)
 free space inode generation (0) did not match free space cache
 generation (1626)
 free space inode generation (0) did not match free space cache
 generation (1609)
 free space inode generation (0) did not match free space cache
 generation (1653)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1649)


 (There was no syslog output.)

 Full btrfsck listing attached.


 Suggestions please?

 Thanks,
 Martin


checking extents
leaf parent key incorrect 907183771648
bad block 907183771648
leaf parent key incorrect 907183779840
bad block 907183779840
leaf parent key incorrect 907183882240
bad block 907183882240
leaf parent key incorrect 907185160192
bad block 907185160192
leaf parent key incorrect 907185201152
bad block 907185201152
leaf parent key incorrect 915432497152
bad block 915432497152
leaf parent key incorrect 915432509440
bad block 915432509440
leaf parent key incorrect 915432513536
bad block 915432513536
leaf parent key incorrect 915432529920
bad block 915432529920
leaf parent key incorrect 915432701952
bad block 915432701952
leaf parent key incorrect 915433058304
bad block 915433058304
leaf parent key incorrect 915437543424
bad block 915437543424
leaf parent key incorrect 915437563904
bad block 915437563904
leaf parent key incorrect 91569760
bad block 91569760
leaf parent key incorrect 91573856
bad block 91573856
leaf parent key incorrect 915444506624
bad block 915444506624
leaf parent key incorrect 915444518912
bad block 915444518912
leaf parent key incorrect 915444523008
bad block 915444523008
leaf parent key incorrect 915444527104
bad block 915444527104
leaf parent key incorrect 915444539392
bad block 915444539392
leaf parent key incorrect 915444543488
bad block 915444543488
leaf parent key incorrect 915444547584
bad block 915444547584
leaf parent key incorrect 915444551680
bad block 915444551680
leaf parent key incorrect 915444555776
bad block 915444555776
leaf parent key incorrect 915444559872
bad block 915444559872
leaf parent key incorrect 915444563968
bad block 915444563968
leaf parent key incorrect 915444572160
bad block 915444572160
leaf parent key incorrect 915444576256
bad block 915444576256
leaf parent key incorrect 915444580352
bad block 915444580352
leaf parent key incorrect 915444584448
bad block 915444584448
leaf parent key incorrect 915444588544
bad block 915444588544
leaf parent key incorrect 915444678656
bad block 915444678656
leaf parent key incorrect 915444682752
bad block 915444682752
leaf parent key incorrect 915444793344
bad block 915444793344
leaf parent key incorrect 915444797440
bad block 915444797440
leaf parent key incorrect 915444813824
bad block 915444813824
leaf parent key incorrect 915444817920
bad block 915444817920
leaf parent key incorrect 915444822016
bad block 915444822016
leaf parent key incorrect 915444826112
bad block 915444826112
leaf parent key incorrect 915444830208
bad block 915444830208
leaf parent key incorrect 915444834304
bad block 915444834304
leaf parent key incorrect 915444924416
bad block 915444924416
leaf parent key incorrect 915444973568
bad block 915444973568
leaf parent key incorrect 915444977664
bad block 915444977664
leaf parent key incorrect 915444981760
bad block 915444981760
parent transid verify failed on 915444973568 wanted 16974 found 13021
parent transid verify failed on 915444973568 wanted 16974 found 13021
parent transid verify failed on 915444977664 wanted 16974 found 13021
parent transid verify failed on 915444977664 wanted 16974 found 13021
parent transid verify failed on 915444981760 wanted 16974 found 13021
parent transid verify failed on 915444981760 wanted 16974 found 13021
parent transid verify failed on 915432701952 wanted 16974 found 16972
parent transid verify failed on 915432701952 wanted 16974 found 16972
parent transid verify

ASM1083 rev01 PCIe to PCI Bridge chip (Was: Corrupt btrfs filesystem recovery... (Due to *sata* errors))

2013-10-05 Thread Martin
On 28/09/13 20:26, Martin wrote:
 AMD
 E-450 APU with Radeon(tm) HD Graphics AuthenticAMD GNU/Linux

Just in case someone else stumbles across this thread due to a related
problem for my particular motherboard...


There appears to be a fatal hardware bug for the interrupt line deassert
for a PCIe to PCI Bridge chip:

ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 01)

See the thread on https://lkml.org/lkml/2012/1/30/216

For that chip, the interrupt line is not always deasserted for PCI
interrupts. The hardware fault appears to be fixed in ASM1083 rev 03.
Unfortunately, there is no useful OS workaround possible for rev 01.

Hence, the PCI interrupts are unusable for ASM1083 rev01 ? :-(


In brief, this means that the PCI card slots on the motherboard cannot
be used for any hardware that might generate an interrupt. That means
pretty much all normal PCI cards. (The PCIe card slots are fine.)

For my own example, there does not appear to be any other devices using
that bridge chip. The only concern is for the sound chip but I happen to
never use sound on that system and so that is disabled.


The problem is listed in syslog/dmesg by lines such as:

kernel: irq 16: nobody cared (try booting with the irqpoll option)
kernel: Disabling IRQ #16


Unfortunately, the HDDs and network interfaces also use that irq or irg
17 (which can also be affected). Losing the irq will badly slow down
your system and can cause data corruption for heavy use of the HDD.



Use:
lspci | grep -i ASM1083

to see if you have that chip and if so, what revision.

To see if you have any irqpoll messages, use:
grep -ia irqpoll /var/log/messages

To list what devices use what interrupts, use either of:
grep -ia ' irq ' /var/log/messages
cat /proc/interrupts



Note that there should no longer be any ASM1083 rev01 chips being
supplied by now. (ASM1083 rev03 chips have been seen in products.)

Hope that helps for that bit of obscurity!
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt btrfs filesystem recovery... What best instructions?

2013-10-05 Thread Martin
So...

The hint there is btrfsck: extent-tree.c:2736, so trying:

btrfsck --repair --init-extent-tree /dev/sdc

That ran for a while until:

kernel: btrfsck[16610]: segfault at cc ip 0041d2a7 sp
7fffd2c2d710 error 4 in btrfsck[40+4d000]

There's no other messages in the syslog.

The output attached.


What next?


Thanks,
Martin



On 05/10/13 12:32, Martin wrote:
 No comment so blindly trying:
 
 btrfsck --repair /dev/sdc
 
 gave the following abort:
 
 btrfsck: extent-tree.c:2736: alloc_reserved_tree_block: Assertion
 `!(ret)' failed.
 
 Full output attached.
 
 
 All on:
 
 3.11.2-gentoo
 Btrfs v0.20-rc1-358-g194aa4a
 
 For a 2TB single HDD formatted with defaults.
 
 
 What next?
 
 Thanks,
 Martin
 
 
 
 
 In the meantime, trying:

 btrfsck /dev/sdc

 gave the following output + abort:

 parent transid verify failed on 915444523008 wanted 16974 found 13021
 Ignoring transid failure
 btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec-ino
 != key-objectid || rec-refs  1)' failed.
 id not match free space cache generation (1625)
 free space inode generation (0) did not match free space cache
 generation (1607)
 free space inode generation (0) did not match free space cache
 generation (1604)
 free space inode generation (0) did not match free space cache
 generation (1606)
 free space inode generation (0) did not match free space cache
 generation (1620)
 free space inode generation (0) did not match free space cache
 generation (1626)
 free space inode generation (0) did not match free space cache
 generation (1609)
 free space inode generation (0) did not match free space cache
 generation (1653)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1649)


 (There was no syslog output.)

 Full btrfsck listing attached.


 Suggestions please?

 Thanks,
 Martin


btrfs unable to find ref byte nr 912043257856 parent 0 root 1  owner 2 offset 0
btrfs unable to find ref byte nr 912043343872 parent 0 root 1  owner 1 offset 1
btrfs unable to find ref byte nr 912044331008 parent 0 root 1  owner 0 offset 1
btrfs unable to find ref byte nr 912043261952 parent 0 root 1  owner 1 offset 1
btrfs unable to find ref byte nr 912043266048 parent 0 root 1  owner 0 offset 1
checking extents
leaf parent key incorrect 907183771648
bad block 907183771648
leaf parent key incorrect 907183779840
bad block 907183779840
leaf parent key incorrect 907183882240
bad block 907183882240
leaf parent key incorrect 907185160192
bad block 907185160192
leaf parent key incorrect 907185201152
bad block 907185201152
leaf parent key incorrect 915432497152
bad block 915432497152
leaf parent key incorrect 915432509440
bad block 915432509440
leaf parent key incorrect 915432513536
bad block 915432513536
leaf parent key incorrect 915432529920
bad block 915432529920
leaf parent key incorrect 915433058304
bad block 915433058304
leaf parent key incorrect 915437543424
bad block 915437543424
leaf parent key incorrect 915437563904
bad block 915437563904
leaf parent key incorrect 91569760
bad block 91569760
leaf parent key incorrect 91573856
bad block 91573856
leaf parent key incorrect 915444506624
bad block 915444506624
leaf parent key incorrect 915444518912
bad block 915444518912
leaf parent key incorrect 915444523008
bad block 915444523008
leaf parent key incorrect 915444527104
bad block 915444527104
leaf parent key incorrect 915444539392
bad block 915444539392
leaf parent key incorrect 915444543488
bad block 915444543488
leaf parent key incorrect 915444547584
bad block 915444547584
leaf parent key incorrect 915444551680
bad block 915444551680
leaf parent key incorrect 915444555776
bad block 915444555776
leaf parent key incorrect 915444559872
bad block 915444559872
leaf parent key incorrect 915444563968
bad block 915444563968
leaf parent key incorrect 915444572160
bad block 915444572160
leaf parent key incorrect 915444576256
bad block 915444576256
leaf parent key incorrect 915444580352
bad block 915444580352
leaf parent key incorrect 915444584448
bad block 915444584448
leaf parent key incorrect 915444588544
bad block 915444588544
leaf parent key incorrect 915444793344
bad block 915444793344
leaf parent key incorrect 915444797440
bad block 915444797440
leaf parent key incorrect 915444813824
bad block 915444813824
leaf parent key incorrect 915444817920
bad block 915444817920
leaf parent key incorrect 915444822016
bad block 915444822016
leaf parent key incorrect 915444826112
bad block 915444826112
leaf parent key incorrect 915444830208
bad block 915444830208
leaf parent key incorrect 915444834304
bad block 915444834304
leaf parent key incorrect 915444924416
bad block 915444924416
ref mismatch on [12582912 8065024] extent item 0, found 1
btrfs unable to find ref byte nr 912014393344 parent 0 root 2  owner 0 offset 0
adding

btrfsck --repair --init-extent-tree: segfault error 4

2013-10-07 Thread Martin
Any clues or educated comment please?

Can the corrupt directory tree safely be ignored and left in place? Or
might that cause everything to fall over in a big heap as soon as I try
to write data again?


Could these other tricks work-around or fix the corrupt tree:

Run a scrub?

Make a snapshot and work from the snapshot?

Or try mount -o recovery,noatime again?


Or is it dead?

(The 1.5TB of backup data is replicated elsewhere but it would be good
to rescue this version rather than completely redo from scratch.
Especially so for the sake of just a few MBytes of one corrupt directory
tree.)

Thanks,
Martin



On 05/10/13 14:18, Martin wrote:
 So...
 
 The hint there is btrfsck: extent-tree.c:2736, so trying:
 
 btrfsck --repair --init-extent-tree /dev/sdc
 
 That ran for a while until:
 
 kernel: btrfsck[16610]: segfault at cc ip 0041d2a7 sp
 7fffd2c2d710 error 4 in btrfsck[40+4d000]
 
 There's no other messages in the syslog.
 
 The output attached.
 
 
 What next?
 
 
 Thanks,
 Martin
 
 
 
 On 05/10/13 12:32, Martin wrote:
 No comment so blindly trying:

 btrfsck --repair /dev/sdc

 gave the following abort:

 btrfsck: extent-tree.c:2736: alloc_reserved_tree_block: Assertion
 `!(ret)' failed.

 Full output attached.


 All on:

 3.11.2-gentoo
 Btrfs v0.20-rc1-358-g194aa4a

 For a 2TB single HDD formatted with defaults.


 What next?

 Thanks,
 Martin




 In the meantime, trying:

 btrfsck /dev/sdc

 gave the following output + abort:

 parent transid verify failed on 915444523008 wanted 16974 found 13021
 Ignoring transid failure
 btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec-ino
 != key-objectid || rec-refs  1)' failed.
 id not match free space cache generation (1625)
 free space inode generation (0) did not match free space cache
 generation (1607)
 free space inode generation (0) did not match free space cache
 generation (1604)
 free space inode generation (0) did not match free space cache
 generation (1606)
 free space inode generation (0) did not match free space cache
 generation (1620)
 free space inode generation (0) did not match free space cache
 generation (1626)
 free space inode generation (0) did not match free space cache
 generation (1609)
 free space inode generation (0) did not match free space cache
 generation (1653)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1628)
 free space inode generation (0) did not match free space cache
 generation (1649)


 (There was no syslog output.)

 Full btrfsck listing attached.


 Suggestions please?

 Thanks,
 Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck --repair --init-extent-tree: segfault error 4

2013-10-09 Thread Martin
In summary:

Looks like minimal damage remains and yet I'm still suffering
Input/output error from btrfs and btrfsck appears to have looped...

A diff check suggests the damage to be in one (heavily linked to) tree
of a few MBytes.

Would a scrub clear out the damaged trees?


Worth debugging?

Thanks,
Martin


Further detail:


On 07/10/13 20:03, Chris Murphy wrote:
 
 On Oct 7, 2013, at 8:56 AM, Martin m_bt...@ml1.co.uk wrote:
 
 
 Or try mount -o recovery,noatime again?
 
 Because of this: free space inode generation (0) did not match free
 space cache generation (1607)
 
 Try mount option clear_cache. You could then use iotop to make sure
 the btrfs-freespace process becomes inactive before unmounting the
 file system; I don't think you need to wait in order to use the file
 system, nor do you need to unmount then remount without the option.
 But if it works, it should only be needed once, not as a persistent
 mount option.

Thanks for that.

So, trying:

mount -v -t btrfs -o recovery,noatime,clear_cache /dev/sdc

gave:

kernel: device label bu_A devid 1 transid 17448 /dev/sdc
kernel: btrfs: enabling inode map caching
kernel: btrfs: enabling auto recovery
kernel: btrfs: force clearing of disk cache
kernel: btrfs: disk space caching is enabled
kernel: btrfs: bdev /dev/sdc errs: wr 0, rd 27, flush 0, corrupt 0, gen 0


btrfs-freespace appeared occasionally briefly in atop but there's no
noticeable disk activity. All very rapidly done?

Running a diff check to see if all ok and what might be missing gave the
syslog output:

kernel: verify_parent_transid: 165 callbacks suppressed
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021
kernel: parent transid verify failed on 915444506624 wanted 16974 found
13021


The diff eventually failed with Input/output error.

'mv' to move this failed directory tree out of the way worked.
Attempting to use 'ln -s' gave the attached syslog output and the
filesystem was made Read-only.

Remounting:

mount -v -o remount,recovery,noatime,clear_cache,rw /dev/sdc

and the mv looks fine. Trying the 'ln -s' again gives:

ln: creating symbolic link `./portage': Read-only file system

unmounting gave the syslog message:

kernel: btrfs: commit super ret -30


Mounting again:

mount -v -t btrfs -o recovery,noatime,clear_cache /dev/sdc

showed that the symbolic link was put in place ok.

Rerunning the diff check eventually found another Input/output error.


So unmounted and tried again:

btrfsck --repair --init-extent-tree /dev/sdc

Failed with:

btrfs unable to find ref byte nr 911367733248 parent 0 root 1  owner 2
offset 0
btrfs unable to find ref byte nr 911367737344 parent 0 root 1  owner 1
offset 1
btrfs unable to find ref byte nr 911367741440 parent 0 root 1  owner 0
offset 1
leaf free space ret -297791851, leaf data size 3995, used 297795846
nritems 2
checking extents
btrfsck: extent_io.c:606: free_extent_buffer: Assertion `!(eb-refs 
0)' failed.
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 38a60270-f9c6-4ed4-8421-4bf1253ae0b3
Creating a new extent tree
Failed to find [911367733248, 168, 4096]
Failed to find [911367737344, 168, 4096]
Failed to find [911367741440, 168, 4096]



Rerunning again and this time btrfsck is sat there at 100% CPU for the
last 24 hours. Full output so far is:

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure


Nothing syslog and no disk activity.

Looped?...




 Or is it dead?
 
 (The 1.5TB of backup data is replicated elsewhere but it would be
 good to rescue this version rather than completely redo from
 scratch. Especially so for the sake of just a few MBytes of one
 corrupt directory tree.)
 
 Right. If you snapshot the subvolume containing the corrupt portion
 of the file system, the snapshot probably inherits that corruption.
 But if you write to only one of them, if those writes make the
 problem worse, should be isolated only to the one you write to. I
 might avoid writing to it, honestly. To save time, get increasingly
 aggressive to get data out of this directory and once you succeed,
 blow away the file system and start from scratch.
 
 You could also then try kernel 3.12 rc4, as there are some btrfs bug
 fixes I'm seeing in there also, but I don't know if any of them will
 help your case. If you try it, mount normally, then try to get your
 data. If that doesn't work, try the recovery option. Maybe you'll get
 different results.

As suspected

Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto

2014-03-24 Thread Martin
On 23/03/14 22:56, Marc MERLIN wrote:
 Ok, thanks to the help I got from you, and my own experiments, I've
 written this:
 http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html
 
 If someone reminds me how to edit the btrfs wiki, I'm happy to copy that
 there, or give anyone permission to take part of all of what I wrote 
 and use it for any purpose.
 
 
 
 The highlights are if you're coming from the mdadm raid5 world:
[---]
 
 Hope this helps,
 Marc

Thanks for the very good summary.

So... In very brief summary, btrfs raid5 is very much a work in progress.


Question: Is the raid5 going to be seamlessly part of the
error-correcting raids whereby raid5, raid6,
raid-with-n-redundant-drives are all coded as one configurable raid?

Also (second question): What happened to the raid naming scheme that
better described the btrfs-style of raid by explicitly numbering the
number of devices used for mirroring, striping, and error-correction?


Thanks,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Suggestion: Anti-fragmentation safety catch (RFC)

2014-03-24 Thread Martin
Just an idea:


btrfs Problem:

I've had two systems die with huge load factors 100(!) for the case
where a user program has unexpected to me been doing 'database'-like
operations and caused multiple files to become heavily fragmented. The
system eventually dies when data cannot be added to the fragmented files
faster than the real time data collection.

My example case is for two systems with btrfs raid1 using two HDDs each.
Normal write speed is about 100MByte/s. After heavy fragmentation, the
cpus are at 100% wait and i/o is a few hundred kByte/s.


Possible fix:

btrfs checks the ratio of filesize versus number of fragments and for a
bad ratio either:

1: Performs a non-cow copy to defragment the file;

2: Turns off cow for that file and gives a syslog warning for that;

3: Automatically defragments the file.



Or?


For my case, I'm not sure 2 is a good idea in case the user is
rattling through a gazillion files and the syslog gets swamped.

Unfortunately, I don't know beforehand what files to mark no-cow unless
I no-cow the entire user/applications.


Thoughts?


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suggestion: Anti-fragmentation safety catch (RFC)

2014-03-24 Thread Martin
On 24/03/14 20:19, Duncan wrote:
 Martin posted on Mon, 24 Mar 2014 19:47:34 + as excerpted:
 
 Possible fix:

 btrfs checks the ratio of filesize versus number of fragments and for a
 bad ratio either: [...]
 
 3: Automatically defragments the file.
 
 See the autodefrag mount option.
 
 =:^)

Thanks for that!

So...

https://btrfs.wiki.kernel.org/index.php/Mount_options

autodefrag (since [kernel] 3.0)

Will detect random writes into existing files and kick off background
defragging. It is well suited to bdb or sqlite databases, but not
virtualization images or big databases (yet). Once the developers make
sure it doesn't defrag files over and over again, they'll move this
toward the default.


Looks like I might be a good test case :-)


What's the problem for big images or big databases? What is considered
big?

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto

2014-03-24 Thread Martin
On 24/03/14 21:52, Marc MERLIN wrote:
 On Mon, Mar 24, 2014 at 07:17:12PM +, Martin wrote:
 Thanks for the very good summary.

 So... In very brief summary, btrfs raid5 is very much a work in progress.
 
 If you know how to use it, which I didn't know do now, it's technically very
 usable as is. The corner cases are in having a failing drive which you can't
 hot remove because you can't write to it.
 It's unfortunate that you can't just kill a drive without umounting,
 making the drive disappear so that btrfs can't see it (dmsetup remove
 cryptname for me, so it's easy to do remotely), and remounting in degraded
 mode.

Yes, looking good, but for my usage I need the option to run ok with a
failed drive. So, that's one to keep a development eye on for continued
progress...


 Question: Is the raid5 going to be seamlessly part of the
 error-correcting raids whereby raid5, raid6,
 raid-with-n-redundant-drives are all coded as one configurable raid?
 
 I'm not sure I parse your question. As far as btrfs is concerned you can
 switch from non raid to raid5 to raid6 by adding a drive and rebalancing
 which effectively reads and re-writes all the blocks in the new format.

There's a big thread a short while ago about using parity across
n-devices where the parity is spread such that you can have 1, 2, and up
to 6 redundant devices. Well beyond just raid5 and raid6:

http://lwn.net/Articles/579034/


 Also (second question): What happened to the raid naming scheme that
 better described the btrfs-style of raid by explicitly numbering the
 number of devices used for mirroring, striping, and error-correction?
 
 btrfs fi show kind of tells you that if you know how to read it (I didn't
 initially). What's missing for you?

btrfs raid1 at present is always just the two copies of data spread
across whatever number of disks you have. A more flexible arrangement is
to be able to set to have say 3 copies of data and use say 4 disks.
There's a new naming scheme proposed somewhere that enumerates all the
permutations possible for numbers of devices, copies and parity that
btrfs can support. For me, that is a 'killer' feature beyond what can be
done with md-raid for example.


Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can anyone boot a system using btrfs root with linux 3.14 or newer? - RESOLVED

2014-04-27 Thread Martin
On 27/04/14 13:00, Пламен Петров wrote:
 The problem reported in this thread has been RESOLVED.
 
 It's not BTRFS's fault.
 
 Debugging on my part led to the actual problem in do_mounts.c - some
 filesystems mount routines return error codes other than 0, EACCES
 and EINVAL and such return codes result in the kernel panicking
 without trying to mount root with all of the available filesystems.
 
 Patch is available as attachment to bug 74901 -
 https://bugzilla.kernel.org/show_bug.cgi?id=74901 . The bugentry
 documents how I managed to find the problem.

Well deduced and that looks to be a good natural clean fix.

My only question is: What was the original intent to deliberately fail
if something other than EACCES or EINVAL were reported?


 Also, the patch has been sent to the linux kernel mailing list - see
 http://news.gmane.org/find-root.php?group=gmane.linux.kernelarticle=1691881
 Hopefully, it will find its way into the kernel, and later on - in
 stable releases.

That all looks very good and very thorough.


 Thanks to you all! -- Plamen Petrov

Thanks to you for chasing it through!

AND for posting the Resolved to let everyone know. :-)


Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ditto blocks on ZFS

2014-05-17 Thread Martin
On 16/05/14 04:07, Russell Coker wrote:
 https://blogs.oracle.com/bill/entry/ditto_blocks_the_amazing_tape
 
 Probably most of you already know about this, but for those of you who 
 haven't 
 the above describes ZFS ditto blocks which is a good feature we need on 
 BTRFS.  The briefest summary is that on top of the RAID redundancy there...
[... are additional copies of metadata ...]


Is that idea not already implemented in effect in btrfs with the way
that the superblocks are replicated multiple times, ever more times, for
ever more huge storage devices?

The one exception is for SSDs whereby there is the excuse that you
cannot know whether your data is usefully replicated across different
erase blocks on a single device, and SSDs are not 'that big' anyhow.


So... Your idea of replicating metadata multiple times in proportion to
assumed 'importance' or 'extent of impact if lost' is an interesting
approach. However, is that appropriate and useful considering the real
world failure mechanisms that are to be guarded against?

Do you see or measure any real advantage?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ditto blocks on ZFS

2014-05-19 Thread Martin
On 18/05/14 17:09, Russell Coker wrote:
 On Sat, 17 May 2014 13:50:52 Martin wrote:
[...]
 Do you see or measure any real advantage?
 
 Imagine that you have a RAID-1 array where both disks get ~14,000 read 
 errors.  
 This could happen due to a design defect common to drives of a particular 
 model or some shared environmental problem.  Most errors would be corrected 
 by 
 RAID-1 but there would be a risk of some data being lost due to both copies 
 being corrupt.  Another possibility is that one disk could entirely die 
 (although total disk death seems rare nowadays) and the other could have 
 corruption.  If metadata was duplicated in addition to being on both disks 
 then the probability of data loss would be reduced.
 
 Another issue is the case where all drive slots are filled with active drives 
 (a very common configuration).  To replace a disk you have to physically 
 remove the old disk before adding the new one.  If the array is a RAID-1 or 
 RAID-5 then ANY error during reconstruction loses data.  Using dup for 
 metadata on top of the RAID protections (IE the ZFS ditto idea) means that 
 case doesn't lose you data.

Your example there is for the case where in effect there is no RAID. How
is that case any better than what is already done for btrfs duplicating
metadata?



So...


What real-world failure modes do the ditto blocks usefully protect against?

And how does that compare for failure rates and against what is already
done?


For example, we have RAID1 and RAID5 to protect against any one RAID
chunk being corrupted or for the total loss of any one device.

There is a second part to that in that another failure cannot be
tolerated until the RAID is remade.


Hence, we have RAID6 that protects against any two failures for a chunk
or device. Hence with just one failure, you can tolerate a second
failure whilst rebuilding the RAID.


And then we supposedly have safety-by-design where the filesystem itself
is using a journal and barriers/sync to ensure that the filesystem is
always kept in a consistent state, even after an interruption to any writes.


*What other failure modes* should we guard against?


There has been mention of fixing metadata keys from single bit flips...

Should hamming codes be used instead of a crc so that we can have
multiple bit error detect, single bit error correct functionality for
all data both in RAM and on disk for those systems that do not use ECC RAM?

Would that be useful?...


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ditto blocks on ZFS

2014-05-21 Thread Martin
Very good comment from Ashford.


Sorry, but I see no advantages from Russell's replies other than for a
feel-good factor or a dangerous false sense of security. At best,
there is a weak justification that for metadata, again going from 2% to
4% isn't going to be a great problem (storage is cheap and fast).

I thought an important idea behind btrfs was that we avoid by design in
the first place the very long and vulnerable RAID rebuild scenarios
suffered for block-level RAID...


On 21/05/14 03:51, Russell Coker wrote:
 Absolutely. Hopefully this discussion will inspire the developers to
 consider this an interesting technical challenge and a feature that
 is needed to beat ZFS.

Sorry, but I think that is completely the wrong reasoning. ...Unless
that is you are some proprietary sales droid hyping features and big
numbers! :-P


Personally I'm not convinced we gain anything beyond what btrfs will
eventually offer in any case for the n-way raid or the raid-n Cauchy stuff.

Also note that usually, data is wanted to be 100% reliable and
retrievable. Or if that fails, you go to your backups instead. Gambling
proportions and importance rather than *ensuring* fault/error
tolerance is a very human thing... ;-)


Sorry:

Interesting idea but not convinced there's any advantage for disk/SSD
storage.


Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs filesystem freezing during snapshots

2014-05-26 Thread Martin
On 26/05/14 13:28, David Bloquel wrote:
 Hi,
 
 I have a problem with my btrfs filesystem which is freezing when I am
 doing snapshots.
 
 I have a cron that is snapshoting around 70 sub volume every ten
 minutes. The sub volumes that btrfs is snapshoting are containers
 folders that are running through my virtual environment.
 Sub directories that btrfs is snapshoting are not that big (from 500MB
 to 10GB max and usually around 3GB) but there is a lot of IO on the
 filesystem because of the intensive use of the CTs and VMs.
 
 At some point the snapshot process becomes really slow, at first it
 snapshot around one folder per seconds but then after a while it can
 take 30seconds or even few minutes to snapshot one single sub volumes.
 Subvolumes are really similar to each other in size and number of
 files so there is no reason that it takes 1second for one sub volume
 and then 3minutes for another one.
 
 Moreover when my snapshot cron is running all my vms and containers
 are slowing down until the whole filesystem freezes which leads to
 frozen CT and VMs (which is a real problem for me).
 
 Moreover I can see that my CPU load is really high during the process.
 
 when I'm am looking to dmesg there is a lot of messages of this kind:
 
 [96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans
[...]

That looks to be running on top of drbd which will add a network write
overhead (unless you are dangerously running asynchronously!). Hence you
will see IO speed related limits a little sooner...

However, I will guess that your primary problem is likely due to
accumulating fragmentation due to adding ever more snapshots every 10
mins for the VMs/containers.


There are other people far more practised here than I, but some guesses
to try are:


Use nocow for the VM images (and container images);

Try using the btrfs auto defrag (beware your IO speed limit vs file size
to be defragged);

Avoid accumulating too many versions of any one snapshot.


Note also the experimental status for btrfs... I'm sure you will have
noticed the previous race problems for deleting snapshots.

Aside: I've held off from using kernel 3.12 and 3.13 due to curious
happenings on my test system. kernel 3.14.4 is behaving well so far.


Hope that gives a few clues.

Good luck,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about snapshot-aware defrag

2014-06-03 Thread Martin
On 02/06/14 14:22, Josef Bacik wrote:
 On 05/30/2014 06:00 PM, Martin wrote:
 OK... I'll jump in...

 On 30/05/14 21:43, Josef Bacik wrote:
 Hello,

 TL;DR: I want to only do snapshot-aware defrag on inodes in snapshots
 that haven't changed since the snapshot was taken.  Yay or nay (with a
 reason why for nay)

 [...]

 === Summary and what I need ===

 Option 1: Only relink inodes that haven't changed since the snapshot was
 taken.
[...]
 Obvious way to go for fast KISS.


 One question:

 Will option one mean that we always need to mount with noatime or
 read-only to allow snapshot defragging to do anything?

 
 Yeah atime would screw this up, I hadn't thought of that.  With that
 being the case I think the only option is to keep the old behavior, we
 don't want to screw up stuff like this just because users used a backup
 program on their snapshot and didn't use noatime.  Thanks,

Not so fast into non-KISS!


The *ONLY* application that I know of that uses atime is Mutt and then
*only* for mbox files!...

NOTHING else uses atime as far as I know.

We already have most distros enabling reltime by default as a
just-in-case...


Can we not have noatime as the default for btrfs? Also widely note that
default in the man page and wiki and with why?...

*And go KISS and move on faster* better?


Myself, I still use Mutt sometimes, but no mbox, and all my filesystems
have been noatime for many years now with good positive results. (Both
home and work servers.)

Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about snapshot-aware defrag

2014-06-04 Thread Martin
On 04/06/14 10:19, Erkki Seppala wrote:
 Martin m_bt...@ml1.co.uk writes:
 
 The *ONLY* application that I know of that uses atime is Mutt and then
 *only* for mbox files!...
 
 However, users, such as myself :), can be interested in when a certain
 file has been last accessed. With snapshots I can even get an idea of
 all the times the file has been accessed.
 
 *And go KISS and move on faster* better?
 
 Well, it in uncertain to me if it truly is better that btrfs would after
 that point no longer truly even support atime, if using it results in
 blowing up snapshot sizes. They might at that point even consider just
 using LVM2 snapshots (shudder) ;).

Not quite... My emphasis is:


1:

Go KISS for the defrag and accept that any atime use will render the
defrag ineffective. Give a note that the noatime mount option should be
used.


2:

Consider using noatime as a /default/ being as there are no known
'must-use' use cases. Those users still wanting atime can add that as a
mount option with the note that atime use reduces the snapshot defrag
effectiveness.


(The for/against atime is a good subject for another thread!)


Go fast KISS!

Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-16 Thread Martin
On 16/06/14 17:05, Josef Bacik wrote:
 
 On 06/16/2014 03:14 AM, Lennart Poettering wrote:
 On Mon, 16.06.14 10:17, Russell Coker (russ...@coker.com.au) wrote:

 I am not really following though why this trips up btrfs though. I am
 not sure I understand why this breaks btrfs COW behaviour. I mean,

 I don't believe that fallocate() makes any difference to
 fragmentation on
 BTRFS.  Blocks will be allocated when writes occur so regardless of an
 fallocate() call the usage pattern in systemd-journald will cause
 fragmentation.

 journald's write pattern looks something like this: append something to
 the end, make sure it is written, then update a few offsets stored at
 the beginning of the file to point to the newly appended data. This is
 of course not easy to handle for COW file systems. But then again, it's
 probably not too different from access patterns of other database or
 database-like engines...

Even though this appears to be a problem case for btrfs/COW, is there a
more favourable write/access sequence possible that is easily
implemented that is favourable for both ext4-like fs /and/ COW fs?

Database-like writing is known 'difficult' for filesystems: Can a data
log can be a simpler case?


 Was waiting for you to show up before I said anything since most systemd
 related emails always devolve into how evil you are rather than what is
 actually happening.

Ouch! Hope you two know each other!! :-P :-)


[...]
 since we shouldn't be fragmenting this badly.
 
 Like I said what you guys are doing is fine, if btrfs falls on it's face
 then its not your fault.  I'd just like an exact idea of when you guys
 are fsync'ing so I can replicate in a smaller way.  Thanks,

Good if COW can be so resilient. I have about 2GBytes of data logging
files and I must defrag those as part of my backups to stop the system
fragmenting to a stop (I use cp -a to defrag the files to a new area
and restart the data software logger on that).


Random thoughts:

Would using a second small file just for the mmap-ed pointers help avoid
repeated rewriting of random offsets in the log file causing excessive
fragmentation?

Align the data writes to 16kByte or 64kByte boundaries/chunks?

Are mmap-ed files a similar problem to using a swap file and so should
the same btrfs file swap code be used for both?


Not looked over the code so all random guesses...

Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs support for efficient SSD operation (data blocks alignment)

2012-02-08 Thread Martin
My understanding is that for x86 architecture systems, btrfs only allows
a sector size of 4kB for a HDD/SSD. That is fine for the present HDDs
assuming the partitions are aligned to a 4kB boundary for that device.

However for SSDs...

I'm using for example a 60GByte SSD that has:

8kB page size;
16kB logical to physical mapping chunk size;
2MB erase block size;
64MB cache.

And the sector size reported to Linux 3.0 is the default 512 bytes!


My first thought is to try formatting with a sector size of 16kB to
align with the SSD logical mapping chunk size. This is to avoid SSD
write amplification. Also, the data transfer performance for that device
is near maximum for writes with a blocksize of 16kB and above. Yet,
btrfs supports a 4kByte page/sector size only at present...


Is there any control possible over the btrfs filesystem structure to map
metadata and data structures to the underlying device boundaries?

For example to maximise performance, can the data chunks and the data
chunk size be aligned to be sympathetic to the SSD logical mapping chunk
size and the erase block size?

What features other than the trim function does btrfs employ to optimise
for SSD operation?


Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for efficient SSD operation (data blocks alignment)

2012-02-09 Thread Martin
On 09/02/12 01:42, Liu Bo wrote:
 On 02/09/2012 03:24 AM, Martin wrote:

[ No problem for 4kByte sector HDDs. However, for SSDs... ]

 However for SSDs...

 I'm using for example a 60GByte SSD that has:

 8kB page size;
 16kB logical to physical mapping chunk size;
 2MB erase block size;
 64MB cache.

 And the sector size reported to Linux 3.0 is the default 512 bytes!
[...]
 Is there any control possible over the btrfs filesystem structure to map
 metadata and data structures to the underlying device boundaries?

 For example to maximise performance, can the data chunks and the data
 chunk size be aligned to be sympathetic to the SSD logical mapping chunk
 size and the erase block size?

 
 The metadata buffer size will support size larger than 4K at least, it is on 
 development.

And also for the data? Also pack smaller data chunks in with the
metadata as is done already but with all the present parameters
proportioned according to the sector size?

(For my example, the filesystem may as well use 16kByte sectors because
the SSD firmware will do a read-modify-write for anything smaller.)


 What features other than the trim function does btrfs employ to optimise
 for SSD operation?

 
 e.g COW(avoid writing to one place multi-times),
 delayed allocation(intend to reduce the write frequency)

I'm using ext4 on a SSD web server and have formatted with (for ext4):

mke2fs -v -T ext4 -L fs_label_name -b 4096 -E
stride=4,stripe-width=4,lazy_itable_init=0 -O
none,dir_index,extent,filetype,flex_bg,has_journal,sparse_super,uninit_bg 
/dev/sdX

and mounted with the mount options:
journal_checksum,barrier,stripe=4,delalloc,commit=300,max_batch_time=15000,min_batch_time=200,discard,noatime,nouser_xattr,noacl,errors=remount-ro

The main bits for the SSD are the:
stripe=4,delalloc,commit=300,max_batch_time=15000,min_batch_time=200,discard,noatime

The -b 4096 is the maximum value allowed. The stride and stripe-width
then take that up to 16kBytes (hopefully...).

(Make sure you're on a good UPS with a reliable shutdown mechanism for
power fail!)


A further thought is:

For my one SSD example, the erase state appears to be all 0xFF... Can
the fs easily check the erase state value and leave any blank space
unchanged to minimise the bit flipping?

Reasonable to be included?


All unnecessary for HDDs but possibly of use for maintaining the
lifespan of SSDs...

Hope of interest,

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for efficient SSD operation (data blocks alignment)

2012-05-01 Thread Martin
Looking at this again from some time ago...

Brief summary:

There is a LOT of nefarious cleverness being attempted by SSD
manufacturers to accommodate a 4kByte block size. Get that wrong, or
just be unsympathetic to that 'cleverness', and you suffer performance
degradation and/or premature device wear.

Is that significant? Very likely it will be for the new three-bit FLASH
devices that have a PE (program-erase) lifespan of only 1000 or so
cycles per cell.

A better question is whether the filesystem can be easily made to be
more sympathetic to all SSDs?


From my investigating, there appears to be a sweet spot for performance
for writing (aligned) 16kByte blocks.

TRIM and keeping the device non-full also helps greatly.

I suspect that consecutive writes, as is the case for HDDs, also helps
performance to a lesser degree.


The erased state for SSDs appears to be either all 0xFF or all 0x00
(I've got examples of both). Can that be automatically detected and used
by btrfs so as to minimise write cycling the bits for (unused) padded areas?

Are 16kByte blocks/sectors useful to btrfs?

Or rather, can btrfs usefully use 16kByte blocks?

Can that be supported?



Further detail...

Some good comments:

On 10/02/12 18:18, Martin Steigerwald wrote:
 Hi Martin,
 
 Am Mittwoch, 8. Februar 2012 schrieb Martin:
 My understanding is that for x86 architecture systems, btrfs only
 allows a sector size of 4kB for a HDD/SSD. That is fine for the
 present HDDs assuming the partitions are aligned to a 4kB boundary for
 that device.

 However for SSDs...

 I'm using for example a 60GByte SSD that has:

 8kB page size;
 16kB logical to physical mapping chunk size;
 2MB erase block size;
 64MB cache.

 And the sector size reported to Linux 3.0 is the default 512 bytes!


 My first thought is to try formatting with a sector size of 16kB to
 align with the SSD logical mapping chunk size. This is to avoid SSD
 write amplification. Also, the data transfer performance for that
 device is near maximum for writes with a blocksize of 16kB and above.
 Yet, btrfs supports a 4kByte page/sector size only at present...
 
 Thing is as far as I know the better SSDs and even the dumber ones have 
 quite some intelligence in the firmware. And at least for me its not clear 
 what the firmware of my Intel SSD 320 all does on its own and whether any 
 of my optimization attempts even matter.

[...]

 The article on write amplication on wikipedia gives me a glimpse of the 
 complexity involved¹. Yes, I set stripe-width as well on my Ext4 
 filesystem, but frankly said I am not even sure whether this has any 
 positive effect except of maybe sparing the SSD controller firmware some 
 reshuffling work.
 
 So from my current point of view most of what you wrote IMHO is more 
 important for really dumb flash. ...

[...]

 grade SSDs just provide a SATA interface and hide the internals. So an 
 optimization for one kind or one brand of SSDs may not be suitable for 
 another one.
 
 There are PCI express models but these probably aren´t dumb either. And 
 then there is the idea of auto commit memory (ACM) by Fusion-IO which just 
 makes a part of the virtual address space persistent.
 
 So its a question on where to put the intelligence. For current SSDs is 
 seems the intelligence is really near the storage medium and then IMHO it 
 makes sense to even reduce the intelligence on the Linux side.
 
 [1] http://en.wikipedia.org/wiki/Write_amplification


As an engineer, I have a deep mistrust of the phrase Trust me or of
Magic or Proprietary, secret or Proprietary, keep out!.

Anand at Anandtech has produced some good articles on some of what goes
on inside SSDs and some of the consequences. If you want a good long read:

The SSD Relapse: Understanding and Choosing the Best SSD
http://www.anandtech.com/print/2829

Covers block allocation and write amplification and the effect of free
space on the write amplification factor.


... The Fastest MLC SSD We've Ever Tested
http://www.anandtech.com/print/2899

Details the Sandforce controller at that time and its use of data
compression on the controller. The latest Sandforce controllers also
utilise data deduplication on the SSD!


OCZ Agility 3 (240GB) Review
http://www.anandtech.com/print/4346

Shows an example set of Performance vs Transfer Size graphs.


Flashy fists fly as OCZ and DDRdrive row over SSD performance
http://www.theregister.co.uk/2011/01/14/ocz_and_ddrdrive_performance_row/

Shows an old and unfair comparison highlighting SSD performance
degradation due to write amplification for 4kByte random writes on a
full device.



A bit of a Joker in the pack are the SSDs that implement their own
controller-level data compression and data deduplication (all
proprietary and secret...). Ofcourse, that is all useless for encrypted
filesystems... Also, what does the controller based data compression do
for aligning to the underlying device blocks?


What is apparent from all that lot

btrfs across a mix of SSDs HDDs

2012-05-01 Thread Martin
How well does btrfs perform across a mix of:

1 SSD and 1 HDD for 'raid' 1 mirror for both data and metadata?

Similarly so across 2 SSDs and 2 HDDs (4 devices)?

Can multiple (small) SSDs be 'clustered' as one device and then mirrored
with one large HDD with btrfs directly? (Other than using lvm...)


The idea is to gain the random access speed of the SSDs but have the
HDDs as backup in case the SSDs fail due to wear...

The usage is to support a few hundred Maildirs + imap for users that
often have many thousands of emails in the one folder for their inbox...


(And no, the users cannot be trained to clean out their inboxes or to be
more hierarchically tidy... :-( )

Or is btrfs yet too premature to suffer such use?


Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on low end and high end FLASH

2012-05-01 Thread Martin
On 02/05/12 00:18, Martin wrote:
 How well suited is btrfs to low-end and high-end FLASH devices?
 
 
 Paraphrasing from a thread elsewhere:
 
 FLASH can be categorised into two classes, which have extremely
 different characteristics:
 
 (a) the low-end (USB, SDHC, CF, cheap ATA SSD);

A good FYI detailing low-end FLASH devices is given on:

Flash memory card design
https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey

For those examples, it looks like write chunks of 32kBytes or more may
well be a good idea...


 and (b) the high-end (SAS, PCIe, NAS, expensive ATA SSD).
 
 
 My own experience is that the low end (a) can have erase blocks as large
 as 4MBytes or more and they are easily worn out to failure. I've no idea
 what their page sizes might be nor what boundaries their wear levelling
 (if any) operate on.
 
 Their normal mode of operation is to use a FAT32 filesystem and to be
 filled up linearly with large files. I guess the more scattered layout
 of extN is non-too sympathetic to their normal operation.
 
 
 The high-end (b) may well have 4kByte pages or smaller but they will
 typically operate with multiple page chunks that are much larger, where
 16kBytes appear to be the optimum performance size for the devices I've
 seen so far.
 
 
 How well does btrfs fit in with the features for those two categories?

Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs across a mix of SSDs HDDs

2012-05-02 Thread Martin
Thanks for good comments.


 Is the OP using Oracle Linux?
 
 He didn't say. But he didn't say he WON'T be using oracle linux (or
 other distro which supports btrfs) either. Plus the kernel can be
 installed on top of RHEL/Centos 5 and 6, so he can easily choose
 either the supported version, or the mainline version, each with its
 own consequences.

For further info:

Nope, not using Oracle Linux. Then again, I'm reasonably distro
agnostic. I'm also happy to compile my own kernels.

And the system in question uses a HDD RAID and looks to be more IOPS
bound rather than suffering actual IO data rate bound. The large
directories certainly don't help! It's running postfix + courier-imap at
the moment and I'm looking to revamp it for the gradually ever
increasing workload. CPU and RAM usage is low on average. It serves 2x
Gbit networks + internet users (3 NIC ports).

Hence I'm considering the best way for an revamp/upgrade. SSDs would
certainly help with the IOPS but I'm cautious about SSD wear-out for a
system that constantly thrashes through a lot of data. I could just
throw more disks at it to divide up the IO load.

Multiple pairs of HDD paired with SSD on md RAID 1 mirror is a thought
with ext4...

bcache looks ideal to help but also looks too 'experimental'.

And I was hoping that btrfs would help with handling the large
directories and multi-user parallel accesses, especially so for being
'mirrored' by btrfs itself (at the filesystem level) across 4 disks for
example.

Thoughts welcomed.


Is btrfs development at the 'optimising' stage now, or is it all still
very much a 'work in progress'?

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and 1 billion small files

2012-05-08 Thread Martin
On 07/05/12 12:05, viv...@gmail.com wrote:
 Il 07/05/2012 11:28, Alessio Focardi ha scritto:
 Hi,

 I need some help in designing a storage structure for 1 billion of
 small files (512 Bytes), and I was wondering how btrfs will fit in
 this scenario. Keep in mind that I never worked with btrfs - I just
 read some documentation and browsed this mailing list - so forgive me
 if my questions are silly! :X

 Are you *really* sure a database is *not* what are you looking for?

My thought also.


Or:

1 billion 512 byte files... Is that not a 512GByte HDD?

With that, use a database to index your data by sector number and
read/write your data direct to the disk?

For that example, your database just holds filename, size, and sector.


If your 512 byte files are written and accessed sequentially, then just
use a HDD and address them by sector number from a database index. That
then becomes your 'filesystem'.

If you need fast random access, then use SSDs.


Plausible?

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and 1 billion small files

2012-05-08 Thread Martin
On 08/05/12 13:31, Chris Mason wrote:

[...]
 A few people have already mentioned how btrfs will pack these small
 files into metadata blocks.  If you're running btrfs on a single disk,

[...]
 But the cost is increased CPU usage.  Btrfs hits memmove and memcpy
 pretty hard when you're using larger blocks.
 
 I suggest using a 16K or 32K block size.  You can go up to 64K, it may
 work well if you have beefy CPUs.  Example for 16K:
 
 mkfs.btrfs -l 16K -n 16K /dev/xxx

Is that still with -s 4K ?


Might that help SSDs that work in 16kByte chunks?

And why are memmove and memcpy more heavily used?

Does that suggest better optimisation of the (meta)data, or just a
greater housekeeping overhead to shuffle data to new offsets?


Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSD format/mount parameters questions

2012-05-17 Thread Martin
For using SSDs:

Are there any format/mount parameters that should be set for using btrfs
on SSDs (other than the ssd mount option)?


General questions:

How long is the 'delay' for the delayed alloc?

Are file allocations aligned to 4kiB boundaries, or larger?

What byte value is used to pad unused space?

(Aside: For some, the erased state reads all 0x00, and for others the
erased state reads all 0xff.)


Background:

I've got a mix of various 120/128GB SSDs to newly set up. I will be
using ext4 on the critical ones, but also wish to compare with btrfs...

The mix includes some SSDs with the Sandforce controller that implements
its own data compression and data deduplication. How well does btrfs fit
with those compared to other non-data-compression controllers?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD format/mount parameters questions

2012-05-22 Thread Martin
On 19/05/12 18:36, Martin Steigerwald wrote:
 Am Freitag, 18. Mai 2012 schrieb Sander:
 Martin wrote (ao):
 Are there any format/mount parameters that should be set for using
 btrfs on SSDs (other than the ssd mount option)?

 If possible, format the whole device, do not partition the ssd. This
 will guarantee proper allignment.
 
 Current partitioning tools align at 1 MiB unless otherwise specified.
 
 And then thats only the alignment of the start of the filesystem.
 
 Not the granularity that the filesystem itself uses to align its writes.
 
 And then its not clear to me what effect proper alignment will actually 
 have given the intelligent nature of SSD firmwares.

That's what I'm trying to untangle rather than just trusting to magic.
I'm also not so convinced about the SSD firmwares being quite so
intelligent...


So far, the only clear indications are that a number of SSDs have a
performance 'sweet spot' when you use 16kByte blocks for data transfer.

Practicalities for the SSD internal structure strongly suggest that they
work in chunks of data greater than 4kBytes.

4kByte operation is a strong driver for SSD manufacturers, but what
compromises do they make to accommodate that?


And for btrfs:

Extents are aligned to sector size boundaries (4kBytes default).

And there is a comment that setting larger sector sizes increases the
CPU overhead in btrfs due to the larger memory moves needed for making
inserts into the trees.

If the SSD is going to do a read-modify-write on anything smaller than
16kBytes in any case, might btrfs just as well use that chunk size to
good advantage in the first place?

So, what is most significant?


Also:

btrfs has a big advantage of using checksumming and COW. However, ext4
is more mature, similarly uses extents, and also allows specifying a
large delayed allocation time to merge multiple writes if you're happy
your system is safely on a UPS...


I'm not too worried about this for MLC SSDs, but it is something that is
of concern for the yet shorter modify-erase count lifespan of TLC SSDs.


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSD erase state and reducing SSD wear

2012-05-22 Thread Martin
I've got two recent examples of SSDs. Their pristine state from the
manufacturer shows:


Device Model: OCZ-VERTEX3

# hexdump -C /dev/sdd
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
||
*
1bf2976000


Device Model: OCZ VERTEX PLUS
(OCZ VERTEX 2E)

# hexdump -C /dev/sdd
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
||
*
df99e6000



What's a good way to test what state they get erased to from a TRIM
operation?

Can btrfs detect the erase state and pad unused space in filesystem
writes with the same value so as to reduce SSD wear?

Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD erase state and reducing SSD wear

2012-05-23 Thread Martin
On 23/05/12 05:19, Calvin Walton wrote:
 On Tue, 2012-05-22 at 22:47 +0100, Martin wrote:
 I've got two recent examples of SSDs. Their pristine state from the
 manufacturer shows:
 
 Device Model: OCZ-VERTEX3
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 
 Device Model: OCZ VERTEX PLUS
  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 
 What's a good way to test what state they get erased to from a TRIM
 operation?
 
 This pristine state probably matches up with the result of a trim
 command on the drive. In particular, a freshly erased flash block is in
 a state where the bits are all 1, so the Vertex Plus drive is showing
 you the flash contents directly. The Vertex 3 has substantially more
 processing, and the 0s are effectively generated on the fly for unmapped
 flash blocks (similar to how the missing portions of a sparse file
 contains 0s).

So for that example of reading an 'empty' drive, the OCZ-VERTEX3 might
not even be reading the flash chips at all!...


 Can btrfs detect the erase state and pad unused space in filesystem
 writes with the same value so as to reduce SSD wear?
 
 On the Vertex 3, this wouldn't actually do what you'd hope. The firmware
 in that drive actually compresses, deduplicates, and encrypts all the
 data prior to writing it to flash - and as a result the data that hits
 the flash looks nothing like what the filesystem wrote.
 (For best performance, it might make sense to disable btrfs's built-in
 compression on the Vertex 3 drive to allow the drive's compression to
 kick in. Let us know if you benchmark it either way.)

Very good comment, thanks. That leaves a very good question of how the
Sandforce controller uses the flash. Does it implement its own 'virtual
block level' interface to then use the underlying flash using structures
that are not visible externally?

What does that do to concerns about alignment?...

And for what granularity of write chunks?


 The benefit to doing this on the Vertex Plus is probably fairly small,
 since to rewrite a block - even if the block is partially unwritten - is
 still likely to require a read-modify-write cycle with an erase step.
 The granularity of the erase blocks is just too big for the savings to
 be very meaningful.

My understanding is that the 'wear' mechanism in flash is a problem of
charge getting trapped in the insulation material itself that surrounds
the floating gate of a cell. The permanently trapped charge accumulates
further for each change of state until a high enough offset voltage has
accumulated to exceed what can be tolerated for correct operation of the
cell.

Hence, writing the *same value* as that for already stored for a cell
should not cause any wear being as you are not changing the state of a
cell. (No change in charge levels.)

For non-Sandforce controllers, that suggests doing a read-modify-write
to pad out whatever minimum sized write chunk. That would be rather poor
for performance, and the manufacturer's secrecy means we cannot be sure
of the underlying write block size for minimum sized alignment.


Alternatively, padding out writes with the erased state value means that
no further wear should be caused for when that block is eventually
TRIMed/erased for rewriting.

That should also be a 'soft' option for the Sandforce controllers in
that /hopefully/ their compression/deduplication will compress down the
padding so as not to be a problem.

(Damn the Manufacturer's secrecy!)


Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will big metadata blocks fix # of hardlinks?

2012-05-29 Thread Martin
Thanks for noting this one. That is one very surprising and unexpected
limit!... And a killer for some not completely rare applications...

On 26/05/12 19:22, Sami Liedes wrote:
 Hi!
 
 I see that Linux 3.4 supports bigger metadata blocks for btrfs.
 
 Will using them allow a bigger number of hardlinks on a single file
 (i.e. the bug that has bitten at least git users on Debian[1,2], and
 BackupPC[3])? As far as I understand correctly, the problem has been
 that the hard links are stored in the same metadata block with some
 other metadata, so the size of the block is an inherent limitation?
 
 If so, I think it would be worth for me to try Btrfs again :)
 
   Sami
 
 
 [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603
 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603
 [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762

One example fail case is just 13 hard links. Even x4 that (16k blocks)
only gives 52 links for that example fail case.


The brief summary for those are:

* It's a rare corner case that needs a format change to fix, so won't-fix;

* There are real world problem examples noted in those threads for such
as: BackupPC (backups); nnmaildir mail backend in Gnus (an Emacs package
for reading news and email); and a web archiver.

* Also, Bacula (backups) and Mutt (email client) are quoted as problem
examples in:

Btrfs File-System Plans For Ubuntu 12.10
http://www.phoronix.com/scan.php?page=news_itempx=MTEwMDE


For myself, I have a real world example for deduplication of identical
files from a proprietary data capture system where the filenames change
(timestamp and index data stored in the filename) yet there are periods
where the file contents change only occasionally... The 'natural' thing
to do is hardlink together all the identical files to then just have the
unique filenames... And you might have many files in a particular
directory...

Note that for long filenames (surprisingly commonly done!), one fail
case noted above is just 13 hard links.


Looks like I'm stuck on ext4 with an impoverished cp -l for a fast
'snapshot' for the time being still... (Or differently, LVM snapshot and
copy.)


For btrfs, rather than a break everything format change, can a neat
and robust 'workaround' be made so that the problem-case hardlinks to a
file within the same directory perhaps spawn their own transparent
subdirectory for the hard links?... Worse case then is that upon a
downgrade to an older kernel, the 'transparent' subdirectory of hard
links becomes visible as a distinct subdirectory? (That is a 'break' but
at least data isn't lost.)

Or am I chasing the wrong bits? ;-)


More seriously: The killer there for me is that running rsync or running
a deduplication script might hit too many hard links that were perfectly
fine when on ext4.

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-17 Thread Martin
On 17/06/14 02:13, cwillu wrote:
 It's not a mmap problem, it's a small writes with an msync or fsync
 after each one problem.

And for logging, that is exactly what is wanted to see why whatever
crashed...

Except...

Whilst logging, hold off on the msync/fsync unless the next log message
to be written is 'critical'?

With that, the mundane logging gets appended just as for any normal file
write. Only the more critical log messages suffer the extra overhead and
fragmentation of an immediate msync/fsync.


 For the case of sequential writes (via write or mmap), padding writes
 to page boundaries would help, if the wasted space isn't an issue.
 Another approach, again assuming all other writes are appends, would
 be to periodically (but frequently enough that the pages are still in
 cache) read a chunk of the file and write it back in-place, with or
 without an fsync. On the other hand, if you can afford to lose some
 logs on a crash, not fsyncing/msyncing after each write will also
 eliminate the fragmentation.
 
 (Worth pointing out that none of that is conjecture, I just spent 30
 minutes testing those cases while composing this ;p)
 
 Josef has mentioned in irc that a piece of Chris' raid5/6 work will
 also fix this when it lands.

Interesting...

The source problem is how the COW fragments under expected normal use...
Is all this unavoidable unless we rethink the semantics?


Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Putting very big and small files in one subvolume?

2014-08-18 Thread Martin
Good questions and already good comment given.


For another view...

On 17/08/14 13:31, Duncan wrote:
 Shriramana Sharma posted on Sun, 17 Aug 2014 14:26:06 +0530 as excerpted:
 
 Hello. One more Q re generic BTRFS behaviour.
 https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
 advertises BTRFS's Space-efficient packing of small files.

 So far (on ext3/4) I have been using two partitions for small/regular
 files (like my source code repos, home directory with its hidden config
 subdirectories etc) and big files (like downloaded Linux ISOs,
 VMs etc) under some sort of understanding that this will help curb
 fragmentation...

The cases of pathological fragmentation by btrfs (for 'database-style'
files and VM image files especially) have been mentioned, as have the
use of nocow and/or using separate subvolumes to reduce or slow down the
buildup of the fragmentation.

systemd logging even bulldozed blindly into that one spectacularly!...


There is now a defragment option. However, that does not scale well for
large or frequently rewritten files and you gamble how much IO bandwidth
you can afford to lose rewriting *entire* files.

The COW fragmentation problem is not going to go away. Also, there is
quite a high requirement for user awareness to specially mark
directories/files as nocow. And yet then, that still does not work well
if multiple snapshots are being taken...!


Could a better and more complete fix be to automatically defragment say
just x4 the size being written for a file segment?

Also, for the file segment being defragged, abandon any links to other
snapshots to in effect deliberately replicate the data where appropriate
so that data segment is fully defragged.



 In any case, since BTRFS effectively discourages usage of separate
 partitions to take advantage of subvolumes etc, and given the above
 claim to the FS automatically handling small files efficiently, I wonder
 if it makes sense any longer to create separate subvolumes for such
 big/small files as I describe in my use case?
 
 It's worth noting that btrfs subvolumes are a reasonably lightweight 
 construct, comparable enough to ordinary subdirectories that they're 
 presented that way when browsing a parent subvolume, and there was 
 actually discussion of making subvolumes and subdirs the exact same 
 thing, effectively turning all subdirs into subvolumes.
 
 As it turns out that wasn't feasible due not to btrfs limitations, but 
 (as I understand it) to assumptions about subdirectories vs. mountable 
 entities (subvolumes) built into the Linux POSIX and VFS levels...

Due to namespaces and inode number spaces?...



 OTOH, I tend to be rather more of an independent partition booster than 
 many.  The biggest reason for that is the too many eggs in one basket 
 problem.  Fully separate filesystems on separate partitions...

I do so similarly myself. A good scheme that I have found to work well
for my cases is to have separate partitions for:

/boot
/var
/var/log
/
/usr
/home
/mnt/data...

And all the better and easy to do using GPT partition tables.

The one aspect to all that is that you can protect your system becoming
jammed by suffering a full disk for whatever reason and all without
needing to resort to quotas. So for example, rogue logging can fill up
/var/log and you can still use the system and be able to easily tidy
things up.

However, that scheme does also require that you have a good idea of what
partition sizes you will need right from when first set up.

You can 'cheat' and gain flexibility at the expense of HDD head seek
time by cobbling together LVM volumes as and when needed to resize
whichever filesystem.


Which is where btrfs comes into play in that if you can trust to not
lose all your eggs to btrfs corruption, you can utilise your partition
scheme with subvolumes and quotas and allow the intelligence in btrfs to
make everything work well even if you change what size (quota) you want
for a subvolume. The ENTIRE disk (no partition table) is all btrfs.

Special NOTE: Myself, I consider btrfs *quotas* to be still very
experimental at the moment and not to be used with valued data!



Other big plusses for btrfs for me are the raid and snapshots.

The killer though is for how robust the filesystem is against corruption
and random data/hardware failure.

btrfsck?


Always keep multiple backups!

Regards,
Martin





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance Issues

2014-09-20 Thread Martin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/09/14 09:23, Marc Dietrich wrote:
 Am Freitag, 19. September 2014, 13:51:22 schrieb Holger
 Hoffstätte:
 
 On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:
 
 I have a particularly uncomplicated setup (a desktop PC with a
 hard disk) and I'm seeing particularly slow performance from
 btrfs.  A `git status` in the linux source tree takes about 46
 seconds after dropping caches, whereas on other machines using
 ext4 this takes about 13s.  My mail client (evolution) also
 seems to perform particularly poorly on this setup, and my
 hunch is that it's spending a lot of time waiting on the
 filesystem.
 
 This is - unfortunately - a particular btrfs
 oddity/characteristic/flaw, whatever you want to call it. git
 relies a lot on fast stat() calls, and those seem to be
 particularly slow with btrfs esp. on rotational media. I have the
 same problem with rsync on a freshly mounted volume; it gets fast
 (quite so!) after the first run.
 
 my favorite benchmark is ls -l /usr/bin:
 
 ext4: 0.934s btrfs:   21.814s


So... On my old low power slow Atom SSD ext4 system:

time ls -l /usr/bin

real0m0.369s

user0m0.048s
sys 0m0.128s

Repeated:

real0m0.107s

user0m0.040s
sys 0m0.044s

and that is for:

# ls -l /usr/bin | wc
   1384   13135   88972


On a comparatively super dual core Athlon64 SSD btrfs three disk btrfs
raid1 system:

real0m0.103s

user0m0.004s
sys 0m0.040s

Repeated:

real0m0.027s

user0m0.008s
sys 0m0.012s

For:

# ls -l /usr/bin | wc
   1449   13534   89024


And on an identical comparatively super dual core Athlon64 HDD
'spinning rust' btrfs two disk btrfs raid1 system:

real0m0.101s

user0m0.008s
sys 0m0.020s

Repeated:

real0m0.020s

user0m0.004s
sys 0m0.012s

For:

# ls -l /usr/bin | wc
   1161   10994   79350


So, no untoward concerns there.

Marc:

You on something really ancient and hopelessly fragmented into oblivion?



 also mounting large partitons (several 100Gs) takes lot of time on
 btrfs.

I've noticed that also for some 16TB btrfs raid1 mounts, btrfs is not
as fast as mounting ext4 but then again all very much faster than
mounting ext4 when a fsck count is tripped!...

So, nothing untoward there.


For my usage, controlling fragmentation and having some automatic
mechanism to deal with pathological fragmentation with such as sqlite
files are greater concerns!

(Yes, there is the manual fix of NOCOW... I also put such horrors into
tmpfs and snapshot that... All well and good but all unnecessary admin
tasks!)


Regards,
Martin


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlQdhBwACgkQ+sI3Ds7h07f/VwCgkHPjrIkBkWh5zrKwvN7fXalZ
LWcAoIbLFEoc7iTNLzgSChNvnYatIkuZ
=YlDI
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID device nomination (Feature request)

2013-04-18 Thread Martin
Dear Devs,

I have a number of esata disk packs holding 4 physical disks each where
I wish to use the disk packs aggregated for 16TB and up to 64TB backups...

Can btrfs...?

1:

Mirror data such that there is a copy of data on each *disk pack* ?

Note that esata shows just the disks as individual physical disks, 4 per
disk pack. Can physical disks be grouped together to force the RAID data
to be mirrored across all the nominated groups?


2:

Similarly for a mix of different storage technologies such as
manufacturer or type (SSD/HDD), can the disks be grouped to ensure a
copy of the data is replicated across all the groups?

For example, I deliberately buy HDDs from different
batches/manufacturers to try to avoid common mode or similarly timed
failures. Can btrfs be guided to safely spread the RAID data across the
*different* hardware types/batches?


3:

Also, for different speeds of disks, can btrfs tune itself to balance
the read/writes accordingly?


4:

Further thought: For SSDs, is the minimise heads movement 'staircase'
code bypassed so as to speed up allocation for the don't care
addressing (near zero seek time) of SSDs?



And then again: Is 64TBytes of btrfs a good idea in the first place?!

(There's more than one physical set of backups but I'd rather not suffer
weeks to recover from one hiccup in the filesystem... Should I partition
btrfs down to smaller gulps, or does the structure of btrfs in effect
already do that?)

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Online dedup for Btrfs

2013-04-18 Thread Martin
Apart from the dates, this sounds highly plausible :-)

If the hashing is done before the compression and the compression is
done for isolated blocks, then this could even work!

Any takers? ;-)


For a performance enhancement, keep a hash tree in memory for the n
most recently used/seen blocks?...


A good writeup! Thanks for a good giggle. :-)

Regards,
Martin



On 01/04/13 15:44, Harald Glatt wrote:
 On Mon, Apr 1, 2013 at 2:50 PM, Josef Bacik jba...@fusionio.com wrote:
 Hello,

 I was bored this weekend so I hacked up online dedup for Btrfs.  It's working
 quite well so I think it can be more widely tested.  There are two ways to 
 use
 it

 1) Compatible mode - this is a bit slower but will handle being used by older
 kernels.  We use the csum tree to find duplicate blocks.  Since it is 
 relatively
 easy to have crc32c collisions this also involves reading the block from disk
 and doing a memcmp with the block we want to write to verify it has the same
 data.  This is way slow but hey, no incompat flag!

 2) Incompatible mode - so this is the way you probably want to use it if you
 don't care about being able to go back to older kernels.  You select your
 hashing function (at the momement I only support sha1 but there is room in 
 the
 format to have different functions).  This creates a btree indexed by the 
 hash
 and the bytenr.  Then we lookup the hash and just link the extent in if it
 matches the hash.  You can use -o paranoid-dedup if you are paranoid about 
 hash
 collisions and this will force it to do the memcmp() dance to make sure that 
 the
 extent we are deduping really matches the extent.

 So performance wise obviously the compat mode sucks.  It's about 50% slower 
 on
 disk and about 20% slower on my Fusion card.  We get pretty good space 
 savings,
 about 10% in my horrible test (just copy a git tree onto the fs), but IMHO 
 not
 worth the performance hit.

 The incompat mode is a bit better, only 15% drop on disk and about 10% on my
 fusion card.  Closer to the crc numbers if we have -o paranoid-dedup.  The 
 space
 savings is better since it uses the original extent sizes, we get about 15%
 space savings.  Please feel free to pull and try it, you can get it here

 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup

 Thanks!

 Josef
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 Hey Josef,
 
 that's really cool! Can this be used together with lzo compression for
 example? How high (roughly) is the impact of something like
 force-compress=lzo compared to the 15% hit from this dedup?
 
 Thanks!
 Harald




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID device nomination (Feature request)

2013-04-18 Thread Martin
On 18/04/13 15:06, Hugo Mills wrote:
 On Thu, Apr 18, 2013 at 02:45:24PM +0100, Martin wrote:
 Dear Devs,
 
 I have a number of esata disk packs holding 4 physical disks each
 where I wish to use the disk packs aggregated for 16TB and up to
 64TB backups...
 
 Can btrfs...?
 
 1:
 
 Mirror data such that there is a copy of data on each *disk pack*
 ?
 
 Note that esata shows just the disks as individual physical
 disks, 4 per disk pack. Can physical disks be grouped together to
 force the RAID data to be mirrored across all the nominated
 groups?
 
 Interesting you should ask this: I realised quite recently that 
 this could probably be done fairly easily with a modification to
 the chunk allocator.

Hey, that sounds good. And easy? ;-)

Possible?...


 2:
 
 Similarly for a mix of different storage technologies such as 
 manufacturer or type (SSD/HDD), can the disks be grouped to
 ensure a copy of the data is replicated across all the groups?
 
 For example, I deliberately buy HDDs from different 
 batches/manufacturers to try to avoid common mode or similarly
 timed failures. Can btrfs be guided to safely spread the RAID
 data across the *different* hardware types/batches?
 
 From the kernel point of view, this is the same question as the 
 previous one.

Indeed so.

The question is how the groups of disks are determined:

Manually by the user for mkfs.btrfs and/or specified when disks are
added/replaced;

Or somehow automatically detected (but with a user override).


Have a disk group UUID for a group of disks similar to that done for
md-raid?



 3:
 
 Also, for different speeds of disks, can btrfs tune itself to
 balance the read/writes accordingly?
 
 Not that I'm aware of.

A 'nice to have' would be some sort of read-access load balancing with
options to balance latency or queue depth... Could btrfs do that
independently (complimentary with) of the block layer schedulers?


 4:
 
 Further thought: For SSDs, is the minimise heads movement
 'staircase' code bypassed so as to speed up allocation for the
 don't care addressing (near zero seek time) of SSDs?
 
 I think this is more to do with the behaviour of the block layer 
 than the FS. There are alternative elevators that can be used, but
 I don't know how to configure them (or whether they need
 configuring at all).

Regardless of the block level io schedulers, does not btrfs determine
the LBA allocation?...

For example, if for an SSD, the next free space allocation for
whatever is to be newly written could become more like a log based
round-robin allocation across the entire SSD (NILFS-like?) rather than
trying to localise data to minimise the physical head movement as for
a HDD.

Or is there no useful gain with that over simply using the same one
lump of allocator code as for HDDs?


 You have backups, which is good. Keep up with the latest kernels 
 from kernel.org. The odds of you hitting something major are
 small, but non-zero. One thing that's probably fairly likely with
 your setup

Healthy paranoia is good ;-)


[...]
 So with light home use on a largeish array, I've had a number of 
 cockups recently that were recoverable, albeit with some swearing.

Thanks for the notes.


 On the other hand, it's entirely possible that something else will 
 go wrong and things will blow up. My guess is that unless you have
[...]

My worry for moving up to spreading a filesystem across multiple disk
packs is for when the disk pack hardware itself fails taking out all
four disks...

(And there's always the worry of the esata lead getting yanked to take
out all four disks...)


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID device nomination (Feature request)

2013-04-18 Thread Martin
On 18/04/13 20:44, Hugo Mills wrote:
 On Thu, Apr 18, 2013 at 05:29:10PM +0100, Martin wrote:
 On 18/04/13 15:06, Hugo Mills wrote:
 On Thu, Apr 18, 2013 at 02:45:24PM +0100, Martin wrote:
 Dear Devs,
 
 I have a number of esata disk packs holding 4 physical disks
 each where I wish to use the disk packs aggregated for 16TB
 and up to 64TB backups...
 
 Can btrfs...?
 
 1:
 
 Mirror data such that there is a copy of data on each *disk
 pack* ?
 
 Note that esata shows just the disks as individual physical 
 disks, 4 per disk pack. Can physical disks be grouped
 together to force the RAID data to be mirrored across all the
 nominated groups?
 
 Interesting you should ask this: I realised quite recently that
  this could probably be done fairly easily with a modification
 to the chunk allocator.
 
 Hey, that sounds good. And easy? ;-)
 
 Possible?...
 
 We'll see... I'm a bit busy for the next week or so, but I'll see 
 what I can do.

Thanks greatly. That should nicely let me stay with my plan A and
just let btrfs conveniently expand over multiple disk packs :-)

(I'm playing 'safe' for the moment while I can by putting in bigger
disks into new packs as needed. I've some packs with smaller disks
that are nearly full that I want to continue to use so I'm agonising
over whether to replace all the disks and rewrite all the data or use
multiple disk packs as one. Plan A is good for keeping the existing
disks :-) )


[...]
 The question is how the groups of disks are determined:
 
 Manually by the user for mkfs.btrfs and/or specified when disks
 are added/replaced;
 
 Or somehow automatically detected (but with a user override).
 
 
 Have a disk group UUID for a group of disks similar to that
 done for md-raid?
 
 I was planning on simply having userspace assign a (small) integer 
 to each device. Devices with the same integer are in the same
 group, and won't have more than one copy of any given piece of data
 assigned to them. Note that there's already an unused disk group
 item which is a 32-bit integer in the device structure, which looks
 like it can be repurposed for this; there's no spare space in the
 device structure, so anything more than that will involve some kind
 of disk format change.

The repurpose for no format change sounds very good and 32-bits
should be enough for anyone. (Notwithstanding the inevitable 640k
comments!)

A 32-bit unsigned-int number that the user specifies? Or include a
semi-random automatic numbering to a group of devices listed by the
user?...

Then again, I can't imagine anyone wanting to go beyond 8-bits...
Hence a 16-bit unsigned int is still suitably overkill. That then
offers the other 16-bits for some other repurpose ;-)


For myself, it would be nice to be able to specify a number that is
the same unique number that's stamped on the disk packs so that I can
be sure what has been plugged in! (Assuming there's some option to
list what's been plugged in.)


 3:
 
 Also, for different speeds of disks, can btrfs tune itself
 to balance the read/writes accordingly?
 
 Not that I'm aware of.
 
 A 'nice to have' would be some sort of read-access load balancing
 with options to balance latency or queue depth... Could btrfs do
 that independently (complimentary with) of the block layer
 schedulers?
 
 All things are possible... :) Whether it's something that someone 
 will actually do or not, I don't know. There's an argument for
 getting some policy into that allocation decision for other
 purposes (e.g. trying to ensure that if a disk dies from a
 filesystem with single allocation, you lose the fewest number of
 files).
 
 On the other hand, this is probably going to be one of those
 things that could have really nasty performance effects. It's also
 somewhat beyond my knowledge right now, so someone else will have
 to look at it. :)

Sounds ideal for some university research ;-)


[...]
 For example, if for an SSD, the next free space allocation for 
 whatever is to be newly written could become more like a log
 based round-robin allocation across the entire SSD (NILFS-like?)
 rather than trying to localise data to minimise the physical head
 movement as for a HDD.
 
 Or is there no useful gain with that over simply using the same
 one lump of allocator code as for HDDs?
 
 No idea. It's going to need someone to write the code and
 benchmark the options, I suspect.

A second university project? ;-)


[...]
 (And there's always the worry of the esata lead getting yanked to
 take out all four disks...)
 
 As I said, I've done the latter myself. The array *should* go into

Looks like I'll likely get to find out for myself sometime or other...



Thanks for your help and keep me posted please.

I'll be experimenting with the groupings as soon as they come along.
Also for the dedup work that is being done.

Regards,
Martin






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info

Re: RAID device nomination (Feature request)

2013-04-18 Thread Martin
On 18/04/13 20:48, Alex Elsayed wrote:
 Hugo Mills wrote:
 
 On Thu, Apr 18, 2013 at 02:45:24PM +0100, Martin wrote:
 Dear Devs,
 snip
 Note that esata shows just the disks as individual physical disks, 4 per
 disk pack. Can physical disks be grouped together to force the RAID data
 to be mirrored across all the nominated groups?

Interesting you should ask this: I realised quite recently that
 this could probably be done fairly easily with a modification to the
 chunk allocator.
 snip
 
 One thing that might be an interesting approach:
 
 Ceph is already in mainline, and uses CRUSH in a similar way to what's 
 described (topology-aware placement+replication). Ceph does it by OSD nodes 
 rather than disk, and the units are objects rather than chunks, but it could 
 potentially be a rather good fit.
 
 CRUSH does it by describing a topology hierarchy, and allocating the OSD ids 
 to that hierarchy. It then uses that to map from a key to one-or-more 
 locations. If we use chunk ID as the key, and use UUID_SUB in place of the 
 OSD id, it could do the job.

OK... That was a bit of a crash course (ok, sorry for the pun on crush :-) )

http://www.anchor.com.au/blog/2012/09/a-crash-course-in-ceph/


Interesting that the CRUSH map is written by hand, then compiled and
passed to the cluster.

Hence, looks like simply have the sysadmin specify what gets grouped
into what group. (I certainly know what disk is where and where I want
the data mirrored!)


For my example, the disk packs are plugged into two servers (up to four
at a time at present) so that we have some fail-over if one server dies.
Ceph looks to be a little overkill for just two big storage users.

Or perhaps include the same Ceph code routines into btrfs?...


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


grub/grub2 boot into btrfs raid root and with no initrd

2013-05-03 Thread Martin
I've made a few attempts to boot into a root filesystem created using:

mkfs.btrfs -d raid1 -m raid1 -L btrfs_root_3 /dev/sda3 /dev/sdb3

Both grub and grub2 pick up a kernel image fine from an ext4 /boot on
/dev/sda1 for exaample, but then fail to find or assemble the btrfs root.

Setting up an initrd and grub operates fine for the btrfs raid.


What is the special magic to do this without the need for an initrd?

Is the comment/patch below from last year languishing unknown? Or is
there some problem with that kernel approach?


Thanks,
Martin


See:

http://forums.gentoo.org/viewtopic-t-923554-start-0.html


Below is my patch, which is working fine for me with 3.8.2.
Code:

$ cat /etc/portage/patches/sys-kernel/gentoo-sources/earlydevtmpfs.patch
--- init/do_mounts.c.orig   2013-03-24 20:49:53.446971127 +0100
+++ init/do_mounts.c   2013-03-24 20:51:46.408237541 +0100
@@ -529,6 +529,7 @@
create_dev(/dev/root, ROOT_DEV);
if (saved_root_name[0]) {
   create_dev(saved_root_name, ROOT_DEV);
+  devtmpfs_mount(dev);
   mount_block_root(saved_root_name, root_mountflags);
} else {
   create_dev(/dev/root, ROOT_DEV);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Will btrfs scrub clear corrupt filesystem trees?

2013-10-10 Thread Martin
I have 1.5TB of data on a single disk formatted with defaults. There
appears to be only two directory trees of a few MBytes that have
suffered corruption (due to in the past too high a sata speed causing
corruption).

The filesystem mounts fine. But how to clear out the corrupt trees?

At the moment, I have running:

btrfsck --repair --init-extent-tree /dev/sdc

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure

... And it is still running after over two days now. Looped?


Would a:

btrfs scrub start

clear out the corrupt trees?

Must I wait for the btrfsck to complete if it is recreating an extents
tree?...


Suggestions welcomed...

Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


No apparent effect for btrfs device delete missing

2013-10-14 Thread Martin
Trying:

btrfs device delete missing /

appears not to do anything for a / mount for where I have swapped out
a HDD:


# btrfs filesystem show
Label: 'test_btrfs_misc_5'  uuid: 7d29d4e6-efdc-41dc-9aa8-e74dfbe13cc9
Total devices 2 FS bytes used 28.00KB
devid1 size 59.74GB used 2.03GB path /dev/sdd5
*** Some devices missing

Label: 'test_btrfs_root_4'  uuid: 269e142c-e561-4227-b2b0-fe2f9fb99391
Total devices 3 FS bytes used 10.55GB
devid4 size 56.00GB used 12.03GB path /dev/sde4
devid1 size 56.00GB used 12.05GB path /dev/sdd4
*** Some devices missing

Btrfs v0.20-rc1-358-g194aa4a
# btrfs device delete missing /
# btrfs filesystem show
Label: 'test_btrfs_misc_5'  uuid: 7d29d4e6-efdc-41dc-9aa8-e74dfbe13cc9
Total devices 2 FS bytes used 28.00KB
devid1 size 59.74GB used 2.03GB path /dev/sdd5
*** Some devices missing

Label: 'test_btrfs_root_4'  uuid: 269e142c-e561-4227-b2b0-fe2f9fb99391
Total devices 3 FS bytes used 10.55GB
devid4 size 56.00GB used 12.03GB path /dev/sde4
devid1 size 56.00GB used 12.05GB path /dev/sdd4
*** Some devices missing

Btrfs v0.20-rc1-358-g194aa4a


All on the latest Linux 3.11.5-gentoo.

# df -h | egrep '/$'
rootfs  112G   22G   89G  20% /
/dev/sdd4   112G   22G   89G  20% /



Aside: Adding the /dev/sde4 device caused no balance action until I
deleted a device to reduce the raid1 mirror (data and metadata) down to
the two devices.

The missing device was an old HDD that had physically failed. No data
was lost for that example failure.


Hope of interest,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


8 days looped? (btrfsck --repair --init-extent-tree)

2013-10-22 Thread Martin
Dear list,

I've been trying to recover a 2TB single disk btrfs from a good few days
ago as already commented on the list. btrfsck complained of an error in
the extents and so I tried:

btrfsck --repair --init-extent-tree /dev/sdX


That was 8 days ago.

The btrfs process is still running at 100% cpu but with no disk activity
and no visible change in memory usage.

Looped?

Is there any way to check whether it is usefully doing anything or
whether this is a lost cause?


The only output it has given, within a few seconds of starting, is:


parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure


Any comment/interest before abandoning?

This all started from trying to delete/repair a directory tree of a few
MBytes of files...


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 8 days looped? (btrfsck --repair --init-extent-tree)

2013-10-23 Thread Martin
On 22/10/13 19:17, Josef Bacik wrote:
 On Tue, Oct 22, 2013 at 06:58:48PM +0100, Martin wrote:
 Dear list,

 I've been trying to recover a 2TB single disk btrfs from a good few days
 ago as already commented on the list. btrfsck complained of an error in
 the extents and so I tried:

 btrfsck --repair --init-extent-tree /dev/sdX


 That was 8 days ago.

 The btrfs process is still running at 100% cpu but with no disk activity
 and no visible change in memory usage.

 Looped?

 Is there any way to check whether it is usefully doing anything or
 whether this is a lost cause?


 The only output it has given, within a few seconds of starting, is:


 parent transid verify failed on 911904604160 wanted 17448 found 17449
 parent transid verify failed on 911904604160 wanted 17448 found 17449
 parent transid verify failed on 911904604160 wanted 17448 found 17449
 parent transid verify failed on 911904604160 wanted 17448 found 17449
 Ignoring transid failure


 Any comment/interest before abandoning?

 This all started from trying to delete/repair a directory tree of a few
 MBytes of files...

 
 Sooo it probably is looped, you should be able to attach gdb to it and run bt 
 to
 see where it is stuck and send that back to the list so we can figure out what
 to do.  Thanks,

OK... But I doubt this helps much:

(gdb) bt
#0  0x0042b93f in ?? ()
#1  0x0041cf10 in ?? ()
#2  0x0041e29d in ?? ()
#3  0x0041e8ae in ?? ()
#4  0x00425bf2 in ?? ()
#5  0x00425cae in ?? ()
#6  0x00421e87 in ?? ()
#7  0x00422022 in ?? ()
#8  0x0042210c in ?? ()
#9  0x00416b07 in ?? ()
#10 0x004043ad in ?? ()
#11 0x7f5ba972860d in __libc_start_main () from /lib64/libc.so.6
#12 0x004043dd in ?? ()
#13 0x7fff7ead12a8 in ?? ()
#14 0x in ?? ()
#15 0x0004 in ?? ()
#16 0x0064f4d0 in ?? ()
#17 0x7fff7ead2469 in ?? ()
#18 0x7fff7ead2472 in ?? ()
#19 0x7fff7ead2485 in ?? ()
#20 0x in ?? ()

At least it stays consistent when repeated!


Recompiling with -ggdb for the symbols and rerunning:

# gdb /sbin/btrfsck 17151
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-pc-linux-gnu.
For bug reporting instructions, please see:
http://bugs.gentoo.org/...
Reading symbols from /sbin/btrfsck...Reading symbols from
/usr/lib64/debug/sbin/btrfsck.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Attaching to program: /sbin/btrfsck, process 17151

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need set solib-search-path or set sysroot?
Reading symbols from /lib64/libuuid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libblkid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libblkid.so.1
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /usr/lib64/liblzo2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/liblzo2.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib64/libthread_db.so.1.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0041e74f in btrfs_search_slot ()
(gdb) bt
#0  0x0041e74f in btrfs_search_slot ()
#1  0x004259fa in find_first_block_group ()
#2  0x00425ab4 in btrfs_read_block_groups ()
#3  0x00421c15 in btrfs_setup_all_roots ()
#4  0x00421dce in __open_ctree_fd ()
#5  0x00421ea8 in open_ctree_fs_info ()
#6  0x004169b4 in cmd_check ()
#7  0x0040443b in main ()

And over twelve hours later:

(gdb)
#0  0x0041e74f in btrfs_search_slot ()
#1  0x004259fa in find_first_block_group ()
#2  0x00425ab4 in btrfs_read_block_groups ()
#3  0x00421c15 in btrfs_setup_all_roots ()
#4  0x00421dce in __open_ctree_fd ()
#5  0x00421ea8 in open_ctree_fs_info ()
#6  0x004169b4 in cmd_check ()
#7  0x0040443b in main ()


Any further debug useful?

Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 8 days looped? (btrfsck --repair --init-extent-tree)

2013-10-23 Thread Martin
On 23/10/13 17:21, Josef Bacik wrote:
 On Wed, Oct 23, 2013 at 04:32:51PM +0100, Martin wrote:


 Any further debug useful?

 
 Nope I know where it's breaking, I need to fix how we init the extent tree.
 Thanks,

Good stuff.

If of help, I can test new code or a patch for that example. (I'll leave
the disk in place for the time being.)


Thanks,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-10-25 Thread Martin
On 25/10/13 19:01, Josef Bacik wrote:
 Unfortunately you can't run --init-extent-tree if you can't actually read the
 extent root.  Fix this by allowing partial starts with no extent root and then
 have fsck only check to see if the extent root is uptodate _after_ the check 
 to
 see if we are init'ing the extent tree.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  cmds-check.c |  9 ++---
  disk-io.c| 16 ++--
  2 files changed, 20 insertions(+), 5 deletions(-)
 
 diff --git a/cmds-check.c b/cmds-check.c
 index 69b0327..8ed7baa 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c

Hey! Quick work!...

Is that worth patching locally and trying against my example?

Thanks,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-10-26 Thread Martin
On 25/10/13 19:31, Josef Bacik wrote:
 On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
 On 25/10/13 19:01, Josef Bacik wrote:
 Unfortunately you can't run --init-extent-tree if you can't actually read 
 the
 extent root.  Fix this by allowing partial starts with no extent root and 
 then
 have fsck only check to see if the extent root is uptodate _after_ the 
 check to
 see if we are init'ing the extent tree.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  cmds-check.c |  9 ++---
  disk-io.c| 16 ++--
  2 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/cmds-check.c b/cmds-check.c
 index 69b0327..8ed7baa 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c

 Hey! Quick work!...

 Is that worth patching locally and trying against my example?

 
 Yes, I'm a little worried about your particular case so I'd like to see if it
 works.  If you don't see a lot of output after say 5 minutes let's assume I
 didn't fix your problem and let me know so I can make the other change I
 considered.  Thanks,

Nope... No-go.

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure

...And nothing more. Looped.


# gdb /sbin/btrfsck 31887
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-pc-linux-gnu.
For bug reporting instructions, please see:
http://bugs.gentoo.org/...
Reading symbols from /sbin/btrfsck...Reading symbols from
/usr/lib64/debug/sbin/btrfsck.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Attaching to program: /sbin/btrfsck, process 31887

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need set solib-search-path or set sysroot?
Reading symbols from /lib64/libuuid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libblkid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libblkid.so.1
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /usr/lib64/liblzo2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/liblzo2.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib64/libthread_db.so.1.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0042b7a9 in read_extent_buffer ()
(gdb)
(gdb) bt
#0  0x0042b7a9 in read_extent_buffer ()
#1  0x0041ccfd in btrfs_check_node ()
#2  0x0041e0a2 in check_block ()
#3  0x0041e69e in btrfs_search_slot ()
#4  0x00425a6e in find_first_block_group ()
#5  0x00425b28 in btrfs_read_block_groups ()
#6  0x00421c40 in btrfs_setup_all_roots ()
#7  0x00421e3f in __open_ctree_fd ()
#8  0x00421f19 in open_ctree_fs_info ()
#9  0x004169b4 in cmd_check ()
#10 0x0040443b in main ()
(gdb)


# btrfs version
Btrfs v0.20-rc1-358-g194aa4a-dirty


 Emerging (1 of 1) sys-fs/btrfs-progs-
 Unpacking source...
GIT update --
   repository:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
   at the commit:194aa4a1bd6447bb545286d0bcb0b0be8204d79f
   branch:   master
   storage directory:
/usr/portage/distfiles/egit-src/btrfs-progs.git
   checkout type:bare repository
Cloning into
'/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs-'...
done.
Branch branch-master set up to track remote branch master from origin.
Switched to a new branch 'branch-master'
 Unpacked to
/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs-
 Source unpacked in /var/tmp/portage/sys-fs/btrfs-progs-/work
 Preparing source in
/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs- ...
 Source prepared.
 * Applying user patches from
/etc/portage/patches//sys-fs/btrfs-progs- ...
 *   jbpatch2013-10-25-extents-fix.patch ...


[ ok ]
 * Done with patching
 Configuring source in
/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs- ...
 Source configured.
[...]

Note the compile warnings:


 * QA Notice: Package triggers severe warnings which

Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-06 Thread Martin
On 28/10/13 15:11, Josef Bacik wrote:
 On Sun, Oct 27, 2013 at 12:16:12AM +0100, Martin wrote:
 On 25/10/13 19:31, Josef Bacik wrote:
 On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
 On 25/10/13 19:01, Josef Bacik wrote:
 Unfortunately you can't run --init-extent-tree if you can't actually read 
 the
 extent root.  Fix this by allowing partial starts with no extent root and 
 then
 have fsck only check to see if the extent root is uptodate _after_ the 
 check to
 see if we are init'ing the extent tree.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  cmds-check.c |  9 ++---
  disk-io.c| 16 ++--
  2 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/cmds-check.c b/cmds-check.c
 index 69b0327..8ed7baa 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c

 Hey! Quick work!...

 Is that worth patching locally and trying against my example?


 Yes, I'm a little worried about your particular case so I'd like to see if 
 it
 works.  If you don't see a lot of output after say 5 minutes let's assume I
 didn't fix your problem and let me know so I can make the other change I
 considered.  Thanks,

 Nope... No-go.

 
 Ok I've sent
 
 [PATCH] Btrfs-progs: rework open_ctree to take flags, add a new one
 
 which should address your situation.  Thanks,


Josef,

Tried your patch:


Signed-off-by: Josef Bacik jba...@fusionio.com

 13 files changed, 75 insertions(+), 113 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 26c7b5f..ae10eed 100644


And the patching fails due to mismatching code...

I have the Gentoo source for:

Btrfs v0.20-rc1-358-g194aa4a

(On Gentoo 3.11.5, will be on 3.11.6 later today.)


What are the magic incantations to download your version of source code
to try please? (Patched or unpatched?)


Many thanks,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: progs integration branch moved to master (new default leafsize)

2013-11-08 Thread Martin
On 08/11/13 22:01, Chris Mason wrote:
 Hi everyone,
 
 This patch is now the tip of the master branch for btrfs-progs, which
 has been updated to include most of the backlogged progs patches.
 Please take a look and give it a shake.  This was based on Dave's
 integration tree (many thanks Dave!) minus the patches for online dedup.
 I've pulled in the coverity fixes and a few others from the list as
 well.
 
 The patch below switches our default mkfs leafsize up to 16K.  This
 should be a better choice in almost every workload, but now is your
 chance to complain if it causes trouble.

Thanks for that and nicely timely!

Compiling on Gentoo (3.11.5-gentoo, sys-fs/btrfs-progs-) gives:


 * QA Notice: Package triggers severe warnings which indicate that it
 *may exhibit random runtime failures.
 * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1930:5: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1931:6: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]

 * Please do not file a Gentoo bug and instead report the above QA
 * issues directly to the upstream developers of this software.
 * Homepage: https://btrfs.wiki.kernel.org




 
 16KB is faster and leads to less metadata fragmentation in almost all
 workloads.  It does slightly increase lock contention on the root nodes
 in some workloads, but that is best dealt with by adding more subvolumes
 (for now).

Interesting and I was wondering about that. Good update.

Also, hopefully that is a little more friendly for SSDs where often you
see improved performance for 8kByte or 16kByte (aligned) writes...


Testing in progress,

Regards,
Martin



 This uses 16KB or the page size, whichever is bigger.  If you're doing a
 mixed block group mkfs, it uses the sectorsize instead.
 
 Since the kernel refuses to mount a mixed block group FS where the
 metadata leaf size doesn't match the data sectorsize, this also adds a
 similar check during mkfs.
 
 Signed-off-by: Chris Mason chris.ma...@fusionio.com
 ---
  mkfs.c | 19 ++-
  1 file changed, 18 insertions(+), 1 deletion(-)
 
 diff --git a/mkfs.c b/mkfs.c
 index bf8a831..cd0af9e 100644
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -46,6 +46,8 @@
  
  static u64 index_cnt = 2;
  
 +#define DEFAULT_MKFS_LEAF_SIZE 16384
 +
  struct directory_name_entry {
   char *dir_name;
   char *path;
 @@ -1222,7 +1224,7 @@ int main(int ac, char **av)
   u64 alloc_start = 0;
   u64 metadata_profile = 0;
   u64 data_profile = 0;
 - u32 leafsize = sysconf(_SC_PAGESIZE);
 + u32 leafsize = max_t(u32, sysconf(_SC_PAGESIZE), 
 DEFAULT_MKFS_LEAF_SIZE);
   u32 sectorsize = 4096;
   u32 nodesize = leafsize;
   u32 stripesize = 4096;
 @@ -1232,6 +1234,7 @@ int main(int ac, char **av)
   int ret;
   int i;
   int mixed = 0;
 + int leaf_forced = 0;
   int data_profile_opt = 0;
   int metadata_profile_opt = 0;
   int discard = 1;
 @@ -1269,6 +1272,7 @@ int main(int ac, char **av)
   case 'n':
   nodesize = parse_size(optarg);
   leafsize = parse_size(optarg);
 + leaf_forced = 1;
   break;
   case 'L':
   label = parse_label(optarg);
 @@ -1386,8 +1390,21 @@ int main(int ac, char **av)
   BTRFS_BLOCK_GROUP_RAID0 : 0; /* raid0 or single 
 */
   }
   } else {
 + u32 best_leafsize = max_t(u32, sysconf(_SC_PAGESIZE), 
 sectorsize);
   metadata_profile = 0;
   data_profile = 0;
 +
 + if (!leaf_forced) {
 + leafsize = best_leafsize;
 + nodesize = best_leafsize;
 + if (check_leaf_or_node_size(leafsize, sectorsize))
 + exit(1);
 + }
 + if (leafsize != sectorsize) {
 + fprintf(stderr, Error: mixed metadata/data block 
 groups 
 + require metadata blocksizes equal to the 
 sectorsize\n);
 + exit(1);
 + }
   }
  
   ret = test_num_disk_vs_raid(metadata_profile, data_profile,


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-11 Thread Martin
On 07/11/13 01:25, Martin wrote:
 On 28/10/13 15:11, Josef Bacik wrote:

 Ok I've sent

 [PATCH] Btrfs-progs: rework open_ctree to take flags, add a new one

 which should address your situation.  Thanks,
 
 
 Josef,
 
 Tried your patch:
 
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 
  13 files changed, 75 insertions(+), 113 deletions(-)
 
 diff --git a/btrfs-convert.c b/btrfs-convert.c
 index 26c7b5f..ae10eed 100644
 
 
 And the patching fails due to mismatching code...
 
 I have the Gentoo source for:
 
 Btrfs v0.20-rc1-358-g194aa4a
 
 (On Gentoo 3.11.5, will be on 3.11.6 later today.)
 
 
 What are the magic incantations to download your version of source code
 to try please? (Patched or unpatched?)

OK so Chris Mason and the Gentoo sys-fs/btrfs-progs- came to the
rescue to give:


# btrfs version
Btrfs v0.20-rc1-591-gc652e4e


This time:

# btrfsck --repair --init-extent-tree /dev/sdc

quickly gave:

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure
btrfs unable to find ref byte nr 910293991424 parent 0 root 1  owner 2
offset 0
btrfs unable to find ref byte nr 910293995520 parent 0 root 1  owner 1
offset 1
btrfs unable to find ref byte nr 910293999616 parent 0 root 1  owner 0
offset 1
leaf free space ret -297791851, leaf data size 3995, used 297795846
nritems 2
checking extents
btrfsck: extent_io.c:609: free_extent_buffer: Assertion `!(eb-refs 
0)' failed.
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 38a60270-f9c6-4ed4-8421-4bf1253ae0b3
Creating a new extent tree
Failed to find [910293991424, 168, 4096]
Failed to find [910293995520, 168, 4096]
Failed to find [910293999616, 168, 4096]


From that, I've tried running again:

# btrfsck --repair /dev/sdc

giving thus far:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure


... And it is still running a couple of days later.

GDB shows:

(gdb) bt
#0  0x0042d576 in read_extent_buffer ()
#1  0x0041ee79 in btrfs_check_node ()
#2  0x00420211 in check_block ()
#3  0x00420813 in btrfs_search_slot ()
#4  0x00427bb4 in btrfs_read_block_groups ()
#5  0x00423e40 in btrfs_setup_all_roots ()
#6  0x0042406d in __open_ctree_fd ()
#7  0x00424126 in open_ctree_fs_info ()
#8  0x0041812e in cmd_check ()
#9  0x00404904 in main ()


So... Has it looped or is it busy? There is no activity on /dev/sdc.


Which comes to a request:

Can the options -v (for verbose) and -s (to continuously show
status) be added to btrfsck to give some indication of progress and what
is happening? The -s should report progress by whatever appropriate
real-time counts as done by such as badblocks -s.


I'll leave running for a little while longer before trying a mount.

Hope of interest.

Thanks,
Martin





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-13 Thread Martin
On 11/11/13 22:52, Martin wrote:
 On 07/11/13 01:25, Martin wrote:

 OK so Chris Mason and the Gentoo sys-fs/btrfs-progs- came to the
 rescue to give:
 
 
 # btrfs version
 Btrfs v0.20-rc1-591-gc652e4e

 From that, I've tried running again:
 
 # btrfsck --repair /dev/sdc
 
 giving thus far:
 
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 Ignoring transid failure
 
 
 ... And it is still running a couple of days later.
 
 GDB shows:
 
 (gdb) bt
 #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()


Another two days and:

(gdb) bt
#0  0x0042373a in read_tree_block ()
#1  0x00421538 in btrfs_search_slot ()
#2  0x00427bb4 in btrfs_read_block_groups ()
#3  0x00423e40 in btrfs_setup_all_roots ()
#4  0x0042406d in __open_ctree_fd ()
#5  0x00424126 in open_ctree_fs_info ()
#6  0x0041812e in cmd_check ()
#7  0x00404904 in main ()


 So... Has it looped or is it busy? There is no activity on /dev/sdc.

Same btrfs_read_block_groups but different stack above that: So
perhaps something useful is being done?...

No disk activity noticed.


 Which comes to a request:
 
 Can the options -v (for verbose) and -s (to continuously show
 status) be added to btrfsck to give some indication of progress and what
 is happening? The -s should report progress by whatever appropriate
 real-time counts as done by such as badblocks -s.


OK... So I'll leave running for a little while longer before trying a mount.

Some sort of progress indicator would be rather useful... Is this going
to run for a few hours more or might this need to run for weeks to
complete? Any clues to look for?

(All on a 2TByte single disk btrfs, 4k defaults)

Hope of interest.

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-15 Thread Martin
Another two days and a backtrace shows the hope of progress:

#0  0x0041de2f in btrfs_node_key ()
#1  0x0041ee79 in btrfs_check_node ()
#2  0x00420211 in check_block ()
#3  0x00420813 in btrfs_search_slot ()
#4  0x00427bb4 in btrfs_read_block_groups ()
#5  0x00423e40 in btrfs_setup_all_roots ()
#6  0x0042406d in __open_ctree_fd ()
#7  0x00424126 in open_ctree_fs_info ()
#8  0x0041812e in cmd_check ()
#9  0x00404904 in main ()

No other output, 100% CPU, using only a single core, and no apparent
disk activity.

There looks to be a repeating pattern of calls. Is this working though
the same test repeated per btrfs block? Are there any variables that can
be checked with gdb to see how far it has gone so as to guess how long
it might need to run?


Phew?

Hope of interest,

Regards,
Martin




On 13/11/13 12:08, Martin wrote:
 On 11/11/13 22:52, Martin wrote:
 On 07/11/13 01:25, Martin wrote:
 
 OK so Chris Mason and the Gentoo sys-fs/btrfs-progs- came to the
 rescue to give:


 # btrfs version
 Btrfs v0.20-rc1-591-gc652e4e
 
 From that, I've tried running again:

 # btrfsck --repair /dev/sdc

 giving thus far:

 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 Ignoring transid failure


 ... And it is still running a couple of days later.

 GDB shows:

 (gdb) bt
 #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()
 
 
 Another two days and:
 
 (gdb) bt
 #0  0x0042373a in read_tree_block ()
 #1  0x00421538 in btrfs_search_slot ()
 #2  0x00427bb4 in btrfs_read_block_groups ()
 #3  0x00423e40 in btrfs_setup_all_roots ()
 #4  0x0042406d in __open_ctree_fd ()
 #5  0x00424126 in open_ctree_fs_info ()
 #6  0x0041812e in cmd_check ()
 #7  0x00404904 in main ()
 
 
 So... Has it looped or is it busy? There is no activity on /dev/sdc.
 
 Same btrfs_read_block_groups but different stack above that: So
 perhaps something useful is being done?...
 
 No disk activity noticed.
 
 
 Which comes to a request:

 Can the options -v (for verbose) and -s (to continuously show
 status) be added to btrfsck to give some indication of progress and what
 is happening? The -s should report progress by whatever appropriate
 real-time counts as done by such as badblocks -s.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-18 Thread Martin
On 07/11/13 01:25, Martin wrote:
[...]
 And the patching fails due to mismatching code...
 
 I have the Gentoo source for:
 
 Btrfs v0.20-rc1-358-g194aa4a
 
 (On Gentoo 3.11.5, will be on 3.11.6 later today.)
 
 
 What are the magic incantations to download your version of source code
 to try please? (Patched or unpatched?)

As an FYI for anyone stumbling onto this thread:

See:

https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

to get to the code!


Cheers,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

2013-11-18 Thread Martin
Continuing:

gdb bt now gives:

#0  0x0042075a in btrfs_search_slot ()
#1  0x00427bb4 in btrfs_read_block_groups ()
#2  0x00423e40 in btrfs_setup_all_roots ()
#3  0x0042406d in __open_ctree_fd ()
#4  0x00424126 in open_ctree_fs_info ()
#5  0x0041812e in cmd_check ()
#6  0x00404904 in main ()

#0  0x004208bc in btrfs_search_slot ()
#1  0x00427bb4 in btrfs_read_block_groups ()
#2  0x00423e40 in btrfs_setup_all_roots ()
#3  0x0042406d in __open_ctree_fd ()
#4  0x00424126 in open_ctree_fs_info ()
#5  0x0041812e in cmd_check ()
#6  0x00404904 in main ()

#0  0x004208d0 in btrfs_search_slot ()
#1  0x00427bb4 in btrfs_read_block_groups ()
#2  0x00423e40 in btrfs_setup_all_roots ()
#3  0x0042406d in __open_ctree_fd ()
#4  0x00424126 in open_ctree_fs_info ()
#5  0x0041812e in cmd_check ()
#6  0x00404904 in main ()


Still no further output. btrfsck running at 100% on a single core and
with no apparent disk activity. All for a 2TB hdd.


Should it take this long?...

Regards,
Martin




On 15/11/13 17:18, Martin wrote:
 Another two days and a backtrace shows the hope of progress:
 
 #0  0x0041de2f in btrfs_node_key ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()
 
 No other output, 100% CPU, using only a single core, and no apparent
 disk activity.
 
 There looks to be a repeating pattern of calls. Is this working though
 the same test repeated per btrfs block? Are there any variables that can
 be checked with gdb to see how far it has gone so as to guess how long
 it might need to run?
 
 
 Phew?
 
 Hope of interest,
 
 Regards,
 Martin
 
 
 
 
 On 13/11/13 12:08, Martin wrote:
 On 11/11/13 22:52, Martin wrote:
 On 07/11/13 01:25, Martin wrote:

 OK so Chris Mason and the Gentoo sys-fs/btrfs-progs- came to the
 rescue to give:


 # btrfs version
 Btrfs v0.20-rc1-591-gc652e4e

 From that, I've tried running again:

 # btrfsck --repair /dev/sdc

 giving thus far:

 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 Ignoring transid failure


 ... And it is still running a couple of days later.

 GDB shows:

 (gdb) bt
 #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()


 Another two days and:

 (gdb) bt
 #0  0x0042373a in read_tree_block ()
 #1  0x00421538 in btrfs_search_slot ()
 #2  0x00427bb4 in btrfs_read_block_groups ()
 #3  0x00423e40 in btrfs_setup_all_roots ()
 #4  0x0042406d in __open_ctree_fd ()
 #5  0x00424126 in open_ctree_fs_info ()
 #6  0x0041812e in cmd_check ()
 #7  0x00404904 in main ()


 So... Has it looped or is it busy? There is no activity on /dev/sdc.

 Same btrfs_read_block_groups but different stack above that: So
 perhaps something useful is being done?...

 No disk activity noticed.


 Which comes to a request:

 Can the options -v (for verbose) and -s (to continuously show
 status) be added to btrfsck to give some indication of progress and what
 is happening? The -s should report progress by whatever appropriate
 real-time counts as done by such as badblocks -s.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Actual effect of mkfs.btrfs -m raid10 /dev/sdX ... -d raid10 /dev/sdX ...

2013-11-19 Thread Martin
On 19/11/13 23:16, Duncan wrote:

 So we have:
 
 1) raid1 is exactly two copies of data, paired devices.
 
 2) raid0 is a stripe exactly two devices wide (reinforced by to read a 
 stripe takes only two devices), so again paired devices.

Which is fine for some occasions and a very good start point.

However, I'm sure there is a strong wish to be able to specify n-copies
of data/metadata spread across m devices. Or even to specify 'hot spares'.

This would be a great to overcome the problem of a set of drives
becoming read-only when one btrfs drive fails or is removed.

(Or should we always mount with the degraded option?)


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Actual effect of mkfs.btrfs -m raid10 /dev/sdX ... -d raid10 /dev/sdX ...

2013-11-19 Thread Martin
On 19/11/13 19:24, deadhorseconsulting wrote:
 Interesting, this confirms what I was observing.
 Given the wording in man pages for -m and -d which states Specify
 how the metadata or data must be spanned across the devices
 specified.
 I took devices specified to literally mean the devices specified
 after the according switch.

That sounds like a hang-over from too many years use of the mdadm
command and more recently such as the sgdisk command...

;-)


Myself, I like the btrfs way to specify the list of parameters and then
they all then get applied as a whole.

The one bugbear at the moment is that for using multiple disks: Any
actions seem to be applied to the list of devices in sequence
one-by-one. There's no apparent intelligence to consider present pool
- new pool of devices as a whole.


More development!

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck --repair /dev/sdc (Was: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked)

2013-11-19 Thread Martin
It's now gone back to a pattern from a full week ago:

(gdb) bt
#0  0x0042d576 in read_extent_buffer ()
#1  0x0041ee79 in btrfs_check_node ()
#2  0x00420211 in check_block ()
#3  0x00420813 in btrfs_search_slot ()
#4  0x00427bb4 in btrfs_read_block_groups ()
#5  0x00423e40 in btrfs_setup_all_roots ()
#6  0x0042406d in __open_ctree_fd ()
#7  0x00424126 in open_ctree_fs_info ()
#8  0x0041812e in cmd_check ()
#9  0x00404904 in main ()


I don't know if that has gone through that pattern during the week but
at a-week-a-time, this is not going to finish in reasonable time.

How come so very slow?

Any hints/tips/fixes or abandon the test?


Regards,
Martin




On 19/11/13 06:34, Martin wrote:
 Continuing:
 
 gdb bt now gives:
 
 #0  0x0042075a in btrfs_search_slot ()
 #1  0x00427bb4 in btrfs_read_block_groups ()
 #2  0x00423e40 in btrfs_setup_all_roots ()
 #3  0x0042406d in __open_ctree_fd ()
 #4  0x00424126 in open_ctree_fs_info ()
 #5  0x0041812e in cmd_check ()
 #6  0x00404904 in main ()
 
 #0  0x004208bc in btrfs_search_slot ()
 #1  0x00427bb4 in btrfs_read_block_groups ()
 #2  0x00423e40 in btrfs_setup_all_roots ()
 #3  0x0042406d in __open_ctree_fd ()
 #4  0x00424126 in open_ctree_fs_info ()
 #5  0x0041812e in cmd_check ()
 #6  0x00404904 in main ()
 
 #0  0x004208d0 in btrfs_search_slot ()
 #1  0x00427bb4 in btrfs_read_block_groups ()
 #2  0x00423e40 in btrfs_setup_all_roots ()
 #3  0x0042406d in __open_ctree_fd ()
 #4  0x00424126 in open_ctree_fs_info ()
 #5  0x0041812e in cmd_check ()
 #6  0x00404904 in main ()
 
 
 Still no further output. btrfsck running at 100% on a single core and
 with no apparent disk activity. All for a 2TB hdd.
 
 
 Should it take this long?...
 
 Regards,
 Martin
 
 
 
 
 On 15/11/13 17:18, Martin wrote:
 Another two days and a backtrace shows the hope of progress:

 #0  0x0041de2f in btrfs_node_key ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()

 No other output, 100% CPU, using only a single core, and no apparent
 disk activity.

 There looks to be a repeating pattern of calls. Is this working though
 the same test repeated per btrfs block? Are there any variables that can
 be checked with gdb to see how far it has gone so as to guess how long
 it might need to run?


 Phew?

 Hope of interest,

 Regards,
 Martin




 On 13/11/13 12:08, Martin wrote:
 On 11/11/13 22:52, Martin wrote:
 On 07/11/13 01:25, Martin wrote:

 OK so Chris Mason and the Gentoo sys-fs/btrfs-progs- came to the
 rescue to give:


 # btrfs version
 Btrfs v0.20-rc1-591-gc652e4e

 From that, I've tried running again:

 # btrfsck --repair /dev/sdc

 giving thus far:

 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 parent transid verify failed on 911904604160 wanted 17448 found 17450
 Ignoring transid failure


 ... And it is still running a couple of days later.

 GDB shows:

 (gdb) bt
 #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()


 Another two days and:

 (gdb) bt
 #0  0x0042373a in read_tree_block ()
 #1  0x00421538 in btrfs_search_slot ()
 #2  0x00427bb4 in btrfs_read_block_groups ()
 #3  0x00423e40 in btrfs_setup_all_roots ()
 #4  0x0042406d in __open_ctree_fd ()
 #5  0x00424126 in open_ctree_fs_info ()
 #6  0x0041812e in cmd_check ()
 #7  0x00404904 in main ()


 So... Has it looped or is it busy? There is no activity on /dev/sdc.

 Same btrfs_read_block_groups but different stack above that: So
 perhaps something useful is being done?...

 No disk activity noticed.


 Which comes to a request:

 Can the options -v (for verbose) and -s (to continuously show
 status) be added to btrfsck to give some indication of progress and what
 is happening? The -s should report progress by whatever appropriate
 real-time counts as done by such as badblocks -s

Re: btrfsck --repair /dev/sdc (Was: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked)

2013-11-20 Thread Martin
On 20/11/13 17:08, Duncan wrote:
 Martin posted on Wed, 20 Nov 2013 06:51:20 + as excerpted:
 
 It's now gone back to a pattern from a full week ago:

 (gdb) bt #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()


 I don't know if that has gone through that pattern during the week but
 at a-week-a-time, this is not going to finish in reasonable time.

 How come so very slow?

 Any hints/tips/fixes or abandon the test?
 
 You're a patient man. =:^)

Sort of... I can leave it running in the background until I come to need
to do something else with that machine. So... A bit of an experiment.



 ( https://btrfs.wiki.kernel.org/index.php/FAQ , search on hours. )
 
 OK, so we round that to a day a TB, double for your two TB, and double 
 again in case your drive is much slower than the normal drive the 
 comment might have been considering and because that's for a balance but 
 you're doing a btrfsck --repair, which for all we know takes longer.
 
 That's still only four days, and you've been going well over a week.  
 At this point I think it's reasonably safe to conclude it's in some sort 
 of loop and likely will never finish.


 ... but at a week a shot, there 
 comes a time when it's simply time to declare a loss and move on.

Exactly so...

No idea what btrfsck is so very slowly checking through or if it has
indeed looped. Which is where progress output would be useful.

However, btrfsck is rather too slow to be practical at the moment.

Further development?... Any useful debug to be had from this case before
I move on?


Regards,
Martin


Still at:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure

...which is all the output thus far.


And:

(gdb) bt
#0  0x0042d574 in read_extent_buffer ()
#1  0x0041ee79 in btrfs_check_node ()
#2  0x00420211 in check_block ()
#3  0x00420813 in btrfs_search_slot ()
#4  0x00427bb4 in btrfs_read_block_groups ()
#5  0x00423e40 in btrfs_setup_all_roots ()
#6  0x0042406d in __open_ctree_fd ()
#7  0x00424126 in open_ctree_fs_info ()
#8  0x0041812e in cmd_check ()
#9  0x00404904 in main ()





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck --repair /dev/sdc (Was: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked)

2013-11-20 Thread Martin
On 20/11/13 17:08, Duncan wrote:

 Which leads to the question of what to do next.  Obviously, there have 
 been a number of update patches since then, some of which might address 
 your problem.  You could update your kernel and userspace and try 
 again... /if/ you have the patience...


This is on kernel 3.11.5 and Btrfs v0.20-rc1-591-gc652e4e.

Can easily upgrade to the latest kernel at the expense of killing the
existing btrfsck run.

Regards,
Martin


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: progs integration branch moved to master (new default leafsize)

2013-11-22 Thread Martin
On 21/11/13 23:37, Chris Mason wrote:
 Quoting Martin (2013-11-08 18:53:06)
 On 08/11/13 22:01, Chris Mason wrote:
 Hi everyone,

 This patch is now the tip of the master branch for btrfs-progs, which
 has been updated to include most of the backlogged progs patches.
 Please take a look and give it a shake.  This was based on Dave's
 integration tree (many thanks Dave!) minus the patches for online dedup.
 I've pulled in the coverity fixes and a few others from the list as
 well.

 The patch below switches our default mkfs leafsize up to 16K.  This
 should be a better choice in almost every workload, but now is your
 chance to complain if it causes trouble.

 Thanks for that and nicely timely!

 Compiling on Gentoo (3.11.5-gentoo, sys-fs/btrfs-progs-) gives:


  * QA Notice: Package triggers severe warnings which indicate that it
  *may exhibit random runtime failures.
  * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
 strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1930:5: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1931:6: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]
 
 I'm not seeing these warnings with the current master branch, could you
 please rerun?

From just now:


 * QA Notice: Package triggers severe warnings which indicate that it
 *may exhibit random runtime failures.
 * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1930:5: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1931:6: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]

 * Please do not file a Gentoo bug and instead report the above QA
 * issues directly to the upstream developers of this software.
 * Homepage: https://btrfs.wiki.kernel.org

 Installing (1 of 1) sys-fs/btrfs-progs-


... Which is exactly the same.

This is for Gentoo for:

gcc: x86_64-pc-linux-gnu-4.7.3

# gcc --version
gcc (Gentoo 4.7.3-r1 p1.3, pie-0.5.5) 4.7.3

Kernel: 3.11.9-gentoo

# btrfs version
Btrfs v0.20-rc1-597-g5aff090



And the - pulls the code in from:

From git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs
   9f0c53f..5aff090  master - master
GIT update --
   repository:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
   updating from commit: 9f0c53f574b242b0d5988db2972c8aac77ef35a9
   to commit:5aff090a3951e7d787b32bb5c49adfec65091385
 cmds-filesystem.c | 79
+++
 mkfs.c| 18 +-
 2 files changed, 88 insertions(+), 9 deletions(-)
   branch:   master
   storage directory:
/usr/portage/distfiles/egit-src/btrfs-progs.git
   checkout type:bare repository
Cloning into
'/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs-'...
done.
Checking connectivity... done
Branch branch-master set up to track remote branch master from origin.
Switched to a new branch 'branch-master'


Hope that helps,

Regards,
Martin





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: progs integration branch moved to master (new default leafsize)

2013-11-22 Thread Martin
On 22/11/13 13:40, Chris Mason wrote:
 Quoting Martin (2013-11-22 04:03:41)

  * QA Notice: Package triggers severe warnings which indicate that it
  *may exhibit random runtime failures.
  * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
 strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1930:5: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1931:6: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]
 
 Does gentoo modify the optimizations from the Makefile?  We actually
 have many strict-aliasing warnings, but I didn't think they came up
 until -O2.

For that system, I have -Os set in the Gentoo make.conf.


 At any rate, I'm adding -fno-strict-aliasing just to be sure.

Good to catch to avoid unexpectedness,


Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: progs integration branch moved to master (new default leafsize)

2013-11-22 Thread Martin
On 22/11/13 19:57, Chris Mason wrote:
 Quoting Martin (2013-11-22 14:50:17)
 On 22/11/13 13:40, Chris Mason wrote:
 Quoting Martin (2013-11-22 04:03:41)

  * QA Notice: Package triggers severe warnings which indicate that it
  *may exhibit random runtime failures.
  * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
 strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1930:5: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]
  * volumes.c:1931:6: warning: dereferencing type-punned pointer will
 break strict-aliasing rules [-Wstrict-aliasing]

 Does gentoo modify the optimizations from the Makefile?  We actually
 have many strict-aliasing warnings, but I didn't think they came up
 until -O2.

 For that system, I have -Os set in the Gentoo make.conf.


 At any rate, I'm adding -fno-strict-aliasing just to be sure.

 Good to catch to avoid unexpectedness,
 
 Ok, please try with the current master to make sure the options are
 being picked up properly.  If you're overriding the
 -fno-strict-aliasing, please don't ;)

No changes my side for that system and...

btrfs-progs now compiles with no warnings given. That looks like a fixed.


# emerge -vD btrfs-progs

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   *] sys-fs/btrfs-progs-  0 kB

Total: 1 package (1 reinstall), Size of downloads: 0 kB


 Verifying ebuild manifests

 Emerging (1 of 1) sys-fs/btrfs-progs-
 Unpacking source...
GIT update --
   repository:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
   at the commit:8116550e16628794b76051b6b8ea503055c08d6f
   branch:   master
   storage directory:
/usr/portage/distfiles/egit-src/btrfs-progs.git
   checkout type:bare repository
Cloning into
'/var/tmp/portage/sys-fs/btrfs-progs-/work/btrfs-progs-'...
done.
Checking connectivity... done
Branch branch-master set up to track remote branch master from origin.
Switched to a new branch 'branch-master'

... And then a clean compile. No warnings.


Thanks,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clean crash... (USB memory sticks mount)

2013-11-24 Thread Martin
On 24/11/13 20:50, Kai Krakow wrote:

 something about device mapper and write barriers not working correctly which 
 are needed for btrfs being able to rely on transactions working correctly.

Re USB memory sticks:

I've found write barriers not to work for USB memory sticks (for at
least the ones I have tried) for ext4 and btrfs. You must mount with the
nobarrier option...


Regards,
Martin



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs tagged as v3.12

2013-11-25 Thread Martin
I'm humbly totally unqualified to comment but that sounds like an
excellent idea. Thanks.

I can't say for others but I was put off by the 0.19 forever eternal
version which pushed me to investigate GIT... I'm sure that has been
putting off many people including distro assemblers.


Just for some positive comment: Good progress, thanks.

Regards,
Martin

(OK, that's the last of the positives for the Christmas present. Back to
bugging! ;-) )



On 25/11/13 21:45, Chris Mason wrote:
 Hi everyone,
 
 I've tagged the current btrfs-progs repo as v3.12.  The new idea is that
 instead of making the poor distros pull from git, I'll be creating
 tagged releases at roughly the same pace as Linus cuts kernels.
 
 Given the volume of btrfs-progs patches, we should have enough new code
 and fixes to justify releases at least as often as the kernel.  Of
 course, if there are issues that need immediate attention, I'll tag a .y
 release (v3.12.1 for example).
 
 If the progs changes slow down, we might skip a version.  But tracking
 kernel version numbers makes it easier for me to line up bug reports,
 mostly because I already devote a fair number of brain cells to
 remembering how old each kernel is.
 
 Just let me know if there are any questions.
 
 -chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck --repair /dev/sdc (Was: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked)

2013-11-25 Thread Martin
On 20/11/13 20:00, Martin wrote:
 On 20/11/13 17:08, Duncan wrote:
 Martin posted on Wed, 20 Nov 2013 06:51:20 + as excerpted:

 It's now gone back to a pattern from a full week ago:

 (gdb) bt #0  0x0042d576 in read_extent_buffer ()
 #1  0x0041ee79 in btrfs_check_node ()
 #2  0x00420211 in check_block ()
 #3  0x00420813 in btrfs_search_slot ()
 #4  0x00427bb4 in btrfs_read_block_groups ()
 #5  0x00423e40 in btrfs_setup_all_roots ()
 #6  0x0042406d in __open_ctree_fd ()
 #7  0x00424126 in open_ctree_fs_info ()
 #8  0x0041812e in cmd_check ()
 #9  0x00404904 in main ()


 I don't know if that has gone through that pattern during the week but
 at a-week-a-time, this is not going to finish in reasonable time.

 How come so very slow?

 Any hints/tips/fixes or abandon the test?

 You're a patient man. =:^)
 
 Sort of... I can leave it running in the background until I come to need
 to do something else with that machine. So... A bit of an experiment.


Until... No more... And just as the gdb bt shows something a little
different!

(gdb) bt
#0  0x0041ddc4 in btrfs_comp_keys ()
#1  0x004208e9 in btrfs_search_slot ()
#2  0x00427bb4 in btrfs_read_block_groups ()
#3  0x00423e40 in btrfs_setup_all_roots ()
#4  0x0042406d in __open_ctree_fd ()
#5  0x00424126 in open_ctree_fs_info ()
#6  0x0041812e in cmd_check ()
#7  0x00404904 in main ()


Nearly done or weeks yet more to run?

The poor thing gets killed in the morning for new work.


Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck --repair /dev/sdc (Was: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent tree is borked)

2013-11-26 Thread Martin

 I don't know if that has gone through that pattern during the week but
 at a-week-a-time, this is not going to finish in reasonable time.

 How come so very slow?

 Any hints/tips/fixes or abandon the test?

 You're a patient man. =:^)

 Sort of... I can leave it running in the background until I come to need
 to do something else with that machine. So... A bit of an experiment.
 
 
 Until... No more... And just as the gdb bt shows something a little
 different!
 
 (gdb) bt
 #0  0x0041ddc4 in btrfs_comp_keys ()
 #1  0x004208e9 in btrfs_search_slot ()
 #2  0x00427bb4 in btrfs_read_block_groups ()
 #3  0x00423e40 in btrfs_setup_all_roots ()
 #4  0x0042406d in __open_ctree_fd ()
 #5  0x00424126 in open_ctree_fs_info ()
 #6  0x0041812e in cmd_check ()
 #7  0x00404904 in main ()
 
 
 Nearly done or weeks yet more to run?
 
 The poor thing gets killed in the morning for new work.

OK, so that all came to naught and it got killed for a kernel update and
new work.

Just for a giggle, I tried mounting that disk with the 'recovery'
option and it failed with the usual complaint:

btrfs: disabling disk space caching
btrfs: enabling auto recovery
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
btrfs: open_ctree failed


Trying a wild guess of btrfs-zero-log /dev/sdc gives:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure

... and it is sat there at 100% CPU usage, no further output, and no
apparent disk activity... Just like btrfsck was...


So... Looks like time finally for a reformat.



Any chance of outputting some indication of progress, and for a speedup,
or options for partial recovery or?... Or for a fast 'slash-and-burn'
recovery where damaged trees get cleanly amputated rather than
too-painfully-slowly repaired?...

Just a few wild ideas ;-)


Regards,
Martin




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feature Req: mkfs.btrfs -d dup option on single device

2013-12-11 Thread Martin
On 11/12/13 03:19, Imran Geriskovan wrote:

SSDs:

 What's more (in relation to our long term data integrity aim)
 order of magnitude for their unpowered data retension period is
 1 YEAR. (Read it as 6months to 2-3 years. While powered they
 refresh/shuffle the blocks) This makes SSDs
 unsuitable for mid-to-long tem consumer storage. Hence they are
 out of this discussion. (By the way, the only way for reliable
 duplication on SSDs, is using physically seperate devices.)

Interesting...

Have you any links/quotes/studies/specs for that please?


Does btrfs need to date-stamp each block/chunk to ensure that data is
rewritten before suffering flash memory bitrot?

Is not the firmware in SSDs aware to rewrite any too-long unchanged data?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS extended attributes mounted on a non-extended-attributes compiled kernel

2013-12-11 Thread Martin
What happens if...

I have a btrfs that has utilised posix ACLs / extended attributes and I
then subsequently mount that onto a system that does not have the kernel
modules compiled for those features?


Crash and burn?

Or are the extra filesystem features benignly ignored until remounted on
the original system with all the kernel modules?


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   >