[PATCH 3/3] btrfs-progs: clean up commands.h

2016-05-18 Thread Anand Jain
This function is declared in utils.h so remove it
from commands.h
int test_issubvolume(const char *path);

This function does not exists delete the declaration
get_subvol_name(char *mnt, char *full_path);

Signed-off-by: Anand Jain 
---
 commands.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/commands.h b/commands.h
index 2da093bf81a3..94229c112bc0 100644
--- a/commands.h
+++ b/commands.h
@@ -125,10 +125,4 @@ int cmd_dump_super(int argc, char **argv);
 int cmd_debug_tree(int argc, char **argv);
 int cmd_rescue(int argc, char **argv);
 
-/* subvolume exported functions */
-int test_issubvolume(const char *path);
-
-/* send.c */
-char *get_subvol_name(char *mnt, char *full_path);
-
 #endif
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: fix make install failure

2016-05-18 Thread Anand Jain
/usr/bin/install -c -m644 -d 64-btrfs-dm.rules /usr/lib/udev/rules.d
/usr/bin/install: cannot create directory ‘64-btrfs-dm.rules’: File exists
Makefile:400: recipe for target 'install' failed
make: *** [install] Error 1

fixes: btrfs-progs: udev: add rules for dm devices

Signed-off-by: Anand Jain 
---
 Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile.in b/Makefile.in
index 47e73c9fbc4e..238dd59badaf 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -408,7 +408,7 @@ install: $(libs) $(progs_install) $(INSTALLDIRS)
$(INSTALL) -m755 -d $(DESTDIR)$(incdir)
$(INSTALL) -m644 $(headers) $(DESTDIR)$(incdir)
 ifneq ($(udevdir),)
-   $(INSTALL) -m644 -d $(udev_rules) $(DESTDIR)$(udevruledir)
+   $(INSTALL) -m644 $(udev_rules) $(DESTDIR)$(udevruledir)
 endif
 
 install-static: $(progs_static) $(INSTALLDIRS)
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: add clean-all to the usage

2016-05-18 Thread Anand Jain
Signed-off-by: Anand Jain 
---
 Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile.in b/Makefile.in
index 238dd59badaf..50b2ee5d8eba 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -6,6 +6,7 @@
 #   testrun the full testsuite
 #   install install to default location (/usr/local)
 #   clean   clean built binaries (not the documentation)
+#   clean-all   clean as above, clean docs and generated files
 #
 # Tuning by variables (environment or make arguments):
 #   V=1verbose, print command lines (default: quiet)
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: fixes on top of latest integration branch

2016-05-18 Thread Anand Jain
Hi David,

  Your latest branch integration-20160517 is failing with
  make install, 1/3 will fix it. While here can you also
  apply 2/3 and 3/3 as below, however they aren't related
  though.

Thanks


Anand Jain (3):
  btrfs-progs: fix make install failure
  btrfs-progs: add clean-all to the usage
  btrfs-progs: clean up commands.h

 Makefile.in | 3 ++-
 commands.h  | 6 --
 2 files changed, 2 insertions(+), 7 deletions(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reducing impact of periodic btrfs balance

2016-05-18 Thread Duncan
Qu Wenruo posted on Thu, 19 May 2016 09:33:19 +0800 as excerpted:

> Graham Cobb wrote on 2016/05/18 14:29 +0100:
>> Hi,
>>
>> I have a 6TB btrfs filesystem I created last year (about 60% used).  It
>> is my main data disk for my home server so it gets a lot of usage
>> (particularly mail). I do frequent snapshots (using btrbk) so I have a
>> lot of snapshots (about 1500 now, although it was about double that
>> until I cut back the retention times recently).
> 
> Even at 1500, it's still quite large, especially when they are all
> snapshots.
> 
> The biggest problem of large amount of snapshots is, it will make any
> backref walk operation very slow. (O(n^3)~O(n^4))
> This includes: btrfs qgroup and balance, even fiemap (recently submitted
> patch will solve fiemap problem though)
> 
> The btrfs design ensures snapshot creation fast, but that comes with the
> cost of backref walk.
> 
> 
> So, unless some super huge rework, I would prefer to keep the number of
> snapshots to a small amount, or avoid balance/qgroup.

Qu and Graham,

As you may have seen on my previous posts, my normal snapshots 
recommendation is to try to keep under 250-300 per subvolume, and 
definitely under 3000 max, 2000 preferably, and 1000 if being 
conservative, per filesystem, thus allowing snapshotting of 6-8 
subvolumes per filesystem before hitting the filesystem cap, due to 
scaling issues like the above that are directly related to number of 
snapshots.  

Also, recognizing that the btrfs quota code dramatically compounds the 
scaling issues, as well as because of the btrfs quota functionality still 
never actually working fully correctly on btrfs, I recommend turning it 
off unless it's definitely and specifically known to be needed, and if 
it's actually needed, I recommend strong consideration be given to use of 
a more mature filesystem where quotas are known to work reliably without 
the scaling issues they present on btrfs.


So to Graham, are these 1.5K snapshots all of the same subvolume, or 
split into snapshots of several subvolumes?  If it's all of the same 
subvolume or of only 2-3 subvolumes, you still have some work to do in 
terms of getting down to recommended snapshot levels.  Also, if you have 
quotas on and don't specifically need them, try turning them off and see 
if that alone makes it workable.

It's worth noting that a reasonable snapshot thinning program can help 
quite a bit here, letting you still keep a reasonable retention, and that 
250-300 snapshots per subvolume fits very well within that model.  
Consider, if you're starting with say hourly snapshots, a year or even 
three months out, are you really going to care what specific hourly 
snapshot you retrieve a file from, or would daily or weekly snapshots do 
just as well and actually make finding an appropriate snapshot easier as 
there's less to go thru?

Generally speaking, most people starting with hourly snapshots can delete 
every other snapshot, thinning by at least half, within a day or two, and 
those doing snapshots even more frequently can thin down to at least 
hourly within hours even, since if you haven't noticed a mistaken 
deletion or whatever within a few hours, chances are good that recovery 
from hourly snapshots is more than practical, and if you haven't noticed 
it within a day or two, recovery from say two-hourly or six-hourly 
snapshots will be fine.  Similarly, a week out, most people can thin to 
twice-daily or daily snapshots, and by 4 weeks out, perhaps to Monday/
Wednesday/Friday snapshots.  By 13 weeks (one quarter) out, weekly 
snapshots are often fine, and by six months (26 weeks) out, thinning to 
quarterly (13-week) snapshots may be practical.  If not, it certainly 
should be within a year, tho well before a year is out, backups to 
separate media should have taken over allowing the oldest snapshots be 
dropped, finally reclaiming the space they were keeping locked up.


And primarily to Qu...

Is that 2K snapshots overall filesystem cap recommendation still too 
high, even if per-subvolume snapshots are limited to 300-ish?  Or is the 
real problem per-subvolume snapshots, and as long as snapshots are 
limited to 300ish per subvolume, for people who have gone subvolume mad 
and have say 50 separate subvolumes being snapshotted (perhaps not too 
unreasonable in a VM context with each VM on its own subvolume), if a 
300ish cap per subvolume is maintained, the 15K total snapshots per 
filesystem should still work reasonably well, so I should be able to drop 
the overall filesystem cap recommendation and simply recommend a per-
subvolume snapshot cap of a few hundred?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-inf

[PATCH] btrfs-progs: Return earlier for previous item

2016-05-18 Thread Qu Wenruo
Follow kernel code to return earlier for btrfs_previous_item() function.

Before this patch, btrfs_previous_item() doesn't use its min_objectid to
exit, this makes caller to check key to exit, and if caller doesn't
check, it will iterate all previous item.

This patch will check min_objectid and type, to early return and save
some time.

Signed-off-by: Qu Wenruo 
---
 ctree.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/ctree.c b/ctree.c
index 079696e..a98ad18 100644
--- a/ctree.c
+++ b/ctree.c
@@ -2880,6 +2880,7 @@ int btrfs_previous_item(struct btrfs_root *root,
 {
struct btrfs_key found_key;
struct extent_buffer *leaf;
+   u32 nritems;
int ret;
 
while(1) {
@@ -2891,9 +2892,20 @@ int btrfs_previous_item(struct btrfs_root *root,
path->slots[0]--;
}
leaf = path->nodes[0];
+   nritems = btrfs_header_nritems(leaf);
+   if (nritems == 0)
+   return 1;
+   if (path->slots[0] == nritems)
+   path->slots[0]--;
+
btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+   if (found_key.objectid < min_objectid)
+   break;
if (found_key.type == type)
return 0;
+   if (found_key.objectid == min_objectid &&
+   found_key.type < type)
+   break;
}
return 1;
 }
-- 
2.8.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reducing impact of periodic btrfs balance

2016-05-18 Thread Qu Wenruo



Graham Cobb wrote on 2016/05/18 14:29 +0100:

Hi,

I have a 6TB btrfs filesystem I created last year (about 60% used).  It
is my main data disk for my home server so it gets a lot of usage
(particularly mail). I do frequent snapshots (using btrbk) so I have a
lot of snapshots (about 1500 now, although it was about double that
until I cut back the retention times recently).


Even at 1500, it's still quite large, especially when they are all 
snapshots.


The biggest problem of large amount of snapshots is, it will make any 
backref walk operation very slow. (O(n^3)~O(n^4))
This includes: btrfs qgroup and balance, even fiemap (recently submitted 
patch will solve fiemap problem though)


The btrfs design ensures snapshot creation fast, but that comes with the 
cost of backref walk.



So, unless some super huge rework, I would prefer to keep the number of 
snapshots to a small amount, or avoid balance/qgroup.




A while ago I had a "no space" problem (despite fi df, fi show and fi
usage all agreeing I had over 1TB free).  But this email isn't about that.

As part of fixing that problem, I tried to do a "balance -dusage=20" on
the disk.  I was expecting it to have system impact, but it was a major
disaster.  The balance didn't just run for a long time, it locked out
all activity on the disk for hours.  A simple "touch" command to create
one file took over an hour.


It seems that balance blocked a transaction for a long time, which makes 
your touch operation to wait for that transaction to end.




More seriously, because of that, mail was being lost: all mail delivery
timed out and the timeout error was interpreted as a fatal delivery
error causing mail to be discarded, mailing lists to cancel
subscriptions, etc. The balance never completed, of course.  I
eventually got it cancelled.

I have since managed to complete the "balance -dusage=20" by running it
repeatedly with "limit=N" (for small N).  I wrote a script to automate
that process, and rerun it every week.  If anyone is interested, the
script is on GitHub: https://github.com/GrahamCobb/btrfs-balance-slowly

Out of that experience, I have a couple of thoughts about how to
possibly make balance more friendly.

1) It looks like the balance process seems to (effectively) lock all
file (extent?) creation for long periods of time.  Would it be possible
for balance to make more effort to yield locks to allow other
processes/threads to get in to continue to create/write files while it
is running?


Balance doesn't really lock the whole file system, and in fact itself 
will only lock(mark readonly) one block group (normally in 1G size).


But unfortunately, balance will hold one transaction for one block 
group, and that's the whole fs level, may blocks unrelated write operation.




2) btrfs scrub has options to set ionice options.  Could balance have
something similar?  Or would reducing the IO priority make things worse
because locks would be held for longer?


IMHO The problem is not about IO.
If using iotop, you would find that the IO active not that high, while 
CPU usage would be near 100% for one core.




3) My btrfs-balance-slowly script would work better if there was a
time-based limit filter for balance, not just the current count-based
filter.  I would like to be able to say, for example, run balance for no
more than 10 minutes (completing the operation in progress, of course)
then return.


As btrfs balance is done in block group unit, I'm afraid such thing 
would be a little tricky to implement.




4) My btrfs-balance-slowly script would be more reliable if there was a
way to get an indication of whether there was more work to be done,
instead of parsing the output for the number of relocations.

Any thoughts about these?  Or other things I could be doing to reduce
the impact on my services?


Would you try to remove unneeded snapshots and disable qgroup if you're 
using it?


If it's possible, it's better to remove *ALL* snapshots to minimize the 
backref walk pressure and then retry the balance.


Thanks,
Qu



Graham
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: switch to common message helpers in open_ctree, adjust messages

2016-05-18 Thread Anand Jain



> You mean a shorter UUID? The first four bytes (8 hexa digits) seem
> unique enough. That can be optinal I think.

 Yes. I said that.
 But I am also thinking FSID is user changeable (which is good),
 and so it should be easy to show that this method fails.


Another way came to my mind: make it a module parameter, so
even the mount option or sysfs settings is not needed and the defaults
are system-wide.



I'm fine with adding the configurable logging
options.


 This should be fine, I can't think of anything better.


Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: incoming merge conflict to linux-next

2016-05-18 Thread Stephen Rothwell
Hi Chris,

On Wed, 18 May 2016 17:10:43 -0400 Chris Mason  wrote:
>
> Dave Sterba's tree in linux-next has a few btrfs patches that we're not
> sending yet into Linus.  We've got an update for Josef's enospc work
> that'll get sent in next week.
> 
> So he prepped a pull for me that merged up a number of his branches but
> didn't include Josef's new code.  It has all been in -next for some
> time, and then I put some fixes from Filipe on top.
> 
> Long story short, you'll get a merge conflict from my next branch:
> 
> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git next
> 
> I've got the sample resolution in next-merge:
> 
> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git next-merge
> 
> Please let us know if you have any problems.

A bit of a mess, but I sorted it out, thanks for the test merge.

-- 
Cheers,
Stephen Rothwell
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Reducing impact of periodic btrfs balance

2016-05-18 Thread Paul Jones
> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Graham Cobb
> Sent: Wednesday, 18 May 2016 11:30 PM
> To: linux-btrfs@vger.kernel.org
> Subject: Reducing impact of periodic btrfs balance
> 
> Hi,
> 
> I have a 6TB btrfs filesystem I created last year (about 60% used).  It is my
> main data disk for my home server so it gets a lot of usage (particularly 
> mail).
> I do frequent snapshots (using btrbk) so I have a lot of snapshots (about 1500
> now, although it was about double that until I cut back the retention times
> recently).
> 
> A while ago I had a "no space" problem (despite fi df, fi show and fi usage 
> all
> agreeing I had over 1TB free).  But this email isn't about that.
> 
> As part of fixing that problem, I tried to do a "balance -dusage=20" on the
> disk.  I was expecting it to have system impact, but it was a major disaster.
> The balance didn't just run for a long time, it locked out all activity on 
> the disk
> for hours.  A simple "touch" command to create one file took over an hour.
> 
> More seriously, because of that, mail was being lost: all mail delivery timed
> out and the timeout error was interpreted as a fatal delivery error causing
> mail to be discarded, mailing lists to cancel subscriptions, etc. The balance
> never completed, of course.  I eventually got it cancelled.
> 
> I have since managed to complete the "balance -dusage=20" by running it
> repeatedly with "limit=N" (for small N).  I wrote a script to automate that
> process, and rerun it every week.  If anyone is interested, the script is on
> GitHub: https://github.com/GrahamCobb/btrfs-balance-slowly


Hi Graham,

I've experienced similar problems from time to time. It seems to be 
fragmentation of the metadata. In my case I have a volume with about 20 million 
smallish (100k) files scattered through around 20,000 directories, and 
originally they were created at random. Updating the files at a data rate of 
around 5 MB/s took 100% disk utilisation on Raid1 SSD. After a few iterations I 
needed to delete the files and start again, this took 4 days!! I cancelled it a 
few times and tried defrags and balances, but they didn't help. Needless to 
say, the filesystem was basically unusable at the time.
Long story short, I discovered that populating each directory completely, one 
at a time, alleviated the speed issue. I then remembered that if you run defrag 
with the compress option it writes out the files again, which also fixes the 
problem. (Note that there is no option for no compression)
So if you are ok with using compression try a defrag with compression. That 
massively fixed my problems.

Regards,
Paul.


Re: incoming merge conflict to linux-next

2016-05-18 Thread David Sterba
On Wed, May 18, 2016 at 05:10:43PM -0400, Chris Mason wrote:
> Dave Sterba's tree in linux-next has a few btrfs patches that we're not
> sending yet into Linus.  We've got an update for Josef's enospc work
> that'll get sent in next week.
> 
> So he prepped a pull for me that merged up a number of his branches but
> didn't include Josef's new code.  It has all been in -next for some
> time, and then I put some fixes from Filipe on top.

JFYI, the enospc branch was not the way to Linus because there were some
unexpected warnings, the v2 patch update hasn't fixed them. So I won't
redo the for-chris branch yet, but will keep the enospc patchset in my
for-next for other testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] Btrfs: introduce ticketed enospc infrastructure

2016-05-18 Thread David Sterba
On Tue, May 17, 2016 at 01:30:55PM -0400, Josef Bacik wrote:
> V1->V2:
> -fixed a check in space_info_add_old_bytes where we didn't take into account
>  bytes_may_used for the space used.
> -don't count ticket->bytes when checking overcommit.

I still see the warning in generic/333, same as with v1.

2016-05-19T00:44:33.232414+02:00 ben kernel: [ cut here 
]
2016-05-19T00:44:33.232446+02:00 ben kernel: WARNING: CPU: 2 PID: 3559 at 
fs/btrfs/extent-tree.c:2956 btrfs_run_delayed_refs+0x27d/0x2a0 [btrfs]
2016-05-19T00:44:33.232448+02:00 ben kernel: BTRFS: Transaction aborted (error 
-28)
2016-05-19T00:44:33.242936+02:00 ben kernel: Modules linked in: nfsv3 nfs_acl 
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache 
af_packet bridge stp llc iscsi_ibft iscsi_boot_sysfs msr btrfs radeon 
i2c_algo_bit drm_kms_helper coretemp kvm_intel syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm xor kvm raid6_pq drm e1000e dm_mod pcspkr ptp 
iTCO_wdt gpio_ich i5000_edac pps_core edac_core i2c_i801 iTCO_vendor_support 
lpc_ich mfd_core ppdev parport_pc intel_rng i5k_amb parport irqbypass serio_raw 
button ata_generic uhci_hcd ehci_pci ehci_hcd ata_piix usbcore aic79xx 
usb_common scsi_transport_spi sg [last unloaded: floppy]
2016-05-19T00:44:33.242960+02:00 ben kernel: CPU: 2 PID: 3559 Comm: 
btrfs-transacti Not tainted 4.6.0-rc5-vanilla+ #14
2016-05-19T00:44:33.242963+02:00 ben kernel: Hardware name: Supermicro 
X7DB8/X7DB8, BIOS 6.00 07/26/2006
2016-05-19T00:44:33.242964+02:00 ben kernel:  88017de1bd20 
8139d86a 88017de1bd70
2016-05-19T00:44:33.242966+02:00 ben kernel:  88017de1bd60 
81080941 0b8c3279c6c0
2016-05-19T00:44:33.242967+02:00 ben kernel: 8802274fb0a0 8800b5f8d000 
88023279c6c0 8800b5f8d000
2016-05-19T00:44:33.242969+02:00 ben kernel: Call Trace:
2016-05-19T00:44:33.242970+02:00 ben kernel: [] 
dump_stack+0x63/0x89
2016-05-19T00:44:33.242972+02:00 ben kernel: [] 
__warn+0xd1/0xf0
2016-05-19T00:44:33.242973+02:00 ben kernel: [] 
warn_slowpath_fmt+0x4f/0x60
2016-05-19T00:44:33.242975+02:00 ben kernel: [] 
btrfs_run_delayed_refs+0x27d/0x2a0 [btrfs]
2016-05-19T00:44:33.242977+02:00 ben kernel: [] ? 
del_timer_sync+0x48/0x50
2016-05-19T00:44:33.242978+02:00 ben kernel: [] 
btrfs_commit_transaction+0x43/0xae0 [btrfs]
2016-05-19T00:44:33.242979+02:00 ben kernel: [] 
transaction_kthread+0x1cc/0x1f0 [btrfs]
2016-05-19T00:44:33.242981+02:00 ben kernel: [] ? 
btrfs_cleanup_transaction+0x580/0x580 [btrfs]
2016-05-19T00:44:33.242983+02:00 ben kernel: [] 
kthread+0xc9/0xe0
2016-05-19T00:44:33.242984+02:00 ben kernel: [] 
ret_from_fork+0x22/0x40
2016-05-19T00:44:33.242986+02:00 ben kernel: [] ? 
kthread_create_on_node+0x180/0x180
2016-05-19T00:44:33.242988+02:00 ben kernel: ---[ end trace a9fa5269514f9444 
]---
2016-05-19T00:44:33.242990+02:00 ben kernel: BTRFS: error (device sdc) in 
btrfs_run_delayed_refs:2956: errno=-28 No space left
2016-05-19T00:44:33.242992+02:00 ben kernel: BTRFS info (device sdc): forced 
readonly
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hot data tracking / hybrid storage

2016-05-18 Thread Ferry Toth
Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow:

> Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn"
> :
> 
>> On 2016-05-17 02:27, Ferry Toth wrote:
>> > Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow:
>> >  
>> >> Am Sun, 15 May 2016 21:11:11 + (UTC)
>> >> schrieb Duncan <1i5t5.dun...@cox.net>:
>> >>  
>>  [...]
>> > 
>> >>
>> >> You can go there with only one additional HDD as temporary storage.
>> >> Just connect it, format as bcache, then do a "btrfs dev replace".
>> >> Now wipe that "free" HDD (use wipefs), format as bcache,
>> >> then... well, you get the point. At the last step, remove the
>> >> remaining HDD. Now add your SSDs, format as caching device, and
>> >> attach each individual HDD backing bcache to each SSD caching
>> >> bcache.
>> >>
>> >> Devices don't need to be formatted and created at the same time. I'd
>> >> also recommend to add all SSDs only in the last step to not wear
>> >> them early with writes during device replacement.
>> >>
>> >> If you want, you can add one additional step to get the temporary
>> >> hard disk back. But why not simply replace the oldest hard disk with
>> >> the newest. Take a look at smartctl to see which is the best
>> >> candidate.
>> >>
>> >> I went a similar route but without one extra HDD. I had three HDDs
>> >> in mraid1/draid0 and enough spare space. I just removed one HDD,
>> >> prepared it for bcache, then added it back and removed the next.
>> >>  
>> > That's what I mean, a lot of work. And it's still a cache, with
>> > unnecessary copying from the ssd to the hdd.
>> On the other hand, it's actually possible to do this all online with
>> BTRFS because of the reshaping and device replacement tools.
>> 
>> In fact, I've done even more complex reprovisioning online before (for
>> example, my home server system has 2 SSD's and 4 HDD's, running BTRFS
>> on top of LVM, I've at least twice completely recreated the LVM layer
>> online without any data loss and minimal performance degradation).
>> >
>> > And what happens when either a hdd or ssd starts failing?
>> I have absolutely no idea how bcache handles this, but I doubt it's any
>> better than BTRFS.
> 
> Bcache should in theory fall back to write-through as soon as an error
> counter exceeds a threshold. This is adjustable with sysfs
> io_error_halftime and io_error_limit. Tho I never tried what actually
> happens when either the HDD (in bcache writeback-mode) or the SSD fails.
> Actually, btrfs should be able to handle this (tho, according to list
> reports, it doesn't handle errors very well at this point).
> 
> BTW: Unnecessary copying from SSD to HDD doesn't take place in bcache
> default mode: It only copies from HDD to SSD in writeback mode (data is
> written to the cache first, then persisted to HDD in the background).
> You can also use "write through" (data is written to SSD and persisted
> to HDD at the same time, reporting persistence to the application only
> when both copies were written) and "write around" mode (data is written
> to HDD only, and only reads are written to the SSD cache device).
> 
> If you want bcache behave as a huge IO scheduler for writes, use
> writeback mode. If you have write-intensive applications, you may want
> to choose write-around to not wear out the SSDs early. If you want
> writes to be cached for later reads, you can choose write-through mode.
> The latter two modes will ensure written data is always persisted to HDD
> with the same guaranties you had without bcache. The last mode is
> default and should not change behavior of btrfs if the HDD fails, and if
> the SSD fails bcache would simply turn off and fall back to HDD.
> 

Hello Kai,

Yeah, lots of modes. So that means, none works well for all cases?

Our server has lots of old files, on smb (various size), imap (1's 
small, 1000's large), postgresql server, virtualbox images (large), 50 or 
so snapshots and running synaptics for system upgrades is painfully slow. 

We are expecting slowness to be caused by fsyncs which appear to be much 
worse on a raid10 with snapshots. Presumably the whole thing would be 
fast enough with ssd's but that would be not very cost efficient.

All the overhead of the cache layer could be avoided if btrfs would just 
prefer to write small, hot, files to the ssd in the first place and clean 
up while balancing. A combination of 2 ssd's and 4 hdd's would be very 
nice (the mobo has 6 x sata, which is pretty common)

Moreover increasing the ssd's size in the future would then be just as 
simple as replacing a disk by a larger one.

I think many would sign up for such a low maintenance, efficient setup 
that doesn't require a PhD in IT to think out and configure.

Even at home, I would just throw in a low cost ssd next to the hdd if it 
was as simple as device add. But I wouldn't want to store my photo/video 
collection on just ssd, too expensive.

> Regards,
> Kai
> 
> Replies to list-only preferred.


--
To unsubscribe from this lis

incoming merge conflict to linux-next

2016-05-18 Thread Chris Mason
Hi Stephen,

Dave Sterba's tree in linux-next has a few btrfs patches that we're not
sending yet into Linus.  We've got an update for Josef's enospc work
that'll get sent in next week.

So he prepped a pull for me that merged up a number of his branches but
didn't include Josef's new code.  It has all been in -next for some
time, and then I put some fixes from Filipe on top.

Long story short, you'll get a merge conflict from my next branch:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git next

I've got the sample resolution in next-merge:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git next-merge

Please let us know if you have any problems.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure

2016-05-18 Thread Josef Bacik

On 05/13/2016 08:07 PM, Liu Bo wrote:

eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This adds the 'atomic_dec(&eb->io_pages)' to the readpage error handling.


Wait why can't this be done in


Signed-off-by: Liu Bo 
---
 fs/btrfs/extent_io.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 99286d1..2327200 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3069,6 +3069,30 @@ static int __do_readpage(struct extent_io_tree *tree,
*bio_flags = this_bio_flag;
} else {
SetPageError(page);
+   /*
+* Only metadata io request has this issue, for data it
+* just unlocks extent and releases page's lock.
+*
+* eb->io_pages is set in read_extent_buffer_pages().
+*
+* When this eb's page fails to add itself to bio,
+* it cannot decrease eb->io_pages via bio_endio, and
+* ends up with extent_buffer_under_io() always being
+* true, because of that, eb won't be freed and we have
+* a memory leak eventually.
+*
+* Here we still hold this page's lock, and other tasks
+* who're also reading this eb are blocked.
+*/
+   if (rw & REQ_META) {
+   struct extent_buffer *eb;
+
+   WARN_ON_ONCE(!PagePrivate(page));
+   eb = (struct extent_buffer *)page->private;
+
+   WARN_ON_ONCE(atomic_read(&eb->io_pages) < 1);
+   atomic_dec(&eb->io_pages);
+   }
unlock_extent(tree, cur, cur + iosize - 1);
}
cur = cur + iosize;



This isn't the right way to do this.  It looks like we don't propagate 
up errors from __do_readpage, which we need to in order to clean up 
properly.  So do that and then change the error stuff to decrement the 
io_pages for the remaining, you can see write_one_eb for how to deal 
with that properly.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected return value of fiemap

2016-05-18 Thread Liu Bo
On Wed, May 18, 2016 at 11:41:05AM +0200, David Sterba wrote:
> On Tue, May 17, 2016 at 05:21:48PM -0700, Liu Bo wrote:
> > btrfs's fiemap is supposed to return 0 on success and
> >  return < 0 on error, however, ret becomes 1 after looking
> > up the last file extent, and if the offset is beyond EOF,
> > we can return 1.
> > 
> > This may confuse applications using ioctl(FIEL_IOC_FIEMAP).
> > 
> > Signed-off-by: Liu Bo 
> 
> Reviewed-by: David Sterba 
> 
> > ---
> >  fs/btrfs/extent_io.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index d247fc0..16ece52 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4379,8 +4379,12 @@ int extent_fiemap(struct inode *inode, struct 
> > fiemap_extent_info *fieinfo,
> > if (ret < 0) {
> > btrfs_free_path(path);
> > return ret;
> > +   } else {
> > +   WARN_ON(!ret);
> > +   if (ret == 1)
> > +   ret = 0;
> > }
> 
> So, ret == 1 can end up here from btrfs_lookup_file_extent ->
> btrfs_search_slot(..., ins_len=0, cow=0) and the offset does not exist,
> we'll get path pointed to the slot where it would be inserted and ret is 1.

Sounds better than the commit log, would you like me to update it?

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs self tests fail on arm64, kernel 4.6

2016-05-18 Thread Steve Capper
On 18 May 2016 at 17:58, Chandan Rajendra  wrote:
> On Wednesday 18 May 2016 16:46:49 Steve Capper wrote:
>> Hello,
>> I am running into issues running the btrfs self tests from 4.6 on an
>> arm64 system with PAGE_SIZE=64K.
>> Poking around in recalculate_thresholds(), I got the following values:
>> size = 1073741824
>> ctl->unit = 4096
>> bytes_per_bg = 2147483648
>> max_bitmaps = 1
>>
>> I am not sure where the problem is as I'm not familiar with btrfs.
>>
>> A panic log can be found below.
>>
>> Is this a known problem?
>>
>> Are there any more diagnostics I could perform, that would be helpful?
>>
>
> Hi Steve,
>
> The fixes for selftests with respect to 64K block size will be posted soon. I
> am currently testing the changes that have been made.

Ahh okay, thanks Chandan.
Please cc me in the fixes and I can give them a quick test here too.

Cheers,
--
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs self tests fail on arm64, kernel 4.6

2016-05-18 Thread Chandan Rajendra
On Wednesday 18 May 2016 16:46:49 Steve Capper wrote:
> Hello,
> I am running into issues running the btrfs self tests from 4.6 on an
> arm64 system with PAGE_SIZE=64K.
> Poking around in recalculate_thresholds(), I got the following values:
> size = 1073741824
> ctl->unit = 4096
> bytes_per_bg = 2147483648
> max_bitmaps = 1
> 
> I am not sure where the problem is as I'm not familiar with btrfs.
> 
> A panic log can be found below.
> 
> Is this a known problem?
> 
> Are there any more diagnostics I could perform, that would be helpful?
> 

Hi Steve,

The fixes for selftests with respect to 64K block size will be posted soon. I
am currently testing the changes that have been made.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs self tests fail on arm64, kernel 4.6

2016-05-18 Thread Steve Capper
Hello,
I am running into issues running the btrfs self tests from 4.6 on an
arm64 system with PAGE_SIZE=64K.
Poking around in recalculate_thresholds(), I got the following values:
size = 1073741824
ctl->unit = 4096
bytes_per_bg = 2147483648
max_bitmaps = 1

I am not sure where the problem is as I'm not familiar with btrfs.

A panic log can be found below.

Is this a known problem?

Are there any more diagnostics I could perform, that would be helpful?

Cheers,
-- 
Steve

[8.151674] Btrfs loaded, debug=on, assert=on
[8.152073] BTRFS: selftest: Running btrfs free space cache tests
[8.152239] BTRFS: selftest: Running extent only tests
[8.152344] BTRFS: selftest: Running bitmap only tests
[8.152635] BTRFS: assertion failed: ctl->total_bitmaps <=
max_bitmaps, file: fs/btrfs/free-space-cache.c, line: 1646
[8.152899] [ cut here ]
[8.153004] kernel BUG at fs/btrfs/ctree.h:4320!
[8.153109] Internal error: Oops - BUG: 0 [#1] SMP
[8.153188] Modules linked in: btrfs(+) xor raid6_pq
[8.153410] CPU: 0 PID: 140 Comm: modprobe Not tainted 4.6.0 #64
[8.153531] Hardware name: linux,dummy-virt (DT)
[8.153630] task: 800020beae00 ti: 800020c3 task.ti:
800020c3
[8.153889] PC is at recalculate_thresholds+0xe0/0xfc [btrfs]
[8.154140] LR is at recalculate_thresholds+0xe0/0xfc [btrfs]
[8.154244] pc : [] lr : []
pstate: 6145
[8.154386] sp : 800020c33b20
[8.154443] x29: 800020c33b20 x28: 00c12a38
[8.154617] x27: 092efda0 x26: 8fbc45a4
[8.154787] x25: 0001 x24: 80002023
[8.154957] x23: 8f9e0048 x22: 
[8.155107] x21: 0020 x20: 8000
[8.155261] x19: 8fbc4580 x18: 0333
[8.155417] x17: 0005 x16: 0027
[8.155560] x15: 030c2a1d0f1e x14: 656c6966202c7370
[8.155707] x13: 616d7469625f7861 x12: 6d203d3c20737061
[8.155844] x11: 6d7469625f6c6174 x10: 
[8.155993] x9 : 00d8 x8 : 800020c33860
[8.156151] x7 :  x6 : 09191e28
[8.156295] x5 :  x4 : 02080020
[8.156459] x3 :  x2 : 
[8.156607] x1 : 0140 x0 : 0069
[8.156749]
[8.156797] Process modprobe (pid: 140, stack limit = 0x800020c30020)
[8.156929] Stack: (0x800020c33b20 to 0x800020c34000)
[8.157046] 3b20: 800020c33b40 00bac344
8fbc4580 8000
[8.157183] 3b40: 800020c33b70 00bb0a48
8fbc4580 8f9e
[8.157314] 3b60: 8000 8f9e0048
800020c33bd0 00be3744
[8.157453] 3b80: 800023ec0400 
80002009b000 00c71f88
[8.157582] 3ba0:  
0030 800020c33de8
[8.157719] 3bc0: 00c16118 00c724c8
800020c33c00 00c72048
[8.157848] 3be0:  00c0a120
08ba22a0 00c0a120
[8.157986] 3c00: 800020c33c20 08090cac
08ba22a0 8fbc1080
[8.158129] 3c20: 800020c33ca0 081b4580
00c12700 8fbc1000
[8.158268] 3c40: 08bda000 08bdaa98
08bda000 081b454c
[8.158410] 3c60: 00c12700 800020c33e68
00c12700 08bdaa98
[8.158566] 3c80: 08bda000 
0030 800020c33de8
[8.158770] 3ca0: 800020c33cd0 08144e60
00c12718 800020c33e68
[8.158898] 3cc0: 00c12700 08bdaa98
800020c33e20 08145330
[8.159065] 3ce0: 800020c33e68 
0005 0041c088
[8.159211] 3d00: 6000 0015
0120 0111
[8.159345] 3d20: 08771000 800020c3
800020c33d80 0064
[8.159482] 3d40: 0072 006e
08976168 003f
[8.159623] 3d60: feff 0018
800020c33dd8 08c9b018
[8.159766] 3d80: 8000 08230060
08c9b298 0a09e030
[8.159926] 3da0: 08bdab00 800020c33ec4
800020c33e78 00c724c8
[8.160070] 3dc0: 0120 080d2f28
800020c33e20 081452f8
[8.160224] 3de0: 800020c33e68 
 
[8.160370] 3e00:  
 
[8.160515] 3e20:  08093af0
 0041c088
[8.160672] 3e40:  b7ab5584
0120 01227977
[8.160820] 3e60: 0974 0974
012276b9 0a09d0f0
[8.160957] 3e80: 0a09ce89 0a956950
000e6118 001106b8

Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio

2016-05-18 Thread David Sterba
On Tue, May 17, 2016 at 10:30:47AM -0700, Liu Bo wrote:
> > If merge_bio gets rid of the BUG_ON, the calles must explicitly handle
> > 'ret < 0' unless it's provably not a problem.
> 
> If merge_bio() returns < 0, then it must be __btrfs_map_block() returns < 0,
> so even if we continue with submiting this bio, it'd fail at
> __btrfs_map_block() again because merge_bio and submit_bio are using the
> same bio.  Because of that we don't bother to submit it, instead we can
> just return in case of (ret < 0), this applies to both
> submit_extent_page() and btrfs_submit_compressed_read/write().
> 
> What do you think?

Makes sense to me. All the partial processing (allocations and bios)
need to be cleaned up, which does not seem trivial from the first look.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Reducing impact of periodic btrfs balance

2016-05-18 Thread Graham Cobb
Hi,

I have a 6TB btrfs filesystem I created last year (about 60% used).  It
is my main data disk for my home server so it gets a lot of usage
(particularly mail). I do frequent snapshots (using btrbk) so I have a
lot of snapshots (about 1500 now, although it was about double that
until I cut back the retention times recently).

A while ago I had a "no space" problem (despite fi df, fi show and fi
usage all agreeing I had over 1TB free).  But this email isn't about that.

As part of fixing that problem, I tried to do a "balance -dusage=20" on
the disk.  I was expecting it to have system impact, but it was a major
disaster.  The balance didn't just run for a long time, it locked out
all activity on the disk for hours.  A simple "touch" command to create
one file took over an hour.

More seriously, because of that, mail was being lost: all mail delivery
timed out and the timeout error was interpreted as a fatal delivery
error causing mail to be discarded, mailing lists to cancel
subscriptions, etc. The balance never completed, of course.  I
eventually got it cancelled.

I have since managed to complete the "balance -dusage=20" by running it
repeatedly with "limit=N" (for small N).  I wrote a script to automate
that process, and rerun it every week.  If anyone is interested, the
script is on GitHub: https://github.com/GrahamCobb/btrfs-balance-slowly

Out of that experience, I have a couple of thoughts about how to
possibly make balance more friendly.

1) It looks like the balance process seems to (effectively) lock all
file (extent?) creation for long periods of time.  Would it be possible
for balance to make more effort to yield locks to allow other
processes/threads to get in to continue to create/write files while it
is running?

2) btrfs scrub has options to set ionice options.  Could balance have
something similar?  Or would reducing the IO priority make things worse
because locks would be held for longer?

3) My btrfs-balance-slowly script would work better if there was a
time-based limit filter for balance, not just the current count-based
filter.  I would like to be able to say, for example, run balance for no
more than 10 minutes (completing the operation in progress, of course)
then return.

4) My btrfs-balance-slowly script would be more reliable if there was a
way to get an indication of whether there was more work to be done,
instead of parsing the output for the number of relocations.

Any thoughts about these?  Or other things I could be doing to reduce
the impact on my services?

Graham
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] Btrfs: introduce ticketed enospc infrastructure

2016-05-18 Thread Austin S. Hemmelgarn

On 2016-05-17 13:30, Josef Bacik wrote:

Our enospc flushing sucks.  It is born from a time where we were early
enospc'ing constantly because multiple threads would race in for the same
reservation and randomly starve other ones out.  So I came up with this solution
to block any other reservations from happening while one guy tried to flush
stuff to satisfy his reservation.  This gives us pretty good correctness, but
completely crap latency.

The solution I've come up with is ticketed reservations.  Basically we try to
make our reservation, and if we can't we put a ticket on a list in order and
kick off an async flusher thread.  This async flusher thread does the same old
flushing we always did, just asynchronously.  As space is freed and added back
to the space_info it checks and sees if we have any tickets that need
satisfying, and adds space to the tickets and wakes up anything we've satisfied.

Once the flusher thread stops making progress it wakes up all the current
tickets and tells them to take a hike.

There is a priority list for things that can't flush, since the async flusher
could do anything we need to avoid deadlocks.  These guys get priority for
having their reservation made, and will still do manual flushing themselves in
case the async flusher isn't running.

This patch gives us significantly better latencies.  Thanks,

Signed-off-by: Josef Bacik 
I've had this running on my test system (which is _finally_ working 
again) for about 16 hours now, nothing is breaking, and a number of the 
tests are actually completing marginally faster, so you can add:


Tested-by: Austin S. Hemmelgarn 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected return value of fiemap

2016-05-18 Thread David Sterba
On Tue, May 17, 2016 at 05:21:48PM -0700, Liu Bo wrote:
> btrfs's fiemap is supposed to return 0 on success and
>  return < 0 on error, however, ret becomes 1 after looking
> up the last file extent, and if the offset is beyond EOF,
> we can return 1.
> 
> This may confuse applications using ioctl(FIEL_IOC_FIEMAP).
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 

> ---
>  fs/btrfs/extent_io.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d247fc0..16ece52 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4379,8 +4379,12 @@ int extent_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   if (ret < 0) {
>   btrfs_free_path(path);
>   return ret;
> + } else {
> + WARN_ON(!ret);
> + if (ret == 1)
> + ret = 0;
>   }

So, ret == 1 can end up here from btrfs_lookup_file_extent ->
btrfs_search_slot(..., ins_len=0, cow=0) and the offset does not exist,
we'll get path pointed to the slot where it would be inserted and ret is 1.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Add support to clear v1 free space cache for btrfs check

2016-05-18 Thread David Sterba
On Tue, May 17, 2016 at 08:12:20PM +0200, Ivan P wrote:
> Thank you, however I can't seem to be able to compile that snapshot, I'm 
> getting
> 
> ===
> /usr/bin/install -c -m644 -d 64-btrfs-dm.rules
> /home/myuser/aur/btrfs-progs-git/pkg/btrfs-progs-git/usr/lib/udev/rules.d
> /usr/bin/install: cannot create directory ‘64-btrfs-dm.rules’: File exists
> Makefile:400: recipe for target 'install' failed
> make: *** [install] Error 1
> ==> ERROR: A failure occurred in package()
> ===
> 
> Just to make sure I wasn't screwing up somewhere, I tried the
> btrfs-progs-git AUR package and I'm getting the same thing.
> It's not only me, however: according to [1] it could be that this
> commit has introduced it: [2]
> 
> Regards,
> Ivan.
> 
> [1] https://aur.archlinux.org/packages/btrfs-progs-git/
> [2] 
> http://repo.or.cz/btrfs-progs-unstable/devel.git?a=commit;h=ebe5b1cc7885027521db3c1d16d84bd54cc1321b

I'll look into it, thanks for the report.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt

2016-05-18 Thread Duncan
Quanttek Jonas posted on Tue, 17 May 2016 10:00:41 -0400 as excerpted:

> So, the question is: How can I recover from this? How do I get my data
> back, after foolishly using "btrfsck --repair"?

First, let me note that I'm a list regular and btrfs user, not a dev, and 
that as such, much of your post was beyond my tech understanding level.  
Thus I snipped it above.  For a technical take perhaps one of the devs 
will help, and other user and general dev but not btrfs dev will likely 
post their thoughts as well.

But here I'd probably declare the filesystem beyond full repair and focus 
on getting any files off it I could using the below described method, and 
restoring what I couldn't get from the damaged filesystem from backup

It's worth pausing to note that this point the sysadmin's rule of 
backups, which in simplest form simply states that if you don't have at 
least one level of backup, you are by choosing not to do that backup, 
defining your data as worth less than the trouble and resources necessary 
to do that backup.  Thus, by definition, you /always/ save what was of 
most importance to you, either the data, if you decided it was worth 
making that backup, or if by your actions you defined the time and 
resources that would otherwise be spent in making that backup as more 
valuable than the data, then you saved your valuable time and resource, 
even if you lost what you had defined to be of lower value, that being 
your data.

And that rule applies in normal conditions, using fully mature and long-
term stable filesystems such as ext3/4, xfs, or (the one I still use on 
my spinning rust, I only use btrfs on my ssds) reiserfs.  Btrfs, while 
stabilizing, is not yet fully stable and mature, definitely not to the 
level of the above filesystems, so the rule applies even more strongly 
there (a less simple form of the rule takes into account varying levels 
of risk and varying data value, along with multiple levels of backup, 100 
levels of backup with some offsite in other locations may not be enough 
for extremely high value data).

So I'll assume that much like me you keep backups where the data is 
valuable enough to warrant it, but you may not always have /current/ 
backups, because the value of the data in the delta between the last 
backup and current simply doesn't warrant the hassle of refreshing the 
backup, yet, given the limited risk of /future/ loss.  However, once the 
potential loss happens, the question changes.  Now it's a matter of 
whether the hassle of further recovery efforts is justified, vs. the 
known loss of the data in that delta between the last backup and the last 
"good" state before things started going bad.

As it happens, btrfs has this really useful tool called btrfs restore, 
that can often help you recover your data at very close to the last good 
state, or at least to a state beyond that of your last backup.  It has 
certainly helped me recover this from-last-backup-delta data a couple 
times here, allowing me to use it instead of having to fall back to the 
older and more stale backup.  One nice thing about btrfs restore is that 
it's read-only with respect to the damaged filesystem, so you can safely 
use it on a filesystem to restore what you can, before trying more 
dangerous things that might cause even more damage.  Since it's a purely 
read-only operation, it won't cause further damage. =:^)

There's a page on the wiki that describes this process in more detail, 
but be aware, once you get beyond where automatic mode can help and you 
have to try manual, it gets quite technical, and a lot of folks find they 
need some additional help from a human, beyond the wiki.

Before I link the wiki page, here's an introduction...

Btrfs restore works on the /unmounted/ filesystem, writing any files it 
recovers to some other filesystem, which of course means that you need 
enough space on that other filesystem to store whatever you wish to 
recover.  By default it will write them as root, using root's umask, with 
current timestamps, and will skip writing symlinks or restoring extended 
attributes, but there are options that will restore ownership/perms/
timestamps, extended attributes, and symlinks, if desired.

Normally, btrfs restore will use a mechanism similar to the recovery 
mount option to try to find a copy of the root tree of the filesystem 
within a few commits (which are 30-seconds apart by default) of what the 
superblocks say is current.

If that works, great.  If not, you have to use a much more manual mode, 
telling btrfs restore what root to try, while using btrfs-find-root to 
find older roots (by generation, aka transid), then feeding the addresses 
found to btrfs restore -t, first with the -l option to list the other 
trees available from that root, then if it finds all the critical trees, 
using it with --dry-run to see if it seems to find most of the expected 
files, before trying the real restore if things look good.

With that, here's the wi