Loan Application

2014-03-05 Thread Loans
Loan Application at a low rate of 0.5% send your Name,Amount,Phone and country 
to standar...@56788.com

Note: $5,000.00 USD minimum and $100,000,000 Maximum.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: add readahead for send_write

2014-03-05 Thread David Sterba
On Wed, Mar 05, 2014 at 10:07:35AM +0800, Liu Bo wrote:
> Btrfs send reads data from disk and then writes to a stream via pipe or
> a file via flush.
> 
> Currently we're going to read each page a time, so every page results
> in a disk read, which is not friendly to disks, esp. HDD.  Given that,
> the performance can be gained by adding readahead for those pages.
> 
> Here is a quick test:
> $ btrfs subvolume create send
> $ xfs_io -f -c "pwrite 0 1G" send/foobar
> $ btrfs subvolume snap -r send ro
> $ time "btrfs send ro -f /dev/null"
> 
>w/o w
> real1m37.527s   0m9.097s
> user0m0.122s0m0.086s
> sys 0m53.191s   0m12.857s
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 5/5] Btrfs: fix broken free space cache after the system crashed

2014-03-05 Thread Josef Bacik
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/05/2014 02:02 AM, Miao Xie wrote:
> On Tue, 4 Mar 2014 10:19:20 -0500, Josef Bacik wrote: On 01/15/2014
> 07:00 AM, Miao Xie wrote:
 When we mounted the filesystem after the crash, we got the 
 following message: BTRFS error (device xxx): block group
 4315938816 has wrong amount of free space BTRFS error (device
 xxx): failed to load free space cache for block group
 4315938816
 
 It is because we didn't update the metadata of the allocated
 space until the file data was written into the disk. During
 this time, there was no information about the allocated
 spaces in either the extent tree nor the free space cache.
 when we wrote out the free space cache at this time, those
 spaces were lost.
 
 In ordered to fix this problem, I use a state tree for every
 block group to record those allocated spaces. We record the
 information when they are allocated, and clean up the
 information after the metadata update. Besides that, we also
 introduce a read-write semaphore to avoid the race between
 the allocation and the free space cache write out.
 
 Only data block groups had this problem, so the above change
 is just for data space allocation.
 
> 
> I didn't like this idea at first but I've come around to it.  The
> only thing is the data_rwsem thing, we don't need it as we are
> protected by the transaction being blocked when we do writeout, so
> nobody can data allocations during this time.  Thanks,
> 
>> But this protection was removed by the patch
> 
>> Commit ID: 00361589d2eebd90fca022148c763e40d3e90871
> 

Excellent point, then I'm good overall, I'll pull it in once I finish
qgroups.  Thanks,

Josef
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTFzyJAAoJEANb+wAKly3BFHMP+wctKXDuKxP4kI27bfRlfJmV
QB5i6qXr4WvQQw04FF3BtxCW9Z36l/1cFLFK0OaQ8Q54+4s0FUXNAynXFkf41TJz
XWhkjTq0hJmzM95tB2+B2HwtEDI6m0l6fnOhsyiiSHF8V1g7V9VRwkeqpB9ZGu4/
ZryAUkSJGXFDvtFgTmRvOAclwD0R6oCXCw3f/AgtZQL1H9ucaogI2XKmcsFC
jOIbn7L+gtFmTAJdvSFlRQAYaV2g69rf0Q5bVHZMAaLKN5rMIXBC2xFOCUxwg3si
ZyOl72ojaGbCt7MI/s2X0uZ5d+xWYjaG+tF2K+XLjFoIUkny3RndFfU6pKk54gHP
9O/GXiilq2t3qZUn3zMuLXIG+cCaYTt3QsHnNyJisqOVLL95LbIvsINm0Xgu6bF/
cJ0acApJr6y2EdEbfVU/mrB06K81bxnZez8rOgyFXKD4yWoTMG23xttKZHG5LbdT
xkwrJWPeJ77mI0+V/MPWePgomcH5cDs0IckSOXXwy8gZF4HJzVbXklcq22BfgIQA
SLVgJYxqIlzIHYHZPisWJHUwFXe4C58YCDP2w4FI32g5LZuzhlus/wFI9Tg1543w
sYf23ZYxlDUYAiD+zb+UipEA4CLtdZgZGonoG9lxK9JkgU2VHzXuTgty2+B0F9kt
l3B5jpy1H2cP2TUskkbZ
=E0en
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


patch for ioctl.h

2014-03-05 Thread Arvin Schnell

Hi,

u64 is not known when just including btrfs/ioctl.h. I assume it
should be __u64 as everywhere else in the file.

Regards,
  Arvin

-- 
Arvin Schnell, 
Senior Software Engineer, Research & Development
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
diff --git a/ioctl.h b/ioctl.h
index a589cd7..7577862 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -539,7 +539,7 @@ struct btrfs_ioctl_clone_range_args {
    struct btrfs_ioctl_search_args)
 #define BTRFS_IOC_INO_LOOKUP _IOWR(BTRFS_IOCTL_MAGIC, 18, \
    struct btrfs_ioctl_ino_lookup_args)
-#define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64)
+#define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, __u64)
 #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \
 struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \


[Repost] Is BTRFS "bedup" maintained ?

2014-03-05 Thread Swâmi Petaramesh
Hello,

(Not having received a single answer, I repost this...)

I tried to use "bedup" BTRFS offline deduplication tool on several BTRFS 
machines, with mixed results :

1/ Crashes with messages stating "programming error", sqlite objects must be 
used inside of the thread that created them, blah-blah...

2/ Crashes stating that some file/objects miss a "live" attribute (?)

3/ bedup eating all my system RAM, then swap over 7GB until the system trashes 
itself.

Wel, besides that and restarting over and over, I could deduplicate some files 
and gain some noticeable space, but it looks extremely bugged and unreliable.

I tried to email the maintainer with details, twice, thru 2 different channels, 
got no anwser...

And it looks like the GIT has not evolved since August, 2013.

So my questions are : 

- Is "bedup" maintained or abandoned ?
- Is it supposed to be usable ?
- Should I use it ?
- Is there a risk that it causes data loss or corruption ?

TIA.

Kind regards.

-- 
Swâmi Petaramesh  http://petaramesh.org PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Repost] Is BTRFS "bedup" maintained ?

2014-03-05 Thread Marc MERLIN
On Wed, Mar 05, 2014 at 06:24:40PM +0100, Swâmi Petaramesh wrote:
> Hello,
> 
> (Not having received a single answer, I repost this...)

I got your post, and posted myself about bedup not working at all for me,
and got no answer either.

As far as I can tell, it's entirely unmaintained and was likely just a proof
of concept until the kernel can do it itself and that's not entirely
finished from what I understand.

It's a bit disappointing, but hopefully it'll get fixed eventually.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Repost] Is BTRFS "bedup" maintained ?

2014-03-05 Thread cwillu
Bedup was/is a third-party project, not sure if its developer follows this list.


Might be worth filing a bug or otherwise poking the author on
https://github.com/g2p/bedup

On Wed, Mar 5, 2014 at 2:43 PM, Marc MERLIN  wrote:
> On Wed, Mar 05, 2014 at 06:24:40PM +0100, Swāmi Petaramesh wrote:
>> Hello,
>>
>> (Not having received a single answer, I repost this...)
>
> I got your post, and posted myself about bedup not working at all for me,
> and got no answer either.
>
> As far as I can tell, it's entirely unmaintained and was likely just a proof
> of concept until the kernel can do it itself and that's not entirely
> finished from what I understand.
>
> It's a bit disappointing, but hopefully it'll get fixed eventually.
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC errors during raid1 rebalance

2014-03-05 Thread Michael Russo
Chris Murphy  colorremedies.com> writes:

> You could also try a full defragment by specifying -r on the mount point
> with a small -t value to effectively cause everything to be subject
> to defragmenting. If this still doesn't permit soft rebalance, then maybe
> filefrag can find files that have more than 1 extent and just copy 
> them (make duplicates, delete the original). Any copy will be
> allocated into chunks with the new profile.

I would think so too.  But it doesn't seem to be happening. 
Here is an example with one file:

root@ossy:/mymedia# filefrag output.wav
output.wav: 2 extents found
root@ossy:/mymedia# /usr/src/btrfs-progs/btrfs fi de -t 1 /mymedia/output.wav 
root@ossy:/mymedia# filefrag output.wav
output.wav: 2 extents found

btrfs does not defrag the file. And copying the file usually
doesn't defrag it either:

root@ossy:/mymedia# cp output.wav output.wav.bak
root@ossy:/mymedia# filefrag output.wav.bak
output.wav.bak: 2 extents found

I even tried copying a large file to another filesystem (/dev/shm),
 removing the original, and copying it back, and more often than not 
it still had more than 1 extent. 

If I copy each file out to another filesystem and then back, will btrfs 
not use any of the space on the "single" and just re-allocate space 
on the RAID1 like I want it to?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC errors during raid1 rebalance

2014-03-05 Thread Michael Russo
Chris Murphy  colorremedies.com> writes:

> Did you do a defrag and balance after ext4>btrfs conversion, 
> but before data/metadata profile conversion?

No I didn't, as I thought it was only optional and didn't realize 
it might later affect my ability to change profiles. 




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: cleanup device stat usage prompt

2014-03-05 Thread Gui Hecheng
1. use usage() to replace the fprintf()
2. use check_argc_exact() to replace "argc != ..."

Signed-off-by: Gui Hecheng 
---
 cmds-device.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/cmds-device.c b/cmds-device.c
index ad59600..5009d9a 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -324,18 +324,13 @@ static int cmd_dev_stats(int argc, char **argv)
break;
case '?':
default:
-   fprintf(stderr, "ERROR: device stat args invalid.\n"
-   " device stat [-z] |\n"
-   " -z  to reset stats after reading.\n");
-   return 1;
+   usage(cmd_dev_stats_usage);
}
}
 
-   if (optind + 1 != argc) {
-   fprintf(stderr, "ERROR: device stat needs path|device as single"
-   " argument\n");
-   return 1;
-   }
+   argc = argc - optind;
+   if (check_argc_exact(argc, 1))
+   usage(cmd_dev_stats_usage);
 
dev_path = argv[optind];
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: cleanup dead return after usage() for fi-disk_usage

2014-03-05 Thread Gui Hecheng
The usage() calls exit() internally, so remove the return after it.

Signed-off-by: Gui Hecheng 
---
 cmds-fi-disk_usage.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
index e4eb72b..a3b06be 100644
--- a/cmds-fi-disk_usage.c
+++ b/cmds-fi-disk_usage.c
@@ -494,10 +494,8 @@ int cmd_filesystem_df(int argc, char **argv)
}
}
 
-   if (check_argc_min(argc - optind, 1)) {
+   if (check_argc_min(argc - optind, 1))
usage(cmd_filesystem_df_usage);
-   return 21;
-   }
 
for (i = optind; i < argc ; i++) {
int r, fd;
@@ -914,10 +912,8 @@ int cmd_filesystem_disk_usage(int argc, char **argv)
}
}
 
-   if (check_argc_min(argc - optind, 1)) {
+   if (check_argc_min(argc - optind, 1))
usage(cmd_filesystem_disk_usage_usage);
-   return 21;
-   }
 
for (i = optind; i < argc ; i++) {
int r, fd;
@@ -1050,10 +1046,8 @@ int cmd_device_disk_usage(int argc, char **argv)
}
}
 
-   if (check_argc_min(argc - optind, 1)) {
+   if (check_argc_min(argc - optind, 1))
usage(cmd_device_disk_usage_usage);
-   return 21;
-   }
 
for (i = optind; i < argc ; i++) {
int r, fd;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: make the device scan logic more clear

2014-03-05 Thread Gui Hecheng
1. Use long option to replace the original strcmp() to parse
the "--all-devices".
2. the "int ret" is defined in 2 places, just define it once
and make the return pattern into "goto + single return".

This does not change the actual scan procedure and return values.
Just make it clear, the original seems a little confusing.

Signed-off-by: Gui Hecheng 
---
 cmds-device.c | 61 ++-
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/cmds-device.c b/cmds-device.c
index 5009d9a..d79732f 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -188,51 +188,67 @@ static int cmd_rm_dev(int argc, char **argv)
 }
 
 static const char * const cmd_scan_dev_usage[] = {
-   "btrfs device scan [<--all-devices>| [...]]",
+   "btrfs device scan [options] [ [...]]",
"Scan devices for a btrfs filesystem",
+   "-d|--all-devicesscan all devices under /dev",
NULL
 };
 
 static int cmd_scan_dev(int argc, char **argv)
 {
-   int i, fd, e;
-   int where = BTRFS_SCAN_LBLKID;
-   int devstart = 1;
+   int i, fd, e;
+   int where = BTRFS_SCAN_LBLKID;
+   int devstart = 1;
+   int all = 0;
+   int ret = 0;
 
-   if( argc > 1 && !strcmp(argv[1],"--all-devices")){
-   if (check_argc_max(argc, 2))
+   optind = 1;
+   while (1) {
+   int long_index;
+   static struct option long_options[] = {
+   { "all-devices", no_argument, NULL, 'd' },
+   { 0, 0, 0, 0 },
+   };
+   int c = getopt_long(argc, argv, "d", long_options,
+   &long_index);
+   if (c < 0)
+   break;
+   switch (c) {
+   case 'd':
+   where = BTRFS_SCAN_DEV;
+   all = 1;
+   break;
+   default:
usage(cmd_scan_dev_usage);
-
-   where = BTRFS_SCAN_DEV;
-   devstart += 1;
+   }
}
 
-   if(argc<=devstart){
-   int ret;
+   if (all && check_argc_max(argc, 2))
+   usage(cmd_scan_dev_usage);
+
+   if (all || argc == 1) {
printf("Scanning for Btrfs filesystems\n");
ret = scan_for_btrfs(where, BTRFS_UPDATE_KERNEL);
-   if (ret){
+   if (ret)
fprintf(stderr, "ERROR: error %d while scanning\n", 
ret);
-   return 1;
-   }
-   return 0;
+   goto out;
}
 
fd = open("/dev/btrfs-control", O_RDWR);
if (fd < 0) {
perror("failed to open /dev/btrfs-control");
-   return 1;
+   ret = 1;
+   goto out;
}
 
for( i = devstart ; i < argc ; i++ ){
struct btrfs_ioctl_vol_args args;
-   int ret;
 
if (!is_block_device(argv[i])) {
fprintf(stderr,
"ERROR: %s is not a block device\n", argv[i]);
-   close(fd);
-   return 1;
+   ret = 1;
+   goto close_out;
}
printf("Scanning for Btrfs filesystems in '%s'\n", argv[i]);
 
@@ -246,15 +262,16 @@ static int cmd_scan_dev(int argc, char **argv)
e = errno;
 
if( ret < 0 ){
-   close(fd);
fprintf(stderr, "ERROR: unable to scan the device '%s' 
- %s\n",
argv[i], strerror(e));
-   return 1;
+   goto close_out;
}
}
 
+close_out:
close(fd);
-   return 0;
+out:
+   return !!ret;
 }
 
 static const char * const cmd_ready_dev_usage[] = {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: Add ftrace for btrfs_workqueue

2014-03-05 Thread quwen...@cn.fujitsu.com
Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/async-thread.c  |  7 
 include/trace/events/btrfs.h | 82 
 2 files changed, 89 insertions(+)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index d8c07e5..00623dd 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include "async-thread.h"
+#include "ctree.h"
 
 #define WORK_DONE_BIT 0
 #define WORK_ORDER_DONE_BIT 1
@@ -210,6 +211,7 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 */
if (test_and_set_bit(WORK_ORDER_DONE_BIT, &work->flags))
break;
+   trace_btrfs_ordered_sched(work);
spin_unlock_irqrestore(lock, flags);
work->ordered_func(work);
 
@@ -223,6 +225,7 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 * with the lock held though
 */
work->ordered_free(work);
+   trace_btrfs_all_work_done(work);
}
spin_unlock_irqrestore(lock, flags);
 }
@@ -246,12 +249,15 @@ static void normal_work_helper(struct work_struct *arg)
need_order = 1;
wq = work->wq;
 
+   trace_btrfs_work_sched(work);
thresh_exec_hook(wq);
work->func(work);
if (need_order) {
set_bit(WORK_DONE_BIT, &work->flags);
run_ordered_work(wq);
}
+   if (!need_order)
+   trace_btrfs_all_work_done(work);
 }
 
 void btrfs_init_work(struct btrfs_work *work,
@@ -280,6 +286,7 @@ static inline void __btrfs_queue_work(struct 
__btrfs_workqueue *wq,
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);
+   trace_btrfs_work_queued(work);
 }
 
 void btrfs_queue_work(struct btrfs_workqueue *wq,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 3176cdc..c346919 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -21,6 +21,7 @@ struct btrfs_block_group_cache;
 struct btrfs_free_cluster;
 struct map_lookup;
 struct extent_buffer;
+struct btrfs_work;
 
 #define show_ref_type(type)\
__print_symbolic(type,  \
@@ -982,6 +983,87 @@ TRACE_EVENT(free_extent_state,
  (void *)__entry->ip)
 );
 
+DECLARE_EVENT_CLASS(btrfs__work,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work),
+
+   TP_STRUCT__entry(
+   __field(void *, work)
+   __field(void *, wq  )
+   __field(void *, func)
+   __field(void *, ordered_func)
+   __field(void *, ordered_free)
+   ),
+
+   TP_fast_assign(
+   __entry->work   = work;
+   __entry->wq = work->wq;
+   __entry->func   = work->func;
+   __entry->ordered_func   = work->ordered_func;
+   __entry->ordered_free   = work->ordered_free;
+   ),
+
+   TP_printk("work=%p, wq=%p, func=%p, ordered_func=%p, ordered_free=%p",
+ __entry->work, __entry->wq, __entry->func,
+ __entry->ordered_func, __entry->ordered_free)
+);
+
+/* For situiations that the work is freed */
+DECLARE_EVENT_CLASS(btrfs__work__done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work),
+
+   TP_STRUCT__entry(
+   __field(void *, work)
+   ),
+
+   TP_fast_assign(
+   __entry->work   = work;
+   ),
+
+   TP_printk("work->%p", __entry->work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_work_queued,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_work_sched,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_normal_work_done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work__done, btrfs_all_work_done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_ordered_sched,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Cleanup the btrfs_workqueue related function type

2014-03-05 Thread quwen...@cn.fujitsu.com
The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/async-thread.c |  6 +++---
 fs/btrfs/async-thread.h | 20 +++-
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index a709585..d8c07e5 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -255,9 +255,9 @@ static void normal_work_helper(struct work_struct *arg)
 }
 
 void btrfs_init_work(struct btrfs_work *work,
-void (*func)(struct btrfs_work *),
-void (*ordered_func)(struct btrfs_work *),
-void (*ordered_free)(struct btrfs_work *))
+btrfs_func_t func,
+btrfs_func_t ordered_func,
+btrfs_func_t ordered_free)
 {
work->func = func;
work->ordered_func = ordered_func;
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 08d7174..0a891cd 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -23,11 +23,13 @@
 struct btrfs_workqueue;
 /* Internal use only */
 struct __btrfs_workqueue;
+struct btrfs_work;
+typedef void (*btrfs_func_t)(struct btrfs_work *arg);
 
 struct btrfs_work {
-   void (*func)(struct btrfs_work *arg);
-   void (*ordered_func)(struct btrfs_work *arg);
-   void (*ordered_free)(struct btrfs_work *arg);
+   btrfs_func_t func;
+   btrfs_func_t ordered_func;
+   btrfs_func_t ordered_free;
 
/* Don't touch things below */
struct work_struct normal_work;
@@ -37,13 +39,13 @@ struct btrfs_work {
 };
 
 struct btrfs_workqueue *btrfs_alloc_workqueue(char *name,
-int flags,
-int max_active,
-int thresh);
+ int flags,
+ int max_active,
+ int thresh);
 void btrfs_init_work(struct btrfs_work *work,
-void (*func)(struct btrfs_work *),
-void (*ordered_func)(struct btrfs_work *),
-void (*ordered_free)(struct btrfs_work *));
+btrfs_func_t func,
+btrfs_func_t ordered_func,
+btrfs_func_t ordered_free);
 void btrfs_queue_work(struct btrfs_workqueue *wq,
  struct btrfs_work *work);
 void btrfs_destroy_workqueue(struct btrfs_workqueue *wq);
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: Add ftrace for btrfs_workqueue

2014-03-05 Thread quwen...@cn.fujitsu.com
Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/async-thread.c  |  7 
 include/trace/events/btrfs.h | 82 
 2 files changed, 89 insertions(+)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index d8c07e5..00623dd 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include "async-thread.h"
+#include "ctree.h"
 
 #define WORK_DONE_BIT 0
 #define WORK_ORDER_DONE_BIT 1
@@ -210,6 +211,7 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 */
if (test_and_set_bit(WORK_ORDER_DONE_BIT, &work->flags))
break;
+   trace_btrfs_ordered_sched(work);
spin_unlock_irqrestore(lock, flags);
work->ordered_func(work);
 
@@ -223,6 +225,7 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 * with the lock held though
 */
work->ordered_free(work);
+   trace_btrfs_all_work_done(work);
}
spin_unlock_irqrestore(lock, flags);
 }
@@ -246,12 +249,15 @@ static void normal_work_helper(struct work_struct *arg)
need_order = 1;
wq = work->wq;
 
+   trace_btrfs_work_sched(work);
thresh_exec_hook(wq);
work->func(work);
if (need_order) {
set_bit(WORK_DONE_BIT, &work->flags);
run_ordered_work(wq);
}
+   if (!need_order)
+   trace_btrfs_all_work_done(work);
 }
 
 void btrfs_init_work(struct btrfs_work *work,
@@ -280,6 +286,7 @@ static inline void __btrfs_queue_work(struct 
__btrfs_workqueue *wq,
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);
+   trace_btrfs_work_queued(work);
 }
 
 void btrfs_queue_work(struct btrfs_workqueue *wq,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 3176cdc..c346919 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -21,6 +21,7 @@ struct btrfs_block_group_cache;
 struct btrfs_free_cluster;
 struct map_lookup;
 struct extent_buffer;
+struct btrfs_work;
 
 #define show_ref_type(type)\
__print_symbolic(type,  \
@@ -982,6 +983,87 @@ TRACE_EVENT(free_extent_state,
  (void *)__entry->ip)
 );
 
+DECLARE_EVENT_CLASS(btrfs__work,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work),
+
+   TP_STRUCT__entry(
+   __field(void *, work)
+   __field(void *, wq  )
+   __field(void *, func)
+   __field(void *, ordered_func)
+   __field(void *, ordered_free)
+   ),
+
+   TP_fast_assign(
+   __entry->work   = work;
+   __entry->wq = work->wq;
+   __entry->func   = work->func;
+   __entry->ordered_func   = work->ordered_func;
+   __entry->ordered_free   = work->ordered_free;
+   ),
+
+   TP_printk("work=%p, wq=%p, func=%p, ordered_func=%p, ordered_free=%p",
+ __entry->work, __entry->wq, __entry->func,
+ __entry->ordered_func, __entry->ordered_free)
+);
+
+/* For situiations that the work is freed */
+DECLARE_EVENT_CLASS(btrfs__work__done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work),
+
+   TP_STRUCT__entry(
+   __field(void *, work)
+   ),
+
+   TP_fast_assign(
+   __entry->work   = work;
+   ),
+
+   TP_printk("work->%p", __entry->work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_work_queued,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_work_sched,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_normal_work_done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work__done, btrfs_all_work_done,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+DEFINE_EVENT(btrfs__work, btrfs_ordered_sched,
+
+   TP_PROTO(struct btrfs_work *work),
+
+   TP_ARGS(work)
+);
+
+
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Cleanup the btrfs_workqueue related function type

2014-03-05 Thread quwen...@cn.fujitsu.com
The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/async-thread.c |  6 +++---
 fs/btrfs/async-thread.h | 20 +++-
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index a709585..d8c07e5 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -255,9 +255,9 @@ static void normal_work_helper(struct work_struct *arg)
 }
 
 void btrfs_init_work(struct btrfs_work *work,
-void (*func)(struct btrfs_work *),
-void (*ordered_func)(struct btrfs_work *),
-void (*ordered_free)(struct btrfs_work *))
+btrfs_func_t func,
+btrfs_func_t ordered_func,
+btrfs_func_t ordered_free)
 {
work->func = func;
work->ordered_func = ordered_func;
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 08d7174..0a891cd 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -23,11 +23,13 @@
 struct btrfs_workqueue;
 /* Internal use only */
 struct __btrfs_workqueue;
+struct btrfs_work;
+typedef void (*btrfs_func_t)(struct btrfs_work *arg);
 
 struct btrfs_work {
-   void (*func)(struct btrfs_work *arg);
-   void (*ordered_func)(struct btrfs_work *arg);
-   void (*ordered_free)(struct btrfs_work *arg);
+   btrfs_func_t func;
+   btrfs_func_t ordered_func;
+   btrfs_func_t ordered_free;
 
/* Don't touch things below */
struct work_struct normal_work;
@@ -37,13 +39,13 @@ struct btrfs_work {
 };
 
 struct btrfs_workqueue *btrfs_alloc_workqueue(char *name,
-int flags,
-int max_active,
-int thresh);
+ int flags,
+ int max_active,
+ int thresh);
 void btrfs_init_work(struct btrfs_work *work,
-void (*func)(struct btrfs_work *),
-void (*ordered_func)(struct btrfs_work *),
-void (*ordered_free)(struct btrfs_work *));
+btrfs_func_t func,
+btrfs_func_t ordered_func,
+btrfs_func_t ordered_free);
 void btrfs_queue_work(struct btrfs_workqueue *wq,
  struct btrfs_work *work);
 void btrfs_destroy_workqueue(struct btrfs_workqueue *wq);
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume

2014-03-05 Thread Miao Xie
If the snapshot creation happened after the nocow write but before the dirty
data flush, we would fail to flush the dirty data because of no space.

So we must keep track of when those nocow write operations start and when they
end, if there are nocow writers, the snapshot creators must wait. In order
to implement this function, I introduce btrfs_{start, end}_nocow_write(),
which is similar to mnt_{want,drop}_write().

These two functions are only used for nocow file write operations.

Signed-off-by: Miao Xie 
---
This patch is against the patchset:
[PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by 
it
[PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow()
[PATCH 3/3] Btrfs: fix preallocate vs double nocow write
---
 fs/btrfs/ctree.h   | 10 ++
 fs/btrfs/disk-io.c | 42 +-
 fs/btrfs/extent-tree.c | 35 +++
 fs/btrfs/file.c| 21 +
 fs/btrfs/ioctl.c   | 35 ++-
 5 files changed, 133 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 277f627..7e7cade 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1682,6 +1682,11 @@ struct btrfs_fs_info {
unsigned int update_uuid_tree_gen:1;
 };
 
+struct btrfs_subvolume_writers {
+   struct percpu_counter   counter;
+   wait_queue_head_t   wait;
+};
+
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
  * and for the extent tree extent_root root.
@@ -1826,6 +1831,8 @@ struct btrfs_root {
 * manipulation with the read-only status via SUBVOL_SETFLAGS
 */
int send_in_progress;
+   struct btrfs_subvolume_writers *subv_writers;
+   atomic_t will_be_snapshoted;
 };
 
 struct btrfs_ioctl_defrag_range_args {
@@ -3350,6 +3357,9 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
 int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info);
 int __get_raid_index(u64 flags);
+
+int btrfs_start_nocow_write(struct btrfs_root *root);
+void btrfs_end_nocow_write(struct btrfs_root *root);
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index de6a48f..b8f6cbf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1152,6 +1152,32 @@ void clean_tree_block(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
}
 }
 
+static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
+{
+   struct btrfs_subvolume_writers *writers;
+   int ret;
+
+   writers = kmalloc(sizeof(*writers), GFP_NOFS);
+   if (!writers)
+   return ERR_PTR(-ENOMEM);
+
+   ret = percpu_counter_init(&writers->counter, 0);
+   if (ret < 0) {
+   kfree(writers);
+   return ERR_PTR(ret);
+   }
+
+   init_waitqueue_head(&writers->wait);
+   return writers;
+}
+
+static void
+btrfs_free_subvolume_writers(struct btrfs_subvolume_writers *writers)
+{
+   percpu_counter_destroy(&writers->counter);
+   kfree(writers);
+}
+
 static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize,
 u32 stripesize, struct btrfs_root *root,
 struct btrfs_fs_info *fs_info,
@@ -1206,6 +1232,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
atomic_set(&root->log_batch, 0);
atomic_set(&root->orphan_inodes, 0);
atomic_set(&root->refs, 1);
+   atomic_set(&root->will_be_snapshoted, 0);
root->log_transid = 0;
root->last_log_commit = 0;
if (fs_info)
@@ -1501,6 +1528,7 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_root 
*tree_root,
 int btrfs_init_fs_root(struct btrfs_root *root)
 {
int ret;
+   struct btrfs_subvolume_writers *writers;
 
root->free_ino_ctl = kzalloc(sizeof(*root->free_ino_ctl), GFP_NOFS);
root->free_ino_pinned = kzalloc(sizeof(*root->free_ino_pinned),
@@ -1510,6 +1538,13 @@ int btrfs_init_fs_root(struct btrfs_root *root)
goto fail;
}
 
+   writers = btrfs_alloc_subvolume_writers();
+   if (IS_ERR(writers)) {
+   ret = PTR_ERR(writers);
+   goto fail;
+   }
+   root->subv_writers = writers;
+
btrfs_init_free_ino_ctl(root);
mutex_init(&root->fs_commit_mutex);
spin_lock_init(&root->cache_lock);
@@ -1517,8 +1552,11 @@ int btrfs_init_fs_root(struct btrfs_root *root)
 
ret = get_anon_bdev(&root->anon_dev);
if (ret)
-   goto fail;
+   goto free_writers;
return 0;
+
+free_writers:
+   btrfs_free_subvolume_writers(root->subv_writers);
 fail:
kfree(root->free_ino_ctl);
kfree(ro

[PATCH V2 04/10] Btrfs: remove the unnecessary flush when preparing the pages

2014-03-05 Thread Miao Xie
Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- None.
---
 fs/btrfs/file.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0165b86..76b725e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1346,11 +1346,11 @@ lock_and_cleanup_extent_if_need(struct inode *inode, 
struct page **pages,
struct btrfs_ordered_extent *ordered;
lock_extent_bits(&BTRFS_I(inode)->io_tree,
 start_pos, last_pos, 0, cached_state);
-   ordered = btrfs_lookup_first_ordered_extent(inode, last_pos);
+   ordered = btrfs_lookup_ordered_range(inode, start_pos,
+last_pos - start_pos + 1);
if (ordered &&
ordered->file_offset + ordered->len > start_pos &&
ordered->file_offset <= last_pos) {
-   btrfs_put_ordered_extent(ordered);
unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 start_pos, last_pos,
 cached_state, GFP_NOFS);
@@ -1358,12 +1358,9 @@ lock_and_cleanup_extent_if_need(struct inode *inode, 
struct page **pages,
unlock_page(pages[i]);
page_cache_release(pages[i]);
}
-   ret = btrfs_wait_ordered_range(inode, start_pos,
-   last_pos - start_pos + 1);
-   if (ret)
-   return ret;
-   else
-   return -EAGAIN;
+   btrfs_start_ordered_extent(inode, ordered, 1);
+   btrfs_put_ordered_extent(ordered);
+   return -EAGAIN;
}
if (ordered)
btrfs_put_ordered_extent(ordered);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 08/10] Btrfs: split the global ordered extents mutex

2014-03-05 Thread Miao Xie
When we create a snapshot, we just need wait the ordered extents in
the source fs/file root, but because we use the global mutex to protect
this ordered extents list of the source fs/file root to avoid accessing
a empty list, if someone got the mutex to access the ordered extents list
of the other fs/file root, we had to wait.

This patch splits the above global mutex, now every fs/file root has
its own mutex to protect its own list.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- New patch.
---
 fs/btrfs/ctree.h|  2 ++
 fs/btrfs/disk-io.c  |  1 +
 fs/btrfs/ordered-data.c | 17 -
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2f3949b..7bae97e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1806,6 +1806,8 @@ struct btrfs_root {
struct list_head delalloc_inodes;
struct list_head delalloc_root;
u64 nr_delalloc_inodes;
+
+   struct mutex ordered_extent_mutex;
/*
 * this is used by the balancing code to wait for all the pending
 * ordered extents
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index de6a48f..65fe26e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1197,6 +1197,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
spin_lock_init(&root->log_extents_lock[1]);
mutex_init(&root->objectid_mutex);
mutex_init(&root->log_mutex);
+   mutex_init(&root->ordered_extent_mutex);
init_waitqueue_head(&root->log_writer_wait);
init_waitqueue_head(&root->log_commit_wait[0]);
init_waitqueue_head(&root->log_commit_wait[1]);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 067e129..2849485 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -595,7 +595,7 @@ static void btrfs_run_ordered_extent_work(struct btrfs_work 
*work)
  * wait for all the ordered extents in a root.  This is done when balancing
  * space between drives.
  */
-static int __btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
+int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
 {
struct list_head splice, works;
struct btrfs_ordered_extent *ordered, *next;
@@ -604,6 +604,7 @@ static int __btrfs_wait_ordered_extents(struct btrfs_root 
*root, int nr)
INIT_LIST_HEAD(&splice);
INIT_LIST_HEAD(&works);
 
+   mutex_lock(&root->ordered_extent_mutex);
spin_lock(&root->ordered_extent_lock);
list_splice_init(&root->ordered_extents, &splice);
while (!list_empty(&splice) && nr) {
@@ -634,17 +635,7 @@ static int __btrfs_wait_ordered_extents(struct btrfs_root 
*root, int nr)
btrfs_put_ordered_extent(ordered);
cond_resched();
}
-
-   return count;
-}
-
-int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
-{
-   int count;
-
-   mutex_lock(&root->fs_info->ordered_operations_mutex);
-   count = __btrfs_wait_ordered_extents(root, nr);
-   mutex_unlock(&root->fs_info->ordered_operations_mutex);
+   mutex_unlock(&root->ordered_extent_mutex);
 
return count;
 }
@@ -669,7 +660,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info 
*fs_info, int nr)
   &fs_info->ordered_roots);
spin_unlock(&fs_info->ordered_root_lock);
 
-   done = __btrfs_wait_ordered_extents(root, nr);
+   done = btrfs_wait_ordered_extents(root, nr);
btrfs_put_fs_root(root);
 
spin_lock(&fs_info->ordered_root_lock);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 02/10] Btrfs: wake up the tasks that wait for the io earlier

2014-03-05 Thread Miao Xie
The tasks that wait for the IO_DONE flag just care about the io of the dirty
pages, so it is better to wake up them immediately after all the pages are
written, not the whole process of the io completes.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- None.
---
 fs/btrfs/ordered-data.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index a6ba75e..067e129 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -349,10 +349,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode 
*inode,
if (!uptodate)
set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
 
-   if (entry->bytes_left == 0)
+   if (entry->bytes_left == 0) {
ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
-   else
+   if (waitqueue_active(&entry->wait))
+   wake_up(&entry->wait);
+   } else {
ret = 1;
+   }
 out:
if (!ret && cached && entry) {
*cached = entry;
@@ -410,10 +413,13 @@ have_entry:
if (!uptodate)
set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
 
-   if (entry->bytes_left == 0)
+   if (entry->bytes_left == 0) {
ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
-   else
+   if (waitqueue_active(&entry->wait))
+   wake_up(&entry->wait);
+   } else {
ret = 1;
+   }
 out:
if (!ret && cached && entry) {
*cached = entry;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 10/10] Btrfs: reclaim the reserved metadata space at background

2014-03-05 Thread Miao Xie
Before applying this patch, the task had to reclaim the metadata space
by itself if the metadata space was not enough. And When the task started
the space reclamation, all the other tasks which wanted to reserve the
metadata space were blocked. At some cases, they would be blocked for
a long time, it made the performance fluctuate wildly.

So we introduce the background metadata space reclamation, when the space
is about to be exhausted, we insert a reclaim work into the workqueue, the
worker of the workqueue helps us to reclaim the reserved space at the
background. By this way, the tasks needn't reclaim the space by themselves at
most cases, and even if the tasks have to reclaim the space or are blocked
for the space reclamation, they will get enough space more quickly.

We needn't worry about the early enospc problem because all the reclaim work
is serialized by the lock.

Signed-off-by: Miao Xie 
---
I just do some simple test now, I'll do more performance test and send out
the result.

Changelog v1 -> v2:
- change the reclaim size.
---
 fs/btrfs/ctree.h   |  6 +++
 fs/btrfs/disk-io.c |  3 ++
 fs/btrfs/extent-tree.c | 99 +-
 fs/btrfs/super.c   |  1 +
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ec47aa9..21f156b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "extent_io.h"
 #include "extent_map.h"
 #include "async-thread.h"
@@ -1305,6 +1306,8 @@ struct btrfs_stripe_hash_table {
 
 #define BTRFS_STRIPE_HASH_TABLE_BITS 11
 
+void btrfs_init_async_reclaim_work(struct work_struct *work);
+
 /* fs_info */
 struct reloc_control;
 struct btrfs_device;
@@ -1681,6 +1684,9 @@ struct btrfs_fs_info {
 
struct semaphore uuid_tree_rescan_sem;
unsigned int update_uuid_tree_gen:1;
+
+   /* Used to reclaim the metadata space in the background. */
+   struct work_struct async_reclaim_work;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2bb0bbd..d77516e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2237,6 +2237,7 @@ int open_ctree(struct super_block *sb,
atomic_set(&fs_info->balance_cancel_req, 0);
fs_info->balance_ctl = NULL;
init_waitqueue_head(&fs_info->balance_wait_q);
+   btrfs_init_async_reclaim_work(&fs_info->async_reclaim_work);
 
sb->s_blocksize = 4096;
sb->s_blocksize_bits = blksize_bits(4096);
@@ -3580,6 +3581,8 @@ int close_ctree(struct btrfs_root *root)
/* clear out the rbtree of defraggable inodes */
btrfs_cleanup_defrag_inodes(fs_info);
 
+   cancel_work_sync(&fs_info->async_reclaim_work);
+
if (!(fs_info->sb->s_flags & MS_RDONLY)) {
ret = btrfs_commit_super(root);
if (ret)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index da43003..6640d28 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4200,6 +4200,98 @@ static int flush_space(struct btrfs_root *root,
 
return ret;
 }
+
+static inline u64
+btrfs_calc_reclaim_metadata_size(struct btrfs_root *root,
+struct btrfs_space_info *space_info)
+{
+   u64 used;
+   u64 expected;
+   u64 to_reclaim;
+
+   to_reclaim = min_t(u64, num_online_cpus() * 1024 * 1024,
+   16 * 1024 * 1024);
+   spin_lock(&space_info->lock);
+   if (can_overcommit(root, space_info, to_reclaim,
+  BTRFS_RESERVE_FLUSH_ALL)) {
+   to_reclaim = 0;
+   goto out;
+   }
+
+   used = space_info->bytes_used + space_info->bytes_reserved +
+  space_info->bytes_pinned + space_info->bytes_readonly +
+  space_info->bytes_may_use;
+   if (can_overcommit(root, space_info, 1024 * 1024,
+  BTRFS_RESERVE_FLUSH_ALL))
+   expected = div_factor_fine(space_info->total_bytes, 95);
+   else
+   expected = div_factor_fine(space_info->total_bytes, 90);
+   to_reclaim = used - expected;
+out:
+   spin_unlock(&space_info->lock);
+
+   return to_reclaim;
+}
+
+static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
+   struct btrfs_fs_info *fs_info, u64 used)
+{
+   return (used >= div_factor_fine(space_info->total_bytes, 95) &&
+   !btrfs_fs_closing(fs_info) &&
+   !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
+}
+
+static int btrfs_need_do_async_reclaim(struct btrfs_space_info *space_info,
+  struct btrfs_fs_info *fs_info)
+{
+   u64 used;
+
+   spin_lock(&space_info->lock);
+   used = space_info->bytes_used + space_info->bytes_reserved +
+  space_info->bytes_pinned + space_info->bytes_readonly +
+  space_info->bytes_may_use

[PATCH V2 07/10] Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock

2014-03-05 Thread Miao Xie
We needn't flush all delalloc inodes when we doesn't get s_umount lock,
or we would make the tasks wait for a long time.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- New patch.
---
 fs/btrfs/ctree.h   |  3 ++-
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/extent-tree.c |  8 
 fs/btrfs/inode.c   | 34 +++---
 fs/btrfs/ioctl.c   |  2 +-
 fs/btrfs/relocation.c  |  2 +-
 fs/btrfs/transaction.c |  2 +-
 7 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 277f627..2f3949b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3727,7 +3727,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
   u32 min_type);
 
 int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
-int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput);
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
+  int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  struct extent_state **cached_state);
 int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index ec1c3f3..9f22905 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -491,7 +491,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * flush all outstanding I/O and inode extent mappings before the
 * copy operation is declared as being finished
 */
-   ret = btrfs_start_delalloc_roots(root->fs_info, 0);
+   ret = btrfs_start_delalloc_roots(root->fs_info, 0, -1);
if (ret) {
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
return ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 07e3d14..da43003 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3971,7 +3971,7 @@ static int can_overcommit(struct btrfs_root *root,
 }
 
 static void btrfs_writeback_inodes_sb_nr(struct btrfs_root *root,
-unsigned long nr_pages)
+unsigned long nr_pages, int nr_items)
 {
struct super_block *sb = root->fs_info->sb;
 
@@ -3986,9 +3986,9 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_root *root,
 * the filesystem is readonly(all dirty pages are written to
 * the disk).
 */
-   btrfs_start_delalloc_roots(root->fs_info, 0);
+   btrfs_start_delalloc_roots(root->fs_info, 0, nr_items);
if (!current->journal_info)
-   btrfs_wait_ordered_roots(root->fs_info, -1);
+   btrfs_wait_ordered_roots(root->fs_info, nr_items);
}
 }
 
@@ -4045,7 +4045,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
-   btrfs_writeback_inodes_sb_nr(root, nr_pages);
+   btrfs_writeback_inodes_sb_nr(root, nr_pages, items);
/*
 * We need to wait for the async pages to actually start before
 * we do anything.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8c9522b..4f64216 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8416,7 +8416,8 @@ void btrfs_wait_and_free_delalloc_work(struct 
btrfs_delalloc_work *work)
  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces them to disk.
  */
-static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput)
+static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput,
+  int nr)
 {
struct btrfs_inode *binode;
struct inode *inode;
@@ -8450,12 +8451,14 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput)
else
iput(inode);
ret = -ENOMEM;
-   goto out;
+   break;
}
list_add_tail(&work->list, &works);
btrfs_queue_worker(&root->fs_info->flush_workers,
   &work->work);
-
+   ret++;
+   if (nr != -1 && ret >= nr)
+   break;
cond_resched();
spin_lock(&root->delalloc_lock);
}
@@ -8465,12 +8468,6 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput)
list_del_init(&work->list);
btrfs_wait_and_free_delalloc_work(work);
}
-   return 0;
-out:
-   list_for_each_ent

[PATCH V2 09/10] Btrfs: fix possible empty list access when flushing the delalloc inodes

2014-03-05 Thread Miao Xie
We didn't have a lock to protect the access to the delalloc inodes list, that is
we might access a empty delalloc inodes list if someone start flushing delalloc
inodes because the delalloc inodes were moved into a other list temporarily.
Fix it by wrapping the access with a lock.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- New patch.
---
 fs/btrfs/ctree.h   | 2 ++
 fs/btrfs/disk-io.c | 2 ++
 fs/btrfs/inode.c   | 4 
 3 files changed, 8 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7bae97e..ec47aa9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1490,6 +1490,7 @@ struct btrfs_fs_info {
 */
struct list_head ordered_roots;
 
+   struct mutex delalloc_root_mutex;
spinlock_t delalloc_root_lock;
/* all fs/file tree roots that have delalloc inodes. */
struct list_head delalloc_roots;
@@ -1797,6 +1798,7 @@ struct btrfs_root {
spinlock_t root_item_lock;
atomic_t refs;
 
+   struct mutex delalloc_mutex;
spinlock_t delalloc_lock;
/*
 * all of the inodes that have delalloc bytes.  It is possible for
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 65fe26e..2bb0bbd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1198,6 +1198,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
mutex_init(&root->objectid_mutex);
mutex_init(&root->log_mutex);
mutex_init(&root->ordered_extent_mutex);
+   mutex_init(&root->delalloc_mutex);
init_waitqueue_head(&root->log_writer_wait);
init_waitqueue_head(&root->log_commit_wait[0]);
init_waitqueue_head(&root->log_commit_wait[1]);
@@ -2169,6 +2170,7 @@ int open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->buffer_lock);
rwlock_init(&fs_info->tree_mod_log_lock);
mutex_init(&fs_info->reloc_mutex);
+   mutex_init(&fs_info->delalloc_root_mutex);
seqlock_init(&fs_info->profiles_lock);
 
init_completion(&fs_info->kobj_unregister);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f64216..34c484c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8429,6 +8429,7 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
INIT_LIST_HEAD(&works);
INIT_LIST_HEAD(&splice);
 
+   mutex_lock(&root->delalloc_mutex);
spin_lock(&root->delalloc_lock);
list_splice_init(&root->delalloc_inodes, &splice);
while (!list_empty(&splice)) {
@@ -8474,6 +8475,7 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
list_splice_tail(&splice, &root->delalloc_inodes);
spin_unlock(&root->delalloc_lock);
}
+   mutex_unlock(&root->delalloc_mutex);
return ret;
 }
 
@@ -8515,6 +8517,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
*fs_info, int delay_iput,
 
INIT_LIST_HEAD(&splice);
 
+   mutex_lock(&fs_info->delalloc_root_mutex);
spin_lock(&fs_info->delalloc_root_lock);
list_splice_init(&fs_info->delalloc_roots, &splice);
while (!list_empty(&splice) && nr) {
@@ -8554,6 +8557,7 @@ out:
list_splice_tail(&splice, &fs_info->delalloc_roots);
spin_unlock(&fs_info->delalloc_root_lock);
}
+   mutex_unlock(&fs_info->delalloc_root_mutex);
return ret;
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 06/10] Btrfs: reclaim delalloc metadata more aggressively

2014-03-05 Thread Miao Xie
generic/074 in xfstests failed sometimes because of the enospc error,
the reason of this problem is that we just reclaimed the space we need
from the reserved space for delalloc, and then tried to reserve the space,
but if some task did no-flush reservation between the above reclamation
and reservation,
Task1   Task2
shrink_delalloc()
reclaim 1 block
(The space that can
 be reserved now is 1
 block)
do no-flush reservation
reserve 1 block
(The space that can
 be reserved now is 0
 block)
reserving 1 block failed
the reservation of Task1 failed, but in fact, there was enough space to
reserve if we could reclaim more space before.

Fix this problem by the aggressive reclamation of the reserved delalloc
metadata space.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- New patch.
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ddb5c84..07e3d14 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4174,7 +4174,7 @@ static int flush_space(struct btrfs_root *root,
break;
case FLUSH_DELALLOC:
case FLUSH_DELALLOC_WAIT:
-   shrink_delalloc(root, num_bytes, orig_bytes,
+   shrink_delalloc(root, num_bytes * 2, orig_bytes,
state == FLUSH_DELALLOC_WAIT);
break;
case ALLOC_CHUNK:
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 03/10] Btrfs: just do dirty page flush for the inode with compression before direct IO

2014-03-05 Thread Miao Xie
As the comment in the btrfs_direct_IO says, only the compressed pages need be
flush again to make sure they are on the disk, but the common pages needn't,
so we add a if statement to check if the inode has compressed pages or not,
if no, skip the flush.

And in order to prevent the write ranges from intersecting, we need wait for
the running ordered extents. But the current code waits for them twice, one
is done before the direct IO starts (in btrfs_wait_ordered_range()), the other
is before we get the blocks, it is unnecessary. because we can do the direct
IO without holding i_mutex, it means that the intersected ordered extents may
happen during the direct IO, the first wait can not avoid this problem. So we
use filemap_fdatawrite_range() instead of btrfs_wait_ordered_range() to remove
the first wait.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- None.
---
 fs/btrfs/inode.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af34d0..8c9522b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7401,15 +7401,15 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb 
*iocb,
smp_mb__after_atomic_inc();
 
/*
-* The generic stuff only does filemap_write_and_wait_range, which isn't
-* enough if we've written compressed pages to this area, so we need to
-* call btrfs_wait_ordered_range to make absolutely sure that any
-* outstanding dirty pages are on disk.
+* The generic stuff only does filemap_write_and_wait_range, which
+* isn't enough if we've written compressed pages to this area, so
+* we need to flush the dirty pages again to make absolutely sure
+* that any outstanding dirty pages are on disk.
 */
count = iov_length(iov, nr_segs);
-   ret = btrfs_wait_ordered_range(inode, offset, count);
-   if (ret)
-   return ret;
+   if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+&BTRFS_I(inode)->runtime_flags))
+   filemap_fdatawrite_range(inode->i_mapping, offset, count);
 
if (rw & WRITE) {
/*
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 05/10] Btrfs: remove unnecessary lock in may_commit_transaction()

2014-03-05 Thread Miao Xie
The reason is:
- The per-cpu counter has its own lock to protect itself.
- Here we needn't get a exact value.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- None.
---
 fs/btrfs/extent-tree.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 32312e0..ddb5c84 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4112,13 +4112,9 @@ static int may_commit_transaction(struct btrfs_root 
*root,
goto commit;
 
/* See if there is enough pinned space to make this reservation */
-   spin_lock(&space_info->lock);
if (percpu_counter_compare(&space_info->total_bytes_pinned,
-  bytes) >= 0) {
-   spin_unlock(&space_info->lock);
+  bytes) >= 0)
goto commit;
-   }
-   spin_unlock(&space_info->lock);
 
/*
 * See if there is some space in the delayed insertion reservation for
@@ -4127,16 +4123,13 @@ static int may_commit_transaction(struct btrfs_root 
*root,
if (space_info != delayed_rsv->space_info)
return -ENOSPC;
 
-   spin_lock(&space_info->lock);
spin_lock(&delayed_rsv->lock);
if (percpu_counter_compare(&space_info->total_bytes_pinned,
   bytes - delayed_rsv->size) >= 0) {
spin_unlock(&delayed_rsv->lock);
-   spin_unlock(&space_info->lock);
return -ENOSPC;
}
spin_unlock(&delayed_rsv->lock);
-   spin_unlock(&space_info->lock);
 
 commit:
trans = btrfs_join_transaction(root);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 01/10] Btrfs: fix early enospc due to the race of the two ordered extent wait

2014-03-05 Thread Miao Xie
btrfs_wait_ordered_roots() moves all the list entries to a new list,
and then deals with them one by one. But if the other task invokes this
function at that time, it would get a empty list. It makes the enospc
error happens more early. Fix it.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- New patch.
---
 fs/btrfs/ordered-data.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 138a7d7..a6ba75e 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -589,7 +589,7 @@ static void btrfs_run_ordered_extent_work(struct btrfs_work 
*work)
  * wait for all the ordered extents in a root.  This is done when balancing
  * space between drives.
  */
-int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
+static int __btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
 {
struct list_head splice, works;
struct btrfs_ordered_extent *ordered, *next;
@@ -598,7 +598,6 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, int 
nr)
INIT_LIST_HEAD(&splice);
INIT_LIST_HEAD(&works);
 
-   mutex_lock(&root->fs_info->ordered_operations_mutex);
spin_lock(&root->ordered_extent_lock);
list_splice_init(&root->ordered_extents, &splice);
while (!list_empty(&splice) && nr) {
@@ -629,6 +628,16 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, 
int nr)
btrfs_put_ordered_extent(ordered);
cond_resched();
}
+
+   return count;
+}
+
+int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
+{
+   int count;
+
+   mutex_lock(&root->fs_info->ordered_operations_mutex);
+   count = __btrfs_wait_ordered_extents(root, nr);
mutex_unlock(&root->fs_info->ordered_operations_mutex);
 
return count;
@@ -642,6 +651,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info 
*fs_info, int nr)
 
INIT_LIST_HEAD(&splice);
 
+   mutex_lock(&fs_info->ordered_operations_mutex);
spin_lock(&fs_info->ordered_root_lock);
list_splice_init(&fs_info->ordered_roots, &splice);
while (!list_empty(&splice) && nr) {
@@ -653,7 +663,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info 
*fs_info, int nr)
   &fs_info->ordered_roots);
spin_unlock(&fs_info->ordered_root_lock);
 
-   done = btrfs_wait_ordered_extents(root, nr);
+   done = __btrfs_wait_ordered_extents(root, nr);
btrfs_put_fs_root(root);
 
spin_lock(&fs_info->ordered_root_lock);
@@ -664,6 +674,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info 
*fs_info, int nr)
}
list_splice_tail(&splice, &fs_info->ordered_roots);
spin_unlock(&fs_info->ordered_root_lock);
+   mutex_unlock(&fs_info->ordered_operations_mutex);
 }
 
 /*
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Btrfs-progs: fsck: deal with really corrupted extent tree

2014-03-05 Thread Wang Shilong
To reinit extent root, we need find a free extent, however,
we may have a really corrupted extent tree, so we can't rely
on existed extent tree to cache block group any more.

During test, we fail to reinit extent tree which is because we
can not find a free extent so let's make block group cache ourselves
firstly.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 98199ce..3cf59b6 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -6030,11 +6030,13 @@ static int pin_metadata_blocks(struct btrfs_fs_info 
*fs_info)
 
 static int reset_block_groups(struct btrfs_fs_info *fs_info)
 {
+   struct btrfs_block_group_cache *cache;
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_chunk *chunk;
struct btrfs_key key;
int ret;
+   u64 start;
 
path = btrfs_alloc_path();
if (!path)
@@ -6085,8 +6087,19 @@ static int reset_block_groups(struct btrfs_fs_info 
*fs_info)
  btrfs_chunk_type(leaf, chunk),
  key.objectid, key.offset,
  btrfs_chunk_length(leaf, chunk));
+   set_extent_dirty(&fs_info->free_space_cache, key.offset,
+key.offset + btrfs_chunk_length(leaf, chunk),
+GFP_NOFS);
path->slots[0]++;
}
+   start = 0;
+   while (1) {
+   cache = btrfs_lookup_first_block_group(fs_info, start);
+   if (!cache)
+   break;
+   cache->cached = 1;
+   start = cache->key.objectid + cache->key.offset;
+   }
 
btrfs_free_path(path);
return 0;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Btrfs-progs: fsck: reset balance after reiniting extent root

2014-03-05 Thread Wang Shilong
reset balance need cow block which will insert extent item into
extent tree. If we do this before reinitting extent root, we may
encounter EEIXST.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 3cf59b6..8a3f2cd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -6253,12 +6253,6 @@ static int reinit_extent_tree(struct btrfs_trans_handle 
*trans,
return ret;
}
 
-   ret = reset_balance(trans, fs_info);
-   if (ret) {
-   fprintf(stderr, "error reseting the pending balance\n");
-   return ret;
-   }
-
/* Ok we can allocate now, reinit the extent root */
ret = btrfs_fsck_reinit_root(trans, fs_info->extent_root, 0);
if (ret) {
@@ -6293,7 +6287,11 @@ static int reinit_extent_tree(struct btrfs_trans_handle 
*trans,
btrfs_extent_post_op(trans, fs_info->extent_root);
}
 
-   return 0;
+   ret = reset_balance(trans, fs_info);
+   if (ret)
+   fprintf(stderr, "error reseting the pending balance\n");
+
+   return ret;
 }
 
 static int recow_extent_buffer(struct btrfs_root *root, struct extent_buffer 
*eb)
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Btrfs-progs: fsck: insert root dir into reloc data tree when reiniting it

2014-03-05 Thread Wang Shilong
There are two bugs when resetting balance:

 1.we will skip reinitting reloc data tree if no reloc root found, however
  this is not right because we don't pin reloc data tree before.

 2.we should insert root dir into reloc data tree,otherwise we will fail
  to fsck.

Fix problems by forcely reiniting reloc data root and inserting root dir.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 8a3f2cd..4b2a8f0 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -6128,7 +6128,10 @@ static int reset_balance(struct btrfs_trans_handle 
*trans,
if (ret) {
if (ret > 0)
ret = 0;
-   goto out;
+   if (!ret)
+   goto reinit_data_reloc;
+   else
+   goto out;
}
 
ret = btrfs_del_item(trans, root, path);
@@ -6190,6 +6193,7 @@ static int reset_balance(struct btrfs_trans_handle *trans,
}
btrfs_release_path(path);
 
+reinit_data_reloc:
key.objectid = BTRFS_DATA_RELOC_TREE_OBJECTID;
key.type = BTRFS_ROOT_ITEM_KEY;
key.offset = (u64)-1;
@@ -6205,6 +6209,9 @@ static int reset_balance(struct btrfs_trans_handle *trans,
extent_buffer_get(root->node);
}
ret = btrfs_fsck_reinit_root(trans, root, 0);
+   if (ret)
+   goto out;
+   ret = btrfs_make_root_dir(trans, root, BTRFS_FIRST_FREE_OBJECTID);
 out:
btrfs_free_path(path);
return ret;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Btrfs-progs: fsck: handle case that we can not lookup extent info

2014-03-05 Thread Wang Shilong
Previously, --init-extent-tree works just because btrfs_lookup_extent_info()
blindly return 0, and this make it work if there are not any *FULL BACKREF*
mode in broken filesystem.

It is just a coincidence that --init-extent-tree option works, let's
do it in the right way firstly.

For now, we have not supported to rebuild extent tree if there are
any *FULL BACKREF* mode which means if there are snapshots with broken
filesystem, avoid using --init-extent-tree option now.

Signed-off-by: Wang Shilong 
---
 cmds-check.c  | 36 ++--
 extent-tree.c |  2 +-
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index ae611d1..d1cafe1 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -52,6 +52,7 @@ static LIST_HEAD(duplicate_extents);
 static LIST_HEAD(delete_items);
 static int repair = 0;
 static int no_holes = 0;
+static int init_extent_tree = 0;
 
 struct extent_backref {
struct list_head list;
@@ -3915,11 +3916,19 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
 
nritems = btrfs_header_nritems(buf);
 
-   ret = btrfs_lookup_extent_info(NULL, root, bytenr,
+   /*
+* FIXME, this only works only if we don't have any full
+* backref mode.
+*/
+   if (!init_extent_tree) {
+   ret = btrfs_lookup_extent_info(NULL, root, bytenr,
   btrfs_header_level(buf), 1, NULL,
   &flags);
-   if (ret < 0)
-   flags = BTRFS_BLOCK_FLAG_FULL_BACKREF;
+   if (ret < 0)
+   flags = 0;
+   } else {
+   flags = 0;
+   }
 
if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) {
parent = bytenr;
@@ -5102,12 +5111,20 @@ static int fixup_extent_refs(struct btrfs_trans_handle 
*trans,
int allocated = 0;
u64 flags = 0;
 
-   /* remember our flags for recreating the extent */
-   ret = btrfs_lookup_extent_info(NULL, info->extent_root, rec->start,
-  rec->max_size, rec->metadata, NULL,
-  &flags);
-   if (ret < 0)
-   flags = BTRFS_BLOCK_FLAG_FULL_BACKREF;
+   /*
+* remember our flags for recreating the extent.
+* FIXME, if we have cleared extent tree, we can not
+* lookup extent info in extent tree.
+*/
+   if (!init_extent_tree) {
+   ret = btrfs_lookup_extent_info(NULL, info->extent_root,
+   rec->start, rec->max_size,
+   rec->metadata, NULL, &flags);
+   if (ret < 0)
+   flags = 0;
+   } else {
+   flags = 0;
+   }
 
path = btrfs_alloc_path();
if (!path)
@@ -6438,7 +6455,6 @@ int cmd_check(int argc, char **argv)
u64 num;
int option_index = 0;
int init_csum_tree = 0;
-   int init_extent_tree = 0;
enum btrfs_open_ctree_flags ctree_flags =
OPEN_CTREE_PARTIAL | OPEN_CTREE_EXCLUSIVE;
 
diff --git a/extent-tree.c b/extent-tree.c
index 7860d1d..7979457 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1560,7 +1560,7 @@ again:
*flags = extent_flags;
 out:
btrfs_free_path(path);
-   return 0;
+   return ret;
 }
 
 int btrfs_set_block_flags(struct btrfs_trans_handle *trans,
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Btrfs-progs: fsck: force to udate tree root for some cases

2014-03-05 Thread Wang Shilong
commit roots won't update root item in tree root if it finds
updated root's bytenr is same as before.

However, this is not right for fsck, we need update tree root in
the following case:

1.overwrite previous root node.

2.reinit reloc data tree, this is because we skip pin relo data
 tree before which means we can allocate same block as before.

Fix this by updating tree root ourselves for the above cases.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 4b2a8f0..ae611d1 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5906,6 +5906,7 @@ static int btrfs_fsck_reinit_root(struct 
btrfs_trans_handle *trans,
struct extent_buffer *c;
struct extent_buffer *old = root->node;
int level;
+   int ret;
struct btrfs_disk_key disk_key = {0,0,0};
 
level = 0;
@@ -5922,6 +5923,7 @@ static int btrfs_fsck_reinit_root(struct 
btrfs_trans_handle *trans,
if (IS_ERR(c)) {
c = old;
extent_buffer_get(c);
+   overwrite = 1;
}
 init:
memset_extent_buffer(c, 0, 0, sizeof(struct btrfs_header));
@@ -5939,7 +5941,26 @@ init:
BTRFS_UUID_SIZE);
 
btrfs_mark_buffer_dirty(c);
-
+   /*
+* this case can happen in the following case:
+*
+* 1.overwrite previous root.
+*
+* 2.reinit reloc data root, this is because we skip pin
+* down reloc data tree before which means we can allocate
+* same block bytenr here.
+*/
+   if (old->start == c->start) {
+   btrfs_set_root_generation(&root->root_item,
+ trans->transid);
+   root->root_item.level = btrfs_header_level(root->node);
+   ret = btrfs_update_root(trans, root->fs_info->tree_root,
+   &root->root_key, &root->root_item);
+   if (ret) {
+   free_extent_buffer(c);
+   return ret;
+   }
+   }
free_extent_buffer(old);
root->node = c;
add_root_to_dirty_list(root);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC errors during raid1 rebalance

2014-03-05 Thread Duncan
Michael Russo posted on Wed, 05 Mar 2014 22:13:10 + as excerpted:

> Chris Murphy  colorremedies.com> writes:
> 
>> You could also try a full defragment by specifying -r on the mount
>> point with a small -t value to effectively cause everything to be
>> subject to defragmenting. If this still doesn't permit soft rebalance,
>> then maybe filefrag can find files that have more than 1 extent and
>> just copy them (make duplicates, delete the original). Any copy will be
>> allocated into chunks with the new profile.
> 
> I would think so too.  But it doesn't seem to be happening.
> Here is an example with one file:
> 
> root@ossy:/mymedia# filefrag output.wav output.wav: 2 extents found
> root@ossy:/mymedia# /usr/src/btrfs-progs/btrfs fi de -t 1
> /mymedia/output.wav root@ossy:/mymedia# filefrag output.wav output.wav:
> 2 extents found
> 
> btrfs does not defrag the file. And copying the file usually doesn't
> defrag it either:
> 
> root@ossy:/mymedia# cp output.wav output.wav.bak root@ossy:/mymedia#
> filefrag output.wav.bak output.wav.bak: 2 extents found
> 
> I even tried copying a large file to another filesystem (/dev/shm),
>  removing the original, and copying it back, and more often than not
> it still had more than 1 extent.

This was covered in one thread recently, but looking back in this one I 
didn't find it covered here, so...

What are your mount options?  Do they include compress and/or compress-
force?  Because filefrag doesn't understand btrfs compression yet and 
counts each 128 KiB (I believe) compression block as a separate extent.  
I'm not sure whether that's 128 KiB pre-compression or post-compression 
size, and I'm not even positive it's 128 KiB, but certainly, if the file 
is large enough and btrfs is compressing it, filefrag will false-positive 
report multiple extents.  That's a known issue with it ATM.

Meanwhile, there's ongoing work to teach filefrag about btrfs compression 
so it can report fragmentation accurately, but from what I've read 
they're working on a general kernel-VFS-level API for that so the same 
general API can be used by other filesystems, and getting proper 
agreement on that API, and having both the kernel and filefrag implement 
it isn't a simple single-kernel-cycle project.  There's a lot of 
filesystems other than btrfs that could potentially use this sort of 
thing, and getting a solution that will work well for all of them is hard 
work, both technically and politically.  But once it's implemented, /
correctly/, the entire Linux kernel filesystem space will benefit, just 
as btrfs is getting the benefit of the filefrag tool that ships with 
e2fsprogs, and the filesystem testing that ships as xfstests. =:^)

But if you're not using compression, /that/ can't explain it...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html