Re: btrfs balancing start - and stop?

2011-04-11 Thread Stephane Chazelas
2011-04-06 12:43:50 +0100, Stephane Chazelas:
[...]
 The rate is going down. It's now down to about 14kB/s
 
 [658654.295752] btrfs: relocating block group 3919858106368 flags 20
 [671932.913235] btrfs: relocating block group 3919589670912 flags 20
 [686189.296126] btrfs: relocating block group 3919321235456 flags 20
 [701511.523990] btrfs: relocating block group 391905280 flags 20
 [718591.316339] btrfs: relocating block group 3918784364544 flags 20
 [725567.081031] btrfs: relocating block group 3918515929088 flags 20
 [744415.011581] btrfs: relocating block group 3918247493632 flags 20
 [762365.021458] btrfs: relocating block group 3917979058176 flags 20
 [780504.726067] btrfs: relocating block group 3917710622720 flags 20
[...]
 At this rate, the balancing would be over in about 8 years.
[...]

Hurray! The btrfs balance eventually ran through after almost exactly 2 weeks.
It didn't get down to 0:

[1189505.152717] btrfs: found 60527 extents
[1189505.440565] btrfs: relocating block group 3910731300864 flags 20
[1199805.071045] btrfs: found 60235 extents
[1199805.447821] btrfs: relocating block group 3910462865408 flags 20
[1207914.737372] btrfs: found 58039 extents

iostat reckons 9TB have been written to  disk in the whole
process (4.5TB read from them (!?)).

There hasn't been any change in allocation though:

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi df /backup
Data, RAID0: total=3.42TB, used=3.41TB
System, RAID1: total=16.00MB, used=228.00KB
Metadata, RAID1: total=28.00GB, used=20.47GB
# btrfs fi show
Label: none  uuid: a0ae35c4-51f2-405f-a4bb-e4f134b1d193
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19

Still 1.5TB missing.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-11 Thread Helmut Hullen
Hallo, Stephane,

Du meintest am 11.04.11:

 [780504.726067] btrfs: relocating block group 3917710622720 flags 20
 [...]
 At this rate, the balancing would be over in about 8 years.
 [...]

 Hurray! The btrfs balance eventually ran through after almost exactly
 2 weeks. It didn't get down to 0:

Congratulations!

 There hasn't been any change in allocation though:

 # df -h /backup
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda4 8.2T  3.5T  3.2T  53% /backup

 # btrfs fi df /backup
 Data, RAID0: total=3.42TB, used=3.41TB
 System, RAID1: total=16.00MB, used=228.00KB
 Metadata, RAID1: total=28.00GB, used=20.47GB

 Still 1.5TB missing.

Seems to be the same problem I've just mourned about.

Just expand available to at least available. And ignore the value in  
Data, RAID0: total=3.42TB.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-06 Thread Stephane Chazelas
2011-04-04 20:07:54 +0100, Stephane Chazelas:
[...]
   4.7 more days to go. And I reckon it will have written about 9
   TB to disk by that time (which is the total size of the volume,
   though only 3.8TB are occupied).
  
  Yes - that's the pessimistic estimation. As Hugo has explained it can  
  finish faster - just look to the data tomorrow again.
 [...]
 
 That may be an optimistic estimation actually, as there hasn't
 been much progress in the last 34 hours:
[...]

The rate is going down. It's now down to about 14kB/s

[658654.295752] btrfs: relocating block group 3919858106368 flags 20
[671932.913235] btrfs: relocating block group 3919589670912 flags 20
[686189.296126] btrfs: relocating block group 3919321235456 flags 20
[701511.523990] btrfs: relocating block group 391905280 flags 20
[718591.316339] btrfs: relocating block group 3918784364544 flags 20
[725567.081031] btrfs: relocating block group 3918515929088 flags 20
[744415.011581] btrfs: relocating block group 3918247493632 flags 20
[762365.021458] btrfs: relocating block group 3917979058176 flags 20
[780504.726067] btrfs: relocating block group 3917710622720 flags 20

Even though it is reading and writing to disk at a much higher
rate. Here stats every second:

--dsk/sda-dsk/sdb-dsk/sdc--
 read  writ: read  writ: read  writ
   0 0 : 540k0 :  12k0
   0 0 : 704k0 :  20k0
   0 0 :1068k0 :  24k0
   0 0 : 968k0 :   0 0
   0 0 : 932k0 :4096B0
   0 0 : 832k  880k: 152k 1320k
  60k 4096B: 880k  140k:   028M
  68k0 : 308k0 :4096B 9240k
   048k:   0 0 :   0  7852k
   0 0 : 576k 6192k:4096B   26M
   0 0 : 100k   18M:   0 0
   0 0 :  28k   10M:   0 0
   0 0 :   0  7020k:   0 0
   0 0 :  52k   13M:   0 0
   012k: 528k   17M:   012k
   0 0 : 884k0 :8192B0
   0 0 :1068k0 :  20k0
   0 0 : 660k0 :   0 0
   040k: 776k0 :4096B0
   0 0 : 576k0 :   0 0
   0 0 : 596k0 :8192B0
1096k   28k: 664k0 :4096B0
   0 0 : 660k0 :   0 0
   0 0 : 592k0 :8192B0

At this rate, the balancing would be over in about 8 years.

Since the start of the balance:
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  10.04 1.56 1.5712282861237359
sdc 396.24 1.77 3.9513970153115057
sdb 421.17 1.87 3.9514737593115093

I think that's the end of my attempt to transfer that FS to
another machine (see other thread). I'll have to ditch that copy
and try again from scratch with another approach.

Before I do that, is there anything I can do to help investigate
the problem?

regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-05 Thread Struan Bartlett


On 01/04/11 12:59, Hugo Mills wrote:

On Fri, Apr 01, 2011 at 12:14:50PM +0100, Struan Bartlett wrote:
   

My company is testing btrfs (kernel 2.6.38) on a slave MySQL
database server with a 195Gb filesystem (of which about 123Gb is
used). So far, we're quite impressed with the performance. Our
database loads are high, and if  filesystem performance wasn't good,
MySQL replication wouldn't be able to keep up and the slave latency
would begin to climb. This though, is generally not happening, which
is good.

However, we recently tried running 'btrfs fi balance' on the
filesystem, and found this deteriorated performance significantly,
and the MySQL replication latency did begin to climb. Several hours
later, with the btrfs-cleaner thread apparently still busy, and our
replication latency running to a couple of hours, and no sign of the
balancing operation finishing, we decided we needed to terminate the
balancing operation, which we did by rebooting the server.

That, however, is suboptimal in a production environment, and so
I've some questions.

1) Is the balancing operation expected to take many hours (or days?)
on a filesystem such as this? Or are there known issues with the
algorithm that are yet to be addressed?
 

A balance rewrites all the data on the filesystem, so it can take a
very long time (I think the longest reported time I've seen from
anyone was 48 hours, on several terabytes of data). However, this will
be highly dependent on the amount of I/O bandwidth available to the
FS, and on the size of the data to be written.

   

2) Is it supposed to be desirable to run balancing operations
periodically anyway? Our server is running on hardware mirrored
disks, so our btrfs filesystem is simply created in spare space on
the LVM volume group, using a single LV block device. Does balancing
help improve performance/optimise free space in this setup anyway?
 

Not that I'm aware of, particularly in the light of the recent
patch that frees up unused block groups. Others here may have a more
informed take on this, though.

   

3) If there's an ioctl for launching a balancing operation, would it
be an idea to add one for pausing a balancing operation? If
balancing may take 'significant' lengths of time, and if it's
intended that balancing be done periodically, it might be helpful if
one could start balancing when loads are lower, and make sure one
can stop them when resources are needed (in our case, when slave
latency exceeds acceptable limits).
 

There's patches for a cancel operation on the mailing list.
Further, I've got (as yet) unreleased patches for various forms of
partial balance, at least one of which would allow a balance to be
restarted after it was cancelled. The only reason I've not released
them is because I want to do a final check of what I send to the list
to ensure that I'm not making an idiot of myself (and wasting people's
time) with malformed patches. I hope to have time for this on Sunday.

Hugo.

   
Hugo - thanks very much for your thorough reply. I look forward to being 
able to cancel a balancing operation, but in the meantime we simply 
won't bother setting any going, and see how things go. So far, our btrfs 
slave database has been running two weeks, with a rolling history of 
snapshots taken every ten minutes, without any other apparent issues.


Struan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-04 Thread Stephane Chazelas
2011-04-03 21:35:00 +0200, Helmut Hullen:
 Hallo, Stephane,
 
 Du meintest am 03.04.11:
 
  balancing about 2 TByte needed about 20 hours.
 
 [...]
 
  Hugo has explained the limits of regarding
 
  dmesg | grep relocating
 
  or (more simple) the last lines of dmesg and looking for the
  relocating lines. But: what do these lines tell now? What is the
  (pessimistic) estimation when you extrapolate the data?
 
 [...]
 
  4.7 more days to go. And I reckon it will have written about 9
  TB to disk by that time (which is the total size of the volume,
  though only 3.8TB are occupied).
 
 Yes - that's the pessimistic estimation. As Hugo has explained it can  
 finish faster - just look to the data tomorrow again.
[...]

That may be an optimistic estimation actually, as there hasn't
been much progress in the last 34 hours:

# dmesg | awk -F '[][ ]+' '/reloc/ n++%5==0 {x=(n-$7)/($2-t)/1048576; printf 
%s\t%s\t%.2f\t%*s\n, $2/3600,$7, x, x/3, ; t=$2; n=$7}' | tr ' ' '*' | tail 
-40
125.629 4170039951360   11.93   ***
125.641 4166818725888   70.99   ***
125.699 4157155049472   43.87   **
125.753 4144270147584   63.34   *
125.773 4137827696640   84.98   
125.786 4134606471168   64.39   *
125.823 4124942794752   70.09   ***
125.87  4112057892864   71.66   ***
125.887 4105615441920   100.60  *
125.898 4102394216448   81.26   ***
125.935 4092730540032   69.06   ***
126.33  4085751218176   4.69*
131.904 4072597880832   0.63
132.082 4059712978944   19.20   **
132.12  4053270528000   45.52   ***
132.138 4050049302528   45.60   ***
132.225 4040385626112   29.68   *
132.267 4027500724224   81.17   ***
132.283 4021058273280   106.31  ***
132.29  4017837047808   110.42  
132.316 4008173371392   100.54  *
132.358 3995288469504   81.18   ***
132.475 3988846018560   14.62   
132.514 3985624793088   21.55   ***
132.611 3975961116672   26.40   
132.663 3963076214784   65.31   *
132.678 3956633763840   120.11  
132.685 3956365328384   10.26   ***
137.701 3949922877440   0.34
137.709 3946701651968   106.54  ***
137.744 3937037975552   72.10   
137.889 3927105863680   18.18   **
137.901 3926837428224   5.85*
141.555 3926300557312   0.04
141.93  3925226815488   0.76
151.227 3924421509120   0.02
151.491 3924153073664   0.27
151.712 3923616202752   0.64
165.301 3922542460928   0.02
174.346 3921737154560   0.02

At this rate (third field expressed in MiB/s), it could take
months to complete.

iostat still reports writes at about 5MiB/s though. Note that
this system is not doing anything else at all.

There definitely seems to be scope for optimisation in the
balancing I'd say.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-03 Thread Helmut Hullen
Hallo, Stephane,

Du meintest am 03.04.11:

 balancing about 2 TByte needed about 20 hours.

[...]

 Hugo has explained the limits of regarding

 dmesg | grep relocating

 or (more simple) the last lines of dmesg and looking for the
 relocating lines. But: what do these lines tell now? What is the
 (pessimistic) estimation when you extrapolate the data?

[...]

 4.7 more days to go. And I reckon it will have written about 9
 TB to disk by that time (which is the total size of the volume,
 though only 3.8TB are occupied).

Yes - that's the pessimistic estimation. As Hugo has explained it can  
finish faster - just look to the data tomorrow again.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Helmut Hullen
Hallo, Struan,

Du meintest am 01.04.11:

 1) Is the balancing operation expected to take many hours (or days?)
 on a filesystem such as this? Or are there known issues with the
 algorithm that are yet to be addressed?

May be. Balancing about 15 GByte needed about 2 hours (or less),  
balancing about 2 TByte needed about 20 hours.

dmesg counts down the number of remaining jobs.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Konstantinos Skarlatos

On 1/4/2011 3:12 μμ, Helmut Hullen wrote:

Hallo, Struan,

Du meintest am 01.04.11:


1) Is the balancing operation expected to take many hours (or days?)
on a filesystem such as this? Or are there known issues with the
algorithm that are yet to be addressed?

May be. Balancing about 15 GByte needed about 2 hours (or less),
balancing about 2 TByte needed about 20 hours.

dmesg counts down the number of remaining jobs.
are you sure? here is a snippet of dmesg from a balance i did yesterday 
(2.6.38.1)


btrfs: relocating block group 15338569728 flags 9
btrfs: found 17296 extents
btrfs: found 17296 extents
btrfs: relocating block group 13191086080 flags 9
btrfs: found 21029 extents
btrfs: found 21029 extents
btrfs: relocating block group 11043602432 flags 9
btrfs: found 4728 extents
btrfs: found 4728 extents



Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Helmut Hullen
Hallo, Konstantinos,

Du meintest am 01.04.11:

 dmesg counts down the number of remaining jobs.

 are you sure? here is a snippet of dmesg from a balance i did
 yesterday (2.6.38.1)

 btrfs: relocating block group 15338569728 flags 9
 btrfs: found 17296 extents
 btrfs: found 17296 extents
 btrfs: relocating block group 13191086080 flags 9
 btrfs: found 21029 extents
 btrfs: found 21029 extents
 btrfs: relocating block group 11043602432 flags 9
 btrfs: found 4728 extents
 btrfs: found 4728 extents

Yes - there I look how long the balancing job may still work. You see  
that the relocating line counts down: 15 13 11 ...

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Hugo Mills
On Fri, Apr 01, 2011 at 03:36:00PM +0200, Helmut Hullen wrote:
 Hallo, Konstantinos,
 
 Du meintest am 01.04.11:
 
  dmesg counts down the number of remaining jobs.
 
  are you sure? here is a snippet of dmesg from a balance i did
  yesterday (2.6.38.1)
 
  btrfs: relocating block group 15338569728 flags 9
  btrfs: found 17296 extents
  btrfs: found 17296 extents
  btrfs: relocating block group 13191086080 flags 9
  btrfs: found 21029 extents
  btrfs: found 21029 extents
  btrfs: relocating block group 11043602432 flags 9
  btrfs: found 4728 extents
  btrfs: found 4728 extents
 
 Yes - there I look how long the balancing job may still work. You see  
 that the relocating line counts down: 15 13 11 ...

   It's not a good measure of time remaining, as the numbers there are
pretty arbitrary: they're the start of the block group in btrfs's own
internal address space. The balance algorithm will indeed go through
the block groups in reverse order, but they're not guaranteed to
terminate at zero. In fact, if you've balanced before, they're pretty
much guaranteed to terminate a long way before zero.

   New block groups (for example, ones created during a balance) are
always created with virtual addresses larger than any previous block
group. So, if you started with block groups at addresses 1G, 2G, 3G,
4G and balanced, you'd end up with ones at 5G, 6G, 7G, 8G. The next
time you balanced, you'd see the numbers count down: 8G, 7G, 6G, 5G,
complete.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Happiness is mandatory.  Are you happy? --- 


signature.asc
Description: Digital signature


Re: btrfs balancing start - and stop?

2011-04-01 Thread Konstantinos Skarlatos

On 1/4/2011 4:37 μμ, Hugo Mills wrote:

On Fri, Apr 01, 2011 at 04:22:39PM +0300, Konstantinos Skarlatos wrote:

On 1/4/2011 3:12 μμ, Helmut Hullen wrote:

Du meintest am 01.04.11:

dmesg counts down the number of remaining jobs.

are you sure? here is a snippet of dmesg from a balance i did
yesterday (2.6.38.1)

btrfs: relocating block group 15338569728 flags 9
btrfs: found 17296 extents
btrfs: found 17296 extents
btrfs: relocating block group 13191086080 flags 9
btrfs: found 21029 extents
btrfs: found 21029 extents
btrfs: relocating block group 11043602432 flags 9
btrfs: found 4728 extents
btrfs: found 4728 extents

Count the number of block groups in the system (1GiB for data,
256MiB for metdata on a typical filesystem), and subtract the number
of relocating block group messages... Not ideal, but it's possible.

The balance cancel patch I mentioned earlier also supplies an
additional patch for monitoring progress, which does show up in the
dmesg output (as well as user-space support for prettier output).
Great, I think it is very important to have a human-readable progress 
monitor for operations like that.

Hugo.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Stephane Chazelas
On Fri, 2011-04-01 at 14:12 +0200, Helmut Hullen wrote:
 Hallo, Struan,
 
 Du meintest am 01.04.11:
 
  1) Is the balancing operation expected to take many hours (or days?)
  on a filesystem such as this? Or are there known issues with the
  algorithm that are yet to be addressed?
 
 May be. Balancing about 15 GByte needed about 2 hours (or less),  
 balancing about 2 TByte needed about 20 hours.
[...]

I've got a balance running since Monday on a 9TB volume (3.5 of which
are used, 3.2 allegedly free), showing no sign of finishing soon. Should
I be worried?

Using /proc/sys/vm/block_dump, I can see it's seeking all over the
place, which is probably why throughput is not high. I can also see it
writing several times to the same sectors.

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi sh
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.16TB path /dev/sdc
devid3 size 2.73TB used 1.16TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19
# ps -eolstart,args | grep balance
Mon Mar 28 11:18:18 2011 sudo btrfs fi balance /backup
Mon Mar 28 11:18:18 2011 btrfs fi balance /backup
# date
Fri Apr  1 19:28:40 BST 2011
# btrfs fi df /backup
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=27.75GB, used=20.47GB
# iostat -md
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  14.49 2.37 2.39 903123 913112
sdc 501.23 2.68 5.0610224561928462
sdb 477.28 2.58 5.06 9828531928482

It's already written more than the used space.

Cheers,
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Helmut Hullen
Hallo, Stephane,

Du meintest am 01.04.11:

 balancing about 2 TByte needed about 20 hours.
 [...]

 I've got a balance running since Monday on a 9TB volume (3.5 of which
 are used, 3.2 allegedly free), showing no sign of finishing soon.
 Should I be worried?

 Using /proc/sys/vm/block_dump, I can see it's seeking all over the
 place, which is probably why throughput is not high. I can also see
 it writing several times to the same sectors.


Hugo has explained the limits of regarding

dmesg | grep relocating

or (more simple) the last lines of dmesg and looking for the  
relocating lines. But: what do these lines tell now? What is the  
(pessimistic) estimation when you extrapolate the data?

(please excuse my gerlish)

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html