re: btrfs: initial readahead code and prototypes

2012-05-17 Thread Dan Carpenter
Hi, I'm working on some new Smatch code and it complains about this
patch from last year. -Dan


This is a semi-automatic email about new static checker warnings.

The patch 7414a03fbf9e: btrfs: initial readahead code and 
prototypes from May 23, 2011, leads to the following Smatch 
complaint:

fs/btrfs/reada.c:147 __readahead_hook()
 error: we previously assumed 'eb' could be null (see line 122)

fs/btrfs/reada.c
   121  
   122  if (eb)
   
Checked here.

   123  level = btrfs_header_level(eb);
   124  
   125  /* find extent */
   126  spin_lock(fs_info-reada_lock);
   127  re = radix_tree_lookup(fs_info-reada_tree, index);
   128  if (re)
   129  kref_get(re-refcnt);
   130  spin_unlock(fs_info-reada_lock);
   131  
   132  if (!re)
   133  return -1;
   134  
   135  spin_lock(re-lock);
   136  /*
   137   * just take the full list from the extent. afterwards we
   138   * don't need the lock anymore
   139   */
   140  list_replace_init(re-extctl, list);
   141  for_dev = re-scheduled_for;
   142  re-scheduled_for = NULL;
   143  spin_unlock(re-lock);
   144  
   145  if (err == 0) {
   146  nritems = level ? btrfs_header_nritems(eb) : 0;
  ^
Checked here again indirectly.

   147  generation = btrfs_header_generation(eb);
 ^^^
Dereferenced inside function without checking.

   148  /*
   149   * FIXME: currently we just set nritems to 0 if this is 
a leaf,

regards,
dan carpenter

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] Btrfs-progs: support get/reset device stats via ioctl

2012-05-17 Thread Stefan Behrens

On 05/16/2012 19:03, Andrei Popa wrote:

It would be nice if this function could show the file names affected by
errors, in case of a single, non-redundant drive, btrfs-progs should
show what files are affected by errors.
Then, an admin could restore only those files from backup.

On Wed, 2012-05-16 at 18:50 +0200, Stefan Behrens wrote:

btrfs device stats is used to retrieve and print the device stats.
btrfs device stats -z is used to atomically retrieve, reset and
print the stats.



In case of disk errors, it is recommended to run scrub on that disk. It 
checks the in-use disk contents for errors, repairs errors where 
possible, and the scrub tool does print the paths and filenames of 
errored files into the kernel log.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-17 Thread Stefan Behrens

On 05/17/2012 03:52, Liu Bo wrote:

On 05/17/2012 12:50 AM, Stefan Behrens wrote:


The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.



Hi Stefan,

Just scaned the patch for a while and got a question:

Adding a new key type usually means changing the disk format,
so have you done something for this?



Hi Liu,

New tree entries with new keys are added to the device tree. Old kernels 
do not search for these keys and therefore ignore them. New kernels 
(with this patch) search for these keys and read and write them, or 
create them when required. That works fine.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.4.0-rc6: WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xae/0xc0 [btrfs]()

2012-05-17 Thread Martin Mailand

Hi,
I got the same Warning but triggered it differently, I created a new 
cephfs on top of btrfs via mkcephfs, the command than hangs.


[  100.643838] Btrfs loaded
[  100.644313] device fsid 49b89a47-76a0-45cf-9e4a-a7e1f4c64bb8 devid 1 
transid 4 /dev/sdc

[  100.645523] btrfs: setting nodatacow
[  100.645527] btrfs: enabling auto defrag
[  100.645529] btrfs: disk space caching is enabled
[  100.645531] btrfs flagging fs with big metadata feature
...

[ 2501.141664] [ cut here ]
[ 2501.141700] WARNING: at fs/btrfs/super.c:219 
__btrfs_abort_transaction+0xae/0xc0 [btrfs]()

[ 2501.141714] Hardware name: X9SRi
[ 2501.141721] btrfs: Transaction aborted
[ 2501.141722] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode sb_edac psmouse serio_raw edac_core joydev mei(C) ioatdma 
mac_hid lp parport ses enclosure usbhid isci hid libsas 
scsi_transport_sas megaraid_sas ixgbe igb mdio dca
[ 2501.141892] Pid: 12129, comm: ceph-osd Tainted: G C 
3.4.0-rc7+ #10

[ 2501.141910] Call Trace:
[ 2501.141927]  [810504ef] warn_slowpath_common+0x7f/0xc0
[ 2501.141945]  [810505e6] warn_slowpath_fmt+0x46/0x50
[ 2501.141972]  [a01ffbde] __btrfs_abort_transaction+0xae/0xc0 
[btrfs]
[ 2501.142024]  [a026913a] ? 
btrfs_add_delayed_tree_ref+0x8a/0x1c0 [btrfs]
[ 2501.142090]  [a022b70b] cow_file_range_inline+0x1bb/0x1c0 
[btrfs]

[ 2501.142137]  [a022b82f] cow_file_range+0x11f/0x480 [btrfs]
[ 2501.142187]  [a024a31f] ? free_extent_buffer+0x2f/0x70 [btrfs]
[ 2501.142235]  [a022bf77] run_delalloc_nocow+0x3e7/0x8c0 [btrfs]
[ 2501.142281]  [a022c749] run_delalloc_range+0x2f9/0x360 [btrfs]
[ 2501.142331]  [a024919d] __extent_writepage+0x61d/0x760 [btrfs]
[ 2501.142366]  [81165f9f] ? kmem_cache_free+0x2f/0x110
[ 2501.142412]  [a02495aa] 
extent_write_cache_pages.isra.25.constprop.39+0x2ca/0x3f0 [btrfs]

[ 2501.142477]  [a0249915] extent_writepages+0x45/0x60 [btrfs]
[ 2501.142524]  [a0228980] ? btrfs_submit_direct+0x640/0x640 
[btrfs]

[ 2501.142570]  [a0226d08] btrfs_writepages+0x28/0x30 [btrfs]
[ 2501.142604]  [81125b41] do_writepages+0x21/0x40
[ 2501.142635]  [8111b18b] __filemap_fdatawrite_range+0x5b/0x60
[ 2501.142669]  [8111c05c] filemap_flush+0x1c/0x20
[ 2501.142713]  [a02328b9] 
btrfs_start_delalloc_inodes+0xc9/0x1f0 [btrfs]

[ 2501.142763]  [8107cc13] ? __wake_up+0x53/0x70
[ 2501.142806]  [a02244bd] 
btrfs_commit_transaction+0x3bd/0xa60 [btrfs]

[ 2501.142856]  [810737c0] ? add_wait_queue+0x60/0x60
[ 2501.142896]  [a020982a] ? block_rsv_migrate_bytes+0x3a/0x50 
[btrfs]

[ 2501.142946]  [a0258106] btrfs_mksubvol+0x356/0x3a0 [btrfs]
[ 2501.142991]  [a025827a] 
btrfs_ioctl_snap_create_transid+0x12a/0x190 [btrfs]
[ 2501.143053]  [a0258336] btrfs_ioctl_snap_create+0x56/0x80 
[btrfs]

[ 2501.143099]  [a025a40d] btrfs_ioctl+0x44d/0x1320 [btrfs]
[ 2501.143134]  [81140dd8] ? handle_mm_fault+0x1f8/0x310
[ 2501.143166]  [81189dd2] ? do_filp_open+0x42/0xa0
[ 2501.143197]  [8118be98] do_vfs_ioctl+0x98/0x550
[ 2501.143228]  [81165f9f] ? kmem_cache_free+0x2f/0x110
[ 2501.143259]  [8118c3e1] sys_ioctl+0x91/0xa0
[ 2501.143291]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 2501.143321] ---[ end trace 7d4c76238d6eae30 ]---
[ 2501.143350] BTRFS error (device sdc) in cow_file_range_inline:261: 
error 28

[ 2501.143381] btrfs is forced readonly
[ 2501.143407] BTRFS error (device sdc) in cow_file_range:871: error 28
[ 2501.143444] BTRFS error (device sdc) in run_delalloc_nocow:1333: error 28

btrfs filesystem df /data/osd.0/
Data: total=112.01GB, used=1.02MB
System, DUP: total=8.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=112.00GB, used=288.00KB
Metadata: total=8.00MB, used=0.00
e5:~$ df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/mapper/redundant-root   85G  3,9G   77G   5% /
udev 32G  4,0K   32G   1% /dev
tmpfs13G  292K   13G   1% /run
none5,0M 0  5,0M   0% /run/lock
none 32G 0   32G   0% /run/shm
/dev/sda1   228M   68M  148M  32% /boot
/dev/sdc5,5T  1,7M  5,3T   1% /data/osd.0
/dev/sdd5,5T  1,7M  5,3T   1% /data/osd.1
/dev/sde5,5T  1,6M  5,3T   1% /data/osd.2
/dev/sdf5,5T  1,7M  5,3T   1% /data/osd.3


Ceph command.

mkcephfs -c /etc/ceph/ceph.conf -a
temp dir is /tmp/mkcephfs.2kN76CD9ut
preparing monmap in /tmp/mkcephfs.2kN76CD9ut/monmap
/usr/bin/monmaptool --create --clobber --add a 192.168.125.10:6789 
--print /tmp/mkcephfs.2kN76CD9ut/monmap

/usr/bin/monmaptool: monmap file /tmp/mkcephfs.2kN76CD9ut/monmap
/usr/bin/monmaptool: 

Re: [PATCH v3 0/3] Btrfs-progs: support get/reset device stats via ioctl

2012-05-17 Thread Andrei Popa
On Thu, 2012-05-17 at 10:44 +0200, Stefan Behrens wrote: 
 On 05/16/2012 19:03, Andrei Popa wrote:
  It would be nice if this function could show the file names affected by
  errors, in case of a single, non-redundant drive, btrfs-progs should
  show what files are affected by errors.
  Then, an admin could restore only those files from backup.
 
  On Wed, 2012-05-16 at 18:50 +0200, Stefan Behrens wrote:
  btrfs device stats is used to retrieve and print the device stats.
  btrfs device stats -z is used to atomically retrieve, reset and
  print the stats.
 
 
 In case of disk errors, it is recommended to run scrub on that disk. It 
 checks the in-use disk contents for errors, repairs errors where 
 possible, and the scrub tool does print the paths and filenames of 
 errored files into the kernel log.

In an automated script or for the usual user it would be easier to get
the output from btrfs-progs scrub command with the affected files,
instead from kernel log. 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How can I create useful bug reports? Here are 50 lines of messages including Call Trace from a system crash.

2012-05-17 Thread Peter Maloney
Hi,

I am using btrfs at home on my root system because I want to be able to
send useful bug reports when things go wrong.

And I have 3 questions:

What kernel should I be using?

And how do I create good bug reports? Is a Call Trace that I find in
/var/log/messages enough, or do I need to install some debug packages
and run some tools?

Can someone also tell me how to find device error counts? (like what
ZFS's zpool status shows under the read and write columns, not
scrub/checksums on data, but the device errors)



I am using version 3.4.0-rc7-1-default which I got using openSUSE KOTD.
Is that a good choice? It would be convenient to use these openSUSE
repositories.

Someone in the #btrfs IRC channel told me to use this:
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

I managed to crash my system today with a USB stick that is defective.
(dd to the direct device hangs and causes it to change device names, eg.
from sdb to sdc.) I would like to properly report this so it can be
fixed. A bad disk should not take down the system.

All I did was create a btrfs fs, dd a file onto the disk which caused an
Input/Output error, and then try to run btrfs scrub start and btrfs
filesystem show to try to find an error count like you would find in a
zpool status using the ZFS file system.

Here are 50 lines from /var/log/messages, starting with what I assume is
the first error.

May 17 10:30:01 peterlaptop kernel: [ 3119.537086] lost page write due
to I/O error on sdb1
May 17 10:30:01 peterlaptop kernel: [ 3119.537100] lost page write due
to I/O error on sdb1
May 17 10:30:01 peterlaptop kernel: [ 3119.537105] BTRFS error (device
sdb1) in write_all_supers:2890: IO failure (1 errors while writing supers)
May 17 10:30:01 peterlaptop kernel: [ 3119.537108] btrfs: commit super
ret -5
May 17 10:30:01 peterlaptop kernel: [ 3119.542081] [ cut
here ]
May 17 10:30:01 peterlaptop kernel: [ 3119.542135] WARNING: at
/home/abuild/rpmbuild/BUILD/kernel-default-3.4.rc7/linux-3.4-rc7/fs/btrfs/extent-tree.c:124
btrfs_put_block_group+0x5a/0x60 [btrfs]()
May 17 10:30:01 peterlaptop kernel: [ 3119.542139] Hardware name: HP
Compaq nx8220 (PY522ET#ABD)
May 17 10:30:01 peterlaptop kernel: [ 3119.542141] Modules linked in:
loop nls_iso8859_1 nls_cp437 reiserfs minix hfs vfat fat usb_storage uas
xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables
cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq
mperf fuse snd_intel8x0m snd_intel8x0 snd_ac97_codec sr_mod sg ppdev
firewire_ohci iTCO_wdt iTCO_vendor_support ac97_bus snd_pcm ipw2200
snd_timer firewire_core cdrom hp_wmi tifm_7xx1 tifm_core parport_pc
parport joydev pcmcia snd sparse_keymap libipw tg3 pcspkr irda wmi
sdhci_pci serio_raw yenta_socket pcmcia_rsrc pcmcia_core sdhci microcode
mmc_core video cfg80211 rfkill button ac soundcore crc_ccitt container
battery snd_page_alloc lib80211 crc_itu_t autofs4 btrfs zlib_deflate
usbhid hid a
May 17 10:30:01 peterlaptop kernel: ta_generic uhci_hcd ehci_hcd
ata_piix ahci libahci radeon ttm drm_kms_helper libata rtc_cmos fan
thermal drm i2c_algo_bit i2c_core processor thermal_sys hwmon usbcore
usb_common
May 17 10:30:01 peterlaptop kernel: [ 3119.542221] Pid: 7558, comm:
umount Tainted: GW3.4.0-rc7-1-default #1
May 17 10:30:01 peterlaptop kernel: [ 3119.542224] Call Trace:
May 17 10:30:01 peterlaptop kernel: [ 3119.542239]  [c0205359]
try_stack_unwind+0x199/0x1b0
May 17 10:30:01 peterlaptop kernel: [ 3119.542248]  [c02041d7]
dump_trace+0x47/0xf0
May 17 10:30:01 peterlaptop kernel: [ 3119.542253]  [c02053bb]
show_trace_log_lvl+0x4b/0x60
May 17 10:30:01 peterlaptop kernel: [ 3119.542258]  [c02053e8]
show_trace+0x18/0x20
May 17 10:30:01 peterlaptop kernel: [ 3119.542264]  [c06952f1]
dump_stack+0x6d/0x72
May 17 10:30:01 peterlaptop kernel: [ 3119.542271]  [c0231058]
warn_slowpath_common+0x78/0xb0
May 17 10:30:01 peterlaptop kernel: [ 3119.542276]  [c02310ab]
warn_slowpath_null+0x1b/0x20
May 17 10:30:01 peterlaptop kernel: [ 3119.542295]  [f82ab0ba]
btrfs_put_block_group+0x5a/0x60 [btrfs]
May 17 10:30:01 peterlaptop kernel: [ 3119.542350]  [f82b4dbe]
btrfs_free_block_groups+0x7e/0x2f0 [btrfs]
May 17 10:30:01 peterlaptop kernel: [ 3119.542412]  [f82c02a4]
close_ctree+0x184/0x390 [btrfs]
May 17 10:30:01 peterlaptop kernel: [ 3119.542475]  [c03236aa]
generic_shutdown_super+0x4a/0xc0
May 17 10:30:01 peterlaptop kernel: [ 3119.542481]  [c0323799]
kill_anon_super+0x9/0x20
May 17 10:30:01 peterlaptop kernel: [ 3119.542498]  [f829d3ac]
btrfs_kill_super+0xc/0x70 [btrfs]
May 17 10:30:01 peterlaptop kernel: [ 3119.542516]  [c0323214]
deactivate_locked_super+0x44/0x70
May 17 10:30:01 peterlaptop kernel: [ 3119.542522]  [c033abd9]

Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Martin Mailand

Hi Josef,

somehow I still get the kernel Bug messages, I used your patch from the 
16th against rc7.


-martin

Am 16.05.2012 21:20, schrieb Josef Bacik:

Hrm ok so I finally got some time to try and debug it and let the test run a
good long while (5 hours almost) and I couldn't hit either the original bug or
the one you guys were hitting.  So either my extra little bit of locking did the
trick or I get to keep my Worst reproducer ever award.  Can you guys give this
one a whirl and if it panics send the entire dmesg since it should spit out a
WARN_ON() to let me know what I thought was the problem was it.  Thanks,


[ 2868.813236] [ cut here ]
[ 2868.813297] kernel BUG at fs/btrfs/inode.c:2220!
[ 2868.813355] invalid opcode:  [#2] SMP
[ 2868.813479] CPU 2
[ 2868.813516] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb megaraid_sas dca mdio

[ 2868.814871]
[ 2868.814925] Pid: 5325, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 2868.815108] RIP: 0010:[a02212f2]  [a02212f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 2868.815236] RSP: 0018:880296e89d18  EFLAGS: 00010282
[ 2868.815294] RAX: fffe RBX: 88101ef3c390 RCX: 
00562497
[ 2868.815355] RDX: 00562496 RSI: 88101ef1 RDI: 
ea00407bc400
[ 2868.815416] RBP: 880296e89d58 R08: 60ef8fd0 R09: 
a01f8c6a
[ 2868.815476] R10:  R11: 011d R12: 
880fdf602790
[ 2868.815537] R13: 880fdf602400 R14: 0001 R15: 
0001
[ 2868.815598] FS:  7f07d5512700() GS:88107fc4() 
knlGS:

[ 2868.815675] CS:  0010 DS:  ES:  CR0: 80050033
[ 2868.815734] CR2: 0ab16000 CR3: 00082a6b2000 CR4: 
000407e0
[ 2868.815796] DR0:  DR1:  DR2: 

[ 2868.815858] DR3:  DR6: 0ff0 DR7: 
0400
[ 2868.815920] Process ceph-osd (pid: 5325, threadinfo 880296e88000, 
task 8810170616e0)

[ 2868.815997] Stack:
[ 2868.816049]  0c07 88101ef12960 880296e89d38 
88101ef12960
[ 2868.816262]   880fdf602400 88101ef3c390 
880b4ce2f260
[ 2868.816485]  880296e89e08 a0225628 88101ef3c390 


[ 2868.816694] Call Trace:
[ 2868.816755]  [a0225628] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 2868.816817]  [81188afd] ? path_lookupat+0x6d/0x750
[ 2868.816880]  [a0227021] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 2868.816940]  [811955c3] notify_change+0x183/0x320
[ 2868.816998]  [8117889e] do_truncate+0x5e/0xa0
[ 2868.817056]  [81178a24] sys_truncate+0x144/0x1b0
[ 2868.817115]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 2868.817173] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 
80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff 
eb b8 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 2868.819501] RIP  [a02212f2] btrfs_orphan_del+0xe2/0xf0 [btrfs]
[ 2868.819602]  RSP 880296e89d18
[ 2868.819703] ---[ end trace 94d17b770b376c84 ]---
[ 3249.857453] [ cut here ]
[ 3249.857481] kernel BUG at fs/btrfs/inode.c:2220!
[ 3249.857506] invalid opcode:  [#3] SMP
[ 3249.857534] CPU 0
[ 3249.857538] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb megaraid_sas dca mdio

[ 3249.857721]
[ 3249.857740] Pid: 5384, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 3249.857791] RIP: 0010:[a02212f2]  [a02212f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 3249.857847] RSP: 0018:880abe8b5d18  EFLAGS: 00010282
[ 3249.857873] RAX: fffe RBX: 8807eb8b6670 RCX: 
0077a084
[ 3249.857902] RDX: 0077a083 RSI: 88101ee497e0 RDI: 
ea00407b9240
[ 3249.857931] RBP: 880abe8b5d58 R08: 60ef8fd0 R09: 
a01f8c6a
[ 3249.857959] R10:  R11: 0153 R12: 
880d56825390
[ 3249.857988] R13: 880d56825000 R14: 0001 R15: 
0001
[ 3249.858017] FS:  7f06bd13b700() GS:88107fc0() 
knlGS:

[ 3249.858062] CS:  0010 DS:  ES:  CR0: 80050033
[ 3249.858088] CR2: 043d2000 CR3: 000e7ebe5000 CR4: 
000407f0
[ 3249.858117] DR0:  DR1:  DR2: 

[ 3249.858146] DR3:  DR6: 0ff0 DR7: 

Re: How can I create useful bug reports? Here are 50 lines of messages including Call Trace from a system crash.

2012-05-17 Thread Hugo Mills
On Thu, May 17, 2012 at 12:26:42PM +0200, Peter Maloney wrote:
 I am using btrfs at home on my root system because I want to be able to
 send useful bug reports when things go wrong.
 
 And I have 3 questions:
 
 What kernel should I be using?

   One of:

 - josef's btrfs-next[1],
 - Chris's main repo[2], or
 - kernel.org mainline -rc kernels[3].

   The latter two will generally be carrying identical btrfs code. The
first one is rather more experimental.

 And how do I create good bug reports? Is a Call Trace that I find in
 /var/log/messages enough, or do I need to install some debug packages
 and run some tools?

   If you have a backtrace in /var/log/messages, yes, that's a good
start. Generally, state what you did to get the error, whether it's
repeatable, what kernel version you're using, and any error messages
you got. If there's extra info needed, whoever picks it up will ask.

 Can someone also tell me how to find device error counts? (like what
 ZFS's zpool status shows under the read and write columns, not
 scrub/checksums on data, but the device errors)

   We don't have those right now -- Stefan Behrens posted a patch here
yesterday to keep track of them. :)

 I am using version 3.4.0-rc7-1-default which I got using openSUSE KOTD.
 Is that a good choice? It would be convenient to use these openSUSE
 repositories.

   Yes, that's reasonable.

 Someone in the #btrfs IRC channel told me to use this:
 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 
 I managed to crash my system today with a USB stick that is defective.
 (dd to the direct device hangs and causes it to change device names, eg.
 from sdb to sdc.) I would like to properly report this so it can be
 fixed. A bad disk should not take down the system.

   Proper error handling is an ongoing work. It's a lot better than it
used to be (back in 2.6.32 days, if you ran out of space, the whole
system could come down :) ), but there's still quite a few things left
to deal with. USB is distinctly unreliable, and seems to cause more
problems than most other block stacks right now.

   Hugo.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
[2] git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
[3] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- There's more than one way to do it is not a commandment. It ---  
   is a dire warning.


signature.asc
Description: Digital signature


[PATCH 1/5] Btrfs: stop defrag the files automatically when doin readonly remount or umount

2012-05-17 Thread Miao Xie
If we remount the fs to be readonly or umount it, we should not continue
defraging the files, it is because
- the auto defragment will introduce lots of dirty pages, it breaks the rule
  of a readonly file system.
- it make the time of remount/umount become longer.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |   12 +++-
 fs/btrfs/file.c|3 ++-
 fs/btrfs/super.c   |5 +
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 20196f4..9a571f7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1529,6 +1529,9 @@ static int cleaner_kthread(void *arg)
do {
vfs_check_frozen(root-fs_info-sb, SB_FREEZE_WRITE);
 
+   if (!down_read_trylock(root-fs_info-sb-s_umount))
+   goto skip;
+
if (!(root-fs_info-sb-s_flags  MS_RDONLY) 
mutex_trylock(root-fs_info-cleaner_mutex)) {
btrfs_run_delayed_iputs(root);
@@ -1536,7 +1539,8 @@ static int cleaner_kthread(void *arg)
mutex_unlock(root-fs_info-cleaner_mutex);
btrfs_run_defrag_inodes(root-fs_info);
}
-
+   up_read(root-fs_info-sb-s_umount);
+skip:
if (!try_to_freeze()) {
set_current_state(TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
@@ -3049,13 +3053,11 @@ int close_ctree(struct btrfs_root *root)
 
btrfs_scrub_cancel(root);
 
-   /* wait for any defraggers to finish */
-   wait_event(fs_info-transaction_wait,
-  (atomic_read(fs_info-defrag_running) == 0));
-
/* clear out the rbtree of defraggable inodes */
btrfs_run_defrag_inodes(fs_info);
 
+   BUG_ON(atomic_read(fs_info-defrag_running));
+
/*
 * Here come 2 situations when btrfs is broken to flip readonly:
 *
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d83260d..23364c1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -230,7 +230,8 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info)
first_ino = defrag-ino + 1;
rb_erase(defrag-rb_node, fs_info-defrag_inodes);
 
-   if (btrfs_fs_closing(fs_info))
+   if (btrfs_fs_closing(fs_info) ||
+   (fs_info-sb-s_flags  MS_RDONLY))
goto next_free;
 
spin_unlock(fs_info-defrag_inodes_lock);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 84571d7..7deb00e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1151,6 +1151,11 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
ret = btrfs_commit_super(root);
if (ret)
goto restore;
+
+   /* clear out the rbtree of defraggable inodes */
+   btrfs_run_defrag_inodes(fs_info);
+
+   BUG_ON(atomic_read(fs_info-defrag_running));
} else {
if (fs_info-fs_devices-rw_devices == 0)
ret = -EACCES;
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Btrfs: count the chunks which will be relocated at first

2012-05-17 Thread Miao Xie
the balance function should count the chunks which will be relocated at first,
and then relocate those chunks one by one.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 759d024..91da8a2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2580,7 +2580,7 @@ again:
 
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
 
-   if (!counting) {
+   if (counting) {
spin_lock(fs_info-balance_lock);
bctl-stat.considered++;
spin_unlock(fs_info-balance_lock);
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Btrfs: pause/recover the space balance when doing remount

2012-05-17 Thread Miao Xie
pause the space balance threads when remounting the fs to be readonly,
and recover it when remounting it from r/o to r/w

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/super.c   |9 -
 fs/btrfs/volumes.c |8 +++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7deb00e..ea17f0a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1148,6 +1148,9 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
if (*flags  MS_RDONLY) {
sb-s_flags |= MS_RDONLY;
 
+   /* pause restriper - we want to resume on remount to r/w */
+   btrfs_pause_balance(root-fs_info);
+
ret = btrfs_commit_super(root);
if (ret)
goto restore;
@@ -1174,7 +1177,10 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
if (ret)
goto restore;
 
-   sb-s_flags = ~MS_RDONLY;
+   if (sb-s_flags  MS_RDONLY) {
+   sb-s_flags = ~MS_RDONLY;
+   btrfs_recover_balance(fs_info-tree_root);
+   }
}
 
return 0;
@@ -1190,6 +1196,7 @@ restore:
fs_info-alloc_start = old_alloc_start;
fs_info-thread_pool_size = old_thread_pool_size;
fs_info-metadata_ratio = old_metadata_ratio;
+
return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 91da8a2..c536d52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2833,7 +2833,13 @@ static int balance_kthread(void *data)
mutex_lock(fs_info-volume_mutex);
mutex_lock(fs_info-balance_mutex);
 
-   set_balance_control(bctl);
+   if (fs_info-balance_ctl) {
+   kfree(bctl);
+   bctl = fs_info-balance_ctl;
+   bctl-flags = bctl-flags | BTRFS_BALANCE_RESUME;
+   } else {
+   set_balance_control(bctl);
+   }
 
if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
printk(KERN_INFO btrfs: force skipping balance\n);
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Btrfs: cancel the scrub when remounting a fs to ro

2012-05-17 Thread Miao Xie
If the filesystem is mounted to readonly, we should not run scrub.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/super.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ea17f0a..817b3a7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1151,6 +1151,8 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
/* pause restriper - we want to resume on remount to r/w */
btrfs_pause_balance(root-fs_info);
 
+   btrfs_scrub_cancel(root);
+
ret = btrfs_commit_super(root);
if (ret)
goto restore;
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()

2012-05-17 Thread Miao Xie
We forget to free fs_info-balance_ctl in the btrfs_pause_balance()
when umounting the fs.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c536d52..fd7fe80 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
ret = -ENOTCONN;
}
 
+   if (btrfs_fs_closing(fs_info)  fs_info-balance_ctl)
+   unset_balance_control(fs_info);
+
mutex_unlock(fs_info-balance_mutex);
return ret;
 }
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: do not resize a seeding device

2012-05-17 Thread Liu Bo
Seeding devices are not supposed to change any more.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f056469..ec2245d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1303,6 +1303,13 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
ret = -EINVAL;
goto out_free;
}
+   if (device-fs_devices  device-fs_devices-seeding) {
+   printk(KERN_INFO btrfs: resizer unable to apply on 
+  seeding device %s\n, device-name);
+   ret = -EACCES;
+   goto out_free;
+   }
+
if (!strcmp(sizestr, max))
new_size = device-bdev-bd_inode-i_size;
else {
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: resize all devices when we dont assign a specific device id

2012-05-17 Thread Liu Bo
This patch fixes two bugs:

When we do not assigne a device id for the resizer,
- it will only take one device to resize, which is supposed to apply on
  all available devices.
- it will take 'id 1' device as default, and this will cause a bug as we
  may have removed the 'id 1' device from the filesystem.

After this patch, we can find all available devices by searching the
chunk tree and resize them:

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/
$ btrfs dev add /dev/sdb8 /mnt/btrfs/

$ btrfs fi resize -100m /mnt/btrfs/
then we can get from dmesg:
btrfs: new size for /dev/sdb7 is 980844544
btrfs: new size for /dev/sdb8 is 980844544

$ btrfs fi resize max /mnt/btrfs
then we can get from dmesg:
btrfs: new size for /dev/sdb7 is 1085702144
btrfs: new size for /dev/sdb8 is 1085702144

$ btrfs fi resize 1:-100m /mnt/btrfs
then we can get from dmesg:
btrfs: resizing devid 1
btrfs: new size for /dev/sdb7 is 980844544

$ btrfs fi resize 1:-100m /mnt/btrfs
then we can get from dmesg:
btrfs: resizing devid 2
btrfs: new size for /dev/sdb8 is 980844544

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |  101 --
 1 files changed, 83 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ec2245d..d9a4fa8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1250,12 +1250,51 @@ out_ra:
return ret;
 }
 
+static struct btrfs_device *get_avail_device(struct btrfs_root *root, u64 
devid)
+{
+   struct btrfs_key key;
+   struct btrfs_path *path;
+   struct btrfs_dev_item *dev_item;
+   struct btrfs_device *device = NULL;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return ERR_PTR(-ENOMEM);
+
+   key.objectid = BTRFS_DEV_ITEMS_OBJECTID;
+   key.offset = devid;
+   key.type = BTRFS_DEV_ITEM_KEY;
+
+   ret = btrfs_search_slot(NULL, root-fs_info-chunk_root, key,
+   path, 0, 0);
+   if (ret  0) {
+   device = ERR_PTR(ret);
+   goto out;
+   }
+   btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
+   if (key.objectid != BTRFS_DEV_ITEMS_OBJECTID ||
+   key.type != BTRFS_DEV_ITEM_KEY) {
+   device = NULL;
+   goto out;
+   }
+   dev_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+ struct btrfs_dev_item);
+   devid = btrfs_device_id(path-nodes[0], dev_item);
+
+   device = btrfs_find_device(root, devid, NULL, NULL);
+out:
+   btrfs_free_path(path);
+   return device;
+}
+
 static noinline int btrfs_ioctl_resize(struct btrfs_root *root,
void __user *arg)
 {
-   u64 new_size;
+   u64 new_size = 0;
u64 old_size;
-   u64 devid = 1;
+   u64 orig_new_size = 0;
+   u64 devid = (-1ULL);
struct btrfs_ioctl_vol_args *vol_args;
struct btrfs_trans_handle *trans;
struct btrfs_device *device = NULL;
@@ -1263,6 +1302,8 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
char *devstr = NULL;
int ret = 0;
int mod = 0;
+   int scan_all = 1;
+   int use_max = 0;
 
if (root-fs_info-sb-s_flags  MS_RDONLY)
return -EROFS;
@@ -1295,8 +1336,31 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
devid = simple_strtoull(devstr, end, 10);
printk(KERN_INFO btrfs: resizing devid %llu\n,
   (unsigned long long)devid);
+   scan_all = 0;
}
-   device = btrfs_find_device(root, devid, NULL, NULL);
+
+   if (!strcmp(sizestr, max)) {
+   use_max = 1;
+   } else {
+   if (sizestr[0] == '-') {
+   mod = -1;
+   sizestr++;
+   } else if (sizestr[0] == '+') {
+   mod = 1;
+   sizestr++;
+   }
+   orig_new_size = memparse(sizestr, NULL);
+   if (orig_new_size == 0) {
+   ret = -EINVAL;
+   goto out_free;
+   }
+   }
+
+   if (devid  (-1ULL))
+   device = btrfs_find_device(root, devid, NULL, NULL);
+   else
+   device = get_avail_device(root, 0);
+again:
if (!device) {
printk(KERN_INFO btrfs: resizer unable to find device %llu\n,
   (unsigned long long)devid);
@@ -1310,22 +1374,10 @@ static noinline int btrfs_ioctl_resize(struct 
btrfs_root *root,
goto out_free;
}
 
-   if (!strcmp(sizestr, max))
+   if (use_max)
new_size = device-bdev-bd_inode-i_size;
-   else {
-   if (sizestr[0] == '-') {
-   mod = -1;
-   sizestr++;
-   } else if (sizestr[0] == '+') {
-   

Re: btrfs: initial readahead code and prototypes

2012-05-17 Thread Dan Carpenter
On Thu, May 17, 2012 at 03:31:50PM +0200, Arne Jansen wrote:
 The assumption here is that if err == 0, eb is always != NULL. There's
 even a tiny comment above the function stating this:
 
   107  /* in case of err, eb might be NULL */
 

Ah, right.  I missed the comment.

 This code changes significantly with the patch
 
 btrfs: extend readahead interface
 
 Where it is written in a more obvious way.

Cool.

regards,
dan carpenter

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] btrfs: extend readahead interface

2012-05-17 Thread Arne Jansen

On 05/09/12 16:48, David Sterba wrote:

On Thu, Apr 12, 2012 at 05:54:38PM +0200, Arne Jansen wrote:

@@ -97,30 +119,87 @@ struct reada_machine_work {
+/*
+ * this is the default callback for readahead. It just descends into the
+ * tree within the range given at creation. if an error occurs, just cut
+ * this part of the tree
+ */
+static void readahead_descend(struct btrfs_root *root, struct reada_control 
*rc,
+ u64 wanted_generation, struct extent_buffer *eb,
+ u64 start, int err, struct btrfs_key *top,
+ void *ctx)
+{
+   int nritems;
+   u64 generation;
+   int level;
+   int i;
+
+   BUG_ON(err == -EAGAIN); /* FIXME: not yet implemented, don't cancel
+* readahead with default callback */
+
+   if (err || eb == NULL) {
+   /*
+* this is the error case, the extent buffer has not been
+* read correctly. We won't access anything from it and
+* just cleanup our data structures. Effectively this will
+* cut the branch below this node from read ahead.
+*/
+   return;
+   }
+
+   level = btrfs_header_level(eb);
+   if (level == 0) {
+   /*
+* if this is a leaf, ignore the content.
+*/
+   return;
+   }
+
+   nritems = btrfs_header_nritems(eb);
+   generation = btrfs_header_generation(eb);
+
+   /*
+* if the generation doesn't match, just ignore this node.
+* This will cut off a branch from prefetch. Alternatively one could
+* start a new (sub-) prefetch for this branch, starting again from
+* root.
+*/
+   if (wanted_generation != generation)
+   return;


I think I saw passing wanted_generation = 0 somewheree, but cannot find
it now again. Is it an expected value for the default RA callback,
meaning eg.  'any generation I find' ?


No. This here is just the default callback. You've seen
wanted_generation = 0 in the droptree code, where a custom
callback is set that doesn't check the generation.




+
+   for (i = 0; i  nritems; i++) {
+   u64 n_gen;
+   struct btrfs_key key;
+   struct btrfs_key next_key;
+   u64 bytenr;
+
+   btrfs_node_key_to_cpu(eb,key, i);
+   if (i + 1  nritems)
+   btrfs_node_key_to_cpu(eb,next_key, i + 1);
+   else
+   next_key = *top;
+   bytenr = btrfs_node_blockptr(eb, i);
+   n_gen = btrfs_node_ptr_generation(eb, i);
+
+   if (btrfs_comp_cpu_keys(key,rc-key_end)  0
+   btrfs_comp_cpu_keys(next_key,rc-key_start)  0)
+   reada_add_block(rc, bytenr,next_key,
+   level - 1, n_gen, ctx);
+   }
+}

@@ -142,65 +221,21 @@ static int __readahead_hook(struct btrfs_root *root, 
struct extent_buffer *eb,
re-scheduled_for = NULL;
spin_unlock(re-lock);

-   if (err == 0) {
-   nritems = level ? btrfs_header_nritems(eb) : 0;
-   generation = btrfs_header_generation(eb);
-   /*
-* FIXME: currently we just set nritems to 0 if this is a leaf,
-* effectively ignoring the content. In a next step we could
-* trigger more readahead depending from the content, e.g.
-* fetch the checksums for the extents in the leaf.
-*/
-   } else {
+   /*
+* call hooks for all registered readaheads
+*/
+   list_for_each_entry(rec,list, list) {
+   btrfs_tree_read_lock(eb);
/*
-* this is the error case, the extent buffer has not been
-* read correctly. We won't access anything from it and
-* just cleanup our data structures. Effectively this will
-* cut the branch below this node from read ahead.
+* we set the lock to blocking, as the callback might want to
+* sleep on allocations.


What about a more finer control given to the callbacks? The blocking
lock may be unnecessary if the callback does not sleep.


I thought about that, but it would add a bit more complexity. So I
decided for the simpler version in the first run. There is definitely
room for optimization here.



My idea is to add a field to 'struct reada_uptodate_ctx', preset with
BTRFS_READ_LOCK by default, but let the RA user to set it to its needs.


The struct is only used in the special case the extent is already
uptodate, to pass the parameters to the worker in this case. The
user has no influence on that. It could either be stored per
request in struct reada_extctl or per reada in struct reada_control.
But this would also not be optimal. There better way 

Btrfs storage advice

2012-05-17 Thread Jim

Hi btrfs list,
I am looking for some counsel regarding how to best (and most safely) 
utilize extra space on my btrfs installation.  I set up a btrfs 
installation about 6 months ago.  I wanted to test the system while 
waiting for mainline acceptance and support.  The machine being used has 
13 1Tb drives.  12 as a btrfs collection (stripe data, mirror metadata) 
and 1 ext4 as a system drive.  We are running kernel 3.2.0-rc4.  I know 
that it is not the latest, but it has been extremely stable for our 
needs.  Currently the system holds backup files.  2 other filesystems 
are nfs mounts on the machine and backups are created by rsyncing these 
mounts onto btrfs.  The btrfs copies are also snapshotted, so 2 copies 
exist of backup data.  I have added the output of btrfs fi show and 
btrfs fi df below so you can see the layout, as well as a standard df 
-h.  As will be readily apparent, my nfs disks are approaching storage 
limits.  Due to financial constraints I must use the space on the btrfs 
system for nfs storage.  My first thought is to take 3 or 4 T as a 
subvol and export it as nfs.  I have not heard of anyone else exporting 
btrfs, is it possible?  Next idea is to split several drives off the 
btrfs system.  I have removed drives and replaced them as experiments 
with the fs but had much less data on them when I was trying that.  I 
have read many times on the list, about size issues with btrfs, and 
filesystems reporting full when they were far from it.  As my system has 
been very stable just r/w data and creating and removing subvols, I am 
reluctant to change the disk layout, but we will do what we have to.  
Also, if I split disks out they could be mirrored, like our other nfs 
systems.  However, I can stand a small amount of filesystem downtime.  
Therefore to maximize space we may look at not mirroring the segment but 
just mount a backup snapshot if a main fs drive goes out.  Final 
question is what about backup space.  Regardless of how I structure the 
new storage segment, it will need to be backed up with the rest of the 
system.  Once again, I am between maximizing available storage and 
leaving breathing room for btrfs.  As I currently backup over 4T on 
btrfs perhaps I should only allocate 2T for new storage thus creating 2T 
storage, 6+T backup and 1+T breathing room.  I am not in a panic 
situation, but I will need to create the new storage over the next 2 
months.  I would really appreciate any feed back and comments concerning 
this operation.  Thanks in advance.

Jim Maloney

[root@btrfs ~]# btrfs fi show
failed to read /dev/sr0
Label: none  uuid: c21f1221-a224-4ba4-92e5-cdea0fa6d0f9
Total devices 12 FS bytes used 4.62TB
devid   12 size 930.99GB used 414.75GB path /dev/sdl
devid   11 size 930.99GB used 414.75GB path /dev/sdk
devid   10 size 930.99GB used 414.99GB path /dev/sdj
devid9 size 930.99GB used 414.99GB path /dev/sdi
devid5 size 930.99GB used 414.99GB path /dev/sde
devid2 size 930.99GB used 414.74GB path /dev/sdb
devid1 size 930.99GB used 414.76GB path /dev/sda
devid7 size 930.99GB used 414.99GB path /dev/sdg
devid3 size 930.99GB used 414.74GB path /dev/sdc
devid4 size 930.99GB used 414.74GB path /dev/sdd
devid6 size 930.99GB used 414.99GB path /dev/sdf
devid8 size 930.99GB used 414.99GB path /dev/sdh

[root@btrfs ~]# btrfs fi df /btrfs
Data, RAID0: total=4.54TB, used=4.50TB
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=324.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=164.25GB, used=122.97GB
Metadata: total=8.00MB, used=0.00

[root@btrfs ~]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sdm2 196G   49G  138G  26% /
tmpfs  16G 0   16G   0% /dev/shm
/dev/sdm1 2.0G  137M  1.8G   8% /boot
/dev/sdm5 1.2T   19G  1.1T   2% /var
/dev/sda   11T  4.8T  6.1T  44% /btrfs
10.2.0.42:/data/sites
  2.6T  2.1T  388G  85% /nfs2/data/sites
10.2.0.40:/data/sites
  2.6T  2.3T  218G  92% /nfs1/data/sites


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Josef Bacik
On Thu, May 17, 2012 at 12:29:32PM +0200, Martin Mailand wrote:
 Hi Josef,
 
 somehow I still get the kernel Bug messages, I used your patch from
 the 16th against rc7.
 

Was there anything above those messages?  There should have been a WARN_ON() or
something.  If not thats fine, I just need to know one way or the other so I can
figure out what to do next.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSD format/mount parameters questions

2012-05-17 Thread Martin
For using SSDs:

Are there any format/mount parameters that should be set for using btrfs
on SSDs (other than the ssd mount option)?


General questions:

How long is the 'delay' for the delayed alloc?

Are file allocations aligned to 4kiB boundaries, or larger?

What byte value is used to pad unused space?

(Aside: For some, the erased state reads all 0x00, and for others the
erased state reads all 0xff.)


Background:

I've got a mix of various 120/128GB SSDs to newly set up. I will be
using ext4 on the critical ones, but also wish to compare with btrfs...

The mix includes some SSDs with the Sandforce controller that implements
its own data compression and data deduplication. How well does btrfs fit
with those compared to other non-data-compression controllers?


Regards,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] Btrfs: count the chunks which will be relocated at first

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:56:53PM +0800, Miao Xie wrote:
 the balance function should count the chunks which will be relocated at first,
 and then relocate those chunks one by one.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 759d024..91da8a2 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2580,7 +2580,7 @@ again:
  
   chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
  
 - if (!counting) {
 + if (counting) {
   spin_lock(fs_info-balance_lock);
   bctl-stat.considered++;
   spin_unlock(fs_info-balance_lock);

__btrfs_balance() already calculates the approximate number of chunks
that will be relocated and stores that value in bctl-stat.expected.
The stat.considered counter OTOH is supposed to reflect the number of
chunks processed through balance filters and it is meant to be updated
at relocation pass, so AFAICS if (!counting) is the right test.

What exactly are you trying to fix here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] Btrfs: pause/recover the space balance when doing remount

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:57:40PM +0800, Miao Xie wrote:
 pause the space balance threads when remounting the fs to be readonly,
 and recover it when remounting it from r/o to r/w
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/super.c   |9 -
  fs/btrfs/volumes.c |8 +++-
  2 files changed, 15 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 7deb00e..ea17f0a 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1148,6 +1148,9 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
   if (*flags  MS_RDONLY) {
   sb-s_flags |= MS_RDONLY;
  
 + /* pause restriper - we want to resume on remount to r/w */
 + btrfs_pause_balance(root-fs_info);
 +
   ret = btrfs_commit_super(root);
   if (ret)
   goto restore;
 @@ -1174,7 +1177,10 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
   if (ret)
   goto restore;
  
 - sb-s_flags = ~MS_RDONLY;
 + if (sb-s_flags  MS_RDONLY) {
 + sb-s_flags = ~MS_RDONLY;
 + btrfs_recover_balance(fs_info-tree_root);
 + }
   }
  
   return 0;
 @@ -1190,6 +1196,7 @@ restore:
   fs_info-alloc_start = old_alloc_start;
   fs_info-thread_pool_size = old_thread_pool_size;
   fs_info-metadata_ratio = old_metadata_ratio;
 +
   return ret;
  }
  
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 91da8a2..c536d52 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2833,7 +2833,13 @@ static int balance_kthread(void *data)
   mutex_lock(fs_info-volume_mutex);
   mutex_lock(fs_info-balance_mutex);
  
 - set_balance_control(bctl);
 + if (fs_info-balance_ctl) {
 + kfree(bctl);
 + bctl = fs_info-balance_ctl;
 + bctl-flags = bctl-flags | BTRFS_BALANCE_RESUME;
 + } else {
 + set_balance_control(bctl);
 + }
  
   if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
   printk(KERN_INFO btrfs: force skipping balance\n);

This is a known bug.  There is a deeper problem here, related to the
fact that we restore balancing state not early enough and that we don't
restore it on ro mounts at all.  I have a patch in the works to fix that
problem, and it also fixes this one the right way.  I'm backed up with
other things right now, but I'll post it as soon as I get a chance.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Martin Mailand

Hi Josef,
no there was nothing above. Here the is another dmesg output.


Was there anything above those messages?  There should have been a WARN_ON() or
something.  If not thats fine, I just need to know one way or the other so I can
figure out what to do next.  Thanks,

Josef


-martin

[   63.027277] Btrfs loaded
[   63.027485] device fsid 266726e1-439f-4d89-a374-7ef92d355daf devid 1 
transid 4 /dev/sdc

[   63.027750] btrfs: setting nodatacow
[   63.027752] btrfs: enabling auto defrag
[   63.027753] btrfs: disk space caching is enabled
[   63.027754] btrfs flagging fs with big metadata feature
[   63.036347] device fsid 070e2c6c-2ea5-478d-bc07-7ce3a954e2e4 devid 1 
transid 4 /dev/sdd

[   63.036624] btrfs: setting nodatacow
[   63.036626] btrfs: enabling auto defrag
[   63.036627] btrfs: disk space caching is enabled
[   63.036628] btrfs flagging fs with big metadata feature
[   63.045628] device fsid 6f7b82a9-a1b7-40c6-8b00-2c2a44481066 devid 1 
transid 4 /dev/sde

[   63.045910] btrfs: setting nodatacow
[   63.045912] btrfs: enabling auto defrag
[   63.045913] btrfs: disk space caching is enabled
[   63.045914] btrfs flagging fs with big metadata feature
[   63.831278] device fsid 46890b76-45c2-4ea2-96ee-2ea88e29628b devid 1 
transid 4 /dev/sdf

[   63.831577] btrfs: setting nodatacow
[   63.831579] btrfs: enabling auto defrag
[   63.831579] btrfs: disk space caching is enabled
[   63.831580] btrfs flagging fs with big metadata feature
[ 1521.820412] [ cut here ]
[ 1521.820424] kernel BUG at fs/btrfs/inode.c:2220!
[ 1521.820433] invalid opcode:  [#1] SMP
[ 1521.820448] CPU 4
[ 1521.820452] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses 
enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd 
aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev 
ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb dca megaraid_sas mdio

[ 1521.820562]
[ 1521.820567] Pid: 3095, comm: ceph-osd Tainted: G C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 1521.820591] RIP: 0010:[a02532f2]  [a02532f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 1521.820616] RSP: 0018:881013da9d18  EFLAGS: 00010282
[ 1521.820626] RAX: fffe RBX: 881013a3b7f0 RCX: 
00395dcf
[ 1521.820640] RDX: 00395dce RSI: 88101df77480 RDI: 
ea004077ddc0
[ 1521.820654] RBP: 881013da9d58 R08: 60ef800010d0 R09: 
a022ac6a
[ 1521.820667] R10:  R11: 010a R12: 
88101e378790
[ 1521.820681] R13: 88101e378400 R14: 0001 R15: 
0001
[ 1521.820695] FS:  7faa45d30700() GS:88107fc8() 
knlGS:

[ 1521.820710] CS:  0010 DS:  ES:  CR0: 80050033
[ 1521.820738] CR2: 7fe0efba6010 CR3: 001016fec000 CR4: 
000407e0
[ 1521.820767] DR0:  DR1:  DR2: 

[ 1521.820796] DR3:  DR6: 0ff0 DR7: 
0400
[ 1521.820825] Process ceph-osd (pid: 3095, threadinfo 881013da8000, 
task 881013da44a0)

[ 1521.820870] Stack:
[ 1521.820889]  0c05 88101df9c230 881013da9d38 
88101df9c230
[ 1521.820939]   88101e378400 881013a3b7f0 
880c6880f840
[ 1521.820988]  881013da9e08 a0257628 881013a3b7f0 


[ 1521.821038] Call Trace:
[ 1521.821066]  [a0257628] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 1521.821096]  [81188afd] ? path_lookupat+0x6d/0x750
[ 1521.821128]  [a0259021] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 1521.821156]  [811955c3] notify_change+0x183/0x320
[ 1521.821183]  [8117889e] do_truncate+0x5e/0xa0
[ 1521.821209]  [81178a24] sys_truncate+0x144/0x1b0
[ 1521.821237]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 1521.821265] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 
80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff 
eb b8 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 1521.821458] RIP  [a02532f2] btrfs_orphan_del+0xe2/0xf0 [btrfs]
[ 1521.821492]  RSP 881013da9d18
[ 1521.821758] ---[ end trace aee4c5fe92ee2a67 ]---
[ 6888.637508] btrfs: truncated 1 orphans
[ 7641.701736] [ cut here ]
[ 7641.701764] kernel BUG at fs/btrfs/inode.c:2220!
[ 7641.701789] invalid opcode:  [#2] SMP
[ 7641.701816] CPU 3
[ 7641.701819] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses 
enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd 
aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev 
ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb dca megaraid_sas mdio

[ 7641.702000]
[ 7641.702030] Pid: 3064, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 7641.702081] RIP: 0010:[a02532f2]  [a02532f2] 

Re: [PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:58:53PM +0800, Miao Xie wrote:
 We forget to free fs_info-balance_ctl in the btrfs_pause_balance()
 when umounting the fs.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index c536d52..fd7fe80 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
   ret = -ENOTCONN;
   }
  
 + if (btrfs_fs_closing(fs_info)  fs_info-balance_ctl)
 + unset_balance_control(fs_info);
 +
   mutex_unlock(fs_info-balance_mutex);
   return ret;
  }

It is kfree()'d in free_fs_info(), which should be called on unmount.
Am I missing something here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: initial readahead code and prototypes

2012-05-17 Thread Arne Jansen

On 05/17/12 15:46, Dan Carpenter wrote:

On Thu, May 17, 2012 at 03:31:50PM +0200, Arne Jansen wrote:

The assumption here is that if err == 0, eb is always != NULL. There's
even a tiny comment above the function stating this:

   107  /* in case of err, eb might be NULL */



Ah, right.  I missed the comment.


Thanks for doing this kind of sanity checking :)

-Arne




This code changes significantly with the patch

btrfs: extend readahead interface

Where it is written in a more obvious way.


Cool.

regards,
dan carpenter



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


trim malfunction in linux 3.3.6

2012-05-17 Thread Sergey E. Kolesnikov

Hello.
I've been running Ubuntu 12.04 kernel and btrfs on two partitions of two 
GPT partitioned SSDs. Rootfs was btrfs subvol @ and homes were at 
@home. When I was batch trimming with fstrim / using Ubuntu's 
standard kernel 3.2.0 - everything was fine. Then I compiled vanilla 
3.3.6 kernel ad tried to fstrim again, fs got severely damaged.


It seems that batch trim miscalculates ranges and trims some occupied 
space. Can't say if GPT or other partitioning details matter.


I will try to provide any info possible, but fs is trimmed badly, and I 
need this machine to be up and running, so will have to mkfs.btrfs again 
and use 3.2.0 kernel.


Steps that caused corruption:
1. Created partitions on two (say /dev/sd[ab]) SSD drives with about 1G 
offset from the beginning (first partition is ext4 for /boot)

2. mkfs.btrfs /dev/sd[ab]2
3. created subvolumes @ and @home for mountpoints / and /home 
respectively

4. installed xubuntu 12.04
5. fstrim /
6. everything is ok
7. compiled and installed vanilla 3.3.6 kernel
8. reboot into 3.3.6
9. btrfs scrub - ok
10. fstrim /
11. fs got baaadly corrupted
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Josef Bacik
On Thu, May 17, 2012 at 05:12:55PM +0200, Martin Mailand wrote:
 Hi Josef,
 no there was nothing above. Here the is another dmesg output.
 

Hrm ok give this a try and hopefully this is it, still couldn't reproduce.
Thanks,

Josef

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 3771b85..559e716 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -57,9 +57,6 @@ struct btrfs_inode {
/* used to order data wrt metadata */
struct btrfs_ordered_inode_tree ordered_tree;
 
-   /* for keeping track of orphaned inodes */
-   struct list_head i_orphan;
-
/* list of all the delalloc inodes in the FS.  There are times we need
 * to write all the delalloc pages to disk, and this list is used
 * to walk them all.
@@ -153,6 +150,7 @@ struct btrfs_inode {
unsigned dummy_inode:1;
unsigned in_defrag:1;
unsigned delalloc_meta_reserved:1;
+   unsigned has_orphan_item:1;
 
/*
 * always compress this one file
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ba8743b..72cdf98 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1375,7 +1375,7 @@ struct btrfs_root {
struct list_head root_list;
 
spinlock_t orphan_lock;
-   struct list_head orphan_list;
+   atomic_t orphan_inodes;
struct btrfs_block_rsv *orphan_block_rsv;
int orphan_item_inserted;
int orphan_cleanup_state;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 19f5b45..25dba7a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
root-orphan_block_rsv = NULL;
 
INIT_LIST_HEAD(root-dirty_list);
-   INIT_LIST_HEAD(root-orphan_list);
INIT_LIST_HEAD(root-root_list);
spin_lock_init(root-orphan_lock);
spin_lock_init(root-inode_lock);
@@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
atomic_set(root-log_commit[0], 0);
atomic_set(root-log_commit[1], 0);
atomic_set(root-log_writers, 0);
+   atomic_set(root-orphan_inodes, 0);
root-log_batch = 0;
root-log_transid = 0;
root-last_log_commit = 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 54ae3df..7cc1c96 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2104,12 +2104,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle 
*trans,
struct btrfs_block_rsv *block_rsv;
int ret;
 
-   if (!list_empty(root-orphan_list) ||
+   if (atomic_read(root-orphan_inodes) ||
root-orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
return;
 
spin_lock(root-orphan_lock);
-   if (!list_empty(root-orphan_list)) {
+   if (atomic_read(root-orphan_inodes)) {
spin_unlock(root-orphan_lock);
return;
}
@@ -2166,8 +2166,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, 
struct inode *inode)
block_rsv = NULL;
}
 
-   if (list_empty(BTRFS_I(inode)-i_orphan)) {
-   list_add(BTRFS_I(inode)-i_orphan, root-orphan_list);
+   if (!BTRFS_I(inode)-has_orphan_item) {
+   BTRFS_I(inode)-has_orphan_item = 1;
 #if 0
/*
 * For proper ENOSPC handling, we should do orphan
@@ -2180,6 +2180,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, 
struct inode *inode)
insert = 1;
 #endif
insert = 1;
+   atomic_inc(root-orphan_inodes);
}
 
if (!BTRFS_I(inode)-orphan_meta_reserved) {
@@ -2198,6 +2199,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, 
struct inode *inode)
if (insert = 1) {
ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
if (ret  ret != -EEXIST) {
+   spin_lock(root-orphan_lock);
+   BTRFS_I(inode)-has_orphan_item = 0;
+   spin_unlock(root-orphan_lock);
btrfs_abort_transaction(trans, root, ret);
return ret;
}
@@ -2227,13 +2231,21 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, 
struct inode *inode)
int release_rsv = 0;
int ret = 0;
 
+   /*
+* evict_inode gets called without holding the i_mutex so we need to
+* take the orphan lock to make sure we are safe in messing with these.
+*/
spin_lock(root-orphan_lock);
-   if (!list_empty(BTRFS_I(inode)-i_orphan)) {
-   list_del_init(BTRFS_I(inode)-i_orphan);
-   delete_item = 1;
+   if (BTRFS_I(inode)-has_orphan_item) {
+   if (trans) {
+   BTRFS_I(inode)-has_orphan_item = 0;
+   delete_item = 1;
+   } else {
+   WARN_ON(1);
+   }
}
 

Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Christian Brunner
2012/5/17 Josef Bacik jo...@redhat.com:
 On Thu, May 17, 2012 at 05:12:55PM +0200, Martin Mailand wrote:
 Hi Josef,
 no there was nothing above. Here the is another dmesg output.


 Hrm ok give this a try and hopefully this is it, still couldn't reproduce.
 Thanks,

 Josef

Well, I hate to say it, but the new patch doesn't seem to change much...

Regards,
Christian

[  123.507444] Btrfs loaded
[  202.683630] device fsid 2aa7531c-0e3c-4955-8542-6aed7ab8c1a2 devid
1 transid 4 /dev/sda
[  202.693704] btrfs: use lzo compression
[  202.697999] btrfs: enabling inode map caching
[  202.702989] btrfs: enabling auto defrag
[  202.707190] btrfs: disk space caching is enabled
[  202.712721] btrfs flagging fs with big metadata feature
[  207.839761] device fsid f81ff6a1-c333-4daf-989f-a28139f15f08 devid
1 transid 4 /dev/sdb
[  207.849681] btrfs: use lzo compression
[  207.853987] btrfs: enabling inode map caching
[  207.858970] btrfs: enabling auto defrag
[  207.863173] btrfs: disk space caching is enabled
[  207.868635] btrfs flagging fs with big metadata feature
[  210.857328] device fsid 9b905faa-f4fa-4626-9cae-2cd0287b30f7 devid
1 transid 4 /dev/sdc
[  210.867265] btrfs: use lzo compression
[  210.871560] btrfs: enabling inode map caching
[  210.876550] btrfs: enabling auto defrag
[  210.880757] btrfs: disk space caching is enabled
[  210.886228] btrfs flagging fs with big metadata feature
[  214.296287] device fsid f7990e4c-90b0-4691-9502-92b60538574a devid
1 transid 4 /dev/sdd
[  214.306510] btrfs: use lzo compression
[  214.310855] btrfs: enabling inode map caching
[  214.315905] btrfs: enabling auto defrag
[  214.320174] btrfs: disk space caching is enabled
[  214.325706] btrfs flagging fs with big metadata feature
[ 1337.937379] [ cut here ]
[ 1337.942526] kernel BUG at fs/btrfs/inode.c:2224!
[ 1337.947671] invalid opcode:  [#1] SMP
[ 1337.952255] CPU 5
[ 1337.954300] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg pcspkr serio_raw iTCO_wdt
iTCO_vendor_support iomemory_vsl(PO) ixgbe dca mdio i7core_edac
edac_core hpsa squashfs [last unloaded: scsi_wait_scan]
[ 1337.978570]
[ 1337.980230] Pid: 6812, comm: ceph-osd Tainted: P   O
3.3.5-1.fits.1.el6.x86_64 #1 HP ProLiant DL180 G6
[ 1337.991592] RIP: 0010:[a035675c]  [a035675c]
btrfs_orphan_del+0x14c/0x150 [btrfs]
[ 1338.001897] RSP: 0018:8805e1171d38  EFLAGS: 00010282
[ 1338.007815] RAX: fffe RBX: 88061c3c8400 RCX: 00b37f48
[ 1338.015768] RDX: 00b37f47 RSI: 8805ec2a1cf0 RDI: ea0017b0a840
[ 1338.023724] RBP: 8805e1171d68 R08: 60f9d88028a0 R09: a033016a
[ 1338.031675] R10:  R11: 0004 R12: 8805de7f57a0
[ 1338.039629] R13: 0001 R14: 0001 R15: 8805ec2a5280
[ 1338.047584] FS:  7f4bffc6e700() GS:8806272a()
knlGS:
[ 1338.056600] CS:  0010 DS:  ES:  CR0: 80050033
[ 1338.063003] CR2: ff600400 CR3: 0005e34c3000 CR4: 06e0
[ 1338.070954] DR0:  DR1:  DR2: 
[ 1338.078909] DR3:  DR6: 0ff0 DR7: 0400
[ 1338.086865] Process ceph-osd (pid: 6812, threadinfo
8805e117, task 88060fa81940)
[ 1338.096268] Stack:
[ 1338.098509]  8805e1171d68 8805ec2a5280 88051235b920

[ 1338.106795]  88051235b920 0008 8805e1171e08
a036043c
[ 1338.115082]    
00011000
[ 1338.123367] Call Trace:
[ 1338.126111]  [a036043c] btrfs_truncate+0x5bc/0x640 [btrfs]
[ 1338.133213]  [a03605b6] btrfs_setattr+0xf6/0x1a0 [btrfs]
[ 1338.140105]  [811816fb] notify_change+0x18b/0x2b0
[ 1338.146320]  [81276541] ? selinux_inode_permission+0xd1/0x130
[ 1338.153699]  [81165f44] do_truncate+0x64/0xa0
[ 1338.159527]  [81172669] ? inode_permission+0x49/0x100
[ 1338.166128]  [81166197] sys_truncate+0x137/0x150
[ 1338.172244]  [8158b1e9] system_call_fastpath+0x16/0x1b
[ 1338.178936] Code: 89 e7 e8 88 7d fe ff eb 89 66 0f 1f 44 00 00 be
a4 08 00 00 48 c7 c7 59 49 3b a0 45 31 ed e8 5c 78 cf e0 45 31 f6 e9
30 ff ff ff 0f 0b eb fe 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65
e0 4c
[ 1338.200623] RIP  [a035675c] btrfs_orphan_del+0x14c/0x150 [btrfs]
[ 1338.208317]  RSP 8805e1171d38
[ 1338.212681] ---[ end trace 86be14f0f863ea79 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Martin Mailand

Hi Josef,

I hit exact the same bug as Christian with your last patch.

-martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] Btrfs: count the chunks which will be relocated at first

2012-05-17 Thread Miao Xie
On Thu, 17 May 2012 17:58:56 +0300, Ilya Dryomov wrote:
 On Thu, May 17, 2012 at 07:56:53PM +0800, Miao Xie wrote:
 the balance function should count the chunks which will be relocated at 
 first,
 and then relocate those chunks one by one.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 759d024..91da8a2 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2580,7 +2580,7 @@ again:
  
  chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
  
 -if (!counting) {
 +if (counting) {
  spin_lock(fs_info-balance_lock);
  bctl-stat.considered++;
  spin_unlock(fs_info-balance_lock);
 
 __btrfs_balance() already calculates the approximate number of chunks
 that will be relocated and stores that value in bctl-stat.expected.
 The stat.considered counter OTOH is supposed to reflect the number of
 chunks processed through balance filters and it is meant to be updated
 at relocation pass, so AFAICS if (!counting) is the right test.
 
 What exactly are you trying to fix here ?

In fact this number reflect the number of all the chunks that may be relocated.
So since we can know the approximate number of chunks that will be relocated
before the relocation start, why can not we know it at the beginning?

And beside that, as a user, I am very strange that this counter is changed
every time I get the status of the balance, it should be the fixed number
since it reflect the number of all the chunks that may be relocated.

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()

2012-05-17 Thread Miao Xie
On  thu, 17 May 2012 18:20:04 +0300, Ilya Dryomov wrote:
 On Thu, May 17, 2012 at 07:58:53PM +0800, Miao Xie wrote:
 We forget to free fs_info-balance_ctl in the btrfs_pause_balance()
 when umounting the fs.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index c536d52..fd7fe80 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
  ret = -ENOTCONN;
  }
  
 +if (btrfs_fs_closing(fs_info)  fs_info-balance_ctl)
 +unset_balance_control(fs_info);
 +
  mutex_unlock(fs_info-balance_mutex);
  return ret;
  }
 
 It is kfree()'d in free_fs_info(), which should be called on unmount.
 Am I missing something here ?

It is my mistake. Sorry.

BTW I think freeing it in btrfs_pause_balance() is better because it is relative
to the balance, or the readability will become worse.

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html