Re: The value displayed by 'ls -s' command is strange.

2010-12-09 Thread Miao Xie
On wed, 08 Dec 2010 08:53:55 +0900, Tsutomu Itoh wrote:
 I think that the disk allocation size of each file becomes a monotone 
 increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?


 The # of blocks is:

  stat-blocks = (inode_get_bytes(inode) +
  BTRFS_I(inode)-delalloc_bytes)  9;

 So I think after sub(delalloc_bytes) and before inode_add_bytes(), you may
 see 0 value.
 
 Yes, I also think so.
 But, I think that such a state is too long for only the update timing...

Several months ago, some one posted a patch to get the allocated size of the 
compressed file,
http://marc.info/?l=linux-btrfsm=128109745012238w=2
this patch may help you to implement what you need.

Regards
Miao



 The result of the test at 2.6.37-rc4 is shown below.
 (see inode no. 291)

  # df -T /test14
  FilesystemType   1K-blocks  Used Available Use% Mounted on
  /dev/sdd14   btrfs 4162560  8736   3709440   1% /test14
  # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
  # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
  # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 
 oflag=direct
  # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
  # ls -lis /test14/dir
  total 406528
  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
   -  291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
  # sleep 3
  # ls -lis /test14/dir
  total 406528
  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
   -  291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
  # sleep 3
  # ls -lis /test14/dir
  total 307200
  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
   -  291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
  # sleep 3
  # ls -lis /test14/dir
  total 409600
  288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
   -  291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
  # sync
  # ls -lis /test14/dir
  total 1024000
  288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
  289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
   -  291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603

 The trace result of btrfs_getattr() is shown below.

   Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
   Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
   Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
   Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
   Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 
 i_bytes:0 delalloc_bytes:0


 Regards,
 Itoh

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Li Zefan
Tsutomu Itoh wrote:
 Hi,
 
 I think that the disk allocation size of each file becomes a monotone increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?
 

The # of blocks is:

stat-blocks = (inode_get_bytes(inode) +
BTRFS_I(inode)-delalloc_bytes)  9;

So I think after sub(delalloc_bytes) and before inode_add_bytes(), you may
see 0 value.

 
 The result of the test at 2.6.37-rc4 is shown below. 
 (see inode no. 291)
 
 # df -T /test14
 FilesystemType   1K-blocks  Used Available Use% Mounted on
 /dev/sdd14   btrfs 4162560  8736   3709440   1% /test14
 # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
 # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
 # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 oflag=direct
 # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
 # ls -lis /test14/dir
 total 406528
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 406528
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 307200
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 409600
 288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sync
 # ls -lis /test14/dir
 total 1024000
 288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 
 The trace result of btrfs_getattr() is shown below. 
 
  Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 i_bytes:0 
 delalloc_bytes:0
 
 
 Regards,
 Itoh
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Chris Mason
Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
 Hi,
 
 I think that the disk allocation size of each file becomes a monotone increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?

Well, there's a window during the processing of delayed allocation where
we don't have the bytes recorded as delalloc and we don't have the bytes
recorded in the inode yet.  That's why they are showing up as zero.

We don't call inode_add_bytes() until after we insert the extent, but we
drop the delalloc byte count on the file before the IO is done.

Fixing it will be a little tricky because all the extent accounting
assumes the inode_add_bytes happens at extent insertion time.

-chris

 
 
 The result of the test at 2.6.37-rc4 is shown below. 
 (see inode no. 291)
 
 # df -T /test14
 FilesystemType   1K-blocks  Used Available Use% Mounted on
 /dev/sdd14   btrfs 4162560  8736   3709440   1% /test14
 # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
 # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
 # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 oflag=direct
 # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
 # ls -lis /test14/dir
 total 406528
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 406528
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 307200
 288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sleep 3
 # ls -lis /test14/dir
 total 409600
 288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 # sync
 # ls -lis /test14/dir
 total 1024000
 288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
 289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
 290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
 
 The trace result of btrfs_getattr() is shown below. 
 
  Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 i_bytes:0 
 delalloc_bytes:0
 
 
 Regards,
 Itoh
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk
On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
 Hi,

 I think that the disk allocation size of each file becomes a monotone 
 increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?

 Well, there's a window during the processing of delayed allocation where
 we don't have the bytes recorded as delalloc and we don't have the bytes
 recorded in the inode yet.  That's why they are showing up as zero.

 We don't call inode_add_bytes() until after we insert the extent, but we
 drop the delalloc byte count on the file before the IO is done.

 Fixing it will be a little tricky because all the extent accounting
 assumes the inode_add_bytes happens at extent insertion time.


How does opening the inode with O_APPEND during this window know where
to write the bytes?  If it's a pointer/cursor to the EOF then that
size could be used during the window.  Is that right?



 The result of the test at 2.6.37-rc4 is shown below.
 (see inode no. 291)

     # df -T /test14
     Filesystem    Type   1K-blocks      Used Available Use% Mounted on
     /dev/sdd14   btrfs     4162560      8736   3709440   1% /test14
     # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
     # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
     # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 oflag=direct
     # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
     # ls -lis /test14/dir
     total 406528
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 406528
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 307200
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291      0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 409600
     288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291      0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sync
     # ls -lis /test14/dir
     total 1024000
     288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603

 The trace result of btrfs_getattr() is shown below.

  Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 
 i_bytes:0 delalloc_bytes:0


 Regards,
 Itoh

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Chris Mason
Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
 On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
  Hi,
 
  I think that the disk allocation size of each file becomes a monotone 
  increase
  when the file is made.
  But, it sometimes return to 0.  Is it correct?
 
  Well, there's a window during the processing of delayed allocation where
  we don't have the bytes recorded as delalloc and we don't have the bytes
  recorded in the inode yet.  That's why they are showing up as zero.
 
  We don't call inode_add_bytes() until after we insert the extent, but we
  drop the delalloc byte count on the file before the IO is done.
 
  Fixing it will be a little tricky because all the extent accounting
  assumes the inode_add_bytes happens at extent insertion time.
 
 
 How does opening the inode with O_APPEND during this window know where
 to write the bytes?  If it's a pointer/cursor to the EOF then that
 size could be used during the window.  Is that right?

This counter records the number of blocks allocated to the file, and
reading it with ls -l or stat is somewhat racey by nature.  Most of the
time its fine, btrfs just has a really big window where the results from
ls -l seem wrong.

But, the counter really means nothing to the btrfs internals.  When we
do file operations we go based on the extent pointers we find in the
tree and i_size (i_size is strictly maintained).

The incorrect results are confusing but they don't hurt the metadata
itself.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk
On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
 On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
  Hi,
 
  I think that the disk allocation size of each file becomes a monotone 
  increase
  when the file is made.
  But, it sometimes return to 0.  Is it correct?
 
  Well, there's a window during the processing of delayed allocation where
  we don't have the bytes recorded as delalloc and we don't have the bytes
  recorded in the inode yet.  That's why they are showing up as zero.
 
  We don't call inode_add_bytes() until after we insert the extent, but we
  drop the delalloc byte count on the file before the IO is done.
 
  Fixing it will be a little tricky because all the extent accounting
  assumes the inode_add_bytes happens at extent insertion time.
 

 How does opening the inode with O_APPEND during this window know where
 to write the bytes?  If it's a pointer/cursor to the EOF then that
 size could be used during the window.  Is that right?

 This counter records the number of blocks allocated to the file, and
 reading it with ls -l or stat is somewhat racey by nature.  Most of the
 time its fine, btrfs just has a really big window where the results from
 ls -l seem wrong.


I see.  Is it using per-cpu vars or something similar?

 But, the counter really means nothing to the btrfs internals.  When we
 do file operations we go based on the extent pointers we find in the
 tree and i_size (i_size is strictly maintained).


Would it be too heavy of an operation to have stat walk the btrfs tree
to get its data?

 The incorrect results are confusing but they don't hurt the metadata
 itself.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Chris Mason
Excerpts from Mike Fedyk's message of 2010-12-07 15:07:08 -0500:
 On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
  On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com 
  wrote:
   Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
   Hi,
  
   I think that the disk allocation size of each file becomes a monotone 
   increase
   when the file is made.
   But, it sometimes return to 0.  Is it correct?
  
   Well, there's a window during the processing of delayed allocation where
   we don't have the bytes recorded as delalloc and we don't have the bytes
   recorded in the inode yet.  That's why they are showing up as zero.
  
   We don't call inode_add_bytes() until after we insert the extent, but we
   drop the delalloc byte count on the file before the IO is done.
  
   Fixing it will be a little tricky because all the extent accounting
   assumes the inode_add_bytes happens at extent insertion time.
  
 
  How does opening the inode with O_APPEND during this window know where
  to write the bytes?  If it's a pointer/cursor to the EOF then that
  size could be used during the window.  Is that right?
 
  This counter records the number of blocks allocated to the file, and
  reading it with ls -l or stat is somewhat racey by nature.  Most of the
  time its fine, btrfs just has a really big window where the results from
  ls -l seem wrong.
 
 
 I see.  Is it using per-cpu vars or something similar?

Our stat function returns the block count in the inode plus the number
of bytes we have accounted as delayed allocation.

As we do writes to the file, the delayed allocation count goes up and
then eventually we decide we need to do some IO.

Before we do the IO, we have to decide where on the disk to write the
extents.  Once that is decided, we decrement the count of delayed
allocation bytes.

This is when stat starts returning the wrong answer.

Then we do the IO, and when the IO is done we actually insert the file
extents into the file metadata.  This is when stat starts returning the
right answer again.

The whole setup sounds strange, but this is how btrfs implements the
semantics from data=ordered.  We don't update the file to point to
the new blocks until after the IO is done, so we never have to wait on
the data IO before we can do a transaction commit.  It avoids all kinds
of latencies with fsync and other problems.

One easy solution is to just add another counter in the in-memory inode
for the number of bytes in flight that aren't accounted for in other
places.  But I'd rather not make the inode any bigger, so I'll have to
think if we can solve this another way.

 
  But, the counter really means nothing to the btrfs internals.  When we
  do file operations we go based on the extent pointers we find in the
  tree and i_size (i_size is strictly maintained).
 
 
 Would it be too heavy of an operation to have stat walk the btrfs tree
 to get its data?
 

I'm afraid so, stat is fairly performance critical.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk
On Tue, Dec 7, 2010 at 12:15 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 15:07:08 -0500:
 On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
  On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com 
  wrote:
   Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
   Hi,
  
   I think that the disk allocation size of each file becomes a monotone 
   increase
   when the file is made.
   But, it sometimes return to 0.  Is it correct?
  
   Well, there's a window during the processing of delayed allocation where
   we don't have the bytes recorded as delalloc and we don't have the bytes
   recorded in the inode yet.  That's why they are showing up as zero.
  
   We don't call inode_add_bytes() until after we insert the extent, but we
   drop the delalloc byte count on the file before the IO is done.
  
   Fixing it will be a little tricky because all the extent accounting
   assumes the inode_add_bytes happens at extent insertion time.
  
 
  How does opening the inode with O_APPEND during this window know where
  to write the bytes?  If it's a pointer/cursor to the EOF then that
  size could be used during the window.  Is that right?
 
  This counter records the number of blocks allocated to the file, and
  reading it with ls -l or stat is somewhat racey by nature.  Most of the
  time its fine, btrfs just has a really big window where the results from
  ls -l seem wrong.
 

 I see.  Is it using per-cpu vars or something similar?


Ok, so to make sure I fully understand I'm going to make some psuedo
code based on your description.

 Our stat function returns the block count in the inode plus the number
 of bytes we have accounted as delayed allocation.


stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

 As we do writes to the file, the delayed allocation count goes up and
 then eventually we decide we need to do some IO.

 Before we do the IO, we have to decide where on the disk to write the
 extents.

inode_a2 = inode_a1

inode_a1 and inode_a2 are the same inode, but inode_a2 has a different
list of extents and is not written yet (in the case of appending, most
of the extents will be the same in the two extent lists, but inode_a2
will have more extents for the newly appended data)

 Once that is decided, we decrement the count of delayed
 allocation bytes.

 This is when stat starts returning the wrong answer.


inode_a2.bytes += inode_a1_delayed_allocation_bytes
inode_a1_delayed_allocation_bytes -= inode_a1_delayed_allocation_bytes
stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

Is it possible to have stat read from inode_a2 during this window?

So it would be instead:

stat = inode_a2.bytes

 Then we do the IO, and when the IO is done we actually insert the file
 extents into the file metadata.  This is when stat starts returning the
 right answer again.


/* implicit when write completes */
inode_a1 = inode_a2
kfree(inode_a2)
stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

 The whole setup sounds strange, but this is how btrfs implements the
 semantics from data=ordered.  We don't update the file to point to
 the new blocks until after the IO is done, so we never have to wait on
 the data IO before we can do a transaction commit.  It avoids all kinds
 of latencies with fsync and other problems.

 One easy solution is to just add another counter in the in-memory inode
 for the number of bytes in flight that aren't accounted for in other
 places.  But I'd rather not make the inode any bigger, so I'll have to
 think if we can solve this another way.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Tsutomu Itoh

(2010/12/08 5:15), Chris Mason wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 15:07:08 -0500:
 On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
 On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com 
 wrote:
 Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
 Hi,

 I think that the disk allocation size of each file becomes a monotone 
 increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?

 Well, there's a window during the processing of delayed allocation where
 we don't have the bytes recorded as delalloc and we don't have the bytes
 recorded in the inode yet.  That's why they are showing up as zero.

 We don't call inode_add_bytes() until after we insert the extent, but we
 drop the delalloc byte count on the file before the IO is done.

 Fixing it will be a little tricky because all the extent accounting
 assumes the inode_add_bytes happens at extent insertion time.


 How does opening the inode with O_APPEND during this window know where
 to write the bytes?  If it's a pointer/cursor to the EOF then that
 size could be used during the window.  Is that right?

 This counter records the number of blocks allocated to the file, and
 reading it with ls -l or stat is somewhat racey by nature.  Most of the
 time its fine, btrfs just has a really big window where the results from
 ls -l seem wrong.


 I see.  Is it using per-cpu vars or something similar?
 
 Our stat function returns the block count in the inode plus the number
 of bytes we have accounted as delayed allocation.
 
 As we do writes to the file, the delayed allocation count goes up and
 then eventually we decide we need to do some IO.
 
 Before we do the IO, we have to decide where on the disk to write the
 extents.  Once that is decided, we decrement the count of delayed
 allocation bytes.
 
 This is when stat starts returning the wrong answer.
 
 Then we do the IO, and when the IO is done we actually insert the file
 extents into the file metadata.  This is when stat starts returning the
 right answer again.

I understood. 
However, I worry that the user is confused because the wrong condition
is too long. 

 
 The whole setup sounds strange, but this is how btrfs implements the
 semantics from data=ordered.  We don't update the file to point to
 the new blocks until after the IO is done, so we never have to wait on
 the data IO before we can do a transaction commit.  It avoids all kinds
 of latencies with fsync and other problems.
 
 One easy solution is to just add another counter in the in-memory inode
 for the number of bytes in flight that aren't accounted for in other
 places.  But I'd rather not make the inode any bigger, so I'll have to
 think if we can solve this another way.
 

 But, the counter really means nothing to the btrfs internals.  When we
 do file operations we go based on the extent pointers we find in the
 tree and i_size (i_size is strictly maintained).


 Would it be too heavy of an operation to have stat walk the btrfs tree
 to get its data?

 
 I'm afraid so, stat is fairly performance critical.
 
 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html