Re: [PATCH] xfstests 276: fix error 'FIBMAP: Invalid argument'

2013-05-26 Thread Zhi Yong Wu
HI,

  This issue still exists when xfstests is run on debian although
xfstests has contained this patch.

root@debian-i386:/home/zwu/xfstests# git describe
linux-v3.8-122-gdc75401
root@debian-i386:/home/zwu/xfstests# uname -a
Linux debian-i386 3.10.0-rc2+ #3 SMP Sun May 26 14:04:13 CST 2013
x86_64 GNU/Linux
root@debian-i386:/home/zwu/xfstests#

 - output mismatch (see /home/zwu/xfstests/results/btrfs/276.out.bad)
--- tests/btrfs/276.out2013-05-25 15:09:01.0 +
+++ /home/zwu/xfstests/results/btrfs/276.out.bad2013-05-26
09:31:49.962876905 +
@@ -1,3 +1,32393 @@
 QA output created by 276
 *** test backref walking
+FIBMAP: Invalid argument
+FIBMAP: Invalid argument
+FIBMAP: Invalid argument
+FIBMAP: Invalid argument
+FIBMAP: Invalid argument
 ...
 (Run 'diff -u tests/btrfs/276.out
/home/zwu/xfstests/results/btrfs/276.out.bad' to see the entire diff)
[ 3877.215268] device fsid 8367602a-14fa-49eb-aff0-44da1faa3b45 devid
1 transid 239 /dev/vdb
[ 3877.218417] btrfs: disk space caching is enabled
Ran: btrfs/276
Failures: btrfs/276
Failed 1 of 1 tests

On Wed, Mar 6, 2013 at 2:43 AM, Rich Johnston rjohns...@sgi.com wrote:
 This patch has been committed.

 --Rich

 commit eba3a5206cd7f8df73d6e6dbf2bb0afca677fca8
 Author: Wang Sheng-Hui shh...@gmail.com
 Date:   Thu Feb 28 02:05:56 2013 +

 xfstests 276: fix error 'FIBMAP: Invalid argument'


 Commit 05dadc0 (Btrfs: add fiemap's flag check) added validity
 checks to the fiemap flags and hence invalid flags are now being
 rejected. Test 276 passes an invalid fiemap flag to btrfs, and so
 that test fails since this commit was added.

 Btrfs doesn't support FIEMAP_FLAG_XATTR, which is enabled by -x option
 of filefrag, and will fail with
 'FIBMAP: Invalid argument'
 for 'filefrag -vx'. 'filefrag -vx' fails on btrfs with
  'FIEMAP failed with unsupported flags 2'
 Remove the '-x' option.

 ___
 xfs mailing list
 x...@oss.sgi.com
 http://oss.sgi.com/mailman/listinfo/xfs



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] VFS hot tracking

2013-05-23 Thread Zhi Yong Wu
HI, Al Viro.

  I have incorporated all comments from all reviewers and waited for
so long time. If you have no comments,  can you merge the patchset?
thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] VFS hot tracking

2013-05-21 Thread Zhi Yong Wu
Ping...

On Tue, May 14, 2013 at 8:59 AM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The patchset is trying to introduce hot tracking function in
 VFS layer, which will keep track of real disk I/O in memory.
 By it, you will easily know more details about disk I/O, and
 then detect where disk I/O hot spots are. Also, specific FS
 can take use of it to do accurate defragment, and hot relocation
 support, etc.

   After V1 was sent out, Chandra Seetharaman has reviewed and
 made a lot of comments, thanks a lot to him. Not it's time to
 send out its V2 for external review, any comments or ideas are
 appreciated, thanks.

 NOTE:

   The patchset can be obtained via my kernel dev git on github:
 git://github.com/wuzhy/kernel.git hot_tracking
   If you're interested, you can also review them via
 https://github.com/wuzhy/kernel/commits/hot_tracking

   For how to use and more other info and performance report,
 please check hot_tracking.txt in Documentation and following
 links:
   1.) http://lwn.net/Articles/525651/
   2.) https://lkml.org/lkml/2012/12/20/199

 Changelog from v1:
  - Refactored to be under RCU [Chandra Seetharaman]
  - Merged some code changes [Chandra Seetharaman]
  - Fixed some issues [Chandra Seetharaman]

 v1:
  - Solved 64 bits inode number issue. [David Sterba]
  - Embed struct hot_type in struct file_system_type [Darrick J. Wong]
  - Cleanup Some issues [David Sterba]
  - Use a static hot debugfs root [Greg KH]

 rfcv4:
  - Introduce hot func registering framework [Zhiyong]
  - Remove global variable for hot tracking [Zhiyong]
  - Add btrfs hot tracking support [Zhiyong]

 rfcv3:
  1.) Rewritten debugfs support based seq_file operation. [Dave Chinner]
  2.) Refactored workqueue support. [Dave Chinner]
  3.) Turn some Micro into be tunable [Zhiyong, Liu Zheng]
TIME_TO_KICK, and HEAT_UPDATE_DELAY
  4.) Cleanedup a lot of other issues [Dave Chinner]

 rfcv2:
  1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
  2.) Added memory shrinker [Dave Chinner]
  3.) Converted to one workqueue to update map info periodically [Dave Chinner]
  4.) Cleanedup a lot of other issues [Dave Chinner]

 rfcv1:
  1.) Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner]
  2.) The first three patches can probably just be flattened into one.
 [Marco Stornelli , Dave Chinner]

 Zhi Yong Wu (12):
   VFS hot tracking: introduce some data structures
   VFS hot tracking: add i/o freq tracking hooks
   VFS hot tracking: add one workqueue to update hot map
   VFS hot tracking: register one shrinker
   VFS hot tracking, rcu: introduce one rcu macro for list
   VFS hot tracking, seq_file: introduce one set of rcu seq_list
 interfaces
   VFS hot tracking: add debugfs support
   VFS hot tracking: add one ioctl interface
   VFS hot tracking, procfs: add two proc interfaces
   VFS hot tracking, btrfs: add hot tracking support
   VFS hot tracking: add documentation
   VFS hot tracking: add fs hot type support

  Documentation/filesystems/00-INDEX |2 +
  Documentation/filesystems/hot_tracking.txt |  256 ++
  fs/Makefile|2 +-
  fs/btrfs/ctree.h   |1 +
  fs/btrfs/super.c   |   22 +-
  fs/compat_ioctl.c  |5 +
  fs/dcache.c|2 +
  fs/direct-io.c |5 +
  fs/hot_tracking.c  | 1320 
 
  fs/hot_tracking.h  |   67 ++
  fs/ioctl.c |   70 ++
  fs/namei.c |2 +
  fs/seq_file.c  |   37 +
  include/linux/fs.h |5 +
  include/linux/hot_tracking.h   |  175 
  include/linux/rculist.h|5 +
  include/linux/seq_file.h   |7 +
  kernel/sysctl.c|   14 +
  mm/filemap.c   |6 +
  mm/page-writeback.c|   12 +
  mm/readahead.c |6 +
  21 files changed, 2019 insertions(+), 2 deletions(-)
  create mode 100644 Documentation/filesystems/hot_tracking.txt
  create mode 100644 fs/hot_tracking.c
  create mode 100644 fs/hot_tracking.h
  create mode 100644 include/linux/hot_tracking.h

 --
 1.7.11.7

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-19 Thread Zhi Yong Wu
On Sun, May 19, 2013 at 6:41 PM, Martin Steigerwald mar...@lichtvoll.de wrote:
 Am Donnerstag, 9. Mai 2013, 07:13:56 schrieb Zhi Yong Wu:
 HI, all

 Hi!

I saw that bcache will be merged into kernel upstream soon, so i
 want to know if btrfs hot relocation support is still meanful, if no,
 i will not continue to work on it. can anyone let me know this?
 thanks.

 I really look forward to VFS hot data tracking with BTRFS (and other
 filesystem) support.
I also really want to see this will happen.

 ZFS and BTRFS have shown that RAID support within the filesystem can make a 
 lot
 of sense. I think hot relocation is another area which can be done more
 accurate within the filesystem. I think there are several features only
 possible by going this route:

 1) Mark files as hot via ioctl. MySQL can mark InnoDB journal files for 
 example.
It has been implemented to get its hot status of one specified file
via ioctl. It is not hard to enable its set function. Before it is
done, i more hope that the core of VFS hot tracking can get merged by
kernel upstream. You know, it is harder if too new functions have been
done.
We can put it in to-do list at first, what do you think?

 2) Proper BTRFS RAID support (as noted elsewhere in this thread)

 3) Easier setup. With BTRFS flexibility I would expect that a SSD as hot data
 cache can be added and removed on the fly during filesystem is mounted. Only
 seems supported at mkfs-time as I read the patch docs, but from my basic
 technical understanding of BTRFS it can be extented to be done on the fly with
 a mounted FS as well.
Yes, it is supported only at mkfs time. It shouldn't be hard to enable
adding or removing a nonrotating disk as hot cache on the fly. You
know, before this is done, i would like to make sure that the current
design for BTRFS hot relocation is this RFC patchset is going in the
direction and is appropriate. If it is going in wrong direction, we
will be doing a lot of meanless work. E.g. How about the current idea
on introducing one new block group for nonrotating disks.
When working on this work, i were trying to change the least current
btrfs code as far as possible. You know, if we change too many current
btrfs code, it will introduce regression bug more easily, and so the
patchset will be also harder to get accepted by btrfs upstream.

What do you think?



 Block based caching can make sense for other use cases like raw device with
 Oracle DB on it or maybe a swap device. Or for filesystems not (yet) 
 supporting
 VFS hot data tracking.

 Thanks,
 --
 Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
 GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7



--
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-19 Thread Zhi Yong Wu
On Sun, May 19, 2013 at 6:41 PM, Martin Steigerwald mar...@lichtvoll.de wrote:
 Am Donnerstag, 9. Mai 2013, 07:13:56 schrieb Zhi Yong Wu:
 HI, all

 Hi!

I saw that bcache will be merged into kernel upstream soon, so i
 want to know if btrfs hot relocation support is still meanful, if no,
 i will not continue to work on it. can anyone let me know this?
 thanks.

 I really look forward to VFS hot data tracking with BTRFS (and other
 filesystem) support.

 ZFS and BTRFS have shown that RAID support within the filesystem can make a 
 lot
 of sense. I think hot relocation is another area which can be done more
 accurate within the filesystem. I think there are several features only
 possible by going this route:

 1) Mark files as hot via ioctl. MySQL can mark InnoDB journal files for 
 example.

 2) Proper BTRFS RAID support (as noted elsewhere in this thread)
After the current design draft for BTRFS hot relocation in this RFC
get basic agreement as btrfs community, maintainer or main developers,
i will dive into this further.
By the way, it is in my to-do list, thanks.

 3) Easier setup. With BTRFS flexibility I would expect that a SSD as hot data
 cache can be added and removed on the fly during filesystem is mounted. Only
 seems supported at mkfs-time as I read the patch docs, but from my basic
 technical understanding of BTRFS it can be extented to be done on the fly with
 a mounted FS as well.


 Block based caching can make sense for other use cases like raw device with
 Oracle DB on it or maybe a swap device. Or for filesystems not (yet) 
 supporting
 VFS hot data tracking.

 Thanks,
 --
 Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
 GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7



--
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-17 Thread Zhi Yong Wu
HI,

You should check the patchset about VFS hot tracking
https://lwn.net/Articles/550495/





On Thu, May 16, 2013 at 3:12 PM, Kai Krakow hurikhan77+bt...@gmail.com wrote:
 Hi!

 I think such a solution as part of the filesystem could do much better than
 something outside of it (like bcache). But I'm not sure: What makes data
 hot? I think the most benefit is detecting random read access and mark only
 those data as hot, also writes should go to the SSD first and then should be
 spooled to the harddisks in background. Bcache does a lot regarding this.

 Since this is within the filesystem, users could even mark files as being
 always hot with some attribute or ioctl. This could be used by a boot-
 readahead and preload implementation to automatically make files hot used
 during booting or for preloading when I start an application.

 On the other side hot relocation should be able to reduce writes to the SSD
 as good as possible, for example: Do not defragment files during autodefrag,
 it makes no sense. Also write data in bursts of erase block size etc.

 And also important: What if the SSD dies due to wearing? Will it gracefully
 fall back to harddisk? What does relocation mean? Files (hot data) should
 only be cached in copy to SSD, and not moved there. It should be possible
 for btrfs to just drop a failing SSD from the filesystem without data loss
 because otherwise one should use two SSDs in raid-1 mode to get a safe cache
 storage.

 Altogether I think that a spinning media btrfs raid can outperform a single
 SSD so hot relocation should probably be used to reduce head movements
 because this is where SSD really excels. So everything that involves heavy
 head movement should go to SSD first, then written back to harddisk. And I
 think there's a lot potential to optimize because a COW filesystem like
 btrfs naturally has a lot of head movement.

 What do you think?

 BTW: I have not tried the one or the other yet because I'm still deciding
 which way to go. Your patches are more welcome because I do not need to
 migrate my storage to bcache-provided block devices. OTOH the bcache
 implementation looks a lot more mature (with regard to performance and
 safety) at this point because it provides many of the above mentioned
 features - most importantly gracefully handling failing SSDs.

 Regarding btrfs raid outperforms SSD: During boot my spinning media 3 device
 btrfs raid reads boot files with up to 600 MB/s (from LZ compressed fs),
 boot takes about 7 seconds until the display manager starts (which takes
 another 30 seconds but that's another story), and the system is pretty
 crowded with services I actually wouldn't need if I optimized for boot
 performance. But I think systemd's read-ahead implementation has a lot
 influence on this fast booting: It defragments and relocates boot files on
 btrfs during boot so the harddisks can sequentially read all this stuff. I
 think it also compresses boot files if compression is enabled because
 booting is IO bound, not CPU bound. Benchmarks showed that my btrfs raid
 could technically read up to 450 MB/s, so I think the 600 MB/s counts for
 decompressed data. A single SSD could not do that. For that same reason I
 created a small script to defragment and compress files used by the preload
 daemon. Without benchmarking it, this felt like another small performance
 boost. So I'm eager what could be next with some sort of SSD cache because
 the only problem left seems to be heavy head movement which slows down the
 system.

 Zhi Yong Wu zwu.ker...@gmail.com schrieb:

 HI,

What do you think if its design approach goes correctly? Do you
 have any comments or better design idea for BTRFS hot relocation
 support? any comments are appreciated, thanks.


 On Mon, May 6, 2013 at 4:53 PM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The patchset as RFC is sent out mainly to see if it goes in the
 correct development direction.

   The patchset is trying to introduce hot relocation support
 for BTRFS. In hybrid storage environment, when the data in
 HDD disk get hot, it can be relocated to SSD disk by BTRFS
 hot relocation support automatically; also, if SSD disk ratio
 exceed its upper threshold, the data which get cold can be
 looked up and relocated to HDD disk to make more space in SSD
 disk at first, and then the data which get hot will be relocated
 to SSD disk automatically.

   BTRFS hot relocation mainly reserve block space from SSD disk
 at first, load the hot data to page cache from HDD, allocate
 block space from SSD disk, and finally write the data to SSD disk.

   If you'd like to play with it, pls pull the patchset from
 my git on github:
   https://github.com/wuzhy/kernel.git hot_reloc

 For how to use, please refer too the example below:

 root@debian-i386:~# echo 0  /sys/block/vdc/queue/rotational
 ^^^ Above command will hack /dev/vdc to be one SSD disk
 root@debian-i386:~# echo 99  /proc/sys/fs/hot-age

Re: [RFC 0/5] BTRFS hot relocation support

2013-05-14 Thread Zhi Yong Wu
HI,

   What do you think if its design approach goes correctly? Do you
have any comments or better design idea for BTRFS hot relocation
support? any comments are appreciated, thanks.


On Mon, May 6, 2013 at 4:53 PM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The patchset as RFC is sent out mainly to see if it goes in the
 correct development direction.

   The patchset is trying to introduce hot relocation support
 for BTRFS. In hybrid storage environment, when the data in
 HDD disk get hot, it can be relocated to SSD disk by BTRFS
 hot relocation support automatically; also, if SSD disk ratio
 exceed its upper threshold, the data which get cold can be
 looked up and relocated to HDD disk to make more space in SSD
 disk at first, and then the data which get hot will be relocated
 to SSD disk automatically.

   BTRFS hot relocation mainly reserve block space from SSD disk
 at first, load the hot data to page cache from HDD, allocate
 block space from SSD disk, and finally write the data to SSD disk.

   If you'd like to play with it, pls pull the patchset from
 my git on github:
   https://github.com/wuzhy/kernel.git hot_reloc

 For how to use, please refer too the example below:

 root@debian-i386:~# echo 0  /sys/block/vdc/queue/rotational
 ^^^ Above command will hack /dev/vdc to be one SSD disk
 root@debian-i386:~# echo 99  /proc/sys/fs/hot-age-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-update-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-reloc-interval
 root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc -f

 WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
 WARNING! - see http://btrfs.wiki.kernel.org before using

 [ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1 
 transid 16 /dev/vdb
 [ 140.283650] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 
 transid 16 /dev/vdc
 [ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 3 /dev/vdb
 [ 140.550759] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 3 /dev/vdb
 [ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 
 transid 16 /dev/vdc
 adding device /dev/vdc id 2
 [ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2 
 transid 3 /dev/vdc
 fs created label (null) on /dev/vdb
 nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
 Btrfs v0.20-rc1-254-gb0136aa-dirty
 root@debian-i386:~# mount -o hot_move /dev/vdb /data2
 [ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 6 /dev/vdb
 [ 144.870444] btrfs: disk space caching is enabled
 [ 144.904214] VFS: Turning on hot data tracking
 root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
 2048+0 records in
 2048+0 records out
 2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.0G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=2.00GB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.19MB
 Data_SSD: total=8.00MB, used=0.00
 root@debian-i386:~# echo 108  /proc/sys/fs/hot-reloc-threshold
 ^^^ Above command will start HOT RLEOCATE, because The data temperature is 
 currently 109
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.1G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=6.25MB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.26MB
 Data_SSD: total=2.01GB, used=2.00GB
 root@debian-i386:~#

 Zhi Yong Wu (5):
   vfs: add one list_head field
   btrfs: add one new block group
   btrfs: add one hot relocation kthread
   procfs: add three proc interfaces
   btrfs: add hot relocation support

  fs/btrfs/Makefile|   3 +-
  fs/btrfs/ctree.h |  26 +-
  fs/btrfs/extent-tree.c   | 107 +-
  fs/btrfs/extent_io.c |  31 +-
  fs/btrfs/extent_io.h |   4 +
  fs/btrfs/file.c  |  36 +-
  fs/btrfs/hot_relocate.c  | 802 
 +++
  fs/btrfs/hot_relocate.h  |  48 +++
  fs/btrfs/inode-map.c |  13 +-
  fs/btrfs/inode.c |  92 -
  fs/btrfs/ioctl.c |  23 +-
  fs/btrfs/relocation.c|  14 +-
  fs/btrfs/super.c |  30 +-
  fs/btrfs/volumes.c   |  28 +-
  fs/hot_tracking.c|   1 +
  include/linux/btrfs.h|   4 +
  include/linux/hot_tracking.h |   1 +
  kernel/sysctl.c  |  22 ++
  18 files changed, 1234 insertions(+), 51 deletions(-)
  create mode 100644 fs/btrfs/hot_relocate.c
  create mode 100644 fs/btrfs/hot_relocate.h

Re: [RFC 0/5] BTRFS hot relocation support

2013-05-09 Thread Zhi Yong Wu
btrfs maintainer's opinion is very important, i guess.

On Thu, May 9, 2013 at 2:30 PM, Stefan Behrens
sbehr...@giantdisaster.de wrote:
 On 05/09/2013 01:13, Zhi Yong Wu wrote:

 HI, all

 I saw that bcache will be merged into kernel upstream soon, so i
 want to know if btrfs hot relocation support is still meanful, if no,
 i will not continue to work on it. can anyone let me know this?
 thanks.


 Which one is better?

 Please do some measurements. Select typical file system use cases, and
 publish and compare the measurement results of the two approaches.



 On Mon, May 6, 2013 at 4:53 PM,  zwu.ker...@gmail.com wrote:

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

The patchset as RFC is sent out mainly to see if it goes in the
 correct development direction.

The patchset is trying to introduce hot relocation support
 for BTRFS. In hybrid storage environment, when the data in
 HDD disk get hot, it can be relocated to SSD disk by BTRFS
 hot relocation support automatically; also, if SSD disk ratio
 exceed its upper threshold, the data which get cold can be
 looked up and relocated to HDD disk to make more space in SSD
 disk at first, and then the data which get hot will be relocated
 to SSD disk automatically.

BTRFS hot relocation mainly reserve block space from SSD disk
 at first, load the hot data to page cache from HDD, allocate
 block space from SSD disk, and finally write the data to SSD disk.

If you'd like to play with it, pls pull the patchset from
 my git on github:
https://github.com/wuzhy/kernel.git hot_reloc

 For how to use, please refer too the example below:

 root@debian-i386:~# echo 0  /sys/block/vdc/queue/rotational
 ^^^ Above command will hack /dev/vdc to be one SSD disk
 root@debian-i386:~# echo 99  /proc/sys/fs/hot-age-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-update-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-reloc-interval
 root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc
 -f

 WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
 WARNING! - see http://btrfs.wiki.kernel.org before using

 [ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1
 transid 16 /dev/vdb
 [ 140.283650] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2
 transid 16 /dev/vdc
 [ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1
 transid 3 /dev/vdb
 [ 140.550759] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1
 transid 3 /dev/vdb
 [ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2
 transid 16 /dev/vdc
 adding device /dev/vdc id 2
 [ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2
 transid 3 /dev/vdc
 fs created label (null) on /dev/vdb
 nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
 Btrfs v0.20-rc1-254-gb0136aa-dirty
 root@debian-i386:~# mount -o hot_move /dev/vdb /data2
 [ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1
 transid 6 /dev/vdb
 [ 144.870444] btrfs: disk space caching is enabled
 [ 144.904214] VFS: Turning on hot data tracking
 root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
 2048+0 records in
 2048+0 records out
 2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.0G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=2.00GB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.19MB
 Data_SSD: total=8.00MB, used=0.00
 root@debian-i386:~# echo 108  /proc/sys/fs/hot-reloc-threshold
 ^^^ Above command will start HOT RLEOCATE, because The data temperature
 is currently 109
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.1G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=6.25MB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.26MB
 Data_SSD: total=2.01GB, used=2.00GB
 root@debian-i386:~#

 Zhi Yong Wu (5):
vfs: add one list_head field
btrfs: add one new block group
btrfs: add one hot relocation kthread
procfs: add three proc interfaces
btrfs: add hot relocation support

   fs/btrfs/Makefile|   3 +-
   fs/btrfs/ctree.h |  26 +-
   fs/btrfs/extent-tree.c   | 107 +-
   fs/btrfs/extent_io.c |  31 +-
   fs/btrfs/extent_io.h |   4 +
   fs/btrfs/file.c  |  36 +-
   fs/btrfs/hot_relocate.c  | 802
 +++
   fs/btrfs/hot_relocate.h  |  48 +++
   fs/btrfs/inode-map.c |  13 +-
   fs/btrfs/inode.c |  92 -
   fs/btrfs/ioctl.c |  23 +-
   fs/btrfs/relocation.c

Re: [RFC 0/5] BTRFS hot relocation support

2013-05-08 Thread Zhi Yong Wu
HI, all

   I saw that bcache will be merged into kernel upstream soon, so i
want to know if btrfs hot relocation support is still meanful, if no,
i will not continue to work on it. can anyone let me know this?
thanks.


On Mon, May 6, 2013 at 4:53 PM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The patchset as RFC is sent out mainly to see if it goes in the
 correct development direction.

   The patchset is trying to introduce hot relocation support
 for BTRFS. In hybrid storage environment, when the data in
 HDD disk get hot, it can be relocated to SSD disk by BTRFS
 hot relocation support automatically; also, if SSD disk ratio
 exceed its upper threshold, the data which get cold can be
 looked up and relocated to HDD disk to make more space in SSD
 disk at first, and then the data which get hot will be relocated
 to SSD disk automatically.

   BTRFS hot relocation mainly reserve block space from SSD disk
 at first, load the hot data to page cache from HDD, allocate
 block space from SSD disk, and finally write the data to SSD disk.

   If you'd like to play with it, pls pull the patchset from
 my git on github:
   https://github.com/wuzhy/kernel.git hot_reloc

 For how to use, please refer too the example below:

 root@debian-i386:~# echo 0  /sys/block/vdc/queue/rotational
 ^^^ Above command will hack /dev/vdc to be one SSD disk
 root@debian-i386:~# echo 99  /proc/sys/fs/hot-age-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-update-interval
 root@debian-i386:~# echo 10  /proc/sys/fs/hot-reloc-interval
 root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc -f

 WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
 WARNING! - see http://btrfs.wiki.kernel.org before using

 [ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1 
 transid 16 /dev/vdb
 [ 140.283650] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 
 transid 16 /dev/vdc
 [ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 3 /dev/vdb
 [ 140.550759] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 3 /dev/vdb
 [ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 
 transid 16 /dev/vdc
 adding device /dev/vdc id 2
 [ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2 
 transid 3 /dev/vdc
 fs created label (null) on /dev/vdb
 nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
 Btrfs v0.20-rc1-254-gb0136aa-dirty
 root@debian-i386:~# mount -o hot_move /dev/vdb /data2
 [ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 
 transid 6 /dev/vdb
 [ 144.870444] btrfs: disk space caching is enabled
 [ 144.904214] VFS: Turning on hot data tracking
 root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
 2048+0 records in
 2048+0 records out
 2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.0G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=2.00GB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.19MB
 Data_SSD: total=8.00MB, used=0.00
 root@debian-i386:~# echo 108  /proc/sys/fs/hot-reloc-threshold
 ^^^ Above command will start HOT RLEOCATE, because The data temperature is 
 currently 109
 root@debian-i386:~# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 16G 13G 2.2G 86% /
 tmpfs 4.8G 0 4.8G 0% /lib/init/rw
 udev 10M 176K 9.9M 2% /dev
 tmpfs 4.8G 0 4.8G 0% /dev/shm
 /dev/vdb 15G 2.1G 13G 14% /data2
 root@debian-i386:~# btrfs fi df /data2
 Data: total=3.01GB, used=6.25MB
 System: total=4.00MB, used=4.00KB
 Metadata: total=8.00MB, used=2.26MB
 Data_SSD: total=2.01GB, used=2.00GB
 root@debian-i386:~#

 Zhi Yong Wu (5):
   vfs: add one list_head field
   btrfs: add one new block group
   btrfs: add one hot relocation kthread
   procfs: add three proc interfaces
   btrfs: add hot relocation support

  fs/btrfs/Makefile|   3 +-
  fs/btrfs/ctree.h |  26 +-
  fs/btrfs/extent-tree.c   | 107 +-
  fs/btrfs/extent_io.c |  31 +-
  fs/btrfs/extent_io.h |   4 +
  fs/btrfs/file.c  |  36 +-
  fs/btrfs/hot_relocate.c  | 802 
 +++
  fs/btrfs/hot_relocate.h  |  48 +++
  fs/btrfs/inode-map.c |  13 +-
  fs/btrfs/inode.c |  92 -
  fs/btrfs/ioctl.c |  23 +-
  fs/btrfs/relocation.c|  14 +-
  fs/btrfs/super.c |  30 +-
  fs/btrfs/volumes.c   |  28 +-
  fs/hot_tracking.c|   1 +
  include/linux/btrfs.h|   4 +
  include/linux/hot_tracking.h |   1 +
  kernel/sysctl.c  |  22 ++
  18 files changed, 1234 insertions(+), 51 deletions(-)
  create mode 100644 fs/btrfs/hot_relocate.c
  create mode 100644

Re: [PATCH] btrfs-progs: update mkfs.btrfs help info for raid5/6

2013-04-21 Thread Zhi Yong Wu
I found this patch had been merged into David's unstable btrfs-progs
git, but not into its master git, so don't know the reason. If it need
me to send v2 based on your comments, please let me know, thanks.

On Sun, Apr 21, 2013 at 5:33 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 I found this patch had been merged into David's unstable btrfs-progs git,
 but not into its master git, so don't know the reason. If it need me to send
 v2 based on your comments, please let me know, thanks.



 On Sun, Apr 21, 2013 at 7:48 AM, Eric Sandeen sand...@redhat.com wrote:

 On 4/20/13 4:13 PM, Frank A. Kingswood wrote:
  On 04/03/13 01:37, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wuwu...@linux.vnet.ibm.com
 
 Since raid5/6 support was introduced, we should update mkfs.btrfs
  help info.
 
  Signed-off-by: Zhi Yong Wuwu...@linux.vnet.ibm.com
  ---
mkfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/mkfs.c b/mkfs.c
  index 5ece186..f9f26a5 100644
  --- a/mkfs.c
  +++ b/mkfs.c
  @@ -326,7 +326,7 @@ static void print_usage(void)
fprintf(stderr, options:\n);
fprintf(stderr, \t -A --alloc-start the offset to start the
  FS\n);
fprintf(stderr, \t -b --byte-count total number of bytes in the
  FS\n);
  -fprintf(stderr, \t -d --data data profile, raid0, raid1, raid10,
  dup or single\n);
  +fprintf(stderr, \t -d --data data profile, raid0, raid1, raid5,
  raid6, raid10, dup or single\n);
fprintf(stderr, \t -l --leafsize size of btree leaves\n);
fprintf(stderr, \t -L --label set a label\n);
fprintf(stderr, \t -m --metadata metadata profile, values like
  data profile\n);
 
  Nitpick: that line of output is now longer than 80. Perhaps instead of
  \t  have a few spaces at the start of each line.

 or maybe:

 +fprintf(stderr, \t -d --data data profile:
 [raid0|raid1|raid5|raid6|raid10|dup|single]\n);

 I think that fits...

  Frank
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs
  in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




 --
 Regards,

 Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] How to enable btrfs reserve block space from specific device?

2013-03-27 Thread Zhi Yong Wu
On Wed, Mar 27, 2013 at 7:34 PM, Ilya Dryomov idryo...@gmail.com wrote:
 On Wed, Mar 27, 2013 at 09:45:59AM +0800, Zhi Yong Wu wrote:
 HI,

 When i work on btrfs hot relocation feature, i hit one question
 about block reservation. btrfs hot relocation need to reserve block
 space from specific devices such as SSD, but current btrfs reserving
 code doesn't differentiate if block space is reserved from SSD or HDD.
 In order to make btrfs support this, I thought that we can introduce
 one new block group or flag, but this will maybe make large impact on
 current existing other functions. For this, does any guy have some
 better idea? thanks.

 Hi,

 Sorry for the late reply.  I have a lot on my plate right now, so I
No matter, thanks for you reply. it is definitely very helpful to me, thanks.
 haven't looked at your WIP code.  However, based on my knowledge, I can
 tell you that block reservation problem is completely orthogonal to the
 way you choose to handle hot data storage.  A separate block group is
 one way to do it, and probably the easiest one.  Block groups - chunks
 is a 1:1 mapping, so, for now, it might make sense to have one giant
Yes, bg:chunks is 1:1 mapping.
 block group spanning over the entire hot data device.  OTOH, IIRC the
If so,when hot relocation is enabled, will the other btrfs functions
reserve block space only from HDD? In this case, will SSD seem to be
only used as one cache?
 original implementation from IBM just used the rotation flag to detect
yes, it determine if one device is SSD by one flag in request queue of
that device.
 hot data devices.  You have a lot of choices here, but, if you are
 planning on implementing mirrored/striped hot data devices in future,
This is maybe next item, i will put it and meta relocation support in
my TODO list.
 I'd definitely go with block groups, since that'll give you some of the
 infrastructure for that for free.

 Thanks,

 Ilya



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] How to enable btrfs reserve block space from specific device?

2013-03-26 Thread Zhi Yong Wu
HI,

When i work on btrfs hot relocation feature, i hit one question
about block reservation. btrfs hot relocation need to reserve block
space from specific devices such as SSD, but current btrfs reserving
code doesn't differentiate if block space is reserved from SSD or HDD.
In order to make btrfs support this, I thought that we can introduce
one new block group or flag, but this will maybe make large impact on
current existing other functions. For this, does any guy have some
better idea? thanks.


-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [POSSIBLE SPAM] Re: Hybrid Storage proposal

2013-02-27 Thread Zhi Yong Wu
On Tue, Feb 26, 2013 at 9:08 PM, Matias Bjørling m...@itu.dk wrote:
 On 02/26/2013 12:04 PM, Zhi Yong Wu wrote:

 HI,

 It's a bit long so that i haven't read its whole, but i want to know
 if it has any collision with my ongoing feature btrfs hot
 relocation/migration?


 It will utilize the hot track patch set you're been creating for VFS. I
 think the methods are complementary. Do you have a description of the
 relocation/migration approach?
sorry, no currently. i will post out them after they are completed.




 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hybrid Storage proposal

2013-02-26 Thread Zhi Yong Wu
 times.

 Implementation details:
   - This can be implemented on a per-tree basis. E.g. specific extent
 trees, checksum tree or other frequently updated tree.

   * Data updates

 Data are placed two places. If small, directly inside the tree leafs or if
 larger, within extents. If an inode is known to be hot, we cache the 
 updates.

  - Write-through cache for user data

If the cache isn't safe, i.e. no duplicate copies. The cache can still be
used using write-through and cache subsequent reads.

This is similar to a pure disk block-based cache approach.

 2.5 Other

  - Warmup cache at mount time

Reread the SSD cache on mount to enjoy a preheated cache of the bulk 
 storage.

This can be archived by storing information about the cache and reconstruct
the cache tree.

  - (By David Sterba) Implement appropriate debugfs/sysfs hooks for monitoring
the cache and get information about the size of trees. This is useful for
deciding if a tree should be cached on an SSD or not. E.g. the checksum 
 tree
might always be in memory, but if it isn't, it should be cached on the SSD
storage to lower checksum tree seeks on the bulk storage.

 2.6 Summary

 The following list of items have to be addressed for the first full patchset:

  - Cache lookup
  - Cache type (write through, write back, hot tracking, etc.)
  - Data structure for lookup cache
  - Allow for prioritized storage (e.g. PCMSSDHDD)
  - Eviction strategy. LRU, LFU, FIFO, Temparature-based (VFS hot track)
  - Disk layout for cache storage

 Here we presented our design space for a hybrid drive solution, as well as
 what it would take to carry it out. We are very much open to any kind of 
 input,
 feedback or new ideas you might have.

 Sincerely,
 Matias  Jesper

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 02/19] vfs: initialize and free data structures

2012-11-15 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 6:24 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:44PM +0800, zwu.ker...@gmail.com wrote:
 +/* Frees the entire hot_range_tree. */
 +static void hot_inode_item_free(struct kref *kref)
 +{
 + struct hot_comm_item *comm_item = container_of(kref,
 + struct hot_comm_item, refs);
 + struct hot_inode_item *he = container_of(comm_item,
 + struct hot_inode_item, hot_inode);
 +
 + hot_range_tree_free(he);
 + radix_tree_delete(he-hot_inode_tree, he-i_ino);

 void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)

 and he::i_ino is u64, this will not work when
 sizeof(unsigned long) != sizeof(u64) (iirc this is a known limitation of
 radix tree implementation). This will work on 64bit only, not sure if
 this is intentional.
Fixed, thanks.

 + kmem_cache_free(hot_inode_item_cachep, he);
 +}
 +
 +/* Frees the entire hot_inode_tree. */
 +static void hot_inode_tree_exit(struct hot_info *root)
 +{
 + struct hot_inode_item *hi_nodes[8];
 + u64 ino = 0;
 + int i, n;

 nitpick, put the declarations on separate lines

 +
 + while (1) {
 + spin_lock(root-lock);
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n) {
 + spin_unlock(root-lock);
 + break;
 + }
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++)
 + hot_inode_item_put(hi_nodes[i]);
 + spin_unlock(root-lock);
 + }
 +}
 +
  /*
   * Initialize kmem cache for hot_inode_item and hot_range_item.
   */
 @@ -106,3 +197,36 @@ err:
   kmem_cache_destroy(hot_inode_item_cachep);
  }
  EXPORT_SYMBOL_GPL(hot_cache_init);
 +
 +/*
 + * Initialize the data structures for hot data tracking.
 + */
 +int hot_track_init(struct super_block *sb)
 +{
 + struct hot_info *root;
 + int ret = -ENOMEM;
 +
 + root = kzalloc(sizeof(struct hot_info), GFP_NOFS);
 + if (!root) {
 + printk(KERN_ERR %s: Failed to malloc memory for 
 + hot_info\n, __func__);
 + return ret;

 minor: you can drop the variable ret and just reurn ENOMEM here

 + }
 +
 + sb-s_hot_root = root;
 + hot_inode_tree_init(root);
 +
 + printk(KERN_INFO VFS: Turning on hot data tracking\n);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(hot_track_init);

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 16/19] btrfs: add hot tracking support

2012-11-07 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 8:00 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:58PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Introduce one new mount option '-o hot_track',
 and add its parsing support.
   Its usage looks like:
mount -o hot_track
mount -o nouser,hot_track
mount -o nouser,hot_track,loop
mount -o hot_track,nouser

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 Reviewed-by: David Sterba dste...@suse.cz
thanks for your review.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-07 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 6:45 AM, Darrick J. Wong darrick.w...@oracle.com wrote:
 On Mon, Oct 29, 2012 at 12:30:45PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add some util helpers to update access frequencies
 for one file or its range.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|  179 
 ++
  fs/hot_tracking.h|7 ++
  include/linux/hot_tracking.h |2 +
  3 files changed, 188 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 68591f0..0a7d9a3 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -172,6 +172,137 @@ static void hot_inode_tree_exit(struct hot_info *root)
   }
  }

 +struct hot_inode_item
 +*hot_inode_item_find(struct hot_info *root, u64 ino)
 +{
 + struct hot_inode_item *he;
 + int ret;
 +
 +again:
 + spin_lock(root-lock);
 + he = radix_tree_lookup(root-hot_inode_tree, ino);
 + if (he) {
 + kref_get(he-hot_inode.refs);
 + spin_unlock(root-lock);
 + return he;
 + }
 + spin_unlock(root-lock);
 +
 + he = kmem_cache_zalloc(hot_inode_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 + if (!he)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_inode_item_init(he, ino, root-hot_inode_tree);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(root-lock);
 + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 +
 + kref_get(he-hot_inode.refs);
 + return he;
 +}
 +EXPORT_SYMBOL_GPL(hot_inode_item_find);
 +
 +static struct hot_range_item
 +*hot_range_item_find(struct hot_inode_item *he,
 + u32 start)
 +{
 + struct hot_range_item *hr;
 + int ret;
 +
 +again:
 + spin_lock(he-lock);
 + hr = radix_tree_lookup(he-hot_range_tree, start);
 + if (hr) {
 + kref_get(hr-hot_range.refs);
 + spin_unlock(he-lock);
 + return hr;
 + }
 + spin_unlock(he-lock);
 +
 + hr = kmem_cache_zalloc(hot_range_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 + if (!hr)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_range_item_init(hr, start, he);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_range_item_cachep, hr);
 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(he-lock);
 + ret = radix_tree_insert(he-hot_range_tree, start, hr);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_range_item_cachep, hr);
 + spin_unlock(he-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(he-lock);
 + radix_tree_preload_end();
 +
 + kref_get(hr-hot_range.refs);
 + return hr;
 +}
 +
 +/*
 + * This function does the actual work of updating
 + * the frequency numbers, whatever they turn out to be.
 + */
 +static u64 hot_average_update(struct timespec old_atime,
 + struct timespec cur_time, u64 old_avg)
 +{
 + struct timespec delta_ts;
 + u64 new_avg;
 + u64 new_delta;
 +
 + delta_ts = timespec_sub(cur_time, old_atime);
 + new_delta = timespec_to_ns(delta_ts)  FREQ_POWER;
 +
 + new_avg = (old_avg  FREQ_POWER) - old_avg + new_delta;
 + new_avg = new_avg  FREQ_POWER;
 +
 + return new_avg;
 +}
 +
 +static void hot_freq_data_update(struct hot_freq_data *freq_data, bool 
 write)
 +{
 + struct timespec cur_time = current_kernel_time();
 +
 + if (write) {
 + freq_data-nr_writes += 1;
 + freq_data-avg_delta_writes = hot_average_update(
 + freq_data-last_write_time,
 + cur_time,
 + freq_data-avg_delta_writes);
 + freq_data-last_write_time = cur_time;
 + } else {
 + freq_data-nr_reads += 1;
 + freq_data-avg_delta_reads = hot_average_update(
 + freq_data-last_read_time,
 + cur_time,
 + freq_data-avg_delta_reads);

 I think you could just pass in a pointer to
 freq_data-avg_delta_{writes,reads} here...
why?

 + freq_data-last_read_time = cur_time;
 + }
 +}
 +
  /*
   * Initialize kmem cache for hot_inode_item and hot_range_item.
   */
 @@ -199,6 +330,54 @@ err:
  EXPORT_SYMBOL_GPL(hot_cache_init);

  /*
 + * Main function to update access frequency from read/writepage

Re: [RFC v4+ hot_track 10/19] vfs: introduce hot func register framework

2012-11-07 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 7:30 AM, Darrick J. Wong darrick.w...@oracle.com wrote:
 On Mon, Oct 29, 2012 at 12:30:52PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Introduce one framwork to enable that specific FS
 can register its own hot tracking functions.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   78 
 ++
  include/linux/hot_tracking.h |   25 +
  2 files changed, 96 insertions(+), 7 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 0ef9cad..c6c6138 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -24,6 +24,9 @@
  #include linux/limits.h
  #include hot_tracking.h

 +static DEFINE_SPINLOCK(hot_func_list_lock);
 +static LIST_HEAD(hot_func_list);
 +
  /* kmem_cache pointers for slab caches */
  static struct kmem_cache *hot_inode_item_cachep __read_mostly;
  static struct kmem_cache *hot_range_item_cachep __read_mostly;
 @@ -305,20 +308,23 @@ static u64 hot_average_update(struct timespec 
 old_atime,
   return new_avg;
  }

 -static void hot_freq_data_update(struct hot_freq_data *freq_data, bool 
 write)
 +static void hot_freq_data_update(struct hot_info *root,
 + struct hot_freq_data *freq_data, bool write)
  {
   struct timespec cur_time = current_kernel_time();

   if (write) {
   freq_data-nr_writes += 1;
 - freq_data-avg_delta_writes = hot_average_update(
 + freq_data-avg_delta_writes =
 + root-hot_func_type-ops.hot_rw_freq_calc_fn(
   freq_data-last_write_time,
   cur_time,
   freq_data-avg_delta_writes);
   freq_data-last_write_time = cur_time;
   } else {
   freq_data-nr_reads += 1;
 - freq_data-avg_delta_reads = hot_average_update(
 + freq_data-avg_delta_reads =
 + root-hot_func_type-ops.hot_rw_freq_calc_fn(
   freq_data-last_read_time,
   cur_time,
   freq_data-avg_delta_reads);
 @@ -430,7 +436,7 @@ static void hot_map_array_update(struct hot_freq_data 
 *freq_data,
   struct hot_comm_item *comm_item;
   struct hot_inode_item *he;
   struct hot_range_item *hr;
 - u32 temp = hot_temp_calc(freq_data);
 + u32 temp = root-hot_func_type-ops.hot_temp_calc_fn(freq_data);
   u8 a_temp = temp  (32 - HEAT_MAP_BITS);
   u8 b_temp = freq_data-last_temp  (32 - HEAT_MAP_BITS);

 @@ -511,7 +517,7 @@ static void hot_range_update(struct hot_inode_item *he,
   hr_nodes[i]-hot_range.hot_freq_data, root);

   spin_lock(hr_nodes[i]-hot_range.lock);
 - obsolete = hot_is_obsolete(
 + obsolete = root-hot_func_type-ops.hot_is_obsolete_fn(
   hr_nodes[i]-hot_range.hot_freq_data);
   spin_unlock(hr_nodes[i]-hot_range.lock);

 @@ -668,7 +674,7 @@ void hot_update_freqs(struct inode *inode, u64 start,
   }

   spin_lock(he-hot_inode.lock);
 - hot_freq_data_update(he-hot_inode.hot_freq_data, rw);
 + hot_freq_data_update(root, he-hot_inode.hot_freq_data, rw);
   spin_unlock(he-hot_inode.lock);

   /*
 @@ -685,7 +691,7 @@ void hot_update_freqs(struct inode *inode, u64 start,
   }

   spin_lock(hr-hot_range.lock);
 - hot_freq_data_update(hr-hot_range.hot_freq_data, rw);
 + hot_freq_data_update(root, hr-hot_range.hot_freq_data, rw);
   spin_unlock(hr-hot_range.lock);

   hot_range_item_put(hr);
 @@ -695,6 +701,61 @@ void hot_update_freqs(struct inode *inode, u64 start,
  }
  EXPORT_SYMBOL_GPL(hot_update_freqs);

 +static struct hot_func_type hot_func_def = {
 + .hot_func_name = hot_type_def,
 + .ops = {
 + .hot_rw_freq_calc_fn = hot_average_update,
 + .hot_temp_calc_fn= hot_temp_calc,
 + .hot_is_obsolete_fn  = hot_is_obsolete,
 + },
 +};

 If these hot_ops are per-filesystem, why not just embed a struct hot_func_ops
 inside of struct file_system_type?  That eliminates this _get function,
this _get function is very small, only some loc, if hot_func_ops is
embedded in struct file_system_type, i am afraid to introduce some
regressions
 collision avoidance, etc.  You can fill in NULL function pointers in
fill in NULL func pointer? why?
 hot_track_init (or just code around them).

 --D

 +
 +static struct hot_func_type *hot_func_get(const char *name)
 +{
 + struct hot_func_type *f, *h = hot_func_def;
 +
 + spin_lock(hot_func_list_lock);
 + list_for_each_entry(f, hot_func_list, list) {
 + if (!strcmp(f-hot_func_name, name))
 + h = f;
 + }
 + spin_unlock(hot_func_list_lock

Re: [PATCH] xfs: add hot tracking support.

2012-11-07 Thread Zhi Yong Wu
HI, Dave,

I guess that you should add some hot tracking stuff in some
xfs_show_xxx function, right?

On Tue, Oct 16, 2012 at 8:04 AM, Dave Chinner da...@fromorbit.com wrote:

 From: Dave Chinner dchin...@redhat.com

 Connect up the VFS hot tracking support so XFS filesystems can make
 use of it.

 Signed-off-by: Dave Chinner dchin...@redhat.com
 ---
  fs/xfs/xfs_mount.h |1 +
  fs/xfs/xfs_super.c |9 +
  2 files changed, 10 insertions(+)

 diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
 index a631ca3..d5e7277 100644
 --- a/fs/xfs/xfs_mount.h
 +++ b/fs/xfs/xfs_mount.h
 @@ -215,6 +215,7 @@ typedef struct xfs_mount {
  #define XFS_MOUNT_WSYNC(1ULL  0) /* for nfs - all 
 metadata ops
must be synchronous except
for space allocations */
 +#define XFS_MOUNT_HOTTRACK (1ULL  1) /* hot inode tracking */
  #define XFS_MOUNT_WAS_CLEAN(1ULL  3)
  #define XFS_MOUNT_FS_SHUTDOWN  (1ULL  4) /* atomic stop of all 
 filesystem
operations, typically for
 diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
 index 56c2537..17786ff 100644
 --- a/fs/xfs/xfs_super.c
 +++ b/fs/xfs/xfs_super.c
 @@ -61,6 +61,7 @@
  #include linux/kthread.h
  #include linux/freezer.h
  #include linux/parser.h
 +#include linux/hot_tracking.h

  static const struct super_operations xfs_super_operations;
  static kmem_zone_t *xfs_ioend_zone;
 @@ -114,6 +115,7 @@ mempool_t *xfs_ioend_pool;
  #define MNTOPT_NODELAYLOG  nodelaylog/* Delayed logging disabled */
  #define MNTOPT_DISCARDdiscard/* Discard unused blocks */
  #define MNTOPT_NODISCARD   nodiscard /* Do not discard unused blocks */
 +#define MNTOPT_HOTTRACKhot_track /* hot inode tracking */

  /*
   * Table driven mount option parser.
 @@ -371,6 +373,8 @@ xfs_parseargs(
 mp-m_flags |= XFS_MOUNT_DISCARD;
 } else if (!strcmp(this_char, MNTOPT_NODISCARD)) {
 mp-m_flags = ~XFS_MOUNT_DISCARD;
 +   } else if (!strcmp(this_char, MNTOPT_HOTTRACK)) {
 +   mp-m_flags |= XFS_MOUNT_HOTTRACK;
 } else if (!strcmp(this_char, ihashsize)) {
 xfs_warn(mp,
 ihashsize no longer used, option is deprecated.);
 @@ -1040,6 +1044,9 @@ xfs_fs_put_super(
  {
 struct xfs_mount*mp = XFS_M(sb);

 +   if (mp-m_flags  XFS_MOUNT_HOTTRACK)
 +   hot_track_exit(sb);
 +
 xfs_filestream_unmount(mp);
 xfs_unmountfs(mp);

 @@ -1470,6 +1477,8 @@ xfs_fs_fill_super(
 error = ENOMEM;
 goto out_unmount;
 }
 +   if (mp-m_flags  XFS_MOUNT_HOTTRACK)
 +   hot_track_init(sb);

 return 0;




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 00/19] vfs: hot data tracking

2012-11-07 Thread Zhi Yong Wu
On Mon, Oct 29, 2012 at 12:30 PM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 NOTE:

   The patchset can be obtained via my kernel dev git on github:
 g...@github.com:wuzhy/kernel.git hot_tracking
   If you're interested, you can also can review them via
 https://github.com/wuzhy/kernel/commits/hot_tracking
hi, guys,

The latest code change has been pushed into my above dev git tree. If
no further comments are done, i will post next version soon.


   For more info, please check hot_tracking.txt in Documentation

 TODO List:

  1.) Need to do scalability or performance tests.
  2.) Need one simpler but effective temp calc'ing function
  3.) How to save the file temperature among the umount to be able to
  preserve the file tempreture after reboot

 Ben Chociej, Matt Lupfer and Conor Scott originally wrote this code to
  be very btrfs-specific.  I've taken their code and attempted to
 make it more generic and integrate it at the VFS level.

 Changelog from v3:
  1.) Rewritten debugfs support based seq_file operation. [Dave Chinner]
  2.) Refactored workqueue support. [Dave Chinner]
  3.) Turn some Micro into be tunable   [Zhiyong, Zheng Liu]
TIME_TO_KICK, and HEAT_UPDATE_DELAY
  4.) Introduce hot func registering framework [Zhiyong]
  5.) Remove global variable for hot tracking [Zhiyong]
  6.) Add xfs hot tracking support [Dave Chinner]
  7.) Add ext4 hot tracking support [Zheng Liu]
  8.) Cleanedup a lot of other issues [Dave Chinner]

 v3:
  1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
  2.) Added memory shrinker [Dave Chinner]

 v2:
  1.) Converted to one workqueue to update map info periodically [Dave Chinner]
  2.) Cleanedup a lot of other issues [Dave Chinner]

 v1:
  1.) Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner]
  2.) Add btrfs hot tracking support [Zhiyong]
  3.) The first three patches can probably just be flattened into one.
 [Marco Stornelli , Dave Chinner]

 Dave Chinner (1):
   xfs: add hot tracking support

 Zheng Liu (1):
   ext4: add hot tracking support

 Zhi Yong Wu (17):
   vfs: introduce private radix tree structures
   vfs: initialize and free data structures
   vfs: add I/O frequency update function
   vfs: add two map arrays
   vfs: add hooks to enable hot tracking
   vfs: add temp calculation function
   vfs: add map info update function
   vfs: add aging function
   vfs: add one work queue
   vfs: introduce hot func register framework
   vfs: register one shrinker
   vfs: add one ioctl interface
   debugfs: introduce one function
   vfs: add debugfs support
   sysfs: add two hot_track proc files
   btrfs: add hot tracking support
   vfs: add documentation

  Documentation/filesystems/00-INDEX |2 +
  Documentation/filesystems/hot_tracking.txt |  262 ++
  fs/Makefile|2 +-
  fs/btrfs/ctree.h   |1 +
  fs/btrfs/super.c   |   22 +-
  fs/compat_ioctl.c  |5 +
  fs/dcache.c|2 +
  fs/debugfs/inode.c |   26 +
  fs/direct-io.c |6 +
  fs/ext4/ext4.h |3 +
  fs/ext4/super.c|   13 +-
  fs/hot_tracking.c  | 1367 
 
  fs/hot_tracking.h  |   58 ++
  fs/ioctl.c |   78 ++
  fs/xfs/xfs_mount.h |1 +
  fs/xfs/xfs_super.c |   16 +
  include/linux/debugfs.h|9 +
  include/linux/fs.h |4 +
  include/linux/hot_tracking.h   |  149 +++
  kernel/sysctl.c|   14 +
  mm/filemap.c   |6 +
  mm/page-writeback.c|   12 +
  mm/readahead.c |6 +
  23 files changed, 2061 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/filesystems/hot_tracking.txt
  create mode 100644 fs/hot_tracking.c
  create mode 100644 fs/hot_tracking.h
  create mode 100644 include/linux/hot_tracking.h

 --
 1.7.6.5

 --
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-07 Thread Zhi Yong Wu
On Thu, Nov 8, 2012 at 2:49 AM, Darrick J. Wong darrick.w...@oracle.com wrote:
 On Wed, Nov 07, 2012 at 04:27:05PM +0800, Zhi Yong Wu wrote:
 On Wed, Nov 7, 2012 at 6:45 AM, Darrick J. Wong darrick.w...@oracle.com 
 wrote:
  On Mon, Oct 29, 2012 at 12:30:45PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Add some util helpers to update access frequencies
  for one file or its range.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/hot_tracking.c|  179 
  ++
   fs/hot_tracking.h|7 ++
   include/linux/hot_tracking.h |2 +
   3 files changed, 188 insertions(+), 0 deletions(-)
 
  diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
  index 68591f0..0a7d9a3 100644
  --- a/fs/hot_tracking.c
  +++ b/fs/hot_tracking.c
  @@ -172,6 +172,137 @@ static void hot_inode_tree_exit(struct hot_info 
  *root)
}
   }
 
  +struct hot_inode_item
  +*hot_inode_item_find(struct hot_info *root, u64 ino)
  +{
  + struct hot_inode_item *he;
  + int ret;
  +
  +again:
  + spin_lock(root-lock);
  + he = radix_tree_lookup(root-hot_inode_tree, ino);
  + if (he) {
  + kref_get(he-hot_inode.refs);
  + spin_unlock(root-lock);
  + return he;
  + }
  + spin_unlock(root-lock);
  +
  + he = kmem_cache_zalloc(hot_inode_item_cachep,
  + GFP_KERNEL | GFP_NOFS);
  + if (!he)
  + return ERR_PTR(-ENOMEM);
  +
  + hot_inode_item_init(he, ino, root-hot_inode_tree);
  +
  + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
  + if (ret) {
  + kmem_cache_free(hot_inode_item_cachep, he);
  + return ERR_PTR(ret);
  + }
  +
  + spin_lock(root-lock);
  + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
  + if (ret == -EEXIST) {
  + kmem_cache_free(hot_inode_item_cachep, he);
  + spin_unlock(root-lock);
  + radix_tree_preload_end();
  + goto again;
  + }
  + spin_unlock(root-lock);
  + radix_tree_preload_end();
  +
  + kref_get(he-hot_inode.refs);
  + return he;
  +}
  +EXPORT_SYMBOL_GPL(hot_inode_item_find);
  +
  +static struct hot_range_item
  +*hot_range_item_find(struct hot_inode_item *he,
  + u32 start)
  +{
  + struct hot_range_item *hr;
  + int ret;
  +
  +again:
  + spin_lock(he-lock);
  + hr = radix_tree_lookup(he-hot_range_tree, start);
  + if (hr) {
  + kref_get(hr-hot_range.refs);
  + spin_unlock(he-lock);
  + return hr;
  + }
  + spin_unlock(he-lock);
  +
  + hr = kmem_cache_zalloc(hot_range_item_cachep,
  + GFP_KERNEL | GFP_NOFS);
  + if (!hr)
  + return ERR_PTR(-ENOMEM);
  +
  + hot_range_item_init(hr, start, he);
  +
  + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
  + if (ret) {
  + kmem_cache_free(hot_range_item_cachep, hr);
  + return ERR_PTR(ret);
  + }
  +
  + spin_lock(he-lock);
  + ret = radix_tree_insert(he-hot_range_tree, start, hr);
  + if (ret == -EEXIST) {
  + kmem_cache_free(hot_range_item_cachep, hr);
  + spin_unlock(he-lock);
  + radix_tree_preload_end();
  + goto again;
  + }
  + spin_unlock(he-lock);
  + radix_tree_preload_end();
  +
  + kref_get(hr-hot_range.refs);
  + return hr;
  +}
  +
  +/*
  + * This function does the actual work of updating
  + * the frequency numbers, whatever they turn out to be.
  + */
  +static u64 hot_average_update(struct timespec old_atime,
  + struct timespec cur_time, u64 old_avg)
  +{
  + struct timespec delta_ts;
  + u64 new_avg;
  + u64 new_delta;
  +
  + delta_ts = timespec_sub(cur_time, old_atime);
  + new_delta = timespec_to_ns(delta_ts)  FREQ_POWER;
  +
  + new_avg = (old_avg  FREQ_POWER) - old_avg + new_delta;
  + new_avg = new_avg  FREQ_POWER;
  +
  + return new_avg;
  +}
  +
  +static void hot_freq_data_update(struct hot_freq_data *freq_data, bool 
  write)
  +{
  + struct timespec cur_time = current_kernel_time();
  +
  + if (write) {
  + freq_data-nr_writes += 1;
  + freq_data-avg_delta_writes = hot_average_update(
  + freq_data-last_write_time,
  + cur_time,
  + freq_data-avg_delta_writes);
  + freq_data-last_write_time = cur_time;
  + } else {
  + freq_data-nr_reads += 1;
  + freq_data-avg_delta_reads = hot_average_update(
  + freq_data-last_read_time,
  + cur_time,
  + freq_data-avg_delta_reads);
 
  I think you could just pass in a pointer to
  freq_data-avg_delta_{writes

Re: [RFC v4+ hot_track 10/19] vfs: introduce hot func register framework

2012-11-07 Thread Zhi Yong Wu
On Thu, Nov 8, 2012 at 2:58 AM, Darrick J. Wong darrick.w...@oracle.com wrote:
 On Wed, Nov 07, 2012 at 04:34:35PM +0800, Zhi Yong Wu wrote:
 On Wed, Nov 7, 2012 at 7:30 AM, Darrick J. Wong darrick.w...@oracle.com 
 wrote:
  On Mon, Oct 29, 2012 at 12:30:52PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Introduce one framwork to enable that specific FS
  can register its own hot tracking functions.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/hot_tracking.c|   78 
  ++
   include/linux/hot_tracking.h |   25 +
   2 files changed, 96 insertions(+), 7 deletions(-)
 
  diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
  index 0ef9cad..c6c6138 100644
  --- a/fs/hot_tracking.c
  +++ b/fs/hot_tracking.c
  @@ -24,6 +24,9 @@
   #include linux/limits.h
   #include hot_tracking.h
 
  +static DEFINE_SPINLOCK(hot_func_list_lock);
  +static LIST_HEAD(hot_func_list);
  +
   /* kmem_cache pointers for slab caches */
   static struct kmem_cache *hot_inode_item_cachep __read_mostly;
   static struct kmem_cache *hot_range_item_cachep __read_mostly;
  @@ -305,20 +308,23 @@ static u64 hot_average_update(struct timespec 
  old_atime,
return new_avg;
   }
 
  -static void hot_freq_data_update(struct hot_freq_data *freq_data, bool 
  write)
  +static void hot_freq_data_update(struct hot_info *root,
  + struct hot_freq_data *freq_data, bool write)
   {
struct timespec cur_time = current_kernel_time();
 
if (write) {
freq_data-nr_writes += 1;
  - freq_data-avg_delta_writes = hot_average_update(
  + freq_data-avg_delta_writes =
  + root-hot_func_type-ops.hot_rw_freq_calc_fn(
freq_data-last_write_time,
cur_time,
freq_data-avg_delta_writes);
freq_data-last_write_time = cur_time;
} else {
freq_data-nr_reads += 1;
  - freq_data-avg_delta_reads = hot_average_update(
  + freq_data-avg_delta_reads =
  + root-hot_func_type-ops.hot_rw_freq_calc_fn(
freq_data-last_read_time,
cur_time,
freq_data-avg_delta_reads);
  @@ -430,7 +436,7 @@ static void hot_map_array_update(struct hot_freq_data 
  *freq_data,
struct hot_comm_item *comm_item;
struct hot_inode_item *he;
struct hot_range_item *hr;
  - u32 temp = hot_temp_calc(freq_data);
  + u32 temp = root-hot_func_type-ops.hot_temp_calc_fn(freq_data);
u8 a_temp = temp  (32 - HEAT_MAP_BITS);
u8 b_temp = freq_data-last_temp  (32 - HEAT_MAP_BITS);
 
  @@ -511,7 +517,7 @@ static void hot_range_update(struct hot_inode_item 
  *he,
hr_nodes[i]-hot_range.hot_freq_data, 
  root);
 
spin_lock(hr_nodes[i]-hot_range.lock);
  - obsolete = hot_is_obsolete(
  + obsolete = 
  root-hot_func_type-ops.hot_is_obsolete_fn(

  hr_nodes[i]-hot_range.hot_freq_data);
spin_unlock(hr_nodes[i]-hot_range.lock);
 
  @@ -668,7 +674,7 @@ void hot_update_freqs(struct inode *inode, u64 start,
}
 
spin_lock(he-hot_inode.lock);
  - hot_freq_data_update(he-hot_inode.hot_freq_data, rw);
  + hot_freq_data_update(root, he-hot_inode.hot_freq_data, rw);
spin_unlock(he-hot_inode.lock);
 
/*
  @@ -685,7 +691,7 @@ void hot_update_freqs(struct inode *inode, u64 start,
}
 
spin_lock(hr-hot_range.lock);
  - hot_freq_data_update(hr-hot_range.hot_freq_data, rw);
  + hot_freq_data_update(root, hr-hot_range.hot_freq_data, 
  rw);
spin_unlock(hr-hot_range.lock);
 
hot_range_item_put(hr);
  @@ -695,6 +701,61 @@ void hot_update_freqs(struct inode *inode, u64 start,
   }
   EXPORT_SYMBOL_GPL(hot_update_freqs);
 
  +static struct hot_func_type hot_func_def = {
  + .hot_func_name = hot_type_def,
  + .ops = {
  + .hot_rw_freq_calc_fn = hot_average_update,
  + .hot_temp_calc_fn= hot_temp_calc,
  + .hot_is_obsolete_fn  = hot_is_obsolete,
  + },
  +};
 
  If these hot_ops are per-filesystem, why not just embed a struct 
  hot_func_ops
  inside of struct file_system_type?  That eliminates this _get function,
 this _get function is very small, only some loc, if hot_func_ops is
 embedded in struct file_system_type, i am afraid to introduce some
 regressions

 What kind of regressions are you afraid of, specifically?  I don't think 
 fstype
 is performance-critical enough to worry about wreaking havoc in the caches due
 to adding three function pointers.
done, thanks.

  collision

Re: [RFC v4+ hot_track 02/19] vfs: initialize and free data structures

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 6:24 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:44PM +0800, zwu.ker...@gmail.com wrote:
 +/* Frees the entire hot_range_tree. */
 +static void hot_inode_item_free(struct kref *kref)
 +{
 + struct hot_comm_item *comm_item = container_of(kref,
 + struct hot_comm_item, refs);
 + struct hot_inode_item *he = container_of(comm_item,
 + struct hot_inode_item, hot_inode);
 +
 + hot_range_tree_free(he);
 + radix_tree_delete(he-hot_inode_tree, he-i_ino);

 void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)

 and he::i_ino is u64, this will not work when
 sizeof(unsigned long) != sizeof(u64) (iirc this is a known limitation of
 radix tree implementation). This will work on 64bit only, not sure if
 this is intentional.
i actually also realized this. Do you have a better way to handle this?

 + kmem_cache_free(hot_inode_item_cachep, he);
 +}
 +
 +/* Frees the entire hot_inode_tree. */
 +static void hot_inode_tree_exit(struct hot_info *root)
 +{
 + struct hot_inode_item *hi_nodes[8];
 + u64 ino = 0;
 + int i, n;

 nitpick, put the declarations on separate lines
Will it have any issue? It has passed the check of checkpatch.pl.


 +
 + while (1) {
 + spin_lock(root-lock);
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n) {
 + spin_unlock(root-lock);
 + break;
 + }
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++)
 + hot_inode_item_put(hi_nodes[i]);
 + spin_unlock(root-lock);
 + }
 +}
 +
  /*
   * Initialize kmem cache for hot_inode_item and hot_range_item.
   */
 @@ -106,3 +197,36 @@ err:
   kmem_cache_destroy(hot_inode_item_cachep);
  }
  EXPORT_SYMBOL_GPL(hot_cache_init);
 +
 +/*
 + * Initialize the data structures for hot data tracking.
 + */
 +int hot_track_init(struct super_block *sb)
 +{
 + struct hot_info *root;
 + int ret = -ENOMEM;
 +
 + root = kzalloc(sizeof(struct hot_info), GFP_NOFS);
 + if (!root) {
 + printk(KERN_ERR %s: Failed to malloc memory for 
 + hot_info\n, __func__);
 + return ret;

 minor: you can drop the variable ret and just reurn ENOMEM here
This variable will also be used in the following patches.


 + }
 +
 + sb-s_hot_root = root;
 + hot_inode_tree_init(root);
 +
 + printk(KERN_INFO VFS: Turning on hot data tracking\n);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(hot_track_init);

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 6:37 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:45PM +0800, zwu.ker...@gmail.com wrote:
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 +struct hot_inode_item
 +*hot_inode_item_find(struct hot_info *root, u64 ino)
 +{
 + struct hot_inode_item *he;
 + int ret;
 +
 +again:
 + spin_lock(root-lock);
 + he = radix_tree_lookup(root-hot_inode_tree, ino);
 + if (he) {
 + kref_get(he-hot_inode.refs);
 + spin_unlock(root-lock);
 + return he;
 + }
 + spin_unlock(root-lock);
 +
 + he = kmem_cache_zalloc(hot_inode_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 + if (!he)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_inode_item_init(he, ino, root-hot_inode_tree);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_inode_item_cachep, he);

 radix_tree_preload_end()

 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(root-lock);
 + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 +
 + kref_get(he-hot_inode.refs);
 + return he;
 +}
 +EXPORT_SYMBOL_GPL(hot_inode_item_find);
 +
 +static struct hot_range_item
 +*hot_range_item_find(struct hot_inode_item *he,
 + u32 start)
 +{
 + struct hot_range_item *hr;
 + int ret;
 +
 +again:
 + spin_lock(he-lock);
 + hr = radix_tree_lookup(he-hot_range_tree, start);
 + if (hr) {
 + kref_get(hr-hot_range.refs);
 + spin_unlock(he-lock);
 + return hr;
 + }
 + spin_unlock(he-lock);
 +
 + hr = kmem_cache_zalloc(hot_range_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 + if (!hr)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_range_item_init(hr, start, he);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_range_item_cachep, hr);

 radix_tree_preload_end()
I checked some kernel existing cases about the usage of
radix_tree_preload(), it seems that when radix_tree_preload() fail,
its error handling doesn't need call radix_tree_preload_end() any
more.

 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(he-lock);
 + ret = radix_tree_insert(he-hot_range_tree, start, hr);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_range_item_cachep, hr);
 + spin_unlock(he-lock);
 + radix_tree_preload_end();
ditto.
 + goto again;
 + }
 + spin_unlock(he-lock);
 + radix_tree_preload_end();
 +
 + kref_get(hr-hot_range.refs);
 + return hr;
 +}

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 05/19] vfs: add hooks to enable hot tracking

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 6:51 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:47PM +0800, zwu.ker...@gmail.com wrote:
 --- a/mm/readahead.c
 +++ b/mm/readahead.c
 @@ -19,6 +19,7 @@
  #include linux/pagemap.h
  #include linux/syscalls.h
  #include linux/file.h
 +#include linux/hot_tracking.h

  /*
   * Initialise a struct file's readahead state.  Assumes that the caller has
 @@ -138,6 +139,11 @@ static int read_pages(struct address_space *mapping, 
 struct file *filp,
  out:
   blk_finish_plug(plug);

 + /* Hot data tracking */
 + hot_update_freqs(mapping-host, (u64)(list_entry(pages-prev,\
 + struct page, lru)-index)  PAGE_CACHE_SHIFT,
 + (u64)nr_pages * PAGE_CACHE_SIZE, 0);

 There's a stale \ at the end of the line, and I find this formatting
 hard to read. Does the following look acceptable?
yes, great, thanks.

 hot_update_freqs(mapping-host,
 (u64)(list_entry(pages-prev, struct page, lru)-index)
  PAGE_CACHE_SHIFT,
 (u64)nr_pages * PAGE_CACHE_SIZE, 0);

 +
   return ret;
  }




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 10/19] vfs: introduce hot func register framework

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 7:14 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:52PM +0800, zwu.ker...@gmail.com wrote:
 +static struct hot_func_type *hot_func_get(const char *name)
 +{
 + struct hot_func_type *f, *h = hot_func_def;
 +
 + spin_lock(hot_func_list_lock);
 + list_for_each_entry(f, hot_func_list, list) {
 + if (!strcmp(f-hot_func_name, name))
 + h = f;

 You probably want to break here
Good catch, done, thanks.

 + }
 + spin_unlock(hot_func_list_lock);
 +
 + return h;
 +}
 +
 +int hot_func_register(struct hot_func_type *h)
 +{
 + struct hot_func_type *f, *t = NULL;
 +
 + /* register, don't allow duplicate names */
 + spin_lock(hot_func_list_lock);
 + list_for_each_entry(f, hot_func_list, list) {
 + if (!strcmp(f-hot_func_name, h-hot_func_name))
 + t = f;

 if duplicate names are not allowed, then a warning may make sense to
 let us know that something is wrong
done, thanks.

 + }
 +
 + if (t) {
 + spin_unlock(hot_func_list_lock);
 + return -EBUSY;
 + }
 +
 + list_add_tail(h-list, hot_func_list);
 + spin_unlock(hot_func_list_lock);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(hot_func_register);
 --- a/include/linux/hot_tracking.h
 +++ b/include/linux/hot_tracking.h
 @@ -73,6 +75,25 @@ struct hot_range_item {
   u32 len; /* length in bytes */
  };

 +typedef u64 (hot_rw_freq_calc_fn) (struct timespec old_atime,
 + struct timespec cur_time, u64 old_avg);
 +typedef u32 (hot_temp_calc_fn) (struct hot_freq_data *freq_data);
 +typedef bool (hot_is_obsolete_fn) (struct hot_freq_data *freq_data);

 I'm thinking, whether these typedefs are useful, similar ops structures
 do not introduce them, also when you pick a struct member names exactly
 same as the typedefs:

 +struct hot_func_ops {
 + hot_rw_freq_calc_fn *hot_rw_freq_calc_fn;
 + hot_temp_calc_fn *hot_temp_calc_fn;
 + hot_is_obsolete_fn *hot_is_obsolete_fn;
 +};

 My suggestion is to make the types explicit in the structure.
sorry, i don't get your point, can you elaborate it about how to do this?

 +/* identifies an hot func type */
 +struct hot_func_type {
 + char hot_func_name[HOT_NAME_MAX];

 'name' would be sufficient IMHO
done, thanks.

 + /* fields provided by specific FS */
 + struct hot_func_ops ops;
 + struct list_head list;
 +};

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 12/19] vfs: add one ioctl interface

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 7:30 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:54PM +0800, zwu.ker...@gmail.com wrote:
 +static int ioctl_heat_info(struct file *file, void __user *argp)
 +{
 + struct inode *inode = file-f_dentry-d_inode;
 + struct hot_heat_info *heat_info;
 + struct hot_inode_item *he;
 + int ret = 0;
 +
 + heat_info = kmalloc(sizeof(struct hot_heat_info),
 + GFP_KERNEL | GFP_NOFS);

 heat_info is small enough to fit onto the stack, so you can avoid the
 kmalloc, I don't think there are deep callstacks to be expected.
ok, done.
 Nevertheless, if you want to use kmalloc here, then please check the
 return value and use GFP_KERNEL.
thanks for your pointing out.

 +
 + if (copy_from_user((void *) heat_info,
 + argp,
 + sizeof(struct hot_heat_info)) != 0) {
 + ret = -EFAULT;
 + goto err;
 + }
 +
 + he = hot_inode_item_find(inode-i_sb-s_hot_root, inode-i_ino);
 + if (!he) {
 + /* we don't have any info on this file yet */
 + ret = -ENODATA;
 + goto err;
 + }
 +
 + spin_lock(he-hot_inode.lock);
 + heat_info-avg_delta_reads =
 + (__u64) he-hot_inode.hot_freq_data.avg_delta_reads;
 + heat_info-avg_delta_writes =
 + (__u64) he-hot_inode.hot_freq_data.avg_delta_writes;
 + heat_info-last_read_time =
 + (__u64) timespec_to_ns(he-hot_inode.hot_freq_data.last_read_time);
 + heat_info-last_write_time =
 + (__u64) timespec_to_ns(he-hot_inode.hot_freq_data.last_write_time);
 + heat_info-num_reads =
 + (__u32) he-hot_inode.hot_freq_data.nr_reads;
 + heat_info-num_writes =
 + (__u32) he-hot_inode.hot_freq_data.nr_writes;
 +
 + if (heat_info-live  0) {
 + /*
 +  * got a request for live temperature,
 +  * call hot_hash_calc_temperature to recalculate
 +  */
 + heat_info-temp =
 + inode-i_sb-s_hot_root-hot_func_type-ops.hot_temp_calc_fn(
 + he-hot_inode.hot_freq_data);
 + } else {
 + /* not live temperature, get it from the hashlist */
 + heat_info-temp = he-hot_inode.hot_freq_data.last_temp;
 + }
 + spin_unlock(he-hot_inode.lock);
 +
 + hot_inode_item_put(he);
 +
 + if (copy_to_user(argp, (void *) heat_info,
 + sizeof(struct hot_heat_info))) {
 + ret = -EFAULT;
 + goto err;
 + }
 +
 +err:
 + kfree(heat_info);
 + return ret;
 +}

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 14/19] vfs: add debugfs support

2012-11-06 Thread Zhi Yong Wu
On Wed, Nov 7, 2012 at 7:45 AM, David Sterba d...@jikos.cz wrote:
 On Mon, Oct 29, 2012 at 12:30:56PM +0800, zwu.ker...@gmail.com wrote:
 +static int hot_range_seq_show(struct seq_file *seq, void *v)
 +{
 + struct hot_range_item *hr = v;
 + struct hot_inode_item *he = hr-hot_inode;
 + struct hot_freq_data *freq_data = hr-hot_range.hot_freq_data;
 +
 + /* Always lock hot_inode_item first */
 + spin_lock(he-hot_inode.lock);
 + spin_lock(hr-hot_range.lock);
 + seq_printf(seq, inode #%llu, range start  \

 the # seems unnecessary to me
OK, removed.

 + %llu (range len %u) reads %u, writes %u, 
 + avg read time %llu, avg write time %llu, temp %u\n,

 compiler will complain if it sees a %llu format and not the expected
 type of 'unsigned long long'
When built, i haven't seen any warning report about this...

 + he-i_ino,

 (unsigned long long)he-i_ino,

 + (u64)hr-start * RANGE_SIZE,
 + hr-len,
 + freq_data-nr_reads,
 + freq_data-nr_writes,
 + freq_data-avg_delta_reads / NSEC_PER_MSEC,
 + freq_data-avg_delta_writes / NSEC_PER_MSEC,
 + freq_data-last_temp  (32 - HEAT_MAP_BITS));
 + spin_unlock(hr-hot_range.lock);
 + spin_unlock(he-hot_inode.lock);
 +
 + return 0;
 +}
 +
 +static int hot_inode_seq_show(struct seq_file *seq, void *v)
 +{
 + struct hot_inode_item *he = v;
 + struct hot_freq_data *freq_data = he-hot_inode.hot_freq_data;
 +
 + spin_lock(he-hot_inode.lock);
 + seq_printf(seq, inode #%llu, reads %u, writes %u,  \
 + avg read time %llu, avg write time %llu, temp %u\n,

 (same here)
ditto.

 + he-i_ino,
 + freq_data-nr_reads,
 + freq_data-nr_writes,
 + freq_data-avg_delta_reads / NSEC_PER_MSEC,
 + freq_data-avg_delta_writes / NSEC_PER_MSEC,
 + freq_data-last_temp  (32 - HEAT_MAP_BITS));
 + spin_unlock(he-hot_inode.lock);
 +
 + return 0;
 +}

 +static void *hot_spot_range_seq_next(struct seq_file *seq, void *v, loff_t 
 *pos)
 +{
 + struct hot_info *root = seq-private;
 + struct hot_range_item *hr_next, *hr = v;
 + struct hot_comm_item *comm_item;
 + struct list_head *n_list;
 + int i =
 +  hr-hot_range.hot_freq_data.last_temp  (32 - HEAT_MAP_BITS);

 now I have noticed that I've seen the ... (32 - HEAT_MAP_BITS)
 expression so many times that it tend to think it deserves a helper
 function
This helper function has existed, hot_raw_shift(), i will replace this with it.

 +
 + n_list = seq_list_next(hr-hot_range.n_list,
 + root-heat_range_map[i].node_list, pos);
 + hot_range_item_put(hr);
 +next:
 + if (n_list) {
 + comm_item = container_of(n_list,
 + struct hot_comm_item, n_list);
 + hr_next = container_of(comm_item,
 + struct hot_range_item, hot_range);
 + kref_get(hr_next-hot_range.refs);
 + return hr_next;
 + } else if (--i = 0) {
 + n_list = seq_list_next(root-heat_range_map[i].node_list,
 + root-heat_range_map[i].node_list, pos);
 + goto next;
 + }
 +
 + return NULL;
 +}
 +
 +static void hot_debugfs_exit(struct super_block *sb)
 +{
 + struct dentry *vol_dentry;
 +
 + vol_dentry = debugfs_get_dentry(sb-s_id,
 + sb-s_hot_root-debugfs_root, 
 strlen(sb-s_id));
 + /* remove all debugfs entries recursively from the volume root */
 + if (vol_dentry)
 + debugfs_remove_recursive(vol_dentry);
 + else
 + BUG_ON(1);

 BUG()
done, thanks.

 +
 + if (list_empty(sb-s_hot_root-debugfs_root-d_subdirs))
 + debugfs_remove(sb-s_hot_root-debugfs_root);
 +}
 +
 +/*

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 7:07 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add some util helpers to update access frequencies
 for one file or its range.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|  179 
 ++
  fs/hot_tracking.h|7 ++
  include/linux/hot_tracking.h |2 +
  3 files changed, 188 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 68591f0..0a7d9a3 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -172,6 +172,137 @@ static void hot_inode_tree_exit(struct hot_info *root)
   }
  }

 +struct hot_inode_item
 +*hot_inode_item_find(struct hot_info *root, u64 ino)
 +{
 + struct hot_inode_item *he;
 + int ret;
 +
 +again:
 + spin_lock(root-lock);
 + he = radix_tree_lookup(root-hot_inode_tree, ino);
 + if (he) {
 + kref_get(he-hot_inode.refs);
 + spin_unlock(root-lock);
 + return he;
 + }
 + spin_unlock(root-lock);
 +
 + he = kmem_cache_zalloc(hot_inode_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 This doesn't look quite right... which of these two did you mean? I
 assume probably just GFP_NOFS
Yes, good catch, thanks.

 + if (!he)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_inode_item_init(he, ino, root-hot_inode_tree);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(root-lock);
 + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 +
 + kref_get(he-hot_inode.refs);
 + return he;
 +}
 +EXPORT_SYMBOL_GPL(hot_inode_item_find);
 +
 +static struct hot_range_item
 +*hot_range_item_find(struct hot_inode_item *he,
 + u32 start)
 +{
 + struct hot_range_item *hr;
 + int ret;
 +
 +again:
 + spin_lock(he-lock);
 + hr = radix_tree_lookup(he-hot_range_tree, start);
 + if (hr) {
 + kref_get(hr-hot_range.refs);
 + spin_unlock(he-lock);
 + return hr;
 + }
 + spin_unlock(he-lock);
 +
 + hr = kmem_cache_zalloc(hot_range_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 Likewise, here too.
ditto

 Steve.






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add a per-superblock workqueue and a delayed_work
 to run periodic work to update map info on each superblock.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   85 
 ++
  fs/hot_tracking.h|3 +
  include/linux/hot_tracking.h |3 +
  3 files changed, 91 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index fff0038..0ef9cad 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -15,9 +15,12 @@
  #include linux/module.h
  #include linux/spinlock.h
  #include linux/hardirq.h
 +#include linux/kthread.h
 +#include linux/freezer.h
  #include linux/fs.h
  #include linux/blkdev.h
  #include linux/types.h
 +#include linux/list_sort.h
  #include linux/limits.h
  #include hot_tracking.h

 @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
   }
  }

 +/* Temperature compare function*/
 +static int hot_temp_cmp(void *priv, struct list_head *a,
 + struct list_head *b)
 +{
 + struct hot_comm_item *ap =
 + container_of(a, struct hot_comm_item, n_list);
 + struct hot_comm_item *bp =
 + container_of(b, struct hot_comm_item, n_list);
 +
 + int diff = ap-hot_freq_data.last_temp
 + - bp-hot_freq_data.last_temp;
 + if (diff  0)
 + return -1;
 + if (diff  0)
 + return 1;
 + return 0;
 +}
 +
 +/*
 + * Every sync period we update temperatures for
 + * each hot inode item and hot range item for aging
 + * purposes.
 + */
 +static void hot_update_worker(struct work_struct *work)
 +{
 + struct hot_info *root = container_of(to_delayed_work(work),
 + struct hot_info, update_work);
 + struct hot_inode_item *hi_nodes[8];
 + u64 ino = 0;
 + int i, n;
 +
 + while (1) {
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n)
 + break;
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++) {
 + kref_get(hi_nodes[i]-hot_inode.refs);
 + hot_map_array_update(
 + hi_nodes[i]-hot_inode.hot_freq_data, root);
 + hot_range_update(hi_nodes[i], root);
 + hot_inode_item_put(hi_nodes[i]);
 + }
 + }
 +
 + /* Sort temperature map info */
 + for (i = 0; i  HEAT_MAP_SIZE; i++) {
 + list_sort(NULL, root-heat_inode_map[i].node_list,
 + hot_temp_cmp);
 + list_sort(NULL, root-heat_range_map[i].node_list,
 + hot_temp_cmp);
 + }
 +

 If this list can potentially have one (or more) entries per inode, then
Only one hot_inode_item per inode, while maybe multiple
hot_range_items per inode.
 filesystems with a lot of inodes (millions) may potentially exceed the
 max size of list which list_sort() can handle. If that happens it still
 works, but you'll get a warning message and it won't be as efficient.
I haven't do so large scale test. If we want to find that issue, we
need to do large scale performance test, before that, i want to make
sure the code change is correct at first.
To be honest, for that issue you pointed to, i also have such
concern.But list_sort() performance looks good from the test result of
the following URL:
https://lkml.org/lkml/2010/1/20/485


 It is something that we've run into with list_sort() and GFS2, but it
 only happens very rarely,
Beside list_sort(), do you have any other way to share? For this
concern, how does GFS2 resolve it?


 Steve.






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 8:07 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-11-05 at 19:55 +0800, Zhi Yong Wu wrote:
 On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse swhit...@redhat.com 
 wrote:
  Hi,
 
  On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Add a per-superblock workqueue and a delayed_work
  to run periodic work to update map info on each superblock.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/hot_tracking.c|   85 
  ++
   fs/hot_tracking.h|3 +
   include/linux/hot_tracking.h |3 +
   3 files changed, 91 insertions(+), 0 deletions(-)
 
  diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
  index fff0038..0ef9cad 100644
  --- a/fs/hot_tracking.c
  +++ b/fs/hot_tracking.c
  @@ -15,9 +15,12 @@
   #include linux/module.h
   #include linux/spinlock.h
   #include linux/hardirq.h
  +#include linux/kthread.h
  +#include linux/freezer.h
   #include linux/fs.h
   #include linux/blkdev.h
   #include linux/types.h
  +#include linux/list_sort.h
   #include linux/limits.h
   #include hot_tracking.h
 
  @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
}
   }
 
  +/* Temperature compare function*/
  +static int hot_temp_cmp(void *priv, struct list_head *a,
  + struct list_head *b)
  +{
  + struct hot_comm_item *ap =
  + container_of(a, struct hot_comm_item, n_list);
  + struct hot_comm_item *bp =
  + container_of(b, struct hot_comm_item, n_list);
  +
  + int diff = ap-hot_freq_data.last_temp
  + - bp-hot_freq_data.last_temp;
  + if (diff  0)
  + return -1;
  + if (diff  0)
  + return 1;
  + return 0;
  +}
  +
  +/*
  + * Every sync period we update temperatures for
  + * each hot inode item and hot range item for aging
  + * purposes.
  + */
  +static void hot_update_worker(struct work_struct *work)
  +{
  + struct hot_info *root = container_of(to_delayed_work(work),
  + struct hot_info, update_work);
  + struct hot_inode_item *hi_nodes[8];
  + u64 ino = 0;
  + int i, n;
  +
  + while (1) {
  + n = radix_tree_gang_lookup(root-hot_inode_tree,
  +(void **)hi_nodes, ino,
  +ARRAY_SIZE(hi_nodes));
  + if (!n)
  + break;
  +
  + ino = hi_nodes[n - 1]-i_ino + 1;
  + for (i = 0; i  n; i++) {
  + kref_get(hi_nodes[i]-hot_inode.refs);
  + hot_map_array_update(
  + hi_nodes[i]-hot_inode.hot_freq_data, 
  root);
  + hot_range_update(hi_nodes[i], root);
  + hot_inode_item_put(hi_nodes[i]);
  + }
  + }
  +
  + /* Sort temperature map info */
  + for (i = 0; i  HEAT_MAP_SIZE; i++) {
  + list_sort(NULL, root-heat_inode_map[i].node_list,
  + hot_temp_cmp);
  + list_sort(NULL, root-heat_range_map[i].node_list,
  + hot_temp_cmp);
  + }
  +
 
  If this list can potentially have one (or more) entries per inode, then
 Only one hot_inode_item per inode, while maybe multiple
 hot_range_items per inode.
  filesystems with a lot of inodes (millions) may potentially exceed the
  max size of list which list_sort() can handle. If that happens it still
  works, but you'll get a warning message and it won't be as efficient.
 I haven't do so large scale test. If we want to find that issue, we
 need to do large scale performance test, before that, i want to make
 sure the code change is correct at first.
 To be honest, for that issue you pointed to, i also have such
 concern.But list_sort() performance looks good from the test result of
 the following URL:
 https://lkml.org/lkml/2010/1/20/485

 Yes, I think it is good. Also, even when it says that it's performance
 is poor (via the warning message) it is still much better than the
 alternative (of not sorting) in the GFS2 case. So currently our
 workaround is to ignore the warning. Due to what we using it for
 (sorting the data blocks for ordered writeback) we only see it very
 occasionally when there has been lots of data write activity with little
 journal activity on a node with lots of RAM.
OK.

 
  It is something that we've run into with list_sort() and GFS2, but it
  only happens very rarely,
 Beside list_sort(), do you have any other way to share? For this
 concern, how does GFS2 resolve it?

 That is an ongoing investigation :-)

 I've pondered various options... increase temp variable space in
 list_sort(), not using list_sort() and insertion sorting the blocks
 instead, flushing the ordered write data early if the list gets too
 long, figuring

Re: [RFC v4+ hot_track 00/19] vfs: hot data tracking

2012-10-29 Thread Zhi Yong Wu
On Mon, Oct 29, 2012 at 6:30 PM, Andi Kleen a...@firstfloor.org wrote:
 zwu.ker...@gmail.com writes:

 TODO List:

  1.) Need to do scalability or performance tests.

 You're changing some of the most performance critical code in the
 kernel. This step is absolutely not optional.
ah, i know, but now i need to make sure all the codes are correct at
first, then do these tests.


 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 13/19] debugfs: introduce one function

2012-10-29 Thread Zhi Yong Wu
On Tue, Oct 30, 2012 at 2:11 AM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Oct 29, 2012 at 12:30:55PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The debugfs function is used to get expected dentry.

 Huh?  Why do you need this?  Why haven't you added documentation for the
It is used to determine if one sysfs directory has been created. OK, i
will add some doc, thanks for your suggestion.

 function saying what it does?

 confused,

 greg k-h



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 13/19] debugfs: introduce one function

2012-10-29 Thread Zhi Yong Wu
On Tue, Oct 30, 2012 at 6:34 AM, Greg KH gre...@linuxfoundation.org wrote:
 On Tue, Oct 30, 2012 at 06:25:50AM +0800, Zhi Yong Wu wrote:
 On Tue, Oct 30, 2012 at 2:11 AM, Greg KH gre...@linuxfoundation.org wrote:
  On Mon, Oct 29, 2012 at 12:30:55PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
The debugfs function is used to get expected dentry.
 
  Huh?  Why do you need this?  Why haven't you added documentation for the
 It is used to determine if one sysfs directory has been created. OK, i
 will add some doc, thanks for your suggestion.

 You didn't answer the why part here.  How come you think you need
ah, Let me say its scenario at first. If we do two mount ops as below:
1.) mount -o loop,hot_track image1 /data1
2.) mount -o loop,hot_track image2 /data2

The mount -o hot_track operation will automatically create one sysfs
directory /sys/kernel/debug/hot_track. To prevent this dir being
created again when 2.) is done, we need to know if it has existed at
first. In my patch, i at first get its dentry by this new function,
then determine if its d_inode field is NULL, if no, it means that this
sysfs dir has existed.
This is the reason that i want to add one new function.

 this?  Can't you just save off the dentry you created somewhere so you
 don't need to look it up again?
Because i can't find one appropriate place to save it.

 greg k-h



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 13/19] debugfs: introduce one function

2012-10-29 Thread Zhi Yong Wu
On Tue, Oct 30, 2012 at 6:54 AM, Greg KH gre...@linuxfoundation.org wrote:
 On Tue, Oct 30, 2012 at 06:45:19AM +0800, Zhi Yong Wu wrote:
 On Tue, Oct 30, 2012 at 6:34 AM, Greg KH gre...@linuxfoundation.org wrote:
  On Tue, Oct 30, 2012 at 06:25:50AM +0800, Zhi Yong Wu wrote:
  On Tue, Oct 30, 2012 at 2:11 AM, Greg KH gre...@linuxfoundation.org 
  wrote:
   On Mon, Oct 29, 2012 at 12:30:55PM +0800, zwu.ker...@gmail.com wrote:
   From: Zhi Yong Wu wu...@linux.vnet.ibm.com
  
 The debugfs function is used to get expected dentry.
  
   Huh?  Why do you need this?  Why haven't you added documentation for the
  It is used to determine if one sysfs directory has been created. OK, i
  will add some doc, thanks for your suggestion.
 
  You didn't answer the why part here.  How come you think you need
 ah, Let me say its scenario at first. If we do two mount ops as below:
 1.) mount -o loop,hot_track image1 /data1
 2.) mount -o loop,hot_track image2 /data2

 The mount -o hot_track operation will automatically create one sysfs
 directory /sys/kernel/debug/hot_track. To prevent this dir being
 created again when 2.) is done, we need to know if it has existed at
 first. In my patch, i at first get its dentry by this new function,
 then determine if its d_inode field is NULL, if no, it means that this
 sysfs dir has existed.
 This is the reason that i want to add one new function.

 Why not do like the rest of the kernel does and just have a:
 static dentry *hot_track_root;
 and use that as your root debugfs directory dentry:
ah, i'm one newbie, don't get familar with other kernel part, but this
is one good point, i will apply it, thanks.

 if (!hot_track_root) {
 /* Create root directory */
 hot_track_root = debugfs_create(...);
 }

 No need to look anything up :)

 thanks,

 greg k-h



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 03/15] vfs,hot_track: add the function for collecting I/O frequency

2012-10-28 Thread Zhi Yong Wu
On Sun, Oct 28, 2012 at 3:55 PM, Zheng Liu gnehzuil@gmail.com wrote:
 Hi Zhiyong,

 On Thu, Oct 25, 2012 at 11:08:55PM +0800, zwu.ker...@gmail.com wrote:
 [snip]
 @@ -199,6 +342,54 @@ err:
  }

  /*
 + * Main function to update access frequency from read/writepage(s) hooks
 + */
 +inline void hot_update_freqs(struct inode *inode, u64 start,
 + u64 len, int rw)

 This function seems too big.  So we really need to inline this function?
As Dave said in his comments, it will add a function call
overhead even when tracking is not enabled. a static inline function
will just result in no extra overhead other than the if
statement


 Regards,
 Zheng

 +{
 + struct hot_info *root = inode-i_sb-s_hot_root;
 + struct hot_inode_item *he;
 + struct hot_range_item *hr;
 + u32 cur, end;
 +
 + if (!root || (len == 0))
 + return;
 +
 + he = hot_inode_item_find(root, inode-i_ino);
 + if (IS_ERR(he)) {
 + WARN_ON(1);
 + return;
 + }
 +
 + spin_lock(he-hot_inode.lock);
 + hot_freq_data_update(he-hot_inode.hot_freq_data, rw);
 + spin_unlock(he-hot_inode.lock);
 +
 + /*
 +  * Align ranges on RANGE_SIZE boundary
 +  * to prevent proliferation of range structs
 +  */
 + end = (start + len + RANGE_SIZE - 1)  RANGE_BITS;
 + for (cur = (start  RANGE_BITS); cur  end; cur++) {
 + hr = hot_range_item_find(he, cur);
 + if (IS_ERR(hr)) {
 + WARN_ON(1);
 + hot_inode_item_put(he);
 + return;
 + }
 +
 + spin_lock(hr-hot_range.lock);
 + hot_freq_data_update(hr-hot_range.hot_freq_data, rw);
 + spin_unlock(hr-hot_range.lock);
 +
 + hot_range_item_put(hr);
 + }
 +
 + hot_inode_item_put(he);
 +}
 +EXPORT_SYMBOL_GPL(hot_update_freqs);



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 03/15] vfs,hot_track: add the function for collecting I/O frequency

2012-10-28 Thread Zhi Yong Wu
On Mon, Oct 29, 2012 at 10:01 AM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Oct 28, 2012 at 09:51:48PM +0800, Zhi Yong Wu wrote:
 On Sun, Oct 28, 2012 at 3:55 PM, Zheng Liu gnehzuil@gmail.com wrote:
  Hi Zhiyong,
 
  On Thu, Oct 25, 2012 at 11:08:55PM +0800, zwu.ker...@gmail.com wrote:
  [snip]
  @@ -199,6 +342,54 @@ err:
   }
 
   /*
  + * Main function to update access frequency from read/writepage(s) hooks
  + */
  +inline void hot_update_freqs(struct inode *inode, u64 start,
  + u64 len, int rw)
 
  This function seems too big.  So we really need to inline this function?
 As Dave said in his comments, it will add a function call
 overhead even when tracking is not enabled. a static inline function
 will just result in no extra overhead other than the if
 statement

 I don't think I said that with respect to this code. I think I said
 it w.r.t. a define or a small wrapper that decides to call
 hot_update_freqs().  A static inline fucntion should only be a
 couple of lines of code at most.

 A static function, OTOH, can be inlined by the compiler if the
 compiler thinks that is a win. But
thanks for your explaination at first.

 .

  +EXPORT_SYMBOL_GPL(hot_update_freqs);

 ... it's an exported function, so it can't be inline or static, so
 using inline is wrong whatever way you look at it. ;)
ah, but i' m surprised by why the compiler find this error.


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 --
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 11/13] vfs: add 3 new ioctl interfaces

2012-10-19 Thread Zhi Yong Wu
On Tue, Oct 16, 2012 at 11:17 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:33PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   FS_IOC_GET_HEAT_INFO: return a struct containing the various
 metrics collected in btrfs_freq_data structs, and also return a

 I think you mean hot_freq_data :P

 calculated data temperature based on those metrics. Optionally, retrieve
 the temperature from the hot data hash list instead of recalculating it.

 To get the heat info for a specific file you have to know what file
 you want to get that info for, right?  I can see the usefulness of
 asking for the heat data on a specific file, but how do you find the
 hot files in the first place? i.e. the big question the user
 interface needs to answer is what files are hot?.

 Once userspace knows what the hottest files are, it can open them
 and query the data via the above ioctl, but expecting userspace to
 iterate millions of inodes in a filesystem to find hot files is very
 inefficient.

 FWIW, if you were to return file handles to the hottest files, then
Good idea. I am not very clear about how to implement it. file handles
mean file_handle??  How to return them to the application? via
debugfs? How many hottest files should be returned?? Top 100?

 the application could open and query them without even needing to
 know the path name to them. This woul dbe exceedingly useful for
 defragmentation programs, especially as that is the way xfs_fsr
 already operates on candidate files.(*)

 IOWs, sometimes the pathname is irrelevant to the operations that
 applications want to perform - all they care about having an
 efficient method of finding the inode they want and getting a file
 descriptor that points to the file. Given the heat map info fits
 right in to the sort of operations defrag and data mover tools
 already do, it kind of makes sense to optimise the interface towards
 those uses

 (*) i.e. finds them via bulkstat which returns handle information
 along with all the other inode data, then opens the file by handle
 to do the defrag work

   FS_IOC_GET_HEAT_OPTS: return an integer representing the current
 state of hot data tracking and migration:

 0 = do nothing
 1 = track frequency of access

   FS_IOC_SET_HEAT_OPTS: change the state of hot data tracking and
 migration, as described above.

 I can't see how this is a manageable interface. It is not
 persistent, so after every filesystem mount you'd have to set the
 flag on all your inodes again. Hence, for the moment, I'd suggest
 that dropping per-inode tracking control until all the core issues
 are sorted out

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/13] vfs: hot data tracking

2012-10-19 Thread Zhi Yong Wu
On Tue, Oct 16, 2012 at 4:42 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:22PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 NOTE:

   The patchset is currently post out mainly to make sure
 it is going in the correct direction and hope to get some
 helpful comments from other guys.
   For more infomation, please check hot_tracking.txt in Documentation

 TODO List:

 1) Fix OOM issues - the hot inode tracking caches grow very large
 and don't get trimmed under memory pressure. From slabtop, after
 creating roughly 24 million single byte files(*) on a machine with
 8GB RAM:

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 23859510 23859476  99%0.12K 795317   30   3181268K hot_range_item
 23859441 23859439  99%0.16K 1037367   23   4149468K hot_inode_item
 572530 572530 100%0.55K  817907327160K radix_tree_node
 241706 241406  99%0.22K  14218   17 56872K xfs_ili
 241206 241204  99%1.06K  804023321608K xfs_inode

 The inode tracking is trying to track all 24 million inodes even
 though they have been written only once, and there are only 240,000
 inodes in the cache at this point in time. That was the last update
 that slabtop got, so it is indicative of the impending OOM situation
 that occurred.

 Changelog from v2:
  1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
  2.) Added memory shrinker [Dave Chinner]

 I haven't looked at the shrinker, but clearly it is not working,
HI, Dave,
Some guys suggest that when inode slab cache is shrinked, the
hot_inode[range]_item slab is accordingly also shrinked, this will
make hot tracking don't need to register its own shrinker. Do you
think of it?
If you don't like above idea. Do you have any good suggestion on how
to remove hot_inode_item and hot_range_item?

 otherwise the above OOM situation would not be occurring.

 Cheers,

 Dave.

 (*) Tested on an empty 17TB XFS filesystem with:

 $ sudo mkfs.xfs -f -l size=131072b,sunit=8 /dev/vdc
 meta-data=/dev/vdc   isize=256agcount=17, agsize=268435455 
 blks
  =   sectsz=512   attr=2, projid32bit=0
 data =   bsize=4096   blocks=4563402735, imaxpct=5
  =   sunit=0  swidth=0 blks
 naming   =version 2  bsize=4096   ascii-ci=0
 log  =internal log   bsize=4096   blocks=131072, version=2
  =   sectsz=512   sunit=1 blks, lazy-count=1
 realtime =none   extsz=4096   blocks=0, rtextents=0
 $ sudo mount -o logbsize=256k /dev/vdc /mnt/scratch
 $ sudo chmod 777 /mnt/scratch
 $ fs_mark  -D  1  -S0  -n  10  -s  1  -L  63  -d \
 /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d \
 /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d \
 /mnt/scratch/6  -d  /mnt/scratch/7
 .
  0 21601  16679.3 12552262
  0 22401  15412.4 12588587
  0 23201  16367.6 14199322
  0 24001  15680.4 15741205
 hangs here w/ OOM

 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 09/13] vfs: add one wq to update map info periodically

2012-10-17 Thread Zhi Yong Wu
On Tue, Oct 16, 2012 at 8:27 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:31PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add a per-superblock workqueue and a work_struct
  to run periodic work to update map info on each superblock.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   94 
 ++
  fs/hot_tracking.h|3 +
  include/linux/hot_tracking.h |2 +
  3 files changed, 99 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index a8dc599..f333c47 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -15,6 +15,8 @@
  #include linux/module.h
  #include linux/spinlock.h
  #include linux/hardirq.h
 +#include linux/kthread.h
 +#include linux/freezer.h
  #include linux/fs.h
  #include linux/blkdev.h
  #include linux/types.h
 @@ -623,6 +625,88 @@ static void hot_map_array_exit(struct hot_info *root)
  }

  /*
 + * Update temperatures for each hot inode item and
 + * hot range item for aging purposes
 + */
 +static void hot_temperature_update_work(struct work_struct *work)
 +{
 + struct hot_update_work *hot_work =
 + container_of(work, struct hot_update_work, work);
 + struct hot_info *root = hot_work-hot_info;
 + struct hot_inode_item *hi_nodes[8];
 + unsigned long delay = HZ * HEAT_UPDATE_DELAY;
 + u64 ino = 0;
 + int i, n;
 +
 + do {
 + while (1) {
 + spin_lock(root-lock);
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n) {
 + spin_unlock(root-lock);
 + break;
 + }
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++) {
 + kref_get(hi_nodes[i]-hot_inode.refs);
 + hot_map_array_update(
 + hi_nodes[i]-hot_inode.hot_freq_data, 
 root);
 + hot_range_update(hi_nodes[i], root);
 + hot_inode_item_put(hi_nodes[i]);
 + }
 + spin_unlock(root-lock);

 This is a lot of work to do under a spin lock. Perhaps you should
 get a reference on all the nodes, then drop the root-lock and then
 update all the nodes in a separate loop.
OK, done

 + }
 +
 + if (unlikely(freezing(current))) {
 + __refrigerator(true);
 + } else {
 + set_current_state(TASK_INTERRUPTIBLE);
 + if (!kthread_should_stop()) {
 + schedule_timeout(delay);
 + }
 + __set_current_state(TASK_RUNNING);
 + }
 + } while (!kthread_should_stop());

 I don't think you understand workqueues fully. A work queue worker
 function is not something that executes endlessly. It is a
 one-shot function that does the work once, not an endless loop
 that has to delay it's execution for periodic work.
ah, i have done this based on your following suggestions, thanks.

 If you need periodic work, then you should use a struct delayed_work
 and queue the next work iteration to be run a later time. See, for
 example, xfs_syncd_worker() and xfs_syncd_queue_sync() and how that
 reschedules itself for periodic work. It also means you don't have
 to handle kthread freezing, as the WQ infrastructure takes care of
 that for you.
ditto.

 This is why unmount is hanging for me - this work never completes,
 so flush_workqueue() will never return.
got it, thanks.

 +}
 +
 +static int hot_wq_init(struct hot_info *root)
 +{
 + struct hot_update_work *hot_work;
 + int ret = 0;
 +
 + root-update_wq = alloc_workqueue(
 + hot_temperature_update, WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
 + if (!root-update_wq) {
 + printk(KERN_ERR %s: failed to create 
 + temperature update workqueue\n,
 + __func__);
 + return 1;
 + }
 +
 + hot_work = kmalloc(sizeof(*hot_work), GFP_NOFS);
 + if (hot_work) {
 + hot_work-hot_info = root;
 + INIT_WORK(hot_work-work, hot_temperature_update_work);
 + queue_work(root-update_wq, hot_work-work);
 + } else {
 + printk(KERN_ERR %s: failed to create update work\n,
 + __func__);
 + ret = 1;
 + }

 I don't understand why you need a separate hot_work structure.
 just embed a struct delayed_work in the struct hot_info and use
 container_of() to get the struct hot_info from the work structure.
 As such, there's no need for a separate function just

Re: [RFC v3 00/13] vfs: hot data tracking

2012-10-17 Thread Zhi Yong Wu
On Tue, Oct 16, 2012 at 4:42 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:22PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 NOTE:

   The patchset is currently post out mainly to make sure
 it is going in the correct direction and hope to get some
 helpful comments from other guys.
   For more infomation, please check hot_tracking.txt in Documentation

 TODO List:

 1) Fix OOM issues - the hot inode tracking caches grow very large
 and don't get trimmed under memory pressure. From slabtop, after
 creating roughly 24 million single byte files(*) on a machine with
 8GB RAM:

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 23859510 23859476  99%0.12K 795317   30   3181268K hot_range_item
 23859441 23859439  99%0.16K 1037367   23   4149468K hot_inode_item
 572530 572530 100%0.55K  817907327160K radix_tree_node
 241706 241406  99%0.22K  14218   17 56872K xfs_ili
 241206 241204  99%1.06K  804023321608K xfs_inode

 The inode tracking is trying to track all 24 million inodes even
 though they have been written only once, and there are only 240,000
 inodes in the cache at this point in time. That was the last update
 that slabtop got, so it is indicative of the impending OOM situation
 that occurred.

 Changelog from v2:
  1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
  2.) Added memory shrinker [Dave Chinner]

 I haven't looked at the shrinker, but clearly it is not working,
 otherwise the above OOM situation would not be occurring.

 Cheers,

 Dave.

 (*) Tested on an empty 17TB XFS filesystem with:

 $ sudo mkfs.xfs -f -l size=131072b,sunit=8 /dev/vdc
 meta-data=/dev/vdc   isize=256agcount=17, agsize=268435455 
 blks
  =   sectsz=512   attr=2, projid32bit=0
 data =   bsize=4096   blocks=4563402735, imaxpct=5
  =   sunit=0  swidth=0 blks
 naming   =version 2  bsize=4096   ascii-ci=0
 log  =internal log   bsize=4096   blocks=131072, version=2
  =   sectsz=512   sunit=1 blks, lazy-count=1
 realtime =none   extsz=4096   blocks=0, rtextents=0
 $ sudo mount -o logbsize=256k /dev/vdc /mnt/scratch
 $ sudo chmod 777 /mnt/scratch
 $ fs_mark  -D  1  -S0  -n  10  -s  1  -L  63  -d \
 /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d \
 /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d \
 /mnt/scratch/6  -d  /mnt/scratch/7
 .
  0 21601  16679.3 12552262
  0 22401  15412.4 12588587
  0 23201  16367.6 14199322
  0 24001  15680.4 15741205
 hangs here w/ OOM
In this test, i haven't see you enable hot_track function via
mount, why did it meet OOM?

 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 09/13] vfs: add one wq to update map info periodically

2012-10-17 Thread Zhi Yong Wu
On Thu, Oct 18, 2012 at 10:25 AM, Zheng Liu gnehzuil@gmail.com wrote:
 On Wed, Oct 17, 2012 at 02:34:15PM +0800, Zhi Yong Wu wrote:
  diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
  index d19e64a..7a79a6d 100644
  --- a/fs/hot_tracking.h
  +++ b/fs/hot_tracking.h
  @@ -36,6 +36,9 @@
*/
   #define TIME_TO_KICK 400
 
  +/* set how often to update temperatures (seconds) */
  +#define HEAT_UPDATE_DELAY 400
 
  FWIW, 400 seconds is an unusual time period. It's expected that
  periodic work might take place at intervals of 5 minutes, 10
  minutes, etc, not 6m40s. It's much easier to predict and understand
  behaviour if it's at a interval of whole units like minutes,
  especially when looking at timestamped event traces. Hence 300s (5
  minutes) makes a lot more sense as a period for updates...
 got it. thanks.

 Hi Zhi Yong,

 IMHO we'd better to make this value parameterized, and then the user
 can adjust this value dynamically.
Yeah, this has been in my TODO list. But i want to make the core
function can work at first.


 Regards,
 Zheng



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/13] vfs: hot data tracking

2012-10-17 Thread Zhi Yong Wu
On Thu, Oct 18, 2012 at 12:29 PM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 17, 2012 at 04:57:14PM +0800, Zhi Yong Wu wrote:
 On Tue, Oct 16, 2012 at 4:42 AM, Dave Chinner da...@fromorbit.com wrote:
  On Wed, Oct 10, 2012 at 06:07:22PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
  (*) Tested on an empty 17TB XFS filesystem with:
 
  $ sudo mkfs.xfs -f -l size=131072b,sunit=8 /dev/vdc
  meta-data=/dev/vdc   isize=256agcount=17, agsize=268435455 
  blks
   =   sectsz=512   attr=2, projid32bit=0
  data =   bsize=4096   blocks=4563402735, imaxpct=5
   =   sunit=0  swidth=0 blks
  naming   =version 2  bsize=4096   ascii-ci=0
  log  =internal log   bsize=4096   blocks=131072, version=2
   =   sectsz=512   sunit=1 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0
  $ sudo mount -o logbsize=256k /dev/vdc /mnt/scratch
  $ sudo chmod 777 /mnt/scratch
  $ fs_mark  -D  1  -S0  -n  10  -s  1  -L  63  -d \
  /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d \
  /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d \
  /mnt/scratch/6  -d  /mnt/scratch/7
  .
   0 21601  16679.3 12552262
   0 22401  15412.4 12588587
   0 23201  16367.6 14199322
   0 24001  15680.4 15741205
  hangs here w/ OOM
 In this test, i haven't see you enable hot_track function via
 mount, why did it meet OOM?

 I copied the wrong mount command. It was definitely enabled.
OK, BTW: fs_mark is the script written by you? After xfsprogs is
installed, i haven't found this command.


 Cheers,

 Dave.

 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/13] vfs: hot data tracking

2012-10-17 Thread Zhi Yong Wu
On Thu, Oct 18, 2012 at 1:17 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Oct 18, 2012 at 12:44:47PM +0800, Zhi Yong Wu wrote:
 On Thu, Oct 18, 2012 at 12:29 PM, Dave Chinner da...@fromorbit.com wrote:
  On Wed, Oct 17, 2012 at 04:57:14PM +0800, Zhi Yong Wu wrote:
  On Tue, Oct 16, 2012 at 4:42 AM, Dave Chinner da...@fromorbit.com wrote:
   On Wed, Oct 10, 2012 at 06:07:22PM +0800, zwu.ker...@gmail.com wrote:
   From: Zhi Yong Wu wu...@linux.vnet.ibm.com
  
   (*) Tested on an empty 17TB XFS filesystem with:
  
   $ sudo mkfs.xfs -f -l size=131072b,sunit=8 /dev/vdc
   meta-data=/dev/vdc   isize=256agcount=17, 
   agsize=268435455 blks
=   sectsz=512   attr=2, projid32bit=0
   data =   bsize=4096   blocks=4563402735, 
   imaxpct=5
=   sunit=0  swidth=0 blks
   naming   =version 2  bsize=4096   ascii-ci=0
   log  =internal log   bsize=4096   blocks=131072, version=2
=   sectsz=512   sunit=1 blks, lazy-count=1
   realtime =none   extsz=4096   blocks=0, rtextents=0
   $ sudo mount -o logbsize=256k /dev/vdc /mnt/scratch
   $ sudo chmod 777 /mnt/scratch
   $ fs_mark  -D  1  -S0  -n  10  -s  1  -L  63  -d \
   /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d \
   /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d \
   /mnt/scratch/6  -d  /mnt/scratch/7
   .
0 21601  16679.3 12552262
0 22401  15412.4 12588587
0 23201  16367.6 14199322
0 24001  15680.4 15741205
   hangs here w/ OOM
  In this test, i haven't see you enable hot_track function via
  mount, why did it meet OOM?
 
  I copied the wrong mount command. It was definitely enabled.
 OK, BTW: fs_mark is the script written by you? After xfsprogs is
 installed, i haven't found this command.

 # apt-get install fsmark
thanks. let me try.

 Or get the source here:

 http://sourceforge.net/projects/fsmark/

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 13/13] vfs: add documentation

2012-10-15 Thread Zhi Yong Wu
On Mon, Oct 15, 2012 at 8:35 AM, Zheng Liu gnehzuil@gmail.com wrote:
 Hi Zhi Yong,

 [cut...]
 +3. The Design
 +
 +These include the following parts:
 +
 +* Hooks in existing vfs functions to track data access frequency
 +
 +* New rbtrees for tracking access frequency of inodes and sub-file
  ^^^ s/rbtrees/radix-trees
 +ranges (hot_rb.c)
 Now it seems that all codes are in the same file.
HI, Zheng,

Good catch, i will update them, thanks.

 Regards,
 Zheng



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/13] vfs: hot data tracking

2012-10-15 Thread Zhi Yong Wu
On Mon, Oct 15, 2012 at 8:39 AM, Zheng Liu gnehzuil@gmail.com wrote:
 On Wed, Oct 10, 2012 at 06:07:22PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 NOTE:

   The patchset is currently post out mainly to make sure
 it is going in the correct direction and hope to get some
 helpful comments from other guys.
   For more infomation, please check hot_tracking.txt in Documentation

 Hi Zhi Yong,

 If I want to use this patch set in ext4, could I apply this patch set
 directly or I need to call some functions like in btrfs.  Thanks.
Hi ,Zheng,

It is the latter. If you have any questions, please feel free to catch me.


 Regards,
 Zheng



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 11/13] vfs: add 3 new ioctl interfaces

2012-10-15 Thread Zhi Yong Wu
On Mon, Oct 15, 2012 at 3:48 PM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:33PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   FS_IOC_GET_HEAT_INFO: return a struct containing the various
 metrics collected in btrfs_freq_data structs, and also return a
 calculated data temperature based on those metrics. Optionally, retrieve
 the temperature from the hot data hash list instead of recalculating it.

   FS_IOC_GET_HEAT_OPTS: return an integer representing the current
 state of hot data tracking and migration:

 0 = do nothing
 1 = track frequency of access

   FS_IOC_SET_HEAT_OPTS: change the state of hot data tracking and
 migration, as described above.
 .
 +struct hot_heat_info {
 + __u64 avg_delta_reads;
 + __u64 avg_delta_writes;
 + __u64 last_read_time;
 + __u64 last_write_time;
 + __u32 num_reads;
 + __u32 num_writes;
 + __u32 temperature;
 + __u8 live;
 + char filename[PATH_MAX];

 Don't put the filename in the ioctl and open the file in the kernel.
 Have userspace open the file directly and issue the ioctl on the fd
 that is returned.
OK, thanks. By the way, do you think that it is necessary to provide
another new ioctl interface to set the temperature value?

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 12/13] vfs: add debugfs support

2012-10-15 Thread Zhi Yong Wu
On Mon, Oct 15, 2012 at 3:55 PM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:34PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add a /sys/kernel/debug/hot_track/device_name/ directory for each
 volume that contains two files. The first, `inode_data', contains the
 heat information for inodes that have been brought into the hot data map
 structures. The second, `range_data', contains similar information for
 subfile ranges.
 
 + /* create debugfs range_data file */
 + debugfs_range_entry = debugfs_create_file(range_data,
 + S_IFREG | S_IRUSR | S_IWUSR | S_IRUGO,
 + debugfs_volume_entry,
 + (void *) range_data,
 + hot_debugfs_range_fops);

 These should not be world readable. 0600 is probably the correct
 permissions for them as we do not want random users to be able to
 infer what files users are accessing from this information.
Good catch, its mode should be S_IFREG | S_IRUSR | S_IWUSR


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 12/13] vfs: add debugfs support

2012-10-15 Thread Zhi Yong Wu
On Mon, Oct 15, 2012 at 4:04 PM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:34PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add a /sys/kernel/debug/hot_track/device_name/ directory for each
 volume that contains two files. The first, `inode_data', contains the
 heat information for inodes that have been brought into the hot data map
 structures. The second, `range_data', contains similar information for
 subfile ranges.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c |  462 
 +
  fs/hot_tracking.h |   43 +
  2 files changed, 505 insertions(+), 0 deletions(-)
 .
 +static int hot_debugfs_copy(struct debugfs_vol_data *data, char *msg, int 
 len)
 +{
 + struct lstring *debugfs_log = data-debugfs_log;
 + uint new_log_alloc_size;
 + char *new_log;
 + static char err_msg[] = No more memory!\n;
 +
 + if (len = data-log_alloc_size - debugfs_log-len) {
 ..
 + }
 +
 + memcpy(debugfs_log-str + debugfs_log-len, data-log_work_buff, len);
 + debugfs_log-len += (unsigned long) len;
 +
 + return len;
 +}
 +
 +/* Returns the number of bytes written to the log. */
 +static int hot_debugfs_log(struct debugfs_vol_data *data, const char *fmt, 
 ...)
 +{
 + struct lstring *debugfs_log = data-debugfs_log;
 + va_list args;
 + int len;
 + static char trunc_msg[] =
 + The next message has been truncated.\n;
 +
 + if (debugfs_log-str == NULL)
 + return -1;
 +
 + spin_lock(data-log_lock);
 +
 + va_start(args, fmt);
 + len = vsnprintf(data-log_work_buff,
 + sizeof(data-log_work_buff), fmt, args);
 + va_end(args);
 +
 + if (len = sizeof(data-log_work_buff)) {
 + hot_debugfs_copy(data, trunc_msg, sizeof(trunc_msg));
 + }
 +
 + len = hot_debugfs_copy(data, data-log_work_buff, len);
 + spin_unlock(data-log_lock);
 +
 + return len;
 +}

 Aren't you just recreating seq_printf() here? i.e. can't you replace
 all this complexity with generic seq_file/seq_operations constructs?
It seems to be a good suggestion, let me try it. thanks.


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 11/13] vfs: add 3 new ioctl interfaces

2012-10-15 Thread Zhi Yong Wu
On Tue, Oct 16, 2012 at 11:17 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Oct 10, 2012 at 06:07:33PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   FS_IOC_GET_HEAT_INFO: return a struct containing the various
 metrics collected in btrfs_freq_data structs, and also return a

 I think you mean hot_freq_data :P
Yeah, sorry.

 calculated data temperature based on those metrics. Optionally, retrieve
 the temperature from the hot data hash list instead of recalculating it.

 To get the heat info for a specific file you have to know what file
 you want to get that info for, right?  I can see the usefulness of
Yes.
 asking for the heat data on a specific file, but how do you find the
 hot files in the first place? i.e. the big question the user
 interface needs to answer is what files are hot?.
We only tell the user what the files' temperatures are, not what files are hot.
Their temperatures are in the output of debugfs.

 Once userspace knows what the hottest files are, it can open them
If the user need to know this type of info, it is easy for us to
provide it. But i don't know what way the user hope to get it via.
 and query the data via the above ioctl, but expecting userspace to
 iterate millions of inodes in a filesystem to find hot files is very
 inefficient.

 FWIW, if you were to return file handles to the hottest files, then
 the application could open and query them without even needing to
 know the path name to them. This woul dbe exceedingly useful for
 defragmentation programs, especially as that is the way xfs_fsr
 already operates on candidate files.(*)
ah.

 IOWs, sometimes the pathname is irrelevant to the operations that
 applications want to perform - all they care about having an
 efficient method of finding the inode they want and getting a file
 descriptor that points to the file. Given the heat map info fits
 right in to the sort of operations defrag and data mover tools
 already do, it kind of makes sense to optimise the interface towards
 those uses

 (*) i.e. finds them via bulkstat which returns handle information
 along with all the other inode data, then opens the file by handle
 to do the defrag work
OK.

   FS_IOC_GET_HEAT_OPTS: return an integer representing the current
 state of hot data tracking and migration:

 0 = do nothing
 1 = track frequency of access

   FS_IOC_SET_HEAT_OPTS: change the state of hot data tracking and
 migration, as described above.

 I can't see how this is a manageable interface. It is not
 persistent, so after every filesystem mount you'd have to set the
 flag on all your inodes again. Hence, for the moment, I'd suggest
 that dropping per-inode tracking control until all the core issues
 are sorted out
OK.

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 02/13] vfs: introduce private radix tree structures

2012-10-11 Thread Zhi Yong Wu
On Wed, Oct 10, 2012 at 11:34 PM, David Sterba d...@jikos.cz wrote:
 On Wed, Oct 10, 2012 at 06:07:24PM +0800, zwu.ker...@gmail.com wrote:
 +void hot_track_init(struct super_block *sb)
 +{
 ...
 +}

 +void hot_track_exit(struct super_block *sb)
 +{
 + hot_cache_exit();
 +}

 Needs to be exported if btrfs is built as a module, otherwise does not
 link

   LDS arch/x86/boot/compressed/vmlinux.lds
   AS  arch/x86/boot/compressed/head_64.o
   CC  arch/x86/boot/compressed/misc.o
   CC  arch/x86/boot/compressed/string.o
   CC  arch/x86/boot/compressed/cmdline.o
   CC  arch/x86/boot/compressed/early_serial_console.o
   OBJCOPY arch/x86/boot/compressed/vmlinux.bin
 ERROR: hot_track_init [fs/btrfs/btrfs.ko] undefined!
 ERROR: hot_track_exit [fs/btrfs/btrfs.ko] undefined!
 make[1]: *** [__modpost] Error 1
 make: *** [modules] Error 2
 make: *** Waiting for unfinished jobs
Sorry for late response at first. Great, thanks.



 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

2012-10-11 Thread Zhi Yong Wu
On Thu, Oct 11, 2012 at 12:28 AM, David Sterba d...@jikos.cz wrote:
 Hi,

 On Wed, Oct 10, 2012 at 06:07:23PM +0800, zwu.ker...@gmail.com wrote:
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1726,6 +1726,7 @@ struct btrfs_ioctl_defrag_range_args {
  #define BTRFS_MOUNT_CHECK_INTEGRITY  (1  20)
  #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
  #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1  22)
 +#define BTRFS_MOUNT_HOT_TRACK(1  23)

 Please don't forget to add new options to btrfs_show_options(),
 otherwise we can't tell what filesystems have hot tracking enabled.
Great catch, thanks.

 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -303,7 +304,7 @@ enum {
   Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
   Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
 - Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 + Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_hot_track,

 Please add the new option to the end.
OK.

   Opt_check_integrity, Opt_check_integrity_including_extent_data,
   Opt_check_integrity_print_mask, Opt_fatal_errors,
   Opt_err,
 @@ -342,6 +343,7 @@ static match_table_t tokens = {
   {Opt_no_space_cache, nospace_cache},
   {Opt_recovery, recovery},
   {Opt_skip_balance, skip_balance},
 + {Opt_hot_track, hot_track},

 (also this one)
ditto.

   {Opt_check_integrity, check_int},
   {Opt_check_integrity_including_extent_data, check_int_data},
   {Opt_check_integrity_print_mask, check_int_print_mask=%d},
 @@ -553,6 +555,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
   case Opt_skip_balance:
   btrfs_set_opt(info-mount_opt, SKIP_BALANCE);
   break;
 + case Opt_hot_track:

 It's a common patter in the surrounding code that a message is printed
 when enabling options, but the vfs prints its own, so I'm not sure if
 it's needed here as well. Just thinking, leave it as it is now.
OK

 + btrfs_set_opt(info-mount_opt, HOT_TRACK);
 + break;
  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
   case Opt_check_integrity_including_extent_data:
   printk(KERN_INFO btrfs: enabling check integrity

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

2012-10-11 Thread Zhi Yong Wu
On Thu, Oct 11, 2012 at 12:28 AM, David Sterba d...@jikos.cz wrote:
 Hi,

 On Wed, Oct 10, 2012 at 06:07:23PM +0800, zwu.ker...@gmail.com wrote:
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1726,6 +1726,7 @@ struct btrfs_ioctl_defrag_range_args {
  #define BTRFS_MOUNT_CHECK_INTEGRITY  (1  20)
  #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
  #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1  22)
 +#define BTRFS_MOUNT_HOT_TRACK(1  23)

 Please don't forget to add new options to btrfs_show_options(),
 otherwise we can't tell what filesystems have hot tracking enabled.

 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -303,7 +304,7 @@ enum {
   Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
   Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
 - Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 + Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_hot_track,

 Please add the new option to the end.
To be honest, it can't be added to the end, if you check Opt_err's
pattern value, you will find it is NULL, it will cause match_one()
return 1. So if we add Opt_hot_track to the end of this array, it will
be covered by match_token(), so i prefer to add it to
Opt_fatal_errors. Do you think of it?

   Opt_check_integrity, Opt_check_integrity_including_extent_data,
   Opt_check_integrity_print_mask, Opt_fatal_errors,
   Opt_err,
 @@ -342,6 +343,7 @@ static match_table_t tokens = {
   {Opt_no_space_cache, nospace_cache},
   {Opt_recovery, recovery},
   {Opt_skip_balance, skip_balance},
 + {Opt_hot_track, hot_track},

 (also this one)

   {Opt_check_integrity, check_int},
   {Opt_check_integrity_including_extent_data, check_int_data},
   {Opt_check_integrity_print_mask, check_int_print_mask=%d},
 @@ -553,6 +555,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
   case Opt_skip_balance:
   btrfs_set_opt(info-mount_opt, SKIP_BALANCE);
   break;
 + case Opt_hot_track:

 It's a common patter in the surrounding code that a message is printed
 when enabling options, but the vfs prints its own, so I'm not sure if
 it's needed here as well. Just thinking, leave it as it is now.

 + btrfs_set_opt(info-mount_opt, HOT_TRACK);
 + break;
  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
   case Opt_check_integrity_including_extent_data:
   printk(KERN_INFO btrfs: enabling check integrity

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

2012-10-11 Thread Zhi Yong Wu
On Thu, Oct 11, 2012 at 10:41 PM, David Sterba d...@jikos.cz wrote:
 On Thu, Oct 11, 2012 at 10:35:28PM +0800, Zhi Yong Wu wrote:
  --- a/fs/btrfs/super.c
  +++ b/fs/btrfs/super.c
  @@ -303,7 +304,7 @@ enum {
Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
  - Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
  + Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_hot_track,
 
  Please add the new option to the end.
 To be honest, it can't be added to the end, if you check Opt_err's
 pattern value, you will find it is NULL, it will cause match_one()
 return 1. So if we add Opt_hot_track to the end of this array, it will
 be covered by match_token(), so i prefer to add it to
 Opt_fatal_errors. Do you think of it?

 Ah, sorry, I was not clear what the 'end' means here. The Opt_err is a
 end-of-the-list token, so yes please add hot_track between
 Opt_fatal_errors and Opt_err.
Done, thanks. It will be appreciated if you can make comments on other
patches of this series.:)


 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

2012-10-10 Thread Zhi Yong Wu
On Wed, Oct 10, 2012 at 7:59 PM, Lukáš Czerner lczer...@redhat.com wrote:
 On Wed, 10 Oct 2012, zwu.ker...@gmail.com wrote:

 Date: Wed, 10 Oct 2012 18:07:23 +0800
 From: zwu.ker...@gmail.com
 To: linux-fsde...@vger.kernel.org
 Cc: linux-e...@vger.kernel.org, linux-btrfs@vger.kernel.org,
 linux-ker...@vger.kernel.org, linux...@linux.vnet.ibm.com,
 v...@zeniv.linux.org.uk, da...@fromorbit.com, d...@jikos.cz,
 ty...@mit.edu, c...@us.ibm.com, Zhi Yong Wu wu...@linux.vnet.ibm.com
 Subject: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Introduce one new mount option '-o hot_track',
 and add its parsing support.
   Its usage looks like:
mount -o hot_track
mount -o nouser,hot_track
mount -o nouser,hot_track,loop
mount -o hot_track,nouser

 This patch should probably be at the end of the series.
Can you let me know your reason? I think that it is not necessary to
be at the end of the series.


 -Lukas


 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/btrfs/ctree.h |1 +
  fs/btrfs/super.c |7 ++-
  2 files changed, 7 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 9821b67..094bec6 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1726,6 +1726,7 @@ struct btrfs_ioctl_defrag_range_args {
  #define BTRFS_MOUNT_CHECK_INTEGRITY  (1  20)
  #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
  #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1  22)
 +#define BTRFS_MOUNT_HOT_TRACK(1  23)

  #define btrfs_clear_opt(o, opt)  ((o) = ~BTRFS_MOUNT_##opt)
  #define btrfs_set_opt(o, opt)((o) |= BTRFS_MOUNT_##opt)
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 83d6f9f..00be9e3 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -41,6 +41,7 @@
  #include linux/slab.h
  #include linux/cleancache.h
  #include linux/ratelimit.h
 +#include linux/hot_tracking.h
  #include compat.h
  #include delayed-inode.h
  #include ctree.h
 @@ -303,7 +304,7 @@ enum {
   Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
   Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
 - Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 + Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_hot_track,
   Opt_check_integrity, Opt_check_integrity_including_extent_data,
   Opt_check_integrity_print_mask, Opt_fatal_errors,
   Opt_err,
 @@ -342,6 +343,7 @@ static match_table_t tokens = {
   {Opt_no_space_cache, nospace_cache},
   {Opt_recovery, recovery},
   {Opt_skip_balance, skip_balance},
 + {Opt_hot_track, hot_track},
   {Opt_check_integrity, check_int},
   {Opt_check_integrity_including_extent_data, check_int_data},
   {Opt_check_integrity_print_mask, check_int_print_mask=%d},
 @@ -553,6 +555,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
   case Opt_skip_balance:
   btrfs_set_opt(info-mount_opt, SKIP_BALANCE);
   break;
 + case Opt_hot_track:
 + btrfs_set_opt(info-mount_opt, HOT_TRACK);
 + break;
  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
   case Opt_check_integrity_including_extent_data:
   printk(KERN_INFO btrfs: enabling check integrity




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

2012-10-10 Thread Zhi Yong Wu
On Wed, Oct 10, 2012 at 9:11 PM, Lukáš Czerner lczer...@redhat.com wrote:
 On Wed, 10 Oct 2012, Zhi Yong Wu wrote:

 Date: Wed, 10 Oct 2012 20:21:48 +0800
 From: Zhi Yong Wu zwu.ker...@gmail.com
 To: Lukáš Czerner lczer...@redhat.com
 Cc: linux-fsde...@vger.kernel.org, linux-e...@vger.kernel.org,
 linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org,
 linux...@linux.vnet.ibm.com, v...@zeniv.linux.org.uk, 
 da...@fromorbit.com,
 d...@jikos.cz, ty...@mit.edu, c...@us.ibm.com,
 Zhi Yong Wu wu...@linux.vnet.ibm.com
 Subject: Re: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'

 On Wed, Oct 10, 2012 at 7:59 PM, Lukáš Czerner lczer...@redhat.com wrote:
  On Wed, 10 Oct 2012, zwu.ker...@gmail.com wrote:
 
  Date: Wed, 10 Oct 2012 18:07:23 +0800
  From: zwu.ker...@gmail.com
  To: linux-fsde...@vger.kernel.org
  Cc: linux-e...@vger.kernel.org, linux-btrfs@vger.kernel.org,
  linux-ker...@vger.kernel.org, linux...@linux.vnet.ibm.com,
  v...@zeniv.linux.org.uk, da...@fromorbit.com, d...@jikos.cz,
  ty...@mit.edu, c...@us.ibm.com, Zhi Yong Wu wu...@linux.vnet.ibm.com
  Subject: [RFC v3 01/13] btrfs: add one new mount option '-o hot_track'
 
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
  Introduce one new mount option '-o hot_track',
  and add its parsing support.
Its usage looks like:
 mount -o hot_track
 mount -o nouser,hot_track
 mount -o nouser,hot_track,loop
 mount -o hot_track,nouser
 
  This patch should probably be at the end of the series.
 Can you let me know your reason? I think that it is not necessary to
 be at the end of the series.

 Simply because you're adding the mount option which does not do
 anything yet. Moreover you change the implementation of the hot track
 as you go. You should enable this once it is ready to use, not the other
 way around. So, please move this at the end of the patch set when
 the feature is supposed to be ready to use.
OK, done, thanks. If you have comments on other patches, it will be appreciated.


 Thanks!
 -Lukas


 
  -Lukas
 
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/btrfs/ctree.h |1 +
   fs/btrfs/super.c |7 ++-
   2 files changed, 7 insertions(+), 1 deletions(-)
 
  diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
  index 9821b67..094bec6 100644
  --- a/fs/btrfs/ctree.h
  +++ b/fs/btrfs/ctree.h
  @@ -1726,6 +1726,7 @@ struct btrfs_ioctl_defrag_range_args {
   #define BTRFS_MOUNT_CHECK_INTEGRITY  (1  20)
   #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
   #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1  22)
  +#define BTRFS_MOUNT_HOT_TRACK(1  23)
 
   #define btrfs_clear_opt(o, opt)  ((o) = ~BTRFS_MOUNT_##opt)
   #define btrfs_set_opt(o, opt)((o) |= BTRFS_MOUNT_##opt)
  diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
  index 83d6f9f..00be9e3 100644
  --- a/fs/btrfs/super.c
  +++ b/fs/btrfs/super.c
  @@ -41,6 +41,7 @@
   #include linux/slab.h
   #include linux/cleancache.h
   #include linux/ratelimit.h
  +#include linux/hot_tracking.h
   #include compat.h
   #include delayed-inode.h
   #include ctree.h
  @@ -303,7 +304,7 @@ enum {
Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
  - Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
  + Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_hot_track,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors,
Opt_err,
  @@ -342,6 +343,7 @@ static match_table_t tokens = {
{Opt_no_space_cache, nospace_cache},
{Opt_recovery, recovery},
{Opt_skip_balance, skip_balance},
  + {Opt_hot_track, hot_track},
{Opt_check_integrity, check_int},
{Opt_check_integrity_including_extent_data, check_int_data},
{Opt_check_integrity_print_mask, check_int_print_mask=%d},
  @@ -553,6 +555,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
  *options)
case Opt_skip_balance:
btrfs_set_opt(info-mount_opt, SKIP_BALANCE);
break;
  + case Opt_hot_track:
  + btrfs_set_opt(info-mount_opt, HOT_TRACK);
  + break;
   #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
case Opt_check_integrity_including_extent_data:
printk(KERN_INFO btrfs: enabling check integrity
 







-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 05/10] vfs: introduce one hash table

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 11:43 AM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Adds a hash table structure which contains
 a lot of hash list and is used to efficiently
 look up the data temperature of a file or its
 ranges.
   In each hash list of hash table, the hash node
 will keep track of temperature info.

 So, let me see if I've got the relationship straight:

 - sb-s_hot_info.hot_inode_tree indexes hot_inode_items, one per inode

 - hot_inode_item contains access frequency data for that inode

 - hot_inode_item holds a heat hash node to index the access
   frequency data for that inode

 - hot_inode_item.hot_range_tree indexes hot_range_items for that inode

 - hot_range_item contains access frequency data for that range

 - hot_range_item holds a heat hash node to index the access
   frequency data for that range

 - sb-s_hot_info.heat_inode_hl indexes per-inode heat hash nodes

 - sb-s_hot_info.heat_range_hl indexes per-range heat hash nodes
Correct.

 How about some ascii art? :) Just looking at the hot inode item case
 (the range item case is the same pattern, though), we have:


 heat_inode_hl   hot_inode_tree
 | |
 | V
 |   +---hot_inode_item---+
 +---+   |   frequency data   |
 |   V^   V
 | ...--hot_inode_item--... |...--hot_inode_item--
 |   frequency data   |  frequency data
 |   ^|   ^
 |   ||   |
 |   ||   |
 +--hot_hash_node--hot_hash_node--hot_hash_node--
Great, can we put them in hot_tracking.txt in Documentation?


 There's no actual data stored in the hot_hash_node, just pointer
 back to the frequency data, a hlist_node and a pointer to the
 hashlist head. IOWs, I agree with Ram that this does not need to
 exist and just embedding a hlist_node inside the hot_inode_item is
 all that is needed. i.e:

 heat_inode_hl   hot_inode_tree
 | |
 | V
 |   +---hot_inode_item---+
 |   |   frequency data   |
 +---+   |   hlist_node   |
 |   V^ | V
 | ...--hot_inode_item--... | |  ...--hot_inode_item--
 |   frequency data   | |frequency data
 +--hlist_node---+ +---hlist_node---.

 There's no need for separate allocations, initialisations, locks and
 reference counting - all that is already in the hot_inode_item. The
 items have the same lifecycle limitations - a hot_hash_node must be
 torn down before the frequency data it points to is freed. Finally,
 there's no difference in how you move it between lists.
How will you know if one hot_inode_item should be moved between lists
when its freq data is changed?

 Indeed, calling it a hash is wrong - there's not hashing at all
 - it keeping an array of list where each entry corresponds to a
 specific temperature. It is a *heat map*, not a hash list. i.e.
 inode_heat_map, not heat_inode_hl. HEAT_MAP_SIZE, not HASH_SIZE.
OK.

 As it is, there aren't any users of the heat maps that are generated
 in this patch set - it's not even exported to userspace or to
 debugfs, so I'm not sure how it will be used yet. How are these heat
 maps going to be used by filesystems, Zhi?
In hot_hash_calc_temperature(), you can see that one hot_inode or
hot_range's freq data will be distilled into one temperature value,
then it will be inserted to the heat map based on its temperature.
When the file corresponding to the inode or range got hotter or cold,
its location will be changed in the heat map based on its new
temperature in hot_hash_update_hash_table().

And the user will retrieve those freq data and temperature info via
debugfs or ioctl interfaces.

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 06/10] vfs: enable hot data tracking

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 11:54 AM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:31PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Miscellaneous features that implement hot data tracking
 and generally make the hot data functions a bit more friendly.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/direct-io.c   |   10 ++
  include/linux/hot_tracking.h |   11 +++
  mm/filemap.c |8 
  mm/page-writeback.c  |   21 +
  mm/readahead.c   |9 +
  5 files changed, 59 insertions(+), 0 deletions(-)

 diff --git a/fs/direct-io.c b/fs/direct-io.c
 index f86c720..3773f44 100644
 --- a/fs/direct-io.c
 +++ b/fs/direct-io.c
 @@ -37,6 +37,7 @@
  #include linux/uio.h
  #include linux/atomic.h
  #include linux/prefetch.h
 +#include hot_tracking.h

  /*
   * How many user pages to map in one call to get_user_pages().  This 
 determines
 @@ -1297,6 +1298,15 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, 
 struct inode *inode,
   prefetch(bdev-bd_queue);
   prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES);

 + /* Hot data tracking */
 + if (TRACK_THIS_INODE(iocb-ki_filp-f_mapping-host)
 +  iov_length(iov, nr_segs)  0) {
 + hot_rb_update_freqs(iocb-ki_filp-f_mapping-host,
 + (u64)offset,
 + (u64)iov_length(iov, nr_segs),
 + rw  WRITE);
 + }

 That's a bit messy. I'd prefer a static inline function that hides
 all this. e.g.
Do you think of moving the condition into hot_inode_udate_freqs(), not
adding another new function?

 track_hot_inode_ranges(inode, offset, length, rw)
 {
 if (inode-i_sb-s_flags  MS_HOT_TRACKING)
 hot_inode_freq_update(inode, offset, length, rw);
 }

 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index 5ad5ce2..552c861 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -35,6 +35,7 @@
  #include linux/buffer_head.h /* __set_page_dirty_buffers */
  #include linux/pagevec.h
  #include linux/timer.h
 +#include linux/hot_tracking.h
  #include trace/events/writeback.h

  /*
 @@ -1895,13 +1896,33 @@ EXPORT_SYMBOL(generic_writepages);
  int do_writepages(struct address_space *mapping, struct writeback_control 
 *wbc)
  {
   int ret;
 + pgoff_t start = 0;
 + u64 prev_count = 0, count = 0;

   if (wbc-nr_to_write = 0)
   return 0;
 +
 + /* Hot data tracking */
 + if (TRACK_THIS_INODE(mapping-host)
 +  wbc-range_cyclic) {
 + start = mapping-writeback_index  PAGE_CACHE_SHIFT;
 + prev_count = (u64)wbc-nr_to_write;
 + }

 Why only wbc-range_cyclic? This won't record things like
 synchronous writes or fsync-triggered writes, are are far more
 likely to be to hot ranges in a file...
sorry, i don't undersand what  wbc-range_cyclic means. OK, i will fix
it in next version.


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 07/10] vfs: fork one kthread to update data temperature

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 12:03 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:32PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Fork and run one kernel kthread to calculate
 that temperature based on some metrics kept
 in custom frequency data structs, and store
 the info in the hash table.

 No new kthreads, please. Use a per-superblock workqueue and a struct
 delayed_work to run periodic work on each superblock.
If no new kthread is created, which kthread will work on these
delayed_work tasks?


 That will also remove all the nasty, nasty
 !hot_track_temperature_update_kthread checks from the code, too.

 Also, I'd separate the work that the workqueue does from the patch
 that introduces the work queue. That way there is only one new thing
 to comment on in the patch. Further, I'd separate the aging code
 from the code that updates the temperature map into it's own patch
 as well..

 Finally, you're going to need a shrinker to control the amount of
 memory that is used in tracking hot regions - if we are throwing
 inodes out of memory due to memory pressure, we most definitely are
 going to need to reduce the amount of memory the tracking code is
 using, even if it means losing useful information (i.e. the shrinker
 accelerates the aging process).
Great, I agree with you.

 Given the above, and the other comments earlier in the series,
 there's not a lot of point in me spending time commenting on ethe
 code in detail here as it will change significantly as a result of
 all the earlier comments
OK,  i will complete the code change based on all your earlier comments ASAP.

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 05/10] vfs: introduce one hash table

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 2:57 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Sep 27, 2012 at 02:23:16PM +0800, Zhi Yong Wu wrote:
 On Thu, Sep 27, 2012 at 11:43 AM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Adds a hash table structure which contains
  a lot of hash list and is used to efficiently
  look up the data temperature of a file or its
  ranges.
In each hash list of hash table, the hash node
  will keep track of temperature info.
 
  So, let me see if I've got the relationship straight:
 
  - sb-s_hot_info.hot_inode_tree indexes hot_inode_items, one per inode
 
  - hot_inode_item contains access frequency data for that inode
 
  - hot_inode_item holds a heat hash node to index the access
frequency data for that inode
 
  - hot_inode_item.hot_range_tree indexes hot_range_items for that inode
 
  - hot_range_item contains access frequency data for that range
 
  - hot_range_item holds a heat hash node to index the access
frequency data for that range
 
  - sb-s_hot_info.heat_inode_hl indexes per-inode heat hash nodes
 
  - sb-s_hot_info.heat_range_hl indexes per-range heat hash nodes
 Correct.
 
  How about some ascii art? :) Just looking at the hot inode item case
  (the range item case is the same pattern, though), we have:
 
 
  heat_inode_hl   hot_inode_tree
  | |
  | V
  |   +---hot_inode_item---+
  +---+   |   frequency data   |
  |   V^   V
  | ...--hot_inode_item--... |...--hot_inode_item--
  |   frequency data   |  frequency data
  |   ^|   ^
  |   ||   |
  |   ||   |
  +--hot_hash_node--hot_hash_node--hot_hash_node--
 Great, can we put them in hot_tracking.txt in Documentation?
 
 
  There's no actual data stored in the hot_hash_node, just pointer
  back to the frequency data, a hlist_node and a pointer to the
  hashlist head. IOWs, I agree with Ram that this does not need to
  exist and just embedding a hlist_node inside the hot_inode_item is
  all that is needed. i.e:
 
  heat_inode_hl   hot_inode_tree
  | |
  | V
  |   +---hot_inode_item---+
  |   |   frequency data   |
  +---+   |   hlist_node   |
  |   V^ | V
  | ...--hot_inode_item--... | |  ...--hot_inode_item--
  |   frequency data   | |frequency data
  +--hlist_node---+ +---hlist_node---.
 
  There's no need for separate allocations, initialisations, locks and
  reference counting - all that is already in the hot_inode_item. The
  items have the same lifecycle limitations - a hot_hash_node must be
  torn down before the frequency data it points to is freed. Finally,
  there's no difference in how you move it between lists.
 How will you know if one hot_inode_item should be moved between lists
 when its freq data is changed?

 Record the current temperature in the frequency data, and if it
I know how to do it, thanks.
 changes, change the list it is on.

  Indeed, calling it a hash is wrong - there's not hashing at all
  - it keeping an array of list where each entry corresponds to a
  specific temperature. It is a *heat map*, not a hash list. i.e.
  inode_heat_map, not heat_inode_hl. HEAT_MAP_SIZE, not HASH_SIZE.
 OK.
 
  As it is, there aren't any users of the heat maps that are generated
  in this patch set - it's not even exported to userspace or to
  debugfs, so I'm not sure how it will be used yet. How are these heat
  maps going to be used by filesystems, Zhi?
 In hot_hash_calc_temperature(), you can see that one hot_inode or
 hot_range's freq data will be distilled into one temperature value,
 then it will be inserted to the heat map based on its temperature.
 When the file corresponding to the inode or range got hotter or cold,
 its location will be changed in the heat map based on its new
 temperature in hot_hash_update_hash_table().

 Yes, but a hot_inode_item or hot_range_item can only have one
 location in the heat map, right? So it doesn't need external
Yes.
 structure to point to the frequency data to track this
OK.

 And the user will retrieve those freq data and temperature info via
 debugfs or ioctl interfaces.

 Right - but that data is only extracted after an initial
 hot_inode_tree lookup - The heat map itself is never directly used
 for lookups. If it's not used for lookups based on temperature, why
 is it needed?
You mean we don't need hot_inode_tree? You know, after those hook
functions collect the freq data for inode, they will store those raw
info in hot_inode_tree. One private kthread will iterate

Re: [RFC v2 06/10] vfs: enable hot data tracking

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 2:59 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Sep 27, 2012 at 02:28:12PM +0800, Zhi Yong Wu wrote:
 On Thu, Sep 27, 2012 at 11:54 AM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:31PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Miscellaneous features that implement hot data tracking
  and generally make the hot data functions a bit more friendly.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/direct-io.c   |   10 ++
   include/linux/hot_tracking.h |   11 +++
   mm/filemap.c |8 
   mm/page-writeback.c  |   21 +
   mm/readahead.c   |9 +
   5 files changed, 59 insertions(+), 0 deletions(-)
 
  diff --git a/fs/direct-io.c b/fs/direct-io.c
  index f86c720..3773f44 100644
  --- a/fs/direct-io.c
  +++ b/fs/direct-io.c
  @@ -37,6 +37,7 @@
   #include linux/uio.h
   #include linux/atomic.h
   #include linux/prefetch.h
  +#include hot_tracking.h
 
   /*
* How many user pages to map in one call to get_user_pages().  This 
  determines
  @@ -1297,6 +1298,15 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, 
  struct inode *inode,
prefetch(bdev-bd_queue);
prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES);
 
  + /* Hot data tracking */
  + if (TRACK_THIS_INODE(iocb-ki_filp-f_mapping-host)
  +  iov_length(iov, nr_segs)  0) {
  + hot_rb_update_freqs(iocb-ki_filp-f_mapping-host,
  + (u64)offset,
  + (u64)iov_length(iov, nr_segs),
  + rw  WRITE);
  + }
 
  That's a bit messy. I'd prefer a static inline function that hides
  all this. e.g.
 Do you think of moving the condition into hot_inode_udate_freqs(), not
 adding another new function?

 Moving it into hot_inode_udate_freqs() will add a function call
 overhead even when tracking is not enabled. a static inline function
Can we not directly define hot_inode_udate_freqs to be a static inline?:)

 will just result in no extra overhead other than the if
 statement

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 07/10] vfs: fork one kthread to update data temperature

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 3:01 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Sep 27, 2012 at 02:54:22PM +0800, Zhi Yong Wu wrote:
 On Thu, Sep 27, 2012 at 12:03 PM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:32PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Fork and run one kernel kthread to calculate
  that temperature based on some metrics kept
  in custom frequency data structs, and store
  the info in the hash table.
 
  No new kthreads, please. Use a per-superblock workqueue and a struct
  delayed_work to run periodic work on each superblock.
 If no new kthread is created, which kthread will work on these
 delayed_work tasks?

 One of the kworker threads that service the workqueue
 infrastructure.
Got it, thanks

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'

2012-09-27 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 3:05 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Sep 27, 2012 at 01:25:34PM +0800, Zhi Yong Wu wrote:
 On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Introduce one new mount option '-o hottrack',
  and add its parsing support.
Its usage looks like:
 mount -o hottrack
 mount -o nouser,hottrack
 mount -o nouser,hottrack,loop
 mount -o hottrack,nouser
 
  I think that this option parsing should be done by the filesystem,
  even though the tracking functionality is in the VFS. That way ony
  the filesystems that can use the tracking information will turn it
  on, rather than being able to turn it on for everything regardless
  of whether it is useful or not.
 
  Along those lines, just using a normal superblock flag to indicate
  it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you
  don't need to allocate the sb-s_hot_info structure just to be able
 If we don't allocate one sb-s_hot_info, where will those hash list
 head and btree roots locate?

 I wrote that thinking (mistakenly) that s-hot)info was dynamically
 allocated rather than being embedded in the struct super_block.

 Indeed, if the mount option is held in s_flags, then it could be
 dynamically allocated, but I don't think that's really necessary...
ah, you prefer allocating it, OK, let me try. thanks for your explaination.


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-26 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 10:19 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Sep 26, 2012 at 10:53:07AM +0800, Zhi Yong Wu wrote:
 On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
  I note that the code will always insert range items of a length
  RANGE_SIZE. This means you have a fixed object granularity and hence
  you have no need for a range based search. That is, you could use a
  radix tree where each entry in the radix tree points directly to the
  range object similar to how the page cache uses a radix tree for
  indexing pages. That brings the possibility of lockless range item
  lookups
 Great suggestion, but can we temporarily put it in TODO list? because
 it will bring one big code change.

 Sure. I just wanted to point out that there are better choices for
 indexing fixed size elements than rb-trees and why it might make
 sense to use a different type of tree.
Got it, thanks. Moreover, it should also be better to use radix tree
to hold hot_inode, not only hot_range.


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'

2012-09-26 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 10:20 AM, Dave Chinner da...@fromorbit.com wrote:
 On Wed, Sep 26, 2012 at 10:56:08AM +0800, Zhi Yong Wu wrote:
 On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote:
  On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Introduce one new mount option '-o hottrack',
  and add its parsing support.
Its usage looks like:
 mount -o hottrack
 mount -o nouser,hottrack
 mount -o nouser,hottrack,loop
 mount -o hottrack,nouser
 
  I think that this option parsing should be done by the filesystem,
  even though the tracking functionality is in the VFS. That way ony
  the filesystems that can use the tracking information will turn it
  on, rather than being able to turn it on for everything regardless
  of whether it is useful or not.
 
  Along those lines, just using a normal superblock flag to indicate
  it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you
  don't need to allocate the sb-s_hot_info structure just to be able
  to check whether we are tracking hot inodes or not.
 
  This then means the hot inode tracking for the superblock can be
  initialised by the filesystem as part of it's fill_super method,
  along with the filesystem specific code that will use the hot
  tracking information the VFS gathers
 I can see what you mean, but don't know if other guys also agree with this.
 If so, all FS specific code which use hot tracking feature wll have to add
 the same chunk of code in it fill_super method. Is it good?

 Most filesystems will only need to add 3-4 lines of code to their
 existing parser, so it's not a big deal I think...
OK, i try to do it. thanks a lot.

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'

2012-09-26 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Introduce one new mount option '-o hottrack',
 and add its parsing support.
   Its usage looks like:
mount -o hottrack
mount -o nouser,hottrack
mount -o nouser,hottrack,loop
mount -o hottrack,nouser

 I think that this option parsing should be done by the filesystem,
 even though the tracking functionality is in the VFS. That way ony
 the filesystems that can use the tracking information will turn it
 on, rather than being able to turn it on for everything regardless
 of whether it is useful or not.

 Along those lines, just using a normal superblock flag to indicate
 it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you
 don't need to allocate the sb-s_hot_info structure just to be able
If we don't allocate one sb-s_hot_info, where will those hash list
head and btree roots locate?

 to check whether we are tracking hot inodes or not.

 This then means the hot inode tracking for the superblock can be
 initialised by the filesystem as part of it's fill_super method,
 along with the filesystem specific code that will use the hot
 tracking information the VFS gathers

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 01/10] vfs: introduce private rb structures

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 3:37 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:26PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   One root structure hot_info is defined, is hooked
 up in super_block, and will be used to hold rb trees
 root, hash list root and some other information, etc.
   Adds hot_inode_tree struct to keep track of
 frequently accessed files, and be keyed by {inode, offset}.
 Trees contain hot_inode_items representing those files
 and ranges.
   Having these trees means that vfs can quickly determine the
 temperature of some data by doing some calculations on the
 hot_freq_data struct that hangs off of the tree item.
   Define two items hot_inode_item and hot_range_item,
 one of them represents one tracked file
 to keep track of its access frequency and the tree of
 ranges in this file, while the latter represents
 a file range of one inode.
   Each of the two structures contains a hot_freq_data
 struct with its frequency of access metrics (number of
 {reads, writes}, last {read,write} time, frequency of
 {reads,writes}).
   Also, each hot_inode_item contains one hot_range_tree
 struct which is keyed by {inode, offset, length}
 and used to keep track of all the ranges in this file.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Just a coupl eof minor formatting things first up - I'll have more
 comments as I get deeper into the series.
All comments are very reasonable, and will be applied.
thanks for your review.


 
 +/*
 + * Initialize the inode tree. Should be called for each new inode
 + * access or other user of the hot_inode interface.
 + */
 +static void hot_rb_inode_tree_init(struct hot_inode_tree *tree)

 The names of these are a bit clunky. You probably don't need the
 _rb_ in the function name. i.e. hot_inode_tree_init() is
 sufficient, and if we every want to change in the tree type we don't
 have to rename every single function...

 .
 +/*
 + * Initialize a new hot_inode_item structure. The new structure is
 + * returned with a reference count of one and needs to be
 + * freed using free_inode_item()
 + */
 +void hot_rb_inode_item_init(void *_item)
 +{

 The usual naming convention for slab initialiser functions is to use
 a suffix of _once to indicate it is only ever called once per
 slab object instantiation, not every time the object is allocated
 fom the slab. See, for example, inode_init_once() and
 inode_init_always().

 so, that would make this function hot_inode_item_init_once().

 
 +/* init hot_inode_item and hot_range_item kmem cache */
 +static int __init hot_rb_item_cache_init(void)
 +{
 + hot_inode_item_cache = kmem_cache_create(hot_inode_item,
 + sizeof(struct hot_inode_item), 0,
 + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
 + hot_rb_inode_item_init);
 + if (!hot_inode_item_cache)
 + goto inode_err;
 +
 + hot_range_item_cache = kmem_cache_create(hot_range_item,
 + sizeof(struct hot_range_item), 0,
 + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
 + hot_rb_range_item_init);
 + if (!hot_range_item_cache)
 + goto range_err;
 +
 + return 0;
 +
 +range_err:
 + kmem_cache_destroy(hot_inode_item_cache);
 +inode_err:
 + return -ENOMEM;
 +}
 +
 +/*
 + * Initialize kmem cache for hot_inode_item
 + * and hot_range_item
 + */
 +void __init hot_track_cache_init(void)
 +{
 + if (hot_rb_item_cache_init())
 + return;

 No real need to have a hot_rb_item_cache_init() function here - just
 open code it all in the hot_track_cache_init() function.

 +}
 diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
 new file mode 100644
 index 000..269b67a
 --- /dev/null
 +++ b/fs/hot_tracking.h
 @@ -0,0 +1,27 @@
 +/*
 + * fs/hot_tracking.h
 + *
 + * Copyright (C) 2012 IBM Corp. All rights reserved.
 + * Written by Zhi Yong Wu wu...@linux.vnet.ibm.com
 + *Ben Chociej bchoc...@gmail.com
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public
 + * License v2 as published by the Free Software Foundation.
 + */
 +
 +#ifndef __HOT_TRACKING__
 +#define __HOT_TRACKING__
 +
 +#include linux/rbtree.h
 +#include linux/hot_tracking.h
 +
 +/* values for hot_freq_data flags */
 +/* freq data struct is for an inode */
 +#define FREQ_DATA_TYPE_INODE (1  0)
 +/* freq data struct is for a range */
 +#define FREQ_DATA_TYPE_RANGE (1  1)

 The comments are redundant - the name of the object documents it's
 use sufficiently.  ie.

 /* values for hot_freq_data flags */
 #define FREQ_DATA_TYPE_INODE (1  0)
 #define FREQ_DATA_TYPE_RANGE (1  1)

 is just fine by itself.

 
 +/* A frequency data struct holds values that are used to
 + * determine temperature of files and file ranges. These structs
 + * are members

Re: [RFC v2 01/10] vfs: introduce private rb structures

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 3:37 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:26PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   One root structure hot_info is defined, is hooked
 up in super_block, and will be used to hold rb trees
 root, hash list root and some other information, etc.
   Adds hot_inode_tree struct to keep track of
 frequently accessed files, and be keyed by {inode, offset}.
 Trees contain hot_inode_items representing those files
 and ranges.
   Having these trees means that vfs can quickly determine the
 temperature of some data by doing some calculations on the
 hot_freq_data struct that hangs off of the tree item.
   Define two items hot_inode_item and hot_range_item,
 one of them represents one tracked file
 to keep track of its access frequency and the tree of
 ranges in this file, while the latter represents
 a file range of one inode.
   Each of the two structures contains a hot_freq_data
 struct with its frequency of access metrics (number of
 {reads, writes}, last {read,write} time, frequency of
 {reads,writes}).
   Also, each hot_inode_item contains one hot_range_tree
 struct which is keyed by {inode, offset, length}
 and used to keep track of all the ranges in this file.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Just a coupl eof minor formatting things first up - I'll have more
 comments as I get deeper into the series.
OK, very look forward to seeing more on other patches, indeed thanks again.


 
 +/*
 + * Initialize the inode tree. Should be called for each new inode
 + * access or other user of the hot_inode interface.
 + */
 +static void hot_rb_inode_tree_init(struct hot_inode_tree *tree)

 The names of these are a bit clunky. You probably don't need the
 _rb_ in the function name. i.e. hot_inode_tree_init() is
 sufficient, and if we every want to change in the tree type we don't
 have to rename every single function...

 .
 +/*
 + * Initialize a new hot_inode_item structure. The new structure is
 + * returned with a reference count of one and needs to be
 + * freed using free_inode_item()
 + */
 +void hot_rb_inode_item_init(void *_item)
 +{

 The usual naming convention for slab initialiser functions is to use
 a suffix of _once to indicate it is only ever called once per
 slab object instantiation, not every time the object is allocated
 fom the slab. See, for example, inode_init_once() and
 inode_init_always().

 so, that would make this function hot_inode_item_init_once().

 
 +/* init hot_inode_item and hot_range_item kmem cache */
 +static int __init hot_rb_item_cache_init(void)
 +{
 + hot_inode_item_cache = kmem_cache_create(hot_inode_item,
 + sizeof(struct hot_inode_item), 0,
 + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
 + hot_rb_inode_item_init);
 + if (!hot_inode_item_cache)
 + goto inode_err;
 +
 + hot_range_item_cache = kmem_cache_create(hot_range_item,
 + sizeof(struct hot_range_item), 0,
 + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
 + hot_rb_range_item_init);
 + if (!hot_range_item_cache)
 + goto range_err;
 +
 + return 0;
 +
 +range_err:
 + kmem_cache_destroy(hot_inode_item_cache);
 +inode_err:
 + return -ENOMEM;
 +}
 +
 +/*
 + * Initialize kmem cache for hot_inode_item
 + * and hot_range_item
 + */
 +void __init hot_track_cache_init(void)
 +{
 + if (hot_rb_item_cache_init())
 + return;

 No real need to have a hot_rb_item_cache_init() function here - just
 open code it all in the hot_track_cache_init() function.

 +}
 diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
 new file mode 100644
 index 000..269b67a
 --- /dev/null
 +++ b/fs/hot_tracking.h
 @@ -0,0 +1,27 @@
 +/*
 + * fs/hot_tracking.h
 + *
 + * Copyright (C) 2012 IBM Corp. All rights reserved.
 + * Written by Zhi Yong Wu wu...@linux.vnet.ibm.com
 + *Ben Chociej bchoc...@gmail.com
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public
 + * License v2 as published by the Free Software Foundation.
 + */
 +
 +#ifndef __HOT_TRACKING__
 +#define __HOT_TRACKING__
 +
 +#include linux/rbtree.h
 +#include linux/hot_tracking.h
 +
 +/* values for hot_freq_data flags */
 +/* freq data struct is for an inode */
 +#define FREQ_DATA_TYPE_INODE (1  0)
 +/* freq data struct is for a range */
 +#define FREQ_DATA_TYPE_RANGE (1  1)

 The comments are redundant - the name of the object documents it's
 use sufficiently.  ie.

 /* values for hot_freq_data flags */
 #define FREQ_DATA_TYPE_INODE (1  0)
 #define FREQ_DATA_TYPE_RANGE (1  1)

 is just fine by itself.

 
 +/* A frequency data struct holds values that are used to
 + * determine temperature of files and file ranges. These structs
 + * are members

Re: [PATCH v2 2/2] btrfs-progs: Fix up memory leakage

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 6:14 PM, David Sterba d...@jikos.cz wrote:
 On Tue, Sep 25, 2012 at 10:02:16AM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Some code pathes forget to free memory on exit.

 Same as with the fd's, kernel will free all memory for us at exit().
hi, can you let me know the pointer to exit() said by you?


 If there's lots of memory allocated, it may be even faster to leave the
 unallocation process to kernel as it will do it in one go, while the
 application would unnecessarily free it chunk by chunk.
got it, thanks.

 david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-25 Thread Zhi Yong Wu
thanks a lot for your review in my heart, Dave. It is very helpful to me.

On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add some utils helpers to update access frequencies
 for one file or its range.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c |  359 
 +
  fs/hot_tracking.h |   15 +++
  2 files changed, 374 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 173054b..52ed926 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -106,6 +106,365 @@ inode_err:
  }

  /*
 + * Drops the reference out on hot_inode_item by one and free the structure
 + * if the reference count hits zero
 + */
 +void hot_rb_free_hot_inode_item(struct hot_inode_item *he)

 hot_inode_item_put()

 +{
 + if (!he)
 + return;

 It's a bug to call a put function on a kref counted item with a null
 pointer. Let the kernel crash so it is noticed and fixed.
OK, will remove it.

 +
 + if (atomic_dec_and_test(he-refs.refcount)) {
 + WARN_ON(he-in_tree);
 + kmem_cache_free(hot_inode_item_cache, he);
 + }

 Isn't this abusing krefs? i.e. this should be:
Sorry, thanks for your explaination as below:

 hot_inode_item_free()
 {
 WARN_ON(he-in_tree);
 kmem_cache_free(hot_inode_item_cache, he);
 }

 hot_inode_item_put()
 {
 kref_put(he-refs, hot_inode_item_free)
 }

 +/*
 + * Drops the reference out on hot_range_item by one and free the structure
 + * if the reference count hits zero
 + */
 +static void hot_rb_free_hot_range_item(struct hot_range_item *hr)

 same comments as above.
OK, thanks.
 
 +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root,
 + unsigned long inode_num,
 + struct rb_node *node)

 static struct rb_node *
 hot_inode_item_find(. )
OK, thanks.

 +{
 + struct rb_node **p = root-rb_node;
 + struct rb_node *parent = NULL;
 + struct hot_inode_item *entry;
 +
 + /* walk tree to find insertion point */
 + while (*p) {
 + parent = *p;
 + entry = rb_entry(parent, struct hot_inode_item, rb_node);
 +
 + if (inode_num  entry-i_ino)
 + p = (*p)-rb_left;
 + else if (inode_num  entry-i_ino)
 + p = (*p)-rb_right;
 + else
 + return parent;
 + }
 +
 + entry = rb_entry(node, struct hot_inode_item, rb_node);
 + entry-in_tree = 1;
 + rb_link_node(node, parent, p);
 + rb_insert_color(node, root);

 So the hot inode item search key is the inode number? Why use an
Yes
 rb-tree then? Wouldn't a btree be a more memory efficient way to
 hold a sparse index that has fixed key and pointer sizes?
Yes, i know, but if we don't use btree, what will be better? Radix tree?


 Also, the API seems quite strange. you pass in the the rb_node and
 the inode number which instead of passing in the hot inode item that
 already holds this information. You then convert the rb_node back to
 a hot inode item to set the in_tree variable. So why not just pass
 in the struct hot_inode_item in the first place?
Good catch, thanks for your remider.

 +static u64 hot_rb_range_end(struct hot_range_item *hr)

 hot_range_item_end()
OK

 +{
 + if (hr-start + hr-len  hr-start)
 + return (u64)-1;
 +
 + return hr-start + hr-len - 1;
 +}
 +
 +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root,

 hot_range_item_find()
OK

 + u64 start,
 + struct rb_node *node)
 +{
 + struct rb_node **p = root-rb_node;
 + struct rb_node *parent = NULL;
 + struct hot_range_item *entry;
 +
 + /* ensure start is on a range boundary */
 + start = start  RANGE_SIZE_MASK;
 + /* walk tree to find insertion point */
 + while (*p) {
 + parent = *p;
 + entry = rb_entry(parent, struct hot_range_item, rb_node);
 +
 + if (start  entry-start)
 + p = (*p)-rb_left;
 + else if (start = hot_rb_range_end(entry))

 Shouldn't an aligned end always be one byte short of the start
 offset of the next aligned region? i.e. start ==
 hot_rb_range_end(entry) is an indication of an off-by one bug
 somewhere?
This is really one good catch, thanks.

 + p = (*p)-rb_right;
 + else
 + return parent;
 + }
 +
 + entry = rb_entry(node, struct hot_range_item, rb_node);
 + entry-in_tree = 1;
 + rb_link_node(node, parent, p);
 + rb_insert_color(node, root);

 Same comments as the hot_inode_range.
OK.

 Also, the start offset

Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Introduce one new mount option '-o hottrack',
 and add its parsing support.
   Its usage looks like:
mount -o hottrack
mount -o nouser,hottrack
mount -o nouser,hottrack,loop
mount -o hottrack,nouser

 I think that this option parsing should be done by the filesystem,
 even though the tracking functionality is in the VFS. That way ony
 the filesystems that can use the tracking information will turn it
 on, rather than being able to turn it on for everything regardless
 of whether it is useful or not.

 Along those lines, just using a normal superblock flag to indicate
 it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you
 don't need to allocate the sb-s_hot_info structure just to be able
 to check whether we are tracking hot inodes or not.

 This then means the hot inode tracking for the superblock can be
 initialised by the filesystem as part of it's fill_super method,
 along with the filesystem specific code that will use the hot
 tracking information the VFS gathers
I can see what you mean, but don't know if other guys also agree with this.
If so, all FS specific code which use hot tracking feature wll have to add
the same chunk of code in it fill_super method. Is it good?


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs-progs: Fix up memory leakage

2012-09-25 Thread Zhi Yong Wu
On Wed, Sep 26, 2012 at 1:14 AM, Goffredo Baroncelli kreij...@libero.it wrote:
 On 09/25/2012 12:14 PM, David Sterba wrote:

 On Tue, Sep 25, 2012 at 10:02:16AM +0800, zwu.ker...@gmail.com wrote:

 From: Zhi Yong Wuwu...@linux.vnet.ibm.com

Some code pathes forget to free memory on exit.


 Same as with the fd's, kernel will free all memory for us at exit().


 I strongly disagree with this approach. The callee often don't know what
 happen after and before the call. The same is true for the programmer,
 because the code is quite often updated by several people. A clean exit() is
 the right thing to do as general rule. I don't see any valid reason (in the
 btrfs context) to do otherwise.

 Relying on the exit() for a proper clean-up increase the likelihood of bug
 when the code evolves (see my patch   [RESPOST][BTRFS-PROGS][PATCH]
 btrfs_read_dev_super(): uninitialized variable for an example of what means
 an incorrect deallocation of resource).


 If there's lots of memory allocated, it may be even faster to leave the
 unallocation process to kernel as it will do it in one go, while the
 application would unnecessarily free it chunk by chunk.


 May be I am wrong, but I don't think that the increase of speed of the btrfs
 command is even measurable relying on exit instead of free()-ing each
 chunk of memory one at time The same should be true for the
 open()/close()

I fully agree with you. In one same function, i find that some code
path free system sources,
while other code path doesn't. This is one nice way.


 My 2¢

 BR
 G.Baroncelli


 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in

 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .





-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 01/10] vfs: introduce private rb structures

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 6:20 PM, Ram Pai linux...@us.ibm.com wrote:
 On Sun, Sep 23, 2012 at 08:56:26PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   One root structure hot_info is defined, is hooked
 up in super_block, and will be used to hold rb trees
 root, hash list root and some other information, etc.
   Adds hot_inode_tree struct to keep track of
 frequently accessed files, and be keyed by {inode, offset}.
 Trees contain hot_inode_items representing those files
 and ranges.
   Having these trees means that vfs can quickly determine the
 temperature of some data by doing some calculations on the
 hot_freq_data struct that hangs off of the tree item.
   Define two items hot_inode_item and hot_range_item,
 one of them represents one tracked file
 to keep track of its access frequency and the tree of
 ranges in this file, while the latter represents
 a file range of one inode.
   Each of the two structures contains a hot_freq_data
 struct with its frequency of access metrics (number of
 {reads, writes}, last {read,write} time, frequency of
 {reads,writes}).
   Also, each hot_inode_item contains one hot_range_tree
 struct which is keyed by {inode, offset, length}
 and used to keep track of all the ranges in this file.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
 +
 ..snip..

 +/* A tree that sits on the hot_info */
 +struct hot_inode_tree {
 + struct rb_root map;
 + rwlock_t lock;
 +};
 +
 +/* A tree of ranges for each inode in the hot_inode_tree */
 +struct hot_range_tree {
 + struct rb_root map;
 + rwlock_t lock;
 +};

 Can as well have a generic datastructure called hot_tree instead
 of having two different datastructure which basically are the same.
OK.

 +
 +/* A frequency data struct holds values that are used to
 + * determine temperature of files and file ranges. These structs
 + * are members of hot_inode_item and hot_range_item
 + */
 +struct hot_freq_data {
 + struct timespec last_read_time;
 + struct timespec last_write_time;
 + u32 nr_reads;
 + u32 nr_writes;
 + u64 avg_delta_reads;
 + u64 avg_delta_writes;
 + u8 flags;
 + u32 last_temperature;
 +};
 +
 +/* An item representing an inode and its access frequency */
 +struct hot_inode_item {
 + /* node for hot_inode_tree rb_tree */
 + struct rb_node rb_node;
 + /* tree of ranges in this inode */
 + struct hot_range_tree hot_range_tree;
 + /* frequency data for this inode */
 + struct hot_freq_data hot_freq_data;
 + /* inode number, copied from inode */
 + unsigned long i_ino;
 + /* used to check for errors in ref counting */
 + u8 in_tree;
 + /* protects hot_freq_data, i_no, in_tree */
 + spinlock_t lock;
 + /* prevents kfree */
 + struct kref refs;
 +};
 +
 +/*
 + * An item representing a range inside of an inode whose frequency
 + * is being tracked
 + */
 +struct hot_range_item {
 + /* node for hot_range_tree rb_tree */
 + struct rb_node rb_node;
 + /* frequency data for this range */
 + struct hot_freq_data hot_freq_data;
 + /* the hot_inode_item associated with this hot_range_item */
 + struct hot_inode_item *hot_inode;
 + /* starting offset of this range */
 + u64 start;
 + /* length of this range */
 + u64 len;
 + /* used to check for errors in ref counting */
 + u8 in_tree;
 + /* protects hot_freq_data, start, len, and in_tree */
 + spinlock_t lock;
 + /* prevents kfree */
 + struct kref refs;
 +};

 might as well have just one generic datastructure called hot_item with
 all the common fields and then have

 struct hot_inode_item  {
 struct hot_item hot_inode;
 struct hot_tree hot_range_tree;
 unsigned long i_ino;
 }

 and

 struct hot_range_item {
 struct hot_item hot_range;
 u64 start;
 u64 len;/* length of this range */
 }

 This should help you eliminate some duplicate code as well.
OK, i will try to apply them. thanks.


 RP




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 05/10] vfs: introduce one hash table

2012-09-25 Thread Zhi Yong Wu
On Tue, Sep 25, 2012 at 5:54 PM, Ram Pai linux...@us.ibm.com wrote:
 On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Adds a hash table structure which contains
 a lot of hash list and is used to efficiently
 look up the data temperature of a file or its
 ranges.
   In each hash list of hash table, the hash node
 will keep track of temperature info.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   77 
 -
  include/linux/hot_tracking.h |   35 +++
  2 files changed, 110 insertions(+), 2 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index fa89f70..5f96442 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -16,6 +16,7 @@
  #include linux/module.h
  #include linux/spinlock.h
  #include linux/hardirq.h
 +#include linux/hash.h
  #include linux/fs.h
  #include linux/blkdev.h
  #include linux/types.h
 @@ -24,6 +25,9 @@

 ...snip...

 +/* Hash list heads for hot hash table */
 +struct hot_hash_head {
 + struct hlist_head hashhead;
 + rwlock_t rwlock;
 + u32 temperature;
 +};
 +
 +/* Nodes stored in each hash list of hash table */
 +struct hot_hash_node {
 + struct hlist_node hashnode;
 + struct list_head node;
 + struct hot_freq_data *hot_freq_data;
 + struct hot_hash_head *hlist;
 + spinlock_t lock; /* protects hlist */
 +
 + /*
 +  * number of references to this node
 +  * equals 1 (hashlist entry)
 +  */
 + struct kref refs;
 +};

 Dont see why you need yet another datastructure to hash the inode_item
 and the range_item into a hash list.  You can just add another
If there's such one structure, we can rapidly know which hash bucket
one inode_item is currently linking to via its hlist field.
This is useful if one inode_item corresponding to one file need to
move from one bucket to another bucket when this file get cold or
hotter.
Does it make sense to you?

 hlist_node in the inode_item and range_item. This field can be then used
 to link into the corresponding hash list.

 You can use the container_of() get to the inode_item or the range_item
 using the hlist_node field.

 You can thus eliminate a lot of code.
As what i said above, i don't think it can eliminate a lo of code.

 +
  /* An item representing an inode and its access frequency */
  struct hot_inode_item {
   /* node for hot_inode_tree rb_tree */
 @@ -68,6 +93,8 @@ struct hot_inode_item {
   spinlock_t lock;
   /* prevents kfree */
   struct kref refs;
 + /* hashlist node for this inode */
 + struct hot_hash_node *heat_node;

 this can be just
 struct hlist_node head_node; /* lookup hot_inode hash list */

 Use this field to link it into the corresponding hashlist.

  };

 this can be just
  /*
 @@ -91,6 +118,8 @@ struct hot_range_item {
   spinlock_t lock;
   /* prevents kfree */
   struct kref refs;
 + /* hashlist node for this range */
 + struct hot_hash_node *heat_node;

 this can be just
 struct hlist_node head_node; /* lookup hot_range hash list */


  };

  struct hot_info {
 @@ -98,6 +127,12 @@ struct hot_info {

   /* red-black tree that keeps track of fs-wide hot data */
   struct hot_inode_tree hot_inode_tree;
 +
 + /* hash map of inode temperature */
 + struct hot_hash_head heat_inode_hl[HEAT_HASH_SIZE];
 +
 + /* hash map of range temperature */
 + struct hot_hash_head heat_range_hl[HEAT_HASH_SIZE];
  };

  #endif  /* _LINUX_HOTTRACK_H */




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] btrfs-progs: some bugfixes

2012-09-05 Thread Zhi Yong Wu
  Some misc bugs are found when i work on other tasks.
Now send out them for interview, thanks.

Zhi Yong Wu (2):
  btrfs-progs: Close file descriptor on exit
  btrfs-progs: Fix up memory leakage

 cmds-filesystem.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs-progs: Close file descriptor on exit

2012-09-05 Thread Zhi Yong Wu
  Need to close fd on exit.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 cmds-filesystem.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index b1457de..e62c4fd 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -77,18 +77,23 @@ static int cmd_df(int argc, char **argv)
if (ret) {
fprintf(stderr, ERROR: couldn't get space info on '%s' - %s\n,
path, strerror(e));
+   close(fd);
free(sargs);
return ret;
}
-   if (!sargs-total_spaces)
+   if (!sargs-total_spaces) {
+   close(fd);
return 0;
+   }
 
count = sargs-total_spaces;
 
sargs = realloc(sargs, sizeof(struct btrfs_ioctl_space_args) +
(count * sizeof(struct btrfs_ioctl_space_info)));
-   if (!sargs)
+   if (!sargs) {
+   close(fd);
return -ENOMEM;
+   }
 
sargs-space_slots = count;
sargs-total_spaces = 0;
@@ -148,6 +153,7 @@ static int cmd_df(int argc, char **argv)
printf(%s: total=%s, used=%s\n, description, total_bytes,
   used_bytes);
}
+   close(fd);
free(sargs);
 
return 0;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] btrfs-progs: Fix up memory leakage

2012-09-05 Thread Zhi Yong Wu
  Some code pathes forget to free memory on exit.

Changelog from v1:
  Fix the variable is used uncorrectly. [Ram Pai]

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 cmds-filesystem.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index e62c4fd..9c43d35 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -47,7 +47,7 @@ static const char * const cmd_df_usage[] = {
 
 static int cmd_df(int argc, char **argv)
 {
-   struct btrfs_ioctl_space_args *sargs;
+   struct btrfs_ioctl_space_args *sargs, *sargs_orig;
u64 count = 0, i;
int ret;
int fd;
@@ -65,7 +65,7 @@ static int cmd_df(int argc, char **argv)
return 12;
}
 
-   sargs = malloc(sizeof(struct btrfs_ioctl_space_args));
+   sargs_orig = sargs = malloc(sizeof(struct btrfs_ioctl_space_args));
if (!sargs)
return -ENOMEM;
 
@@ -83,6 +83,7 @@ static int cmd_df(int argc, char **argv)
}
if (!sargs-total_spaces) {
close(fd);
+   free(sargs);
return 0;
}
 
@@ -92,6 +93,7 @@ static int cmd_df(int argc, char **argv)
(count * sizeof(struct btrfs_ioctl_space_info)));
if (!sargs) {
close(fd);
+   free(sargs_orig);
return -ENOMEM;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: rework the code logic

2012-07-05 Thread Zhi Yong Wu
Sorry, please ignore this patch.

On Thu, Jul 5, 2012 at 3:26 PM,  zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  extent-cache.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/extent-cache.c b/extent-cache.c
 index 3dd6434..a5084ee 100644
 --- a/extent-cache.c
 +++ b/extent-cache.c
 @@ -136,10 +136,10 @@ struct cache_extent *find_first_cache_extent(struct 
 cache_tree *tree,
 struct cache_extent *entry;

 ret = __tree_search(tree-root, start, 1, prev);
 -   if (!ret)
 +   if (!ret) {
 ret = prev;
 -   if (!ret)
 return NULL;
 +}
 entry = rb_entry(ret, struct cache_extent, rb_node);
 return entry;
  }
 --
 1.7.6

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_print_tree?

2012-07-01 Thread Zhi Yong Wu
On Sun, Jul 1, 2012 at 5:41 PM, Mike Fleetwood
mike.fleetw...@googlemail.com wrote:
 On 1 July 2012 05:53, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 HI,

 Do anyone know where btrfs_print_tree is invoked? thanks.

 --
 Regards,

 Zhi Yong Wu

 Is this the answer you are after?

 $ grep -r btrfs_print_tree fs/btrfs/
 fs/btrfs/print-tree.c:void btrfs_print_tree(struct btrfs_root *root,
 struct extent_buffer *c)
 fs/btrfs/print-tree.c:  btrfs_print_tree(root, next);
 fs/btrfs/print-tree.h:void btrfs_print_tree(struct btrfs_root *root,
 struct extent_buffer *t);
No, i also did as this, but didn't find out who will invoke this
function. From above output, we only saw that it invokes itself one
time.


 Mike



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_print_tree?

2012-07-01 Thread Zhi Yong Wu
On Sun, Jul 1, 2012 at 6:16 PM, Jeff Liu jeff@oracle.com wrote:
 On 07/01/2012 05:49 PM, Zhi Yong Wu wrote:

 On Sun, Jul 1, 2012 at 5:41 PM, Mike Fleetwood
 mike.fleetw...@googlemail.com wrote:
 On 1 July 2012 05:53, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 HI,

 Do anyone know where btrfs_print_tree is invoked? thanks.

 --
 Regards,

 Zhi Yong Wu

 Is this the answer you are after?

 $ grep -r btrfs_print_tree fs/btrfs/
 fs/btrfs/print-tree.c:void btrfs_print_tree(struct btrfs_root *root,
 struct extent_buffer *c)
 fs/btrfs/print-tree.c:  btrfs_print_tree(root, next);
 fs/btrfs/print-tree.h:void btrfs_print_tree(struct btrfs_root *root,
 struct extent_buffer *t);
 No, i also did as this, but didn't find out who will invoke this
 function. From above output, we only saw that it invokes itself one
 time.

 Looks this is a helper routine exported to btrfs-progs previously, it is
 used by debug-tree, quick-test, etc...

 But this function has been implemented at btrfs-progs now, maybe it
 could be safely removed from kernel, not sure. :)
Great, thanks both.

 Thanks,
 -Jeff



 Mike








-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_print_tree?

2012-06-30 Thread Zhi Yong Wu
HI,

Do anyone know where btrfs_print_tree is invoked? thanks.

-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can give some help?

2012-06-29 Thread Zhi Yong Wu
HI,

Can anyone let me know where the funtions are declared or defined,
such as btrfs_header_nritems(), btrfs_header_level(), etc? thanks.

-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can give some help?

2012-06-29 Thread Zhi Yong Wu
On Fri, Jun 29, 2012 at 9:47 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Fri, Jun 29, 2012 at 09:41:47PM +0800, Zhi Yong Wu wrote:
 HI,

 Can anyone let me know where the funtions are declared or defined,
 such as btrfs_header_nritems(), btrfs_header_level(), etc? thanks.

   ctree.h, somewhere around or after line 1550. They're all accessor
 functions, defined by a set of macros. Look for the *_SETGET_* macros.
 The actual definitions of BTRFS_SETGET_FUNCS are in struct-funcs.h
Great, thanks.


   Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
       --- Great oxymorons of the world, no. 1: Family Holiday ---



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html