Re: [RFC PATCH] ceph: Write through cache support based on fscache

2013-11-02 Thread Li Wang

Hi Milosz,
  Thanks for your comments.
  We think SSD and fscache based write cache is definitely useful for 
Ceph, since to some extent, write amplification slow down the write 
performance of Ceph. Lustre has already introduced SSD based write 
cache. SSD can be treated as an outer big cache for page cache. It can 
reduce the requirement of network and OSD bandwidth. Write back cache is 
more performance useful, but more complicated to implement to meet the 
consistence and other correctness semantic demands of Ceph and POSIX, 
such as sync(). Write through cache is much simpler, which will not 
bother too much. So our goal is to implement both, we plan to submit it 
as a blueprint at the incoming CDS.
  It would be great if you could help review and give comments on our 
codes during the development. Again, thanks very much.


Cheers,
Li Wang

On 11/02/2013 12:51 AM, Milosz Tanski wrote:

Li,

I think it would be fantastic to see a write cache. In many workloads
you ended up writing out a file and then turning around and reading it
right back in on the same node.

There's a few things that I would like to see. First, an mount option
to turn on/off write through caching. There are some workloads / user
hardware configurations that will not benefit from this (it might be a
net negative). Also, I think it's nice to have a fallback to disable
it it's miss behaving.

Second, for correctness I think you should only do write-through
caching if you have an exclusive cap on the file. Currently as the
code is written it only reads from fscache if the file is open in read
only mode and has the cache cap. This would also have to change.

Thanks,
- Milosz

P.S: Sorry for the second message Li, I fail at email and forgot to reply-all.

On Fri, Nov 1, 2013 at 9:49 AM, Li Wang liw...@ubuntukylin.com wrote:

Currently, fscache only plays as read cache for ceph, this patch
enables it plays as the write through cache as well.

A small trick to be discussed: if the writing to OSD finishes before
the writing to fscache, the fscache writing is cancelled to avoid
slow down the writepages() process.

Signed-off-by: Min Chen minc...@ubuntukylin.com
Signed-off-by: Li Wang liw...@ubuntukylin.com
Signed-off-by: Yunchuan Wen yunchuan...@ubuntukylin.com
---
  fs/ceph/addr.c  |   10 +++---
  fs/ceph/cache.c |   29 +
  fs/ceph/cache.h |   13 +
  3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6df8bd4..2465c49 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -506,7 +506,7 @@ static int writepage_nounlock(struct page *page, struct 
writeback_control *wbc)
 CONGESTION_ON_THRESH(fsc-mount_options-congestion_kb))
 set_bdi_congested(fsc-backing_dev_info, BLK_RW_ASYNC);

-   ceph_readpage_to_fscache(inode, page);
+   ceph_writepage_to_fscache(inode, page);

 set_page_writeback(page);
 err = ceph_osdc_writepages(osdc, ceph_vino(inode),
@@ -634,6 +634,7 @@ static void writepages_finish(struct ceph_osd_request *req,
 if ((issued  (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0)
 generic_error_remove_page(inode-i_mapping, page);

+   ceph_maybe_release_fscache_page(inode, page);
 unlock_page(page);
 }
 dout(%p wrote+cleaned %d pages\n, inode, wrote);
@@ -746,7 +747,7 @@ retry:

 while (!done  index = end) {
 int num_ops = do_sync ? 2 : 1;
-   unsigned i;
+   unsigned i, j;
 int first;
 pgoff_t next;
 int pvec_pages, locked_pages;
@@ -894,7 +895,6 @@ get_more_pages:
 if (!locked_pages)
 goto release_pvec_pages;
 if (i) {
-   int j;
 BUG_ON(!locked_pages || first  0);

 if (pvec_pages  i == pvec_pages 
@@ -924,6 +924,10 @@ get_more_pages:

 osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
 !!pool, false);
+   for (j = 0; j  locked_pages; j++) {
+   struct page *page = pages[j];
+   ceph_writepage_to_fscache(inode, page);
+   }

 pages = NULL;   /* request message now owns the pages array */
 pool = NULL;
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 6bfe65e..6f928c4 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -320,6 +320,24 @@ void ceph_readpage_to_fscache(struct inode *inode, struct 
page *page)
  fscache_uncache_page(ci-fscache, page);
  }

+void ceph_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+   struct ceph_inode_info *ci = ceph_inode(inode);
+   int ret;
+
+   if (!cache_valid(ci))
+   return;
+
+   if (!PageFsCache(page

[RFC PATCH] ceph: Write through cache support based on fscache

2013-11-01 Thread Li Wang
Currently, fscache only plays as read cache for ceph, this patch
enables it plays as the write through cache as well. 

A small trick to be discussed: if the writing to OSD finishes before
the writing to fscache, the fscache writing is cancelled to avoid 
slow down the writepages() process.

Signed-off-by: Min Chen minc...@ubuntukylin.com
Signed-off-by: Li Wang liw...@ubuntukylin.com
Signed-off-by: Yunchuan Wen yunchuan...@ubuntukylin.com
---
 fs/ceph/addr.c  |   10 +++---
 fs/ceph/cache.c |   29 +
 fs/ceph/cache.h |   13 +
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6df8bd4..2465c49 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -506,7 +506,7 @@ static int writepage_nounlock(struct page *page, struct 
writeback_control *wbc)
CONGESTION_ON_THRESH(fsc-mount_options-congestion_kb))
set_bdi_congested(fsc-backing_dev_info, BLK_RW_ASYNC);
 
-   ceph_readpage_to_fscache(inode, page);
+   ceph_writepage_to_fscache(inode, page);
 
set_page_writeback(page);
err = ceph_osdc_writepages(osdc, ceph_vino(inode),
@@ -634,6 +634,7 @@ static void writepages_finish(struct ceph_osd_request *req,
if ((issued  (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0)
generic_error_remove_page(inode-i_mapping, page);
 
+   ceph_maybe_release_fscache_page(inode, page);
unlock_page(page);
}
dout(%p wrote+cleaned %d pages\n, inode, wrote);
@@ -746,7 +747,7 @@ retry:
 
while (!done  index = end) {
int num_ops = do_sync ? 2 : 1;
-   unsigned i;
+   unsigned i, j;
int first;
pgoff_t next;
int pvec_pages, locked_pages;
@@ -894,7 +895,6 @@ get_more_pages:
if (!locked_pages)
goto release_pvec_pages;
if (i) {
-   int j;
BUG_ON(!locked_pages || first  0);
 
if (pvec_pages  i == pvec_pages 
@@ -924,6 +924,10 @@ get_more_pages:
 
osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
!!pool, false);
+   for (j = 0; j  locked_pages; j++) {
+   struct page *page = pages[j];
+   ceph_writepage_to_fscache(inode, page);
+   }
 
pages = NULL;   /* request message now owns the pages array */
pool = NULL;
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 6bfe65e..6f928c4 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -320,6 +320,24 @@ void ceph_readpage_to_fscache(struct inode *inode, struct 
page *page)
 fscache_uncache_page(ci-fscache, page);
 }
 
+void ceph_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+   struct ceph_inode_info *ci = ceph_inode(inode);
+   int ret;
+
+   if (!cache_valid(ci))
+   return;
+
+   if (!PageFsCache(page)) {
+   if (fscache_alloc_page(ci-fscache, page, GFP_KERNEL))
+   return;
+   }
+
+   if (fscache_write_page(ci-fscache, page, GFP_KERNEL))
+   fscache_uncache_page(ci-fscache, page);
+}
+
+
 void ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
 {
struct ceph_inode_info *ci = ceph_inode(inode);
@@ -328,6 +346,17 @@ void ceph_invalidate_fscache_page(struct inode* inode, 
struct page *page)
fscache_uncache_page(ci-fscache, page);
 }
 
+void ceph_maybe_release_fscache_page(struct inode *inode, struct page *page)
+{
+   struct ceph_inode_info *ci = ceph_inode(inode);
+
+   if (PageFsCache(page)) {
+   if (!fscache_check_page_write(ci-fscache, page))
+   fscache_maybe_release_page(ci-fscache,
+  page, GFP_KERNEL);
+   }
+}
+
 void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
 {
if (fsc-revalidate_wq)
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index ba94940..aa02b7a 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -45,7 +45,9 @@ int ceph_readpages_from_fscache(struct inode *inode,
struct list_head *pages,
unsigned *nr_pages);
 void ceph_readpage_to_fscache(struct inode *inode, struct page *page);
+void ceph_writepage_to_fscache(struct inode *inode, struct page *page);
 void ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
+void ceph_maybe_release_fscache_page(struct inode *inode, struct page *page);
 void ceph_queue_revalidate(struct inode *inode);
 
 static inline void ceph_fscache_invalidate(struct inode *inode)
@@ -127,6 +129,11 @@ static inline void ceph_readpage_to_fscache(struct inode 
*inode,
 {
 }
 
+static inline void

Re: [RFC PATCH] ceph: Write through cache support based on fscache

2013-11-01 Thread Milosz Tanski
Li,

I think it would be fantastic to see a write cache. In many workloads
you ended up writing out a file and then turning around and reading it
right back in on the same node.

There's a few things that I would like to see. First, an mount option
to turn on/off write through caching. There are some workloads / user
hardware configurations that will not benefit from this (it might be a
net negative). Also, I think it's nice to have a fallback to disable
it it's miss behaving.

Second, for correctness I think you should only do write-through
caching if you have an exclusive cap on the file. Currently as the
code is written it only reads from fscache if the file is open in read
only mode and has the cache cap. This would also have to change.

Thanks,
- Milosz

P.S: Sorry for the second message Li, I fail at email and forgot to reply-all.

On Fri, Nov 1, 2013 at 9:49 AM, Li Wang liw...@ubuntukylin.com wrote:
 Currently, fscache only plays as read cache for ceph, this patch
 enables it plays as the write through cache as well.

 A small trick to be discussed: if the writing to OSD finishes before
 the writing to fscache, the fscache writing is cancelled to avoid
 slow down the writepages() process.

 Signed-off-by: Min Chen minc...@ubuntukylin.com
 Signed-off-by: Li Wang liw...@ubuntukylin.com
 Signed-off-by: Yunchuan Wen yunchuan...@ubuntukylin.com
 ---
  fs/ceph/addr.c  |   10 +++---
  fs/ceph/cache.c |   29 +
  fs/ceph/cache.h |   13 +
  3 files changed, 49 insertions(+), 3 deletions(-)

 diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
 index 6df8bd4..2465c49 100644
 --- a/fs/ceph/addr.c
 +++ b/fs/ceph/addr.c
 @@ -506,7 +506,7 @@ static int writepage_nounlock(struct page *page, struct 
 writeback_control *wbc)
 CONGESTION_ON_THRESH(fsc-mount_options-congestion_kb))
 set_bdi_congested(fsc-backing_dev_info, BLK_RW_ASYNC);

 -   ceph_readpage_to_fscache(inode, page);
 +   ceph_writepage_to_fscache(inode, page);

 set_page_writeback(page);
 err = ceph_osdc_writepages(osdc, ceph_vino(inode),
 @@ -634,6 +634,7 @@ static void writepages_finish(struct ceph_osd_request 
 *req,
 if ((issued  (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 
 0)
 generic_error_remove_page(inode-i_mapping, page);

 +   ceph_maybe_release_fscache_page(inode, page);
 unlock_page(page);
 }
 dout(%p wrote+cleaned %d pages\n, inode, wrote);
 @@ -746,7 +747,7 @@ retry:

 while (!done  index = end) {
 int num_ops = do_sync ? 2 : 1;
 -   unsigned i;
 +   unsigned i, j;
 int first;
 pgoff_t next;
 int pvec_pages, locked_pages;
 @@ -894,7 +895,6 @@ get_more_pages:
 if (!locked_pages)
 goto release_pvec_pages;
 if (i) {
 -   int j;
 BUG_ON(!locked_pages || first  0);

 if (pvec_pages  i == pvec_pages 
 @@ -924,6 +924,10 @@ get_more_pages:

 osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
 !!pool, false);
 +   for (j = 0; j  locked_pages; j++) {
 +   struct page *page = pages[j];
 +   ceph_writepage_to_fscache(inode, page);
 +   }

 pages = NULL;   /* request message now owns the pages array */
 pool = NULL;
 diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
 index 6bfe65e..6f928c4 100644
 --- a/fs/ceph/cache.c
 +++ b/fs/ceph/cache.c
 @@ -320,6 +320,24 @@ void ceph_readpage_to_fscache(struct inode *inode, 
 struct page *page)
  fscache_uncache_page(ci-fscache, page);
  }

 +void ceph_writepage_to_fscache(struct inode *inode, struct page *page)
 +{
 +   struct ceph_inode_info *ci = ceph_inode(inode);
 +   int ret;
 +
 +   if (!cache_valid(ci))
 +   return;
 +
 +   if (!PageFsCache(page)) {
 +   if (fscache_alloc_page(ci-fscache, page, GFP_KERNEL))
 +   return;
 +   }
 +
 +   if (fscache_write_page(ci-fscache, page, GFP_KERNEL))
 +   fscache_uncache_page(ci-fscache, page);
 +}
 +
 +
  void ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
  {
 struct ceph_inode_info *ci = ceph_inode(inode);
 @@ -328,6 +346,17 @@ void ceph_invalidate_fscache_page(struct inode* inode, 
 struct page *page)
 fscache_uncache_page(ci-fscache, page);
  }

 +void ceph_maybe_release_fscache_page(struct inode *inode, struct page *page)
 +{
 +   struct ceph_inode_info *ci = ceph_inode(inode);
 +
 +   if (PageFsCache(page)) {
 +   if (!fscache_check_page_write(ci-fscache, page))
 +   fscache_maybe_release_page(ci-fscache

write-through cache

2012-05-23 Thread Mandell Degerness
I would like to test the effect of using the new write-through cache
on RBD volumes mounted to Openstack VMs.  What, precisely, are the
changes I need to make to the volume XML in order to do so?

-Mandell
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write-through cache

2012-05-23 Thread Josh Durgin

On 05/23/2012 04:44 PM, Mandell Degerness wrote:

I would like to test the effect of using the new write-through cache
on RBD volumes mounted to Openstack VMs.  What, precisely, are the
changes I need to make to the volume XML in order to do so?


If your only volumes are rbd, you can append 
:rbd_cache=true:rbd_cache_max_dirty=0 to the image name in the volume 
xml, so it would include something like


source protocol=rbd 
name=pool/image:rbd_cache=true:rbd_cache_max_dirty=0


Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html