Re: [ceph-users] Experiences with Ceph at the June'14 issue of USENIX ; login:

2014-06-04 Thread Filippos Giannakos
Hello Ian,

Thanks for your interest.

On Mon, Jun 02, 2014 at 06:37:48PM -0400, Ian Colle wrote:
 Thanks, Filippos! Very interesting reading.
 
 Are you comfortable enough yet to remove the RAID-1 from your architecture and
 get all that space back?

Actually, we are not ready to do that yet. There are three major things to
consider.

First, to be able to get rid of the RAID-1 setup, we need to increase the
replication level to at least 3x. So the space gain is not that great to begin
with.

Second, this operation can take about a month for our scale according to our
calculations and previous experience. During this period of increased I/O we
might get peaks of performance degradation. Plus, we currently do not have the
necessary hardware available to increase the replication level before we get rid
of the RAID setup.

Third, we have a few disk failures per month. The RAID-1 setup has allowed us to
seamlessly replace them without any hiccup or even a clue to the end user that
something went wrong. Surely we can rely on RADOS to avoid any data loss, but if
we currently rely on RADOS for recovery there might be some (minor) performance
degradation, especially for the VM I/O traffic.

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Experiences with Ceph at the June'14 issue of USENIX ;login:

2014-06-02 Thread Filippos Giannakos
Hello all,

As you may already know, we have been using Ceph for quite some time now to back
the ~okeanos [1] public cloud service, which is powered by Synnefo [2].

A few months ago we were kindly invited to write an article about our
experiences with Ceph for the USENIX ;login: magazine. The article is out in
this month's (June '14) issue and we are really happy to share it with you all:

https://www.usenix.org/publications/login/june14/giannakos

In the article we describe our storage needs, how we use Ceph and how it has
worked so far. I hope you enjoy reading it.

Kind Regards,
Filippos

[1] http://okeanos.grnet.gr
[2] http://www.synnefo.org

-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion error in librados

2014-03-28 Thread Filippos Giannakos
Hello,

We recently bumped again into the same assertion error.
Do you have any indications or update regarding the cause ?

On Tue, Feb 25, 2014 at 11:26:15AM -0800, Noah Watkins wrote:
 On Tue, Feb 25, 2014 at 9:51 AM, Josh Durgin josh.dur...@inktank.com wrote:
  That's a good idea. This particular assert in a Mutex is almost always
  a use-after-free of the Mutex or structure containing it though.
 
 I think that a use-after-free will also throw an EINVAL (assuming it
 isn't a pathalogical case) as pthread_mutex_lock checks an
 initialization magic variable. I think that particular mutex isn't
 initialized with flags that would cause any of the other possible
 return values.

Kind regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Assertion error in librados

2014-02-25 Thread Filippos Giannakos
Hello all,

We recently bumped into the following assertion error in librados on our
production service:


common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fa2c2ccf700 time 
2014-02-21 07:23:26.340791
common/Mutex.cc: 93: FAILED assert(r == 0)
 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: (Mutex::Lock(bool)+0x131) [0x7fa2c7707431]
 2: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t 
const, int, bool)+0x52) [0x7fa2c7863172]
 3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x23e) 
[0x7fa2c7863bfe]
 4: (Objecter::send_op(Objecter::Op*)+0x32c) [0x7fa2c76b317c]
 5: (Objecter::handle_osd_map(MOSDMap*)+0x365) [0x7fa2c76b7805]
 6: (librados::RadosClient::_dispatch(Message*)+0x7c) [0x7fa2c768c70c]
 7: (librados::RadosClient::ms_dispatch(Message*)+0x9b) [0x7fa2c768c82b]
 8: (DispatchQueue::entry()+0x4eb) [0x7fa2c7800d2b]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fa2c78666ad]
 10: (()+0x6b50) [0x7fa2c7203b50]
 11: (clone()+0x6d) [0x7fa2c6b570ed]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'


From what I can tell, there were some network problems on our RADOS cluster,
after which many of our librados clients failed with the above assertion error.

Do you have any ideas of what might went wrong ?

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Mon, Jan 27, 2014 at 01:10:23PM -0500, Kyle Bader wrote:
  Are there any tools we are not aware of for controlling, possibly pausing,
  deep-scrub and/or getting some progress about the procedure ?
  Also since I believe it would be a bad practice to disable deep-scrubbing 
  do you
  have any recommendations of how to work around (or even solve) this issue ?
 
 The periodicity of scrubs is controllable with these tunables:
 
 osd scrub max interval
 osd deep scrub interval
 
 You may also be interested in adjusting:
 
 osd scrub load threshold
 
 More information on the docs page:
 
 http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
 
 Hope that helps some!
 

Thanks Kyle but this does not solve the performance degradation when the deep
scrubbing is actually running. Plus, it can take several days to complete.

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Mon, Jan 27, 2014 at 10:45:48AM -0800, Sage Weil wrote:
 There is also 
 
  ceph osd set noscrub
 
 and then later
 
  ceph osd unset noscrub
 
 I forget whether this pauses an in-progress PG scrub or just makes it stop 
 when it gets to the next PG boundary.
 
 sage

I bumped into those settings but I couldn't find any documentation about them.
When I first tried them, they didn't do anything immediately, so I thought they
weren't the answer. After your mention, I tried them again, and after a while
the deep-scrubbing stopped. So I'm guessing they stop scrubbing on the next PG
boundary.

I see from this thread and others before, that some people think it is a spindle
issue. I'm not sure that it is just that. Replicating it to an idle cluster that
can do more than 250MiB/seconds and pausing for 4-5 seconds on a single request,
sounds like an issue by itself. Maybe there is too much locking or not enough
priority to the actual I/O ? Plus, that idea of throttling deep scrubbing based
on the iops sounds appealing.

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Tue, Jan 28, 2014 at 01:30:46AM -0500, Mike Dawson wrote:
 
 On 1/27/2014 1:45 PM, Sage Weil wrote:
 There is also
 
   ceph osd set noscrub
 
 and then later
 
   ceph osd unset noscrub
 
 In my experience scrub isn't nearly as much of a problem as
 deep-scrub. On a IOPS constrained cluster with writes approaching
 the available aggregate spindle performance minus replication
 penalty and possibly co-located osd journal penalty, scrub may run
 without any disruption. But deep-scrub tends to make iowait on the
 spindles get ugly.
 
 To disable/enable deep-scrub use:
 
 ceph osd set nodeep-scrub
 ceph osd unset nodeep-scrub


Yes, deep-scrubbing is much worse than scrubbing, but I think fully disabling it
is not a good option. But having days of degraded performance isn't either.
That's why I am bringing up the problem and seeking for a solid solution
regarding the matter.

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RADOS + deep scrubbing performance issues in production environment

2014-01-27 Thread Filippos Giannakos
Hello all,

We have been running RADOS in a large scale, production, public cloud
environment for a few months now and we are generally happy with it.

However, we experience performance problems when deep scrubbing is active.

We managed to reproduce them in our testing cluster running emperor, even while
it was idle.

We ran a simple rados bench test:

  rados -p bench bench -b 524288 120 write

and could easily reach 230MB/Sec consistently [1].

Then, we manually initiated a deep scrub and re-ran the test.

As you can see from the results [2], the performance dropped significantly and
even paused for a few seconds.

Now imagine that behavior in a loaded cluster with thousands of VMs on top of
it. The performance drop is unacceptable for our service.

Are there any tools we are not aware of for controlling, possibly pausing,
deep-scrub and/or getting some progress about the procedure ?
Also since I believe it would be a bad practice to disable deep-scrubbing do you
have any recommendations of how to work around (or even solve) this issue ?

[1] https://pithos.okeanos.grnet.gr/public/yzq5fHNkl5OnjgLOPlRTA3
[2] https://pithos.okeanos.grnet.gr/public/OjIGAQFBGwcsBNMHtA8ir5

Kind Regards,
-- 
Filippos
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2 v2] librados: Add RADOS locks to the C/C++ API

2013-06-04 Thread Filippos Giannakos
Hi team,

This set of patches export the RADOS advisory locking functionality to the C/C++
API.
They have been refactored to incorporate Josh suggestions from his review:
 * Always set tag to  for exclusive lock
 * Add duration argument to lock_{exclusive, shared}
 * Add lock flags to librados.h
 * Return -EINVAL on client parsing error
 * Remove unneeded std::string convertions
 * Typos/style fixes

Kind Regards,
Filippos

Filippos Giannakos (2):
  Add RADOS lock mechanism to the librados C/C++ API.
  Add RADOS API lock tests

 src/Makefile.am|   11 +-
 src/include/rados/librados.h   |  102 +-
 src/include/rados/librados.hpp |   29 
 src/librados/librados.cc   |  180 
 src/test/librados/lock.cc  |  301 
 5 files changed, 621 insertions(+), 2 deletions(-)
 create mode 100644 src/test/librados/lock.cc

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v2] Add RADOS lock mechanism to the librados C/C++ API.

2013-06-04 Thread Filippos Giannakos
Add functions to the librados C/C++ API, to take advantage and utilize the
advisory locking system offered by RADOS.

Signed-off-by: Filippos Giannakos philipg...@grnet.gr
---
 src/Makefile.am|5 +-
 src/include/rados/librados.h   |  102 ++-
 src/include/rados/librados.hpp |   29 +++
 src/librados/librados.cc   |  180 
 4 files changed, 314 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.am b/src/Makefile.am
index 5e17687..3b95662 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -357,7 +357,10 @@ librados_SOURCES = \
librados/RadosClient.cc \
librados/IoCtxImpl.cc \
osdc/Objecter.cc \
-   osdc/Striper.cc
+   osdc/Striper.cc \
+   cls/lock/cls_lock_client.cc \
+   cls/lock/cls_lock_types.cc \
+   cls/lock/cls_lock_ops.cc
 librados_la_SOURCES = ${librados_SOURCES}
 librados_la_CFLAGS = ${CRYPTO_CFLAGS} ${AM_CFLAGS}
 librados_la_CXXFLAGS = ${AM_CXXFLAGS}
diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index a575042..cf38e01 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -24,7 +24,7 @@ extern C {
 #endif
 
 #define LIBRADOS_VER_MAJOR 0
-#define LIBRADOS_VER_MINOR 52
+#define LIBRADOS_VER_MINOR 53
 #define LIBRADOS_VER_EXTRA 0
 
 #define LIBRADOS_VERSION(maj, min, extra) ((maj  16) + (min  8) + extra)
@@ -33,6 +33,11 @@ extern C {
 
 #define LIBRADOS_SUPPORTS_WATCH 1
 
+/* RADOS lock flags
+ * They are also defined in cls_lock_types.h. Keep them in sync!
+ */
+#define LIBRADOS_LOCK_FLAG_RENEW 0x1
+
 /**
  * @defgroup librados_h_xattr_comp xattr comparison operations
  * @note BUG: there's no way to use these in the C api
@@ -1571,6 +1576,101 @@ int rados_notify(rados_ioctx_t io, const char *o, 
uint64_t ver, const char *buf,
 
 /** @} Watch/Notify */
 
+/**
+ * Take an exclusive lock on an object.
+ *
+ * @param io the context to operate in
+ * @param oid the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for this instance of the lock
+ * @param desc user-defined lock description
+ * @param duration the duration of the lock. Set to NULL for infinite duration.
+ * @param flags lock flags
+ * @returns 0 on success, negative error code on failure
+ * @returns -EBUSY if the lock is already held by another (client, cookie) pair
+ * @returns -EEXIST if the lock is already held by the same (client, cookie) 
pair
+ */
+int rados_lock_exclusive(rados_ioctx_t io, const char * o, const char * name,
+  const char * cookie, const char * desc, struct timeval * 
duration,
+  uint8_t flags);
+
+/**
+ * Take a shared lock on an object.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for this instance of the lock
+ * @param tag The tag of the lock
+ * @param desc user-defined lock description
+ * @param duration the duration of the lock. Set to NULL for infinite duration.
+ * @param flags lock flags
+ * @returns 0 on success, negative error code on failure
+ * @returns -EBUSY if the lock is already held by another (client, cookie) pair
+ * @returns -EEXIST if the lock is already held by the same (client, cookie) 
pair
+ */
+int rados_lock_shared(rados_ioctx_t io, const char * o, const char * name,
+  const char * cookie, const char * tag, const char * desc,
+  struct timeval * duration, uint8_t flags);
+
+/**
+ * Release a shared or exclusive lock on an object.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for the instance of the lock
+ * @returns 0 on success, negative error code on failure
+ * @returns -ENOENT if the lock is not held by the specified (client, cookie) 
pair
+ */
+int rados_unlock(rados_ioctx_t io, const char *o, const char *name,
+const char *cookie);
+
+/**
+ * List clients that have locked the named object lock and information about
+ * the lock.
+ *
+ * The number of bytes required in each buffer is put in the
+ * corresponding size out parameter. If any of the provided buffers
+ * are too short, -ERANGE is returned after these sizes are filled in.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param exclusive where to store whether the lock is exclusive (1) or shared 
(0)
+ * @param tag where to store the tag associated with the object lock
+ * @param tag_len number of bytes in tag buffer
+ * @param clients buffer in which locker clients are stored, separated by '\0'
+ * @param clients_len number of bytes in the clients buffer
+ * @param cookies buffer in which locker cookies are stored, separated by '\0'
+ * @param cookies_len number of bytes in the cookies buffer
+ * @param addrs buffer in which

[PATCH 2/2 v2] Add RADOS API lock tests

2013-06-04 Thread Filippos Giannakos
Add tests for the advisory locking API calls.

Signed-off-by: Filippos Giannakos philipg...@grnet.gr
---
 src/Makefile.am   |6 +
 src/test/librados/lock.cc |  301 +
 2 files changed, 307 insertions(+)
 create mode 100644 src/test/librados/lock.cc

diff --git a/src/Makefile.am b/src/Makefile.am
index 3b95662..ccabf19 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -1015,6 +1015,12 @@ ceph_test_rados_api_misc_LDADD =  librados.la 
${UNITTEST_STATIC_LDADD}
 ceph_test_rados_api_misc_CXXFLAGS = ${AM_CXXFLAGS} ${UNITTEST_CXXFLAGS}
 bin_DEBUGPROGRAMS += ceph_test_rados_api_misc
 
+ceph_test_rados_api_lock_SOURCES = test/librados/lock.cc test/librados/test.cc
+ceph_test_rados_api_lock_LDFLAGS = ${AM_LDFLAGS}
+ceph_test_rados_api_lock_LDADD =  librados.la ${UNITTEST_STATIC_LDADD}
+ceph_test_rados_api_lock_CXXFLAGS = ${AM_CXXFLAGS} ${UNITTEST_CXXFLAGS}
+bin_DEBUGPROGRAMS += ceph_test_rados_api_lock
+
 ceph_test_libcephfs_SOURCES = test/libcephfs/test.cc 
test/libcephfs/readdir_r_cb.cc test/libcephfs/caps.cc
 ceph_test_libcephfs_LDFLAGS = $(PTHREAD_CFLAGS) ${AM_LDFLAGS}
 ceph_test_libcephfs_LDADD =  ${UNITTEST_STATIC_LDADD} libcephfs.la
diff --git a/src/test/librados/lock.cc b/src/test/librados/lock.cc
new file mode 100644
index 000..1d33d46
--- /dev/null
+++ b/src/test/librados/lock.cc
@@ -0,0 +1,301 @@
+#include include/rados/librados.h
+#include include/rados/librados.hpp
+#include test/librados/test.h
+#include cls/lock/cls_lock_client.h
+
+#include algorithm
+#include errno.h
+#include gtest/gtest.h
+#include sys/time.h
+
+using namespace librados;
+
+TEST(LibRadosLock, LockExclusive) {
+  rados_t cluster;
+  rados_ioctx_t ioctx;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool(pool_name, cluster));
+  rados_ioctx_create(cluster, pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, rados_lock_exclusive(ioctx, foo, TestLock, Cookie, , 
NULL,  0));
+  ASSERT_EQ(-EEXIST, rados_lock_exclusive(ioctx, foo, TestLock, Cookie, 
, NULL, 0));
+  rados_ioctx_destroy(ioctx);
+  ASSERT_EQ(0, destroy_one_pool(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockExclusivePP) {
+  Rados cluster;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool_pp(pool_name, cluster));
+  IoCtx ioctx;
+  cluster.ioctx_create(pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, ioctx.lock_exclusive(foo, TestLock, Cookie, , NULL,  
0));
+  ASSERT_EQ(-EEXIST, ioctx.lock_exclusive(foo, TestLock, Cookie, , 
NULL, 0));
+  ioctx.close();
+  ASSERT_EQ(0, destroy_one_pool_pp(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockShared) {
+  rados_t cluster;
+  rados_ioctx_t ioctx;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool(pool_name, cluster));
+  rados_ioctx_create(cluster, pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, rados_lock_shared(ioctx, foo, TestLock, Cookie, Tag, 
, NULL, 0));
+  ASSERT_EQ(-EEXIST, rados_lock_shared(ioctx, foo, TestLock, Cookie, 
Tag, , NULL, 0));
+  rados_ioctx_destroy(ioctx);
+  ASSERT_EQ(0, destroy_one_pool(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockSharedPP) {
+  Rados cluster;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool_pp(pool_name, cluster));
+  IoCtx ioctx;
+  cluster.ioctx_create(pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, ioctx.lock_shared(foo, TestLock, Cookie, Tag, , NULL, 
0));
+  ASSERT_EQ(-EEXIST, ioctx.lock_shared(foo, TestLock, Cookie, Tag, , 
NULL, 0));
+  ioctx.close();
+  ASSERT_EQ(0, destroy_one_pool_pp(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockExclusiveDur) {
+  struct timeval tv;
+  tv.tv_sec = 1;
+  tv.tv_usec = 0;
+  rados_t cluster;
+  rados_ioctx_t ioctx;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool(pool_name, cluster));
+  rados_ioctx_create(cluster, pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, rados_lock_exclusive(ioctx, foo, TestLock, Cookie, , 
tv,  0));
+  sleep(1);
+  ASSERT_EQ(0, rados_lock_exclusive(ioctx, foo, TestLock, Cookie, , 
NULL, 0));
+  rados_ioctx_destroy(ioctx);
+  ASSERT_EQ(0, destroy_one_pool(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockExclusiveDurPP) {
+  struct timeval tv;
+  tv.tv_sec = 1;
+  tv.tv_usec = 0;
+  Rados cluster;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool_pp(pool_name, cluster));
+  IoCtx ioctx;
+  cluster.ioctx_create(pool_name.c_str(), ioctx);
+  ASSERT_EQ(0, ioctx.lock_exclusive(foo, TestLock, Cookie, , tv,  0));
+  sleep(1);
+  ASSERT_EQ(0, ioctx.lock_exclusive(foo, TestLock, Cookie, , NULL, 0));
+  ioctx.close();
+  ASSERT_EQ(0, destroy_one_pool_pp(pool_name, cluster));
+}
+
+TEST(LibRadosLock, LockSharedDur) {
+  struct timeval tv;
+  tv.tv_sec = 1;
+  tv.tv_usec = 0;
+  rados_t cluster;
+  rados_ioctx_t ioctx;
+  std::string pool_name = get_temp_pool_name();
+  ASSERT_EQ(, create_one_pool(pool_name, cluster));
+  rados_ioctx_create(cluster, pool_name.c_str(), ioctx);
+  ASSERT_EQ(0

Re: [PATCH 0/2] librados: Add RADOS locks to the C/C++ API

2013-06-03 Thread Filippos Giannakos

Hi Josh,

On 05/31/2013 10:44 PM, Josh Durgin wrote:

On 05/30/2013 06:02 AM, Filippos Giannakos wrote:

The following patches export the RADOS advisory locks functionality to
the C/C++
librados API. The extra API calls added are inspired by the relevant
functions
of librbd.


This looks good to me overall. I wonder if we should create a new
library in the future for these kinds of things that are built on top
of librados. Other generally useful class client operations could go
there, as well as generally useful things built on top of librados,
like methods for striping over many objects.


Thanks for the review. I will incorporate all your suggestions in a new
patch, which I will submit shortly.
As for the new library you mention, it is a good idea, but for now I
think that the basic RADOS locking functionality should be at the core
librados API.

King Regards,
--
Filippos.
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Add RADOS lock mechanism to the librados C/C++ API.

2013-05-30 Thread Filippos Giannakos
Add functions to the librados C/C++ API, to take advantage and utilize the
advisory locking system offered by RADOS.

Signed-off-by: Filippos Giannakos philipg...@grnet.gr
---
 src/Makefile.am|5 +-
 src/include/rados/librados.h   |   95 +++-
 src/include/rados/librados.hpp |   27 ++
 src/librados/librados.cc   |  187 
 4 files changed, 312 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.am b/src/Makefile.am
index 5e17687..3b95662 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -357,7 +357,10 @@ librados_SOURCES = \
librados/RadosClient.cc \
librados/IoCtxImpl.cc \
osdc/Objecter.cc \
-   osdc/Striper.cc
+   osdc/Striper.cc \
+   cls/lock/cls_lock_client.cc \
+   cls/lock/cls_lock_types.cc \
+   cls/lock/cls_lock_ops.cc
 librados_la_SOURCES = ${librados_SOURCES}
 librados_la_CFLAGS = ${CRYPTO_CFLAGS} ${AM_CFLAGS}
 librados_la_CXXFLAGS = ${AM_CXXFLAGS}
diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index a575042..ae8db4a 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -24,7 +24,7 @@ extern C {
 #endif
 
 #define LIBRADOS_VER_MAJOR 0
-#define LIBRADOS_VER_MINOR 52
+#define LIBRADOS_VER_MINOR 53
 #define LIBRADOS_VER_EXTRA 0
 
 #define LIBRADOS_VERSION(maj, min, extra) ((maj  16) + (min  8) + extra)
@@ -1571,6 +1571,99 @@ int rados_notify(rados_ioctx_t io, const char *o, 
uint64_t ver, const char *buf,
 
 /** @} Watch/Notify */
 
+/**
+ * Take an exclusive lock on an object.
+ *
+ * @param io the context to operate in
+ * @param oid the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for this instance of the lock
+ * @param tag The tag of the lock
+ * @param desc user-defined lock description
+ * #param flags lock flags
+ * @returns 0 on success, negative error code on failure
+ * @returns -EBUSY if the lock is already held by another (client, cookie) pair
+ * @returns -EEXIST if the lock is already held by the same (client, cookie) 
pair
+ */
+int rados_lock_exclusive(rados_ioctx_t io, const char * o, const char * name,
+  const char * cookie, const char * tag, const char * desc,
+  uint8_t flags);
+
+/**
+ * Take a shared lock on an object.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for this instance of the lock
+ * @param tag The tag of the lock
+ * @param desc user-defined lock description
+ * #param flags lock flags
+ * @returns 0 on success, negative error code on failure
+ * @returns -EBUSY if the lock is already held by another (client, cookie) pair
+ * @returns -EEXIST if the lock is already held by the same (client, cookie) 
pair
+ */
+int rados_lock_shared(rados_ioctx_t io, const char * o, const char * name,
+  const char * cookie, const char * tag, const char * desc,
+  uint8_t flags);
+
+/**
+ * Release a shared or exclusive lock on an object.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param cookie user-defined identifier for the instance of the lock
+ * @returns 0 on success, negative error code on failure
+ * @returns -ENOENT if the lock is not held by the specified (client, cookie) 
pair
+ */
+int rados_unlock(rados_ioctx_t io, const char *o, const char *name,
+const char *cookie);
+
+/**
+ * List clients that have locked the named object lokc and information about
+ * the lock.
+ *
+ * The number of bytes required in each buffer is put in the
+ * corresponding size out parameter. If any of the provided buffers
+ * are too short, -ERANGE is returned after these sizes are filled in.
+ *
+ * @param io the context to operate in
+ * @param o the name of the object
+ * @param name the name of the lock
+ * @param exclusive where to store whether the lock is exclusive (1) or shared 
(0)
+ * @param tag where to store the tag associated with the object lock
+ * @param tag_len number of bytes in tag buffer
+ * @param clients buffer in which locker clients are stored, separated by '\0'
+ * @param clients_len number of bytes in the clients buffer
+ * @param cookies buffer in which locker cookies are stored, separated by '\0'
+ * @param cookies_len number of bytes in the cookies buffer
+ * @param addrs buffer in which locker addresses are stored, separated by '\0'
+ * @param addrs_len number of bytes in the clients buffer
+ * @returns number of lockers on success, negative error code on failure
+ * @returns -ERANGE if any of the buffers are too short
+ */
+ssize_t rados_list_lockers(rados_ioctx_t io, const char *o,
+  const char *name, int *exclusive,
+  char *tag, size_t *tag_len,
+  char *clients, size_t *clients_len

[PATCH 0/2] librados: Add RADOS locks to the C/C++ API

2013-05-30 Thread Filippos Giannakos
Hi Team,

The following patches export the RADOS advisory locks functionality to the C/C++
librados API. The extra API calls added are inspired by the relevant functions
of librbd.

Kind Regards,
Filippos

Filippos Giannakos (2):
  Add RADOS lock mechanism to the librados C/C++ API.
  Add RADOS API lock tests

 src/Makefile.am|   11 +-
 src/include/rados/librados.h   |   95 +++-
 src/include/rados/librados.hpp |   27 +
 src/librados/librados.cc   |  187 +++
 src/test/librados/lock.cc  |  236 
 5 files changed, 554 insertions(+), 2 deletions(-)
 create mode 100644 src/test/librados/lock.cc

--
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add X-Python-Version =2.6 to debian control file.

2013-02-27 Thread Filippos Giannakos
python-ceph complains when installed to debian squeeze about the 'with'
statement. Apparently installation tries to install the python-ceph package for
python 2.5, which does not support the 'with' statement natively.

Signed-off-by: Filippos Giannakos philipg...@grnet.gr
---
 debian/control |1 +
 1 file changed, 1 insertion(+)

diff --git a/debian/control b/debian/control
index eefb4ee..fbf517b 100644
--- a/debian/control
+++ b/debian/control
@@ -8,6 +8,7 @@ Maintainer: Laszlo Boszormenyi (GCS) g...@debian.hu
 Uploaders: Sage Weil s...@newdream.net
 Build-Depends: debhelper (= 6.0.7~), autotools-dev, autoconf, automake, 
libfuse-dev, libboost-dev (= 1.34), libboost-thread-dev, libedit-dev, 
libnss3-dev, libtool, libexpat1-dev, libfcgi-dev, libatomic-ops-dev, 
libgoogle-perftools-dev [i386 amd64], pkg-config, libcurl4-gnutls-dev, 
libkeyutils-dev, uuid-dev, libaio-dev, python (= 2.6.6-3~), libxml2-dev, 
javahelper, default-jdk, junit4, libboost-program-options-dev
 Standards-Version: 3.9.3
+X-Python-Version: =2.6
 
 Package: ceph
 Architecture: linux-any
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Librados aio stat

2013-01-07 Thread Filippos Giannakos

Hi Josh,

On 01/05/2013 02:08 AM, Josh Durgin wrote:

On 01/04/2013 05:01 AM, Filippos Giannakos wrote:

Hi Team,

Is there any progress or any comments regarding the librados aio stat
patch ?


They look good to me. I put them in the wip-librados-aio-stat branch.
Can we add your signed-off-by to them?

Thanks,
Josh


Sorry for my late response. You can go ahead with the signed-off.

Best Regards

--
Filippos.
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Librados aio stat

2013-01-04 Thread Filippos Giannakos

Hi Team,

Is there any progress or any comments regarding the librados aio stat
patch ?

Best regards

On 12/20/2012 10:05 PM, Filippos Giannakos wrote:

Hi Team,

Here is the patch with the changes, plus the tests you requested.

Best regards,
Filippos



--
Filippos.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Librados aio stat

2012-12-20 Thread Filippos Giannakos
Hi Team,

Here is the patch with the changes, plus the tests you requested.

Best regards,
Filippos


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Implement librados aio_stat

2012-12-20 Thread Filippos Giannakos
Implement aio stat and also export this functionality to the C API.
---
 src/include/rados/librados.h   |   16 ++-
 src/include/rados/librados.hpp |4 +++-
 src/librados/IoCtxImpl.cc  |   42 
 src/librados/IoCtxImpl.h   |9 +
 src/librados/librados.cc   |   18 +
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index 44d6f71..b2df767 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -23,7 +23,7 @@ extern C {
 #endif
 
 #define LIBRADOS_VER_MAJOR 0
-#define LIBRADOS_VER_MINOR 48
+#define LIBRADOS_VER_MINOR 49
 #define LIBRADOS_VER_EXTRA 0
 
 #define LIBRADOS_VERSION(maj, min, extra) ((maj  16) + (min  8) + extra)
@@ -1444,6 +1444,20 @@ int rados_aio_read(rados_ioctx_t io, const char *oid,
  */
 int rados_aio_flush(rados_ioctx_t io);
 
+
+/**
+ * Asynchronously get object stats (size/mtime)
+ *
+ * @param io ioctx
+ * @param o object name
+ * @param psize where to store object size
+ * @param pmtime where to store modification time
+ * @returns 0 on success, negative error code on failure
+ */
+int rados_aio_stat(rados_ioctx_t io, const char *o, 
+  rados_completion_t completion,
+  uint64_t *psize, time_t *pmtime);
+
 /** @} Asynchronous I/O */
 
 /**
diff --git a/src/include/rados/librados.hpp b/src/include/rados/librados.hpp
index 3df4c86..b30b18f 100644
--- a/src/include/rados/librados.hpp
+++ b/src/include/rados/librados.hpp
@@ -478,9 +478,11 @@ namespace librados
  * other than CEPH_NOSNAP
  */
 int aio_remove(const std::string oid, AioCompletion *c);
-
+
 int aio_flush();
 
+int aio_stat(const std::string oid, AioCompletion *c, uint64_t *psize, 
time_t *pmtime);
+
 int aio_exec(const std::string oid, AioCompletion *c, const char *cls, 
const char *method,
 bufferlist inbl, bufferlist *outbl);
 
diff --git a/src/librados/IoCtxImpl.cc b/src/librados/IoCtxImpl.cc
index 934a101..808a30e 100644
--- a/src/librados/IoCtxImpl.cc
+++ b/src/librados/IoCtxImpl.cc
@@ -851,6 +851,21 @@ int librados::IoCtxImpl::aio_remove(const object_t oid, 
AioCompletionImpl *c)
   return 0;
 }
 
+
+int librados::IoCtxImpl::aio_stat(const object_t oid, AioCompletionImpl *c,
+ uint64_t *psize, time_t *pmtime)
+{
+  c-io = this;
+  C_aio_stat_Ack *onack = new C_aio_stat_Ack(c, pmtime);
+
+  Mutex::Locker l(*lock);
+  objecter-stat(oid, oloc,
+snap_seq, psize, onack-mtime, 0,
+onack, c-objver);
+
+  return 0;
+}
+
 int librados::IoCtxImpl::remove(const object_t oid)
 {
   utime_t ut = ceph_clock_now(client-cct);
@@ -1564,6 +1579,33 @@ void librados::IoCtxImpl::C_aio_Ack::finish(int r)
   c-put_unlock();
 }
 
+/ C_aio_stat_Ack 
+
+librados::IoCtxImpl::C_aio_stat_Ack::C_aio_stat_Ack(AioCompletionImpl *_c,
+   time_t *pm)
+   : c(_c), pmtime(pm)
+{
+  c-get();
+}
+
+void librados::IoCtxImpl::C_aio_stat_Ack::finish(int r)
+{
+  c-lock.Lock();
+  c-rval = r;
+  c-ack = true;
+  c-cond.Signal();
+
+  if (r = 0  pmtime) {
+*pmtime = mtime.sec();
+  }
+
+  if (c-callback_complete) {
+c-io-client-finisher.queue(new C_AioComplete(c));
+  }
+
+  c-put_unlock();
+}
+
 /// C_aio_sparse_read_Ack //
 
 
librados::IoCtxImpl::C_aio_sparse_read_Ack::C_aio_sparse_read_Ack(AioCompletionImpl
 *_c,
diff --git a/src/librados/IoCtxImpl.h b/src/librados/IoCtxImpl.h
index feea0e8..55b07ee 100644
--- a/src/librados/IoCtxImpl.h
+++ b/src/librados/IoCtxImpl.h
@@ -144,6 +144,14 @@ struct librados::IoCtxImpl {
 C_aio_Ack(AioCompletionImpl *_c);
 void finish(int r);
   };
+  
+  struct C_aio_stat_Ack : public Context {
+librados::AioCompletionImpl *c;
+time_t *pmtime;
+utime_t mtime;
+C_aio_stat_Ack(AioCompletionImpl *_c, time_t *pm);
+void finish(int r);
+  };
 
   struct C_aio_sparse_read_Ack : public Context {
 AioCompletionImpl *c;
@@ -177,6 +185,7 @@ struct librados::IoCtxImpl {
   int aio_remove(const object_t oid, AioCompletionImpl *c);
   int aio_exec(const object_t oid, AioCompletionImpl *c, const char *cls,
   const char *method, bufferlist inbl, bufferlist *outbl);
+  int aio_stat(const object_t oid, AioCompletionImpl *c, uint64_t *psize, 
time_t *pmtime);
 
   int pool_change_auid(unsigned long long auid);
   int pool_change_auid_async(unsigned long long auid, PoolAsyncCompletionImpl 
*c);
diff --git a/src/librados/librados.cc b/src/librados/librados.cc
index c31b82a..c3756cd 100644
--- a/src/librados/librados.cc
+++ b/src/librados/librados.cc
@@ -978,6 +978,14 @@ int librados::IoCtx::aio_flush()
   return 0;
 }
 
+int librados::IoCtx::aio_stat(const std::string oid, librados::AioCompletion 
*c,
+   

[PATCH 2/2] Add librados aio stat tests

2012-12-20 Thread Filippos Giannakos
Implement simple write-stat test, and a write-stat-remove-stat test cycle.
---
 src/include/rados/librados.h |2 +-
 src/test/librados/aio.cc |  176 ++
 2 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index b2df767..d40d9b5 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -1454,7 +1454,7 @@ int rados_aio_flush(rados_ioctx_t io);
  * @param pmtime where to store modification time
  * @returns 0 on success, negative error code on failure
  */
-int rados_aio_stat(rados_ioctx_t io, const char *o, 
+int rados_aio_stat(rados_ioctx_t io, const char *o,
   rados_completion_t completion,
   uint64_t *psize, time_t *pmtime);
 
diff --git a/src/test/librados/aio.cc b/src/test/librados/aio.cc
index 4983fee..33b5942 100644
--- a/src/test/librados/aio.cc
+++ b/src/test/librados/aio.cc
@@ -762,6 +762,182 @@ TEST(LibRadosAio, RoundTripWriteFullPP) {
   delete my_completion3;
 }
 
+
+TEST(LibRadosAio, SimpleStat) {
+  AioTestData test_data;
+  rados_completion_t my_completion;
+  ASSERT_EQ(, test_data.init());
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion));
+  char buf[128];
+  memset(buf, 0xcc, sizeof(buf));
+  ASSERT_EQ(0, rados_aio_write(test_data.m_ioctx, foo,
+  my_completion, buf, sizeof(buf), 0));
+  {
+TestAlarm alarm;
+sem_wait(test_data.m_sem);
+sem_wait(test_data.m_sem);
+  }
+  uint64_t psize;
+  time_t pmtime;
+  rados_completion_t my_completion2;
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion2));
+  ASSERT_EQ(0, rados_aio_stat(test_data.m_ioctx, foo,
+ my_completion2, psize, pmtime));
+  {
+TestAlarm alarm;
+ASSERT_EQ(0, rados_aio_wait_for_complete(my_completion2));
+  }
+  ASSERT_EQ(sizeof(buf), psize);
+  rados_aio_release(my_completion);
+  rados_aio_release(my_completion2);
+}
+
+TEST(LibRadosAio, SimpleStatPP) {
+  AioTestDataPP test_data;
+  ASSERT_EQ(, test_data.init());
+  AioCompletion *my_completion = test_data.m_cluster.aio_create_completion(
+ (void*)test_data, set_completion_complete, set_completion_safe);
+  AioCompletion *my_completion_null = NULL;
+  ASSERT_NE(my_completion, my_completion_null);
+  char buf[128];
+  memset(buf, 0xcc, sizeof(buf));
+  bufferlist bl1;
+  bl1.append(buf, sizeof(buf));
+  ASSERT_EQ(0, test_data.m_ioctx.aio_write(foo, my_completion,
+  bl1, sizeof(buf), 0));
+  {
+TestAlarm alarm;
+sem_wait(test_data.m_sem);
+sem_wait(test_data.m_sem);
+  }
+  uint64_t psize;
+  time_t pmtime;
+  AioCompletion *my_completion2 = test_data.m_cluster.aio_create_completion(
+ (void*)test_data, set_completion_complete, set_completion_safe);
+  ASSERT_NE(my_completion2, my_completion_null);
+  ASSERT_EQ(0, test_data.m_ioctx.aio_stat(foo, my_completion2,
+   psize, pmtime));
+  {
+TestAlarm alarm;
+ASSERT_EQ(0, my_completion2-wait_for_complete());
+  }
+  ASSERT_EQ(sizeof(buf), psize);
+  delete my_completion;
+  delete my_completion2;
+}
+
+TEST(LibRadosAio, StatRemove) {
+  AioTestData test_data;
+  rados_completion_t my_completion;
+  ASSERT_EQ(, test_data.init());
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion));
+  char buf[128];
+  memset(buf, 0xcc, sizeof(buf));
+  ASSERT_EQ(0, rados_aio_write(test_data.m_ioctx, foo,
+  my_completion, buf, sizeof(buf), 0));
+  {
+TestAlarm alarm;
+sem_wait(test_data.m_sem);
+sem_wait(test_data.m_sem);
+  }
+  uint64_t psize;
+  time_t pmtime;
+  rados_completion_t my_completion2;
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion2));
+  ASSERT_EQ(0, rados_aio_stat(test_data.m_ioctx, foo,
+ my_completion2, psize, pmtime));
+  {
+TestAlarm alarm;
+ASSERT_EQ(0, rados_aio_wait_for_complete(my_completion2));
+  }
+  ASSERT_EQ(sizeof(buf), psize);
+  rados_completion_t my_completion3;
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion3));
+  ASSERT_EQ(0, rados_aio_remove(test_data.m_ioctx, foo, my_completion3));
+  {
+TestAlarm alarm;
+ASSERT_EQ(0, rados_aio_wait_for_complete(my_completion3));
+  }
+  uint64_t psize2;
+  time_t pmtime2;
+  rados_completion_t my_completion4;
+  ASSERT_EQ(0, rados_aio_create_completion((void*)test_data,
+ set_completion_complete, set_completion_safe, my_completion4));
+  ASSERT_EQ(0, rados_aio_stat(test_data.m_ioctx, foo,
+

Re: [PATCH] implement librados aio_stat

2012-12-19 Thread Filippos Giannakos

OK. About the LIBRADOS_VER_MINOR, do you want me to bump it and submit a
new patch?

Best regards,
Filippos

On 12/15/2012 09:49 AM, Yehuda Sadeh wrote:

Went through it briefly, looks fine, though I'd like to go over it
some more before picking this up. Note that LIBRADOS_VER_MINOR needs
to be bumped up too.

Thanks,
Yehuda

On Fri, Dec 14, 2012 at 3:18 AM, Filippos Giannakosphilipg...@grnet.gr  wrote:

---
  src/include/rados/librados.h   |   14 ++
  src/include/rados/librados.hpp |   15 +-
  src/librados/IoCtxImpl.cc  |   42 
  src/librados/IoCtxImpl.h   |9 +
  src/librados/librados.cc   |   10 ++
  5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index 44d6f71..7f4b5c0 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -1444,6 +1444,20 @@ int rados_aio_read(rados_ioctx_t io, const char *oid,
   */
  int rados_aio_flush(rados_ioctx_t io);

+
+/**
+ * Asynchronously get object stats (size/mtime)
+ *
+ * @param io ioctx
+ * @param o object name
+ * @param psize where to store object size
+ * @param pmtime where to store modification time
+ * @returns 0 on success, negative error code on failure
+ */
+int rados_aio_stat(rados_ioctx_t io, const char *o,
+  rados_completion_t completion,
+  uint64_t *psize, time_t *pmtime);
+
  /** @} Asynchronous I/O */

  /**
diff --git a/src/include/rados/librados.hpp b/src/include/rados/librados.hpp
index e50acdb..96bfc15 100644
--- a/src/include/rados/librados.hpp
+++ b/src/include/rados/librados.hpp
@@ -473,9 +473,22 @@ namespace librados
   * other than CEPH_NOSNAP
   */
  int aio_remove(const std::string  oid, AioCompletion *c);
-
+
  int aio_flush();

+/**
+ * Asynchronously get object stats (size/mtime)
+ *
+ * @param io ioctx
+ * @param o object name
+ * @param psize where to store object size
+ * @param pmtime where to store modification time
+ * @returns 0 on success, negative error code on failure
+ */
+int rados_aio_stat(rados_ioctx_t io, const char *o,
+  rados_completion_t completion,
+  uint64_t *psize, time_t *pmtime);
+
  int aio_exec(const std::string  oid, AioCompletion *c, const char *cls, 
const char *method,
  bufferlist  inbl, bufferlist *outbl);

diff --git a/src/librados/IoCtxImpl.cc b/src/librados/IoCtxImpl.cc
index 01b4a94..50aab1e 100644
--- a/src/librados/IoCtxImpl.cc
+++ b/src/librados/IoCtxImpl.cc
@@ -851,6 +851,21 @@ int librados::IoCtxImpl::aio_remove(const object_toid, 
AioCompletionImpl *c)
return 0;
  }

+
+int librados::IoCtxImpl::aio_stat(const object_t  oid, AioCompletionImpl *c,
+ uint64_t *psize, time_t *pmtime)
+{
+  c-io = this;
+  C_aio_stat_Ack *onack = new C_aio_stat_Ack(c, pmtime);
+
+  Mutex::Locker l(*lock);
+  objecter-stat(oid, oloc,
+snap_seq, psize,onack-mtime, 0,
+onack,c-objver);
+
+  return 0;
+}
+
  int librados::IoCtxImpl::remove(const object_t  oid)
  {
utime_t ut = ceph_clock_now(client-cct);
@@ -1562,6 +1577,33 @@ void librados::IoCtxImpl::C_aio_Ack::finish(int r)
c-put_unlock();
  }

+/ C_aio_stat_Ack 
+
+librados::IoCtxImpl::C_aio_stat_Ack::C_aio_stat_Ack(AioCompletionImpl *_c,
+   time_t *pm)
+   : c(_c), pmtime(pm)
+{
+  c-get();
+}
+
+void librados::IoCtxImpl::C_aio_stat_Ack::finish(int r)
+{
+  c-lock.Lock();
+  c-rval = r;
+  c-ack = true;
+  c-cond.Signal();
+
+  if (r= 0  pmtime) {
+*pmtime = mtime.sec();
+  }
+
+  if (c-callback_complete) {
+c-io-client-finisher.queue(new C_AioComplete(c));
+  }
+
+  c-put_unlock();
+}
+
  /// C_aio_sparse_read_Ack //

  
librados::IoCtxImpl::C_aio_sparse_read_Ack::C_aio_sparse_read_Ack(AioCompletionImpl
 *_c,
diff --git a/src/librados/IoCtxImpl.h b/src/librados/IoCtxImpl.h
index feea0e8..55b07ee 100644
--- a/src/librados/IoCtxImpl.h
+++ b/src/librados/IoCtxImpl.h
@@ -144,6 +144,14 @@ struct librados::IoCtxImpl {
  C_aio_Ack(AioCompletionImpl *_c);
  void finish(int r);
};
+
+  struct C_aio_stat_Ack : public Context {
+librados::AioCompletionImpl *c;
+time_t *pmtime;
+utime_t mtime;
+C_aio_stat_Ack(AioCompletionImpl *_c, time_t *pm);
+void finish(int r);
+  };

struct C_aio_sparse_read_Ack : public Context {
  AioCompletionImpl *c;
@@ -177,6 +185,7 @@ struct librados::IoCtxImpl {
int aio_remove(const object_toid, AioCompletionImpl *c);
int aio_exec(const object_t  oid, AioCompletionImpl *c, const char *cls,
const char *method, bufferlist  inbl, bufferlist *outbl);
+  int aio_stat(const object_t  oid, AioCompletionImpl *c, uint64_t