Re: [PATCH v4] ceph: add acl for cephfs

2013-11-08 Thread Li Wang

Hi,
  It seems to me there are three issues, you can take a look below if 
they are really there,


On 11/08/2013 01:23 PM, Guangliang Zhao wrote:

v4: check the validity before set/get_cached_acl()

v3: handle the attr change in ceph_set_acl()

v2: remove some redundant code in ceph_setattr()

Signed-off-by: Guangliang Zhao lucienc...@gmail.com
---
  fs/ceph/Kconfig  |   13 +++
  fs/ceph/Makefile |1 +
  fs/ceph/acl.c|  326 ++
  fs/ceph/dir.c|5 +
  fs/ceph/inode.c  |   14 +++
  fs/ceph/super.c  |4 +
  fs/ceph/super.h  |   35 ++
  fs/ceph/xattr.c  |   60 --
  8 files changed, 446 insertions(+), 12 deletions(-)
  create mode 100644 fs/ceph/acl.c

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index ac9a2ef..264e9bf 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -25,3 +25,16 @@ config CEPH_FSCACHE
  caching support for Ceph clients using FS-Cache

  endif
+
+config CEPH_FS_POSIX_ACL
+   bool Ceph POSIX Access Control Lists
+   depends on CEPH_FS
+   select FS_POSIX_ACL
+   help
+ POSIX Access Control Lists (ACLs) support permissions for users and
+ groups beyond the owner/group/world scheme.
+
+ To learn more about Access Control Lists, visit the POSIX ACLs for
+ Linux website http://acl.bestbits.at/.
+
+ If you don't know what Access Control Lists are, say N
diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 32e3010..85a4230 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -10,3 +10,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o 
\
debugfs.o

  ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
+ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
new file mode 100644
index 000..a474626
--- /dev/null
+++ b/fs/ceph/acl.c
@@ -0,0 +1,326 @@
+/*
+ * linux/fs/ceph/acl.c
+ *
+ * Copyright (C) 2013 Guangliang Zhao, lucienc...@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/ceph/ceph_debug.h
+#include linux/fs.h
+#include linux/string.h
+#include linux/xattr.h
+#include linux/posix_acl_xattr.h
+#include linux/posix_acl.h
+#include linux/sched.h
+#include linux/slab.h
+
+#include super.h
+
+static inline void ceph_set_cached_acl(struct inode *inode,
+   int type, struct posix_acl *acl)
+{
+   struct ceph_inode_info *ci = ceph_inode(inode);
+   int issued;
+
+   spin_lock(ci-i_ceph_lock);
+   issued = __ceph_caps_issued(ci, NULL);
+   if (issued  (CEPH_CAP_XATTR_EXCL | CEPH_CAP_XATTR_SHARED)) {
+   set_cached_acl(inode, type, acl);
+   ci-i_aclcache_gen = ci-i_rdcache_gen;
+   }
+   spin_unlock(ci-i_ceph_lock);
+
+}
+
+static inline struct posix_acl *ceph_get_cached_acl(struct inode *inode,
+   int type)
+{
+   struct ceph_inode_info *ci = ceph_inode(inode);
+   struct posix_acl *acl = NULL;
+
+   spin_lock(ci-i_ceph_lock);
+   if (ci-i_aclcache_gen == ci-i_rdcache_gen)
+   acl = get_cached_acl(inode, type);
+   spin_unlock(ci-i_ceph_lock);
+
+   return acl;
+}
+
+struct posix_acl *ceph_get_acl(struct inode *inode, int type)
+{
+   int size;
+   const char *name;
+   char *value = NULL;
+   struct posix_acl *acl;
+
+   if (!IS_POSIXACL(inode))
+   return NULL;
+
+   acl = ceph_get_cached_acl(inode, type);
+   if (acl != ACL_NOT_CACHED)
+   return acl;


If client does not own cap, it can not rely on the cached acl, in that 
case, ceph_get_cached_acl() will return NULL rather than ACL_NOT_CACHED.

It will forbid to do the following synchronous MDS consultation.


+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = POSIX_ACL_XATTR_ACCESS;
+   break;
+   case ACL_TYPE_DEFAULT:
+   name = POSIX_ACL_XATTR_DEFAULT;
+   break;
+   default:
+   BUG();
+   }
+
+   size = __ceph_getxattr(inode, name, , 0);
+   if (size  0) {
+   value = kzalloc(size, GFP_NOFS);
+   if (!value)
+   return ERR_PTR(-ENOMEM);
+   size = __ceph_getxattr(inode, name, value, size);
+   }
+
+   if (size  0)
+  

[PATCH] MAINTAINERS: update an e-mail address

2013-11-08 Thread Alex Elder
I no longer have direct access to my Inktank e-mail.  I still pay
attention to rbd, so update its entry in MAINTAINERS accordingly.

Signed-off-by: Alex Elder el...@linaro.org
---
 MAINTAINERS |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 831b869..30f82a2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6889,7 +6889,7 @@ F:drivers/media/parport/*-qcam*
 RADOS BLOCK DEVICE (RBD)
 M: Yehuda Sadeh yeh...@inktank.com
 M: Sage Weil s...@inktank.com
-M: Alex Elder el...@inktank.com
+M: Alex Elder el...@kernel.org
 M: ceph-devel@vger.kernel.org
 W: http://ceph.com/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
-- 1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfs Warnings in syslog

2013-11-08 Thread Niklas Goerke
I'm sorry for reporting back only now, but I did not get to try it out 
earlier.
I upgraded one of my nodes to kernel 3.10 and the messages did not come 
up yet. It seems the kernel upgrade did the job.


Thank you very much for your help!

On Tue, 22 Oct 2013 13:36:56 +0100, Andrey Korolyov wrote:

Just my two cents:

XFS is a quite unstable with Ceph especially along with heavy CPU
usage up to 3.7(primarily soft lockups). I used 3.7 for eight months
before upgrade on production system and it performs just perfectly.

On Tue, Oct 22, 2013 at 1:29 PM, Jeff Liu jeff@oracle.com 
wrote:

Hello,

So It's better to add XFS mailing list to the CC-list. :)

I think this issue has been fixed by upstream commits:

From ff9a28f6c25d18a635abcab1f49db68108203dfb
From: Jan Kara j...@suse.cz
Date: Thu, 14 Mar 2013 14:30:54 +0100
Subject: [PATCH 1/1] xfs: Fix WARN_ON(delalloc) in 
xfs_vm_releasepage()



Thanks,
-Jeff

On 10/22/2013 07:46 PM, Niklas Goerke wrote:


Hi

My syslog and dmesg are being filled with the Warnings attached.
Looking at todays syslog I got up to 1101 of these warnings in the 
time
from 10:50 to 11:13 (and only in that time, else the log was 
clean). I

found them on all four of my OSD hosts, all at about the same time.
I'm running kernel 3.2.0-4-amd64 on a debian 7.0. Ceph is on 
version

0.67.4. I have got 15 OSDs per OSD Host.

Ceph does not really seem to care about this, so I'm not sure what 
it is

all about…
Still they are warnings in syslog and I hope you guys can tell me 
what

went wrong here and what I can do about it?

Thank you
Niklas


Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388018] [ 
cut

here ]
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388030] WARNING: at
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_aops.c:1091
xfs_vm_releasepage+0x76/0x8e [xfs]()
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388034] Hardware 
name:

X9DR3-F
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388036] Modules 
linked in:
xfs autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc ext3 
jbd

loop acpi_cpufreq mperf coretemp crc32c_intel ghash_clmulni_intel
snd_pcm aesni_intel snd_page_alloc aes_x86_64 snd_timer aes_generic 
s

nd cryptd soundcore pcspkr sb_edac joydev evdev edac_core iTCO_wdt
i2c_i801 iTCO_vendor_support i2c_core ioatdma processor thermal_sys
container button ext4 crc16 jbd2 mbcache usbhid hid ses enclosure 
sg
sd_mod crc_t10dif megaraid_sas ehci_hcd usbcore isci libsas 
usb_common liba

ta ixgbe mdio scsi_transport_sas scsi_mod igb dca [last unloaded:
scsi_wait_scan]
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388093] Pid: 3459605,
comm: ceph-osd Tainted: GW3.2.0-4-amd64 #1 Debian 
3.2.46-1

Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388096] Call Trace:
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388102]
[81046b75] ? warn_slowpath_common+0x78/0x8c
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388115]
[a048b98c] ? xfs_vm_releasepage+0x76/0x8e [xfs]
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388122]
[810bedc5] ? invalidate_inode_page+0x5e/0x80
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388129]
[810bee5d] ? invalidate_mapping_pages+0x76/0x102
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388135]
[810b7b83] ? sys_fadvise64_64+0x19f/0x1e2
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388140]
[81353b52] ? system_call_fastpath+0x16/0x1b
Oct 22 11:11:19 cs-bigfoot06 kernel: [9744648.388144] ---[ end 
trace

e9640ed6f82f066d ]---
--
To unsubscribe from this list: send the line unsubscribe 
ceph-devel in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe 
ceph-devel in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel 
in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: zero size file on cephfs

2013-11-08 Thread Yan, Zheng
sounds like caused by the file recover bug. The fix
(https://github.com/ceph/ceph/commit/eb381ffc8db14f13a7c5e3528a109bf89a7c5b31)
is included in the 0.72 release.

Regards
Yan, Zheng


On Fri, Nov 8, 2013 at 10:36 AM, Drunkard Zhang gongfan...@gmail.com wrote:
 Hi,
 I'm using cephfs for a while, it's great, but I found a serious problem 
 now.

 My situation is simple: a lot of servers receive syslog to local disk,
 then mount ceph FS via kernel driver and move those received files to
 ceph FS.

 Some days ago, I found my script missed some file while processing
 data, then I noticed those files' size is zero. But it's impossible,
 rsyslog won't generate file until data arrives, so I logged the file
 size before mv, so here's one example from these exceptions:

 In log:
 2013-11-07 18:17:29 /scripts/log/collector: file size: 4910689
 /gwbn/nat/20131107/dnsquery-pb24-192.168.156.60-by_squid79-20131107_17.log

 Zero size file on ceph FS:
 -rw-r--r-- 1 root root 0 Jan  1  1970
 dnsquery-pb24-192.168.156.60-by_squid79-20131107_17.log

 stat dnsquery-pb24-192.168.156.60-by_squid79-20131107_17.log
   File: ‘dnsquery-pb24-192.168.156.60-by_squid79-20131107_17.log’
   Size: 0 Blocks: 0  IO Block: 4194304 regular empty file
 Device: 0h/0dInode: 1099518968218  Links: 1
 Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
 Access: 2013-11-07 17:00:00.167501860 +0800
 Modify: 1970-01-01 08:00:00.0 +0800
 Change: 2013-11-07 18:17:29.843558000 +0800
  Birth: -

 I'm running multiple versions of kernel, scales from 3.8.x to 3.11.6.
 Most of the zero file event occurs on those servers with kernel
 version of 3.10.[45], some on 3.8.6 and 3.11.4, I didn't tested all of
 them, these is my running versions.

 I think the ceph FS kernel driver should sync before umount, isn't it
 ? So what can I do to shoot this problem down ? And any advice of
 temparory solution to avoid this, manually sync?
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: writing a ceph cliente for MS windows

2013-11-08 Thread Alphe Salas Michels
Hello malcom and matt thank you for apporting some more information 
source. OpenAFS is sure interesting httpfs too.


I hope it will help us on deciding the best path to follow in our 
interface with window.
Actually I still trying to isolate the needed client code in the 
shortest way possible.


Regards.

Alphe Salas

El nov 7, 2013 9:11 p.m., Malcolm Haak malc...@sgi.com 
mailto:malc...@sgi.com escribió:


   I'm just going to throw these in there.

   http://www.acc.umu.se/~bosse/ http://www.acc.umu.se/%7Ebosse/

   They are GPLv2 some already use sockets and such from inside the
   kernel.  Heck you might even be able to mod the HTTP one to use
   rados gateway. I don't know as I havent sat down and pulled them
   apart enough yet.

   They might help, but they might be useless. Not sure.

   On 08/11/13 06:47, Alphe Salas Michels wrote:

   Hello all I finally finished my first source code extraction
   that starts
   from ceph/src/client/fuse_ll.c
   The result is accurate unlike previous provided results.
   basically the
   script start from a file extract all the private includes
   definitions
   #include something.h and recursively extract private includes
   too. the
   best way to know who is related with who.

   starting from fuse_ll.cc I optain 390 files retreived and 120
   000 lines
   of code !
   involved dirs are : in ceph/src
   objclass/, common/, msg/, common/, osdc/, include/, client/, mds/,
   global/, json_spirit/, log/, os/, crush/, mon/, osd/, auth/

   probably not a good way to analyse what amount of work it means
   since
   most of those directories are the implementation of servers
   (osd, mon,
   mds) and even if only a tiny bit of them is needed at client
   level. you
   need two structures from ./osd/OSD.h and  my script by relation will
   take into acount the whole directory...

   I ran the script with libcephfs.cc as start point and got almost the
   same results. 131 000 lines of code and 386 files most of the
   same dirs
   involved.



   I think I will spend alot of time doing the manual source code
   isolation
   and understand way each #include is set in the files I read (what
   purpose they have do they allow to integrate a crucial data type
   or not.


   The other way around will be to read src/libcephfs.cc. It seems
   shorter
   but without understanding what part is used for each included
   header I
   can t say anything...



   I will keep reading the source code and take notes. I think in
   the case
   of libcephfs I will gain alot of time.

   signature

   *Alphé Salas*
   Ingeniero T.I

   asa...@kepler.cl mailto:asa...@kepler.cl
   *www.kepler.cl http://www.kepler.cl http://www.kepler.cl*

   On 11/07/13 15:02, Alphe Salas Michels wrote:

   Hello D.Ketor and Matt Benjamin,
   You give me alot to think about and this is great!
   I merged your previous post to make a single reply that
   anyone can
   report to easyly

   Windows NFS 4.1 is available here:
   http://www.citi.umich.edu/projects/nfsv4/windows/readme.html

   pnfs is another name for NFS4.X. It is presented as
   alternative to
   ceph and we get known terminology as MDS and OSD but without
   the self
   healing part if I understand well my rapid look on the
   topic. (when I
   say rapid look I mean ... 5 minutes spent in that... which
   is really
   small amount of time to get an accurate view on something)


   starting from mount.ceph ... I know that mount.ceph does
   little but it
   is a great hint to know what ceph needs and do things.
   Basically mount.ceph modprobe the ceph driver in the linux
   kernel then
   call mount with the line command passed args and the cephfs
   type as
   argument. Then the kernel does the work I don t understand
   yet what is
   the start calls that are made to the ceph driver but it
   seemed to me
   that is was relatively light. (a first impression compared
   to ceph-fuse.)

   I think I will do both isolate source code from ceph-client
   kernel
   (cephfs module for linux kernel) and the one pointed by Sage
   starting
   from client/fuse_ll.cc in ceph master branch. The common
   files betwin
   those 2 extractions will be our core set of mandatory features.

   Then we try to compile with cygwin a cephfs client library .
   Then we
   will try to interface with a modified windows nfs 4.1 client
   or pnfs
   or any other that will accept to be compiled with gcc for
   win32...

   the fact that 

Re: cache tier blueprint (part 2)

2013-11-08 Thread Gregory Farnum
On Thu, Nov 7, 2013 at 6:56 AM, Sage Weil s...@inktank.com wrote:
 I typed up what I think is remaining for the cache tier work for firefly.
 Greg, can you take a look?  I'm most likely missing a bunch of stuff here.

  
 http://wiki.ceph.com/01Planning/02Blueprints/Emperor/rados_cache_pool_(part_2)

 This will be one of the meatier sessions at the upcoming CDS.  There will
 be a fair bit of work not just on the OSD side but also in building
 testing tools that exercise and validate the functionality, here, that
 may be a great way for a new contributer to get involved and help out.

This looks reasonably complete to me as an enumeration of areas, but
is definitely targeted at the cache pool use case rather than demoting
to cold storage. I'm not sure yet if we're going to want to use the
same agent/process/whatever in both use cases; do you want a separate
blueprint for that, or should we broaden this one?
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Ceph: allocate non-zero page to fscache in readpage()

2013-11-08 Thread Li Wang
ceph_osdc_readpages() returns number of bytes read, currently,
the code only allocate full-zero page into fscache, this patch
fixes this.

Signed-off-by: Li Wang liw...@ubuntukylin.com
---
 fs/ceph/addr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6df8bd4..1e561c0 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -216,7 +216,7 @@ static int readpage_nounlock(struct file *filp, struct page 
*page)
}
SetPageUptodate(page);
 
-   if (err == 0)
+   if (err = 0)
ceph_readpage_to_fscache(inode, page);
 
 out:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


v0.72 Emperor released

2013-11-08 Thread Sage Weil
This is the fifth major release of Ceph, the fourth since adopting a
3-month development cycle. This release brings several new features,
including multi-datacenter replication for the radosgw, improved
usability, and lands a lot of incremental performance and internal
refactoring work to support upcoming features in Firefly.

Thank you to every who contributed to this release!  There were 46 authors 
in all.

Highlights include:

 * common: improved crc32c performance
 * librados: new example client and class code
 * mds: many bug fixes and stability improvements
 * mon: health warnings when pool pg_num values are not reasonable
 * mon: per-pool performance stats
 * osd, librados: new object copy primitives
 * osd: improved interaction with backend file system to reduce latency
 * osd: much internal refactoring to support ongoing erasure coding and 
   tiering support
 * rgw: bucket quotas
 * rgw: improved CORS support
 * rgw: performance improvements
 * rgw: validate S3 tokens against Keystone

Coincident with core Ceph, the Emperor release also brings:

 * radosgw-agent: support for multi-datacenter replication for disaster 
   recovery (buliding on the multi-site features that appeared in 
   Dumpling)
 * tgt: improved support for iSCSI via upstream tgt

Upgrading

There are no specific upgrade restrictions on the order or sequence of
upgrading from 0.67.x Dumpling.  We normally suggest a rolling upgrade
of monitors first, and then OSDs, followed by the radosgw and ceph-mds
daemons (if any).

It is also possible to do a rolling upgrade from 0.61.x Cuttlefish, but
there are ordering restrictions. (This is the same set of restrictions
for Cuttlefish to Dumpling.)

 1. Upgrade ceph-common on all nodes that will use the command line 
ceph utility.
 2. Upgrade all monitors (upgrade ceph package, restart ceph-mon 
daemons). This can happen one daemon or host at a time. Note that
because cuttlefish and dumpling monitors can't talk to each other,
all monitors should be upgraded in relatively short succession to
minimize the risk that an a untimely failure will reduce availability.
 3. Upgrade all osds (upgrade ceph package, restart ceph-osd daemons). 
This can happen one daemon or host at a time.
 4. Upgrade radosgw (upgrade radosgw package, restart radosgw daemons).

There are several minor compatibility changes in the librados API that
direct users of librados should be aware of.  For a full summary of
those changes, please see the complete release notes:

 * http://ceph.com/docs/master/release-notes/#v0-72-emperor

The next major release of Ceph, Firefly, is scheduled for release in
February of 2014.

You can download v0.72 Emperor from the usual locations:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.72.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Waiters on OSD operations will hang if replies invalid?

2013-11-08 Thread Sage Weil
On Sat, 9 Nov 2013, Li Wang wrote:
 Hi Sage,
   I am wondering if this issue is there. My understanding is that, for OSD
 requests, if replies get lost, each request will get re-sent, even to
 different OSDs, if the Monitor tells the client corresponding OSD error. Then
 each request will finally get handled in handle_reply(), right? But, how about
 if the replies are invalid, as described below.
   If this issue is really there, I will try to prepare patches.

Yeah, I think you are right.  If we get an invalid reply something is 
clearly wrong with the cluster, so this isn't the highest concern, but it 
would definitely be better if the client failed with EIO instead of 
hanging forever.  I suspect this is mainly a matter of making the bad_put 
label also set r_result and kick the waiters, although there is probably 
some reorganization that can be done to reorganize the flow in this 
function a bit and avoid duplicating any code.

Thanks!
sage

 
 Cheers,
 Li Wang
 
  Original Message 
 Subject: Waiters on OSD operations will hang if replies invalid?
 Date: Thu, 07 Nov 2013 11:08:24 +0800
 From: Li Wang liw...@ubuntukylin.com
 To: ceph-devel@vger.kernel.org ceph-devel@vger.kernel.org
 CC: Sage Weil s...@inktank.com
 
 For ceph_sync_write()/ceph_osdc_readpages()/ceph_osdc_writepages(), the
 user process or kernel thread will
 wait for the pending OSD requests to complete on the corresponding
 req-r_completion. But it seems they only are waked up in handle_reply()
 and provided the replies are correct. What about if the replies are
 invalid, as the situations of label 'bad_put' in this function intended
 to capture, the waiters gotta hang there?
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libuuid vs boost uuid

2013-11-08 Thread James Harper
Just out of curiosity (recent thread about windows port) I just had a quick go 
at compiling librados under mingw (win32 cross compile), and one of the errors 
that popped up was the lack of libuuid under mingw. Ceph appears to use 
libuuid, but I notice boost appears to include a uuid class too, and it seems 
that ceph already uses some of boost (which already builds under mingw).

Is there anything special about libuuid that would mean boost's uuid class 
couldn't replace it? And would it be better to still use ceph's uuid.h as a 
wrapper around the boost uuid class, or to modify ceph to use the boost uuid 
class directly?

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cache tier blueprint (part 2)

2013-11-08 Thread Sage Weil
On Fri, 8 Nov 2013, Gregory Farnum wrote:
 On Thu, Nov 7, 2013 at 6:56 AM, Sage Weil s...@inktank.com wrote:
  I typed up what I think is remaining for the cache tier work for firefly.
  Greg, can you take a look?  I'm most likely missing a bunch of stuff here.
 
   
  http://wiki.ceph.com/01Planning/02Blueprints/Emperor/rados_cache_pool_(part_2)
 
  This will be one of the meatier sessions at the upcoming CDS.  There will
  be a fair bit of work not just on the OSD side but also in building
  testing tools that exercise and validate the functionality, here, that
  may be a great way for a new contributer to get involved and help out.
 
 This looks reasonably complete to me as an enumeration of areas, but
 is definitely targeted at the cache pool use case rather than demoting
 to cold storage. I'm not sure yet if we're going to want to use the
 same agent/process/whatever in both use cases; do you want a separate
 blueprint for that, or should we broaden this one?

I think they are going to be quite similar, so it seems simplest to just 
start with this use-case and keep in mind that it may need to generalize 
slightly.  To a first approximation, both are making some decision about 
when to flush/evict or demote based on how old the object is and/or how 
full the pool is.  It will probably be a pretty compact bit of logic, so 
I'm not too worried.

The other question for me is where we want to implement the agent.  The 
logic is pretty simple, and it can operate entirely using the new librados 
calls.  I may be simpler to drive it from a thread inside ceph-osd 
(simpler than managing a separate external daemon).  It might also be easy 
(and slightly more efficient) to plug into the scrub process.  It is 
possible/likely that in the future we (or others) will want to construct 
more sophisticated policies using an external agent, but at this point the 
logic is simple enough that we can probably pick something simple and 
still be able to easily change it up down the line...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-08 Thread Sage Weil
On Sat, 9 Nov 2013, James Harper wrote:
 Just out of curiosity (recent thread about windows port) I just had a 
 quick go at compiling librados under mingw (win32 cross compile), and 
 one of the errors that popped up was the lack of libuuid under mingw. 
 Ceph appears to use libuuid, but I notice boost appears to include a 
 uuid class too, and it seems that ceph already uses some of boost (which 
 already builds under mingw).
 
 Is there anything special about libuuid that would mean boost's uuid 
 class couldn't replace it? And would it be better to still use ceph's 
 uuid.h as a wrapper around the boost uuid class, or to modify ceph to 
 use the boost uuid class directly?

Nice!  Boost uuid looks like it would work just fine.  It is probably 
easier and less disruptive to use it from within the existing class in 
include/uuid.h.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html