Re: ceph-deploy: too many argument: --setgroup 10
I see. This is the ownership info for /var/lib/ceph [nwatkins@cn67 ~]$ stat /var/lib/ceph File: `/var/lib/ceph' -> `/ram/var/lib/ceph' Size: 17Blocks: 3 IO Block: 524288 symbolic link Device: 12h/18d Inode: 4429801211 Links: 1 Access: (0777/lrwxrwxrwx) Uid: (0/root) Gid: ( 10/ wheel) This is probably a rather non traditional environment that Ceph is being deployed onto. I talked with the sys admins and they were rather surprised by this. I guess we can take care of it on this end now that we see the issue. In general should the ceph-deploy tool version be paired up with the version of Ceph being deployed, or is it the case that for Hammer the file system uid/gid _should be_ 0 to avoid the unsupported --getgroup flag? Thanks! On Wed, Sep 2, 2015 at 3:32 PM, Travis Rhoden wrote: > Hi Noah, > > What is the ownership on /var/lib/ceph ? > > ceph-deploy should only be trying to use --setgroup if /var/lib/ceph is > owned by non-root. > > On a fresh install of Hammer, this should be root:root. > > The --setgroup flag was added to ceph-deploy in 1.5.26. > > - Travis > > On Wed, Sep 2, 2015 at 1:59 PM, Noah Watkins wrote: >> >> I'm getting the following error using ceph-deploy to setup a cluster. >> It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It >> looks like setgroup wasn't an option in Hammer, but ceph-deploy adds >> it. Is there a trick or older version of ceph-deploy I should try? >> >> - Noah >> >> [cn67][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i >> cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 >> [cn67][WARNIN] too many arguments: [--setgroup,10] >> [cn67][DEBUG ] --conf/-c FILEread configuration from the given >> configuration file >> [cn67][WARNIN] usage: ceph-mon -i monid [flags] >> [cn67][DEBUG ] --id/-i IDset ID portion of my name >> [cn67][WARNIN] --debug_mon n >> [cn67][DEBUG ] --name/-n TYPE.ID set name >> [cn67][WARNIN] debug monitor level (e.g. 10) >> [cn67][DEBUG ] --cluster NAMEset cluster name (default: ceph) >> [cn67][WARNIN] --mkfs >> [cn67][DEBUG ] --version show version and quit >> [cn67][WARNIN] build fresh monitor fs >> [cn67][DEBUG ] >> [cn67][WARNIN] --force-sync >> [cn67][DEBUG ] -drun in foreground, log to stderr. >> [cn67][WARNIN] force a sync from another mon by wiping local >> data (BE CAREFUL) >> [cn67][DEBUG ] -frun in foreground, log to usual >> location. >> [cn67][WARNIN] --yes-i-really-mean-it >> [cn67][DEBUG ] --debug_ms N set message debug level (e.g. 1) >> [cn67][WARNIN] mandatory safeguard for --force-sync >> [cn67][WARNIN] --compact >> [cn67][WARNIN] compact the monitor store >> [cn67][WARNIN] --osdmap >> [cn67][WARNIN] only used when --mkfs is provided: load the >> osdmap from >> [cn67][WARNIN] --inject-monmap >> [cn67][WARNIN] write the monmap to the local >> monitor store and exit >> [cn67][WARNIN] --extract-monmap >> [cn67][WARNIN] extract the monmap from the local monitor store and >> exit >> [cn67][WARNIN] --mon-data >> [cn67][WARNIN] where the mon store and keyring are located >> [cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1 >> [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon >> --cluster ceph --mkfs -i cn67 --keyring >> /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 >> [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph-deploy: too many argument: --setgroup 10
I'm getting the following error using ceph-deploy to setup a cluster. It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It looks like setgroup wasn't an option in Hammer, but ceph-deploy adds it. Is there a trick or older version of ceph-deploy I should try? - Noah [cn67][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 [cn67][WARNIN] too many arguments: [--setgroup,10] [cn67][DEBUG ] --conf/-c FILEread configuration from the given configuration file [cn67][WARNIN] usage: ceph-mon -i monid [flags] [cn67][DEBUG ] --id/-i IDset ID portion of my name [cn67][WARNIN] --debug_mon n [cn67][DEBUG ] --name/-n TYPE.ID set name [cn67][WARNIN] debug monitor level (e.g. 10) [cn67][DEBUG ] --cluster NAMEset cluster name (default: ceph) [cn67][WARNIN] --mkfs [cn67][DEBUG ] --version show version and quit [cn67][WARNIN] build fresh monitor fs [cn67][DEBUG ] [cn67][WARNIN] --force-sync [cn67][DEBUG ] -drun in foreground, log to stderr. [cn67][WARNIN] force a sync from another mon by wiping local data (BE CAREFUL) [cn67][DEBUG ] -frun in foreground, log to usual location. [cn67][WARNIN] --yes-i-really-mean-it [cn67][DEBUG ] --debug_ms N set message debug level (e.g. 1) [cn67][WARNIN] mandatory safeguard for --force-sync [cn67][WARNIN] --compact [cn67][WARNIN] compact the monitor store [cn67][WARNIN] --osdmap [cn67][WARNIN] only used when --mkfs is provided: load the osdmap from [cn67][WARNIN] --inject-monmap [cn67][WARNIN] write the monmap to the local monitor store and exit [cn67][WARNIN] --extract-monmap [cn67][WARNIN] extract the monmap from the local monitor store and exit [cn67][WARNIN] --mon-data [cn67][WARNIN] where the mon store and keyring are located [cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon --cluster ceph --mkfs -i cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
newstore OSD magic error
I've deployed a cluster with one monitor and one OSD using ceph-deploy. The OSD is spinning up, and there are some errors in the log after `ceph-deploy create osd osd0:/dev/sdb`. 2015-08-27 14:23:28.061297 7f46d6811980 -1 OSD magic @??7?V != my ceph osd volume v026 ceph.conf: [global] fsid = 9126937a-c39c-42c5-a00d-30625ce8fa11 mon_initial_members = mon0 mon_host = 10.10.2.3 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true enable experimental unrecoverable data corrupting features = newstore rocksdb public network = 10.10.2.0/8 cluster network = 10.10.1.0/8 [mon.a] host = mon0 mon addr = 10.10.2.3:6789 [osd] #osd objectstore = memstore osd objectstore = newstore [osd.0] public addr = 10.10.2.1 cluster addr = 10.10.1.2 full config log: 2015-08-27 14:23:05.227604 7f1db823b980 0 ceph version 0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad), process ceph-osd, pid 13468 2015-08-27 14:23:05.227632 7f1db823b980 -1 WARNING: experimental feature 'newstore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:05.247424 7f5576b43980 0 ceph version 0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad), process ceph-osd, pid 13493 2015-08-27 14:23:05.247449 7f5576b43980 -1 WARNING: experimental feature 'newstore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:05.274768 7fbe45229980 0 ceph version 0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad), process ceph-osd, pid 13497 2015-08-27 14:23:05.274797 7fbe45229980 -1 WARNING: experimental feature 'newstore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:25.171436 7ff6a3816980 0 ceph version 0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad), process ceph-osd, pid 13677 2015-08-27 14:23:25.171457 7ff6a3816980 -1 WARNING: experimental feature 'newstore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:25.173214 7ff6a3816980 -1 WARNING: the following dangerous and experimental features are enabled: newstore,rocksdb 2015-08-27 14:23:25.173623 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mkfs path /var/lib/ceph/tmp/mnt.sU2pxt 2015-08-27 14:23:25.173633 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_path using fs driver 'generic' 2015-08-27 14:23:25.173651 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mkfs fsid is already set to db74b022-d47b-4313-8367-44ed6ab5a2ce 2015-08-27 14:23:25.173994 7ff6a3816980 -1 WARNING: experimental feature 'rocksdb' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:25.235063 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_db opened rocksdb path /var/lib/ceph/tmp/mnt.sU2pxt options 2015-08-27 14:23:25.235237 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mount path /var/lib/ceph/tmp/mnt.sU2pxt 2015-08-27 14:23:25.235254 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_path using fs driver 'generic' 2015-08-27 14:23:25.235281 7ff6a3816980 -1 WARNING: experimental feature 'rocksdb' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2015-08-27 14:23:25.263694 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_db opened rocksdb path /var/lib/ceph/tmp/mnt.sU2pxt options 2015-08-27 14:23:25.263865 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _recover_next_fid old fid_max 0/0 2015-08-27 14:23:25.263887 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _recover_next_nid old nid_max 0 2015-08-27 14:23:25.303719 7ff6a3816980 1 newstore(/var/lib/ceph/tmp/mnt.sU2pxt) umount 2015-08-27 14:23:25.525720 7ff6a3816980 -1 created object store /var/lib/ceph/tmp/mnt.sU2pxt journal /var/lib/ceph/tmp/mnt.sU2pxt/journal for osd.0 fsid 9126937a-c39c-42c5-a00d-30625ce8fa11 2015-08-27 14:23:25.525785 7ff6a3816980 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.sU2pxt/keyring: can't open /var/lib/ceph/tmp/mnt.sU2pxt/keyring: (2) No such file or directory 2015-08-27 14:23:25.525984 7ff6a
Re: Ceph bindings for go & docker
Hi Loic, This sounds great. The librados bindings have good test converage, but I merged a PR for RBD support a couple weeks ago and haven't had time to get it cleaned up and tests written. Do you need support for the AIO interface in librbd? -Noah - Original Message - From: "Loic Dachary" To: "Noah Watkins" Cc: "Ceph Development" , "Vincent Batts" , "Johan Euphrosine" Sent: Monday, February 9, 2015 9:15:02 AM Subject: Ceph bindings for go & docker Hi, I discovered https://github.com/noahdesu/go-ceph today :-) It would be useful in the context of a Ceph volume driver for docker ( see https://github.com/docker/docker/issues/10661 & https://github.com/docker/docker/pull/8484 ). Are you a docker user by any chance ? -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ann] fio plugin for libcephfs
I've posted a preliminary patch set to support a libcephfs io engine in fio: http://github.com/noahdesu/fio cephfs You can use this right now to generate load through libcephfs, but the plugin needs a bit more work before it goes upstream (patches welcome), but feel free to play around with it. There is an example script in examples/cephfs.fio. Issues: Currently all the files that are created get the same size as the total job size rather than the total size being divided by the number of threads. - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: arm7 gitbuilder ?
On Fri, Oct 10, 2014 at 3:21 PM, Loic Dachary wrote: > > You are lucky to have access to ARMv8 boxes :-) Would you be willing to run a > few tests on my behalf ? That might be a challenge since we (our reserach group) don't own them--we just have remote access for a specific project. But it doesn't hurt to ask! I'll let you know. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: arm7 gitbuilder ?
On Fri, Oct 10, 2014 at 2:54 PM, Loic Dachary wrote: > I would be surprised if it could easily be setup for cross compilation. > Although it would be nice to have an ARMv8 I don't need it right now. Do you ? Potentially. I'll poke around a bit and see. Maybe we can run 32-bit builds on ARMv8. I'm not in a huge rush, and can always build from source on those boxes, as cross-compiling might be a major headache :) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: arm7 gitbuilder ?
On Fri, Oct 10, 2014 at 12:16 PM, Loic Dachary wrote: > Hi Noah, > > My focus is to create centos7 and ubuntu-14.04 packages at this point. I think a newish Ubuntu should work just fine for us. Are the builds using a cross compile? If so it'd be great if there was one build for ARMv8. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: arm7 gitbuilder ?
On Fri, Oct 10, 2014 at 10:14 AM, Sage Weil wrote: > On Fri, 10 Oct 2014, Loic Dachary wrote: >> On 10/10/2014 17:11, Sage Weil wrote: >> > On Fri, 10 Oct 2014, Loic Dachary wrote: >> >> Hi Sandon, >> >> >> >> Would it be possible to resurect / create an arm7 gitbuilder for >> >> whatever distribution is more convenient ? Janne made a great >> >> contribution to erasure code optimization on NEON ( >> >> https://bitbucket.org/jannau/gf-complete/branch/neon ) and it would make >> >> it easier to have a gitbuilder to run it throught teuthology and torture >> >> it ;-) If that's complicated or there is another way to run teuthology >> >> jobs for this purpose, I'm open to suggestions. >> > >> > We have a bunch of armv7l nodes that used to run distcc and a gitbuilder. >> > It's a bit of a time suck to maintain, though. Is there an interested >> > party we can set up with lab access that can help administer these on an >> > ongoing basis? Sandon is a highly contended resource. :) >> >> Long term I'm not sure. I'll try to setup something temporary and see >> where it goes. Is there a risk that these machines go away in the next 6 >> months ? 3 months ? I am an interested party, and at least in the short term would dedicate some time to getting arm7 builds working too. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why index (collectionIndex) need a lock?
I didn't know about mutrace, thanks for that reference! On Tue, Sep 30, 2014 at 8:13 PM, Milosz Tanski wrote: > On Tue, Sep 30, 2014 at 7:36 PM, Noah Watkins > wrote: >> On Tue, Sep 30, 2014 at 10:42 AM, Somnath Roy >> wrote: >>> Also, I don't think this lock has big impact to performance since it is >>> already sharded to index level. I tried with reader/writer implementation >>> of this lock (logic will be somewhat similar to your state concept) and not >>> getting any benefit . >> >> If there is interest in identifying locks that are introducing latency >> it might useful to add some tracking features to Mutex and RWLock. A >> simple thing would be to just record maximum wait times per lock and >> dump this via admin socket. > > Noah, > > You're better off running some kind of synthetic test using mutrace > (you can't use tcmalloc/jemalloc) or measuring futex syscalls via a > pref tracepoint. Generally adding this kind of tracking into the locks > itself ends up being even more expensive. > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Milosz Tanski > CTO > 16 East 34th Street, 15th floor > New York, NY 10016 > > p: 646-253-9055 > e: mil...@adfin.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why index (collectionIndex) need a lock?
On Tue, Sep 30, 2014 at 10:42 AM, Somnath Roy wrote: > Also, I don't think this lock has big impact to performance since it is > already sharded to index level. I tried with reader/writer implementation of > this lock (logic will be somewhat similar to your state concept) and not > getting any benefit . If there is interest in identifying locks that are introducing latency it might useful to add some tracking features to Mutex and RWLock. A simple thing would be to just record maximum wait times per lock and dump this via admin socket. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LTTng unfriendly with mixed static/dynamic linking
The Mutex tracepoints were just a driving example, so definitely feel free to remove them. But libcommon is pretty big, so I suspect that that if tracing is merged that someone will eventually want tracepoints in libcommon. On Tue, Aug 12, 2014 at 12:41 PM, Adam Crume wrote: > Sage, if I understood you correctly on the video call, you have > reservations about making libcommon a dynamic library because of > incompatible changes between versions causing problems when packages > use different versions, and you brought up the idea of having a static > version and a dynamic version. I don't think that would entirely > work, because rbd (which must use the dynamic version) and libcommon > would have to be in different packages, so they could have version > mismatches. > > There's another alternative, which is to remove all tracepoints from > libcommon. At the moment, the only tracepoints are in Mutex, and > they're not necessary for rbd-replay. (Noah added them as an example > of using LTTng in Ceph. Noah, are you using these tracepoints?) If > we ever wanted to trace anything in libcommon, though, this issue > would come up again. > > On Sat, Jul 26, 2014 at 3:29 AM, Joao Eduardo Luis > wrote: >> On 07/25/2014 11:12 PM, Sage Weil wrote: >>> >>> On Fri, 25 Jul 2014, Adam Crume wrote: I tried all solutions, and it looks like only #1 works. #2 gives the error "/usr/bin/ld: main: hidden symbol `tracepoint_dlopen' in common_tp.a(common_tp.o) is referenced by DSO" when linking. #3 gives the error "./liblibrary.so: undefined reference to `tracepoint_dlopen'" when linking. (Linking is complicated by the fact that LTTng uses special symbol attributes, and tracepoint_dlopen happens to be weak and hidden.) >>> >>> >>> I think #1 is good for other reasons, too. We already have issues (I >>> think!) with binaries that use librados and also link libcommon >>> statically. Specifically, I think we've seen that having mismatched >>> versions of librados and the binary installed lead to confusion about the >>> contents/structure of mdconfig_t (g_conf). This is one of the reasons >>> why the libcommon and rgw packages require an identical version of >>> librados or librbd or whatever--to avoid this inconsistency. >>> Unless I'm mistaken (and I very well may be), we will have to ensure that all traced code is either 1) placed in a shared library and never statically linked elsewhere, or 2) never linked into any shared library. >>> >>> >>> That sounds doable and sane to me: >>> >>> - librados, librbd, libceph_common, etc. would have the tracepoints in >>> the same .so >>> - ceph-osd could have its own tracepoints, as long as they are always >>> static. (For example, libos.la, which is linked statically by ceph-mon >>> and ceph-osd but never dynamically.) >>> >>> One pain point in all of this, though, is that the libceph_common.so (or >>> whatever) will need to go into a separate package that is required by >>> librados.so and librbd and ceph-common and everything else. 'ceph-common' >>> is what this ought to be called, but we've coopted it to mean 'ceph >>> clients'. I'm not sure it if it worthwhile to go through the hinjinx to >>> rename ceph-common to ceph-clients and repurpose ceph-common for this? >>> >>> sage >> >> >> I notice that ceph-common contains no libs whatsoever. We may want to >> change ceph-common to ceph-client or something and have libcommon shipped as >> ceph-common, but I imagine that would be a pain as package management goes. >> Or we could take the path of least resistance (and possibly open ourselves >> to confusion?) and ship libcommon in a 'ceph-libs' package -- although it >> looks like it would be a 1-lib package :) >> >> -Joao >> >>> >>> >>> Thoughts? Adam On Fri, Jul 25, 2014 at 11:48 AM, Adam Crume wrote: > > LTTng requires tracepoints to be linked into a program only once. If > tracepoints are linked in multiple times, the program crashes at > startup with: "LTTng-UST: Error (-17) while registering tracepoint > probe. Duplicate registration of tracepoint probes having the same > name is not allowed." > > This is problematic when mixing static and dynamic linking. If the > tracepoints are in a static library, that library can end up in an > executable multiple times by being linked in directly, as well as > being statically linked into a dynamic library. Even if the > tracepoints are not linked directly into the executable, they can be > statically linked into multiple dynamic libraries that the executable > loads. > > For us, this problem shows up with libcommon, and could show up with > others such as libosd. (In general, I think anything added to > noinst_LTLIBRARIES is static, and anything added to lib_LTLIBRARIES is > dynamic.) > > There are a few ways of solving the issue: > 1. Change every
[ANN] Maven support for CephFS Java
The `cephfs-java` package can now to retreived from the Maven repository below. Note that this is a _very alpha_ release, but contains a whole lot of awesome in the form of the native JNI bits embedded in the package, so there should be no need to do an RPM/Deb install of these dependencies (e.g. libcephfs-jni) and deal with all those silly search paths. Please shoot any bug reports to the list. Current support is limited to Linux/x86-64, but can add other archs if requested. All existing cephfs-java packages are unaffected. - Noah ceph-maven http://ceph.com/maven com.ceph cephfs 0.0.1 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Packaging RFC for cephfs-java
Hey ceph-devel, In its current form deploying any software that depends on `cephfs-java` can be a real pain! Users need to make sure they have one of each of the following installed: * platform-indep: libcephfs.jar * platform-dep: libcephfs_jni.[so,dylib,...] - osx/linux * x86/x86-64 * etc... We build RPM and Debian packages for both, but that doesn't solve the following dependency problems. First, The JVM needs to be told about the locations of the native library via a special "java.library.path" property specified when the JVM starts, and the JVM doesn't always look in the places that the RPM/Deb packages stick the binaries. Projects like Hadoop invoke the JVM deep within shell scripts, so getting these dependencies to resolve has been a continuous source of frustration for users. The second problem is that users expect easy dependency management using solutions like Maven Central, and so currently dealing with the libcephfs.jar turns into a special case in the deployment procedure. We solve the second problem by publishing our artifacts into Maven Central. That is pretty easy, but the problem of resolving the native library dependencies remain. There are two ways to deal with this challenge. First, several projects (e.g. JNA and LevelDB-Java) embed everything, including pre-built libraries for all supported platforms into a single JAR file, and the native library is transparently loaded at run-time. I've create a POC that shows how to do this: http://github.com/ceph/ceph wip-native-cephfs-java This approach is ideal from a users approach, but means more packaging/shipping complexity. The second approach would depend on the current RPM/Deb installation, and we could manually search for these in well-known places. This could work I suppose, but it still requires that extra step to rendezvous deps from Maven and yum/apt. Any thoughts welcome! -Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph-osd seg-fault with small writes
Running rados bench with the default 4MB object size works fine, but when I shrink the write size to 4K (i.e. running `rados -p data -b 4096 bench 30 write`), ceph-osd will segfault almost immediately. Here is the segfault, and a link to the full debug uploaded using ceph-post-file. ceph-post-file id: f9c8a1f8-969f-46c2-9cef-41b4683fdc76 ceph version 0.77-624-g82f62b1 (82f62b1ea82f6d92f7a5ed0bcbacd608770a15e3) 1: ./ceph-osd() [0x95dbaf] 2: (()+0xfbb0) [0x7f3f1f2f4bb0] 3: (gsignal()+0x37) [0x7f3f1d7cdf77] 4: (abort()+0x148) [0x7f3f1d7d15e8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3f1e0d96e5] 6: (()+0x5e856) [0x7f3f1e0d7856] 7: (()+0x5e883) [0x7f3f1e0d7883] 8: (()+0x5eaae) [0x7f3f1e0d7aae] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa3d1b2] 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x8e9) [0x749c59] 11: (FileStore::_do_transactions(std::list >&, unsigned long, ThreadPool::TPHandle*)+0x6c) [0x74d88c] 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x167) [0x74da17] 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaef) [0xa2dfdf] 14: (ThreadPool::WorkThread::entry()+0x10) [0xa2eed0] 15: (()+0x7f6e) [0x7f3f1f2ecf6e] 16: (clone()+0x6d) [0x7f3f1d8919cd] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: librados::ObjectIterator segfault
Oh great. I'll pull in that patch. Thanks On Sun, Mar 2, 2014 at 11:12 AM, Ilya Dryomov wrote: > On Sun, Mar 2, 2014 at 8:38 PM, Noah Watkins wrote: >> This is a segfault occuring in the latest master listing objects with >> `rados -p data ls` >> >> Full trace: http://pastebin.com/3JG9cX0Z >> >> nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados lspools >> data >> metadata >> rbd >> nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados -p data ls >> *** Caught signal (Segmentation fault) ** >> in thread 7f84f02ce7c0 >> ceph version 0.77-620-gf3976c1 (f3976c16531096b9979842fc4445d40d6e889932) >> 1: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x43e2df] >> 2: (()+0xfbb0) [0x7f84eef6bbb0] >> 3: (librados::ObjectIterator::operator=(librados::ObjectIterator >> const&)+0x27) [0x7f84ef31f807] >> 4: (librados::ObjectIterator::ObjectIterator(librados::ObjectIterator >> const&)+0x30) [0x7f84ef31fb10] >> 5: (main()+0x1bff) [0x41274f] >> 6: (__libc_start_main()+0xf5) [0x7f84ee193de5] >> 7: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x41b967] >> 2014-03-02 10:35:57.471206 7f84f02ce7c0 -1 *** Caught signal >> (Segmentation fault) ** >> in thread 7f84f02ce7c0 > > I ran into it a couple days ago. I think Josh has it fixed. > > https://github.com/ceph/ceph/pull/1322 > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
librados::ObjectIterator segfault
This is a segfault occuring in the latest master listing objects with `rados -p data ls` Full trace: http://pastebin.com/3JG9cX0Z nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados lspools data metadata rbd nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados -p data ls *** Caught signal (Segmentation fault) ** in thread 7f84f02ce7c0 ceph version 0.77-620-gf3976c1 (f3976c16531096b9979842fc4445d40d6e889932) 1: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x43e2df] 2: (()+0xfbb0) [0x7f84eef6bbb0] 3: (librados::ObjectIterator::operator=(librados::ObjectIterator const&)+0x27) [0x7f84ef31f807] 4: (librados::ObjectIterator::ObjectIterator(librados::ObjectIterator const&)+0x30) [0x7f84ef31fb10] 5: (main()+0x1bff) [0x41274f] 6: (__libc_start_main()+0xf5) [0x7f84ee193de5] 7: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x41b967] 2014-03-02 10:35:57.471206 7f84f02ce7c0 -1 *** Caught signal (Segmentation fault) ** in thread 7f84f02ce7c0 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Assertion error in librados
On Tue, Feb 25, 2014 at 9:51 AM, Josh Durgin wrote: > That's a good idea. This particular assert in a Mutex is almost always > a use-after-free of the Mutex or structure containing it though. I think that a use-after-free will also throw an EINVAL (assuming it isn't a pathalogical case) as pthread_mutex_lock checks an initialization magic variable. I think that particular mutex isn't initialized with flags that would cause any of the other possible return values. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Assertion error in librados
Perhaps using gtest-style asserts (ASSERT_EQ(r, 0)) in Ceph would be useful so we can see parameter values to the assertion in the log. In this case, the return value from pthread_mutex_lock is almost certainly EINVAL, but it'd be informative to know for sure. On Tue, Feb 25, 2014 at 7:58 AM, Filippos Giannakos wrote: > Hi Greg, > > Unfortunately we don't keep any Ceph related logs on the client side. On the > server side, we kept the default log settings to avoid overlogging. > Do you think that there might be something usefull on the OSD side ? > > On Tue, Feb 25, 2014 at 07:28:30AM -0800, Gregory Farnum wrote: >> Do you have logs? The assert indicates that the messenger got back >> something other than "okay" when trying to grab a local Mutex, which >> shouldn't be able to happen. It may be that some error-handling path >> didn't drop it (within the same thread that later tried to grab it >> again), but we'll need more details to track it down. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> > > Kind Regards, > -- > Filippos > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: doxygen help
I don't think Asphyxiate handles 'class' http://marc.info/?l=ceph-devel&m=135130277326664&w=2 On Sun, Feb 23, 2014 at 12:38 PM, Sage Weil wrote: > A while back I added some doxygen comments/docs to librados.hpp and tried > to have sphinx slurp it up into the generated docs on ceph.com: > > > https://github.com/ceph/ceph/commit/d0c4600e6645d116b9b2c4eea56ef5851eea54d5 > > Unfortunately this flames out with some obscure doxygen error and I wasn't > able to sort it out given my limited attention span and experience with > such tools. You can see the error here: > > > http://gitbuilder.sepia.ceph.com/gitbuilder-doc/log.cgi?log=d0c4600e6645d116b9b2c4eea56ef5851eea54d5 > > Exception occurred: > File "/tmp/virtualenv-docs/src/asphyxiate/asphyxiate/__init__.py", line > 388, in render_compounddef > "cannot handle {node.tag} kind={node.attrib[kind]}".format(node=node) > Error: Assertion cannot handle compounddef kind=class > > Anybody know what might be going on? This is the main thing preventing > the C++ librados API docs (such as they are) from appearing on the site. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] ceph hadoop using ambari
Hi Kesten, It's a little difficult to tell what the source of the problem is, but looking at the gist you referenced, I don't see anything that would indicate that Ceph is causing the issue. For instance, hadoop-mapred-tasktracker-xxx-yyy-hdfs01.log looks like Hadoop daemons are having problems conneting to each other. Finding out what command in hadoop-daemon.sh is causing the permission errors might be informative, but I don't have any experience with Ambari. On Mon, Feb 17, 2014 at 9:23 AM, Kesten Broughton wrote: > I posted this to ceph-devel-owner before seeing that this is the correct > place to post. > > My company is trying to evaluate virtualized hdfs clusters using ceph as a > drop-in replacement for staging and development > following http://ceph.com/docs/master/cephfs/hadoop/. We deploy clusters > with ambari 1.3.2. > > I spun up a 10 node cluster with 3 datanodes, name, secondary, 3 > zookeepers, ambari master, and accumulo master. > > Our process is > This was likely the cause of shutdown errors. Should do > 1. Run ambari install > 2. shut down all ambari services > 3. push modified core-site.xml to datanodes, name, secondary > 4. restart ambari services > > I am getting errors > /usr/lib/hadoop/bin/hadoop-daemon.sh: Permission denied > > in the ambari console error log from the command: > su - hdfs -c 'export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && > /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start > datanode' > > > I think this is an ambari issue, but I¹m wondering > 1. Is there a detailed guide of using ambari with ceph-hadoop, or has > anyone tried it? > 2. Is there a script or list of log files useful for debugging ceph > issues in general? > > thanks, > > kesten > > > ps. > I have opened a gist via > https://gist.github.com/darKoram/9051450 > and an issue on the horton forums at > http://hortonworks.com/community/forums/topic/ambari-restart-services-give- > bash-usrlibhadoopbinhadoop-daemon-sh-permiss/#post-48793 > > > ___ > ceph-users mailing list > ceph-us...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feedback on supporting libc++
I've send up a pull request with an initial run at this patch set: https://github.com/ceph/ceph/pull/1064 The only thing that we haven't mentioned as possible solutions is to only use boost variants, rather than switching between the ones in std:: (in c++11) and std::tr1::. On Tue, Dec 31, 2013 at 11:59 AM, Josh Durgin wrote: > On 12/31/2013 08:59 AM, Noah Watkins wrote: >> >> Thanks for testing that Josh. Before cleaning up this patch set, I >> have a few questions. >> >> I'm still not clear on how to handle the "std::tr1::shared_ptr < >> ObjListCtx > ctx;" in librados.hpp. If we change this to >> ceph::shared_ptr, then we'll also need to some how ship with the >> translations here: >> >>https://github.com/ceph/ceph/blob/port/libc%2B%2B/src/include/memory.h > > > I'd suggest treating it like we did buffer.h and associated headers - > make a symlink to it from include/rados/memory.h, and install a copy of > it with the librados headers. > > >> It's also not clear that ceph::shared_ptr should be exposed publically >> if there is a thought we might start switching out implementations of >> ceph::shared_ptr via memory.h (e.g. by using boost implementation). > > > We can't change the actual type used by librados, since AIUI that's > part of the ABI, so if we want to use another type internally we can > make include/rados/memory.h a copy of the original instead of a symlink, > and then change the internal include/memory.h however we like. > > Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] how to use the function ceph_open_layout
You'll need to register the new pool with the MDS: ceph mds add_data_pool On Thu, Jan 2, 2014 at 9:48 PM, 鹏 wrote: > Hi all; > today, I want to use the fuction of ceph_open_layout() in libcephFs.h > > I creat a new pool success, > # rados mkpool data1 > and then I edit the code like this: > > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data1") > > and then the fd is -22! > > when I use the data pool , it can success > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data") > > the ceph_open_layout support read/write to a new pool??? > > thinks you for the help! > yous ! > > > > > > > > > > ___ > ceph-users mailing list > ceph-us...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feedback on supporting libc++
Thanks for testing that Josh. Before cleaning up this patch set, I have a few questions. I'm still not clear on how to handle the "std::tr1::shared_ptr < ObjListCtx > ctx;" in librados.hpp. If we change this to ceph::shared_ptr, then we'll also need to some how ship with the translations here: https://github.com/ceph/ceph/blob/port/libc%2B%2B/src/include/memory.h It's also not clear that ceph::shared_ptr should be exposed publically if there is a thought we might start switching out implementations of ceph::shared_ptr via memory.h (e.g. by using boost implementation). On Mon, Dec 30, 2013 at 5:19 PM, Josh Durgin wrote: > On 12/27/2013 03:34 PM, Noah Watkins wrote: >> >> On Wed, Oct 30, 2013 at 2:02 PM, Josh Durgin >> wrote: >>> >>> On 10/29/2013 03:51 PM, Noah Watkins wrote: >>> >>> unsafe to me. Could you check whether you can run 'rados ls' compiled >>> against an old librados, but dynamically loading librados from this >>> branch compiled in c++98 mode? >> >> >> I'm still working on this, but my understanding so far from libc++ >> documentation is that libc++ and libstdc++ are API but not ABI >> compatible, so there shouldn't be an expectation that librados binary >> library built against libstc++ will work if dynamically linked against >> libc++. > > > I meant if it was compiled against libstdc++ both times, I was curious > whether changing std::tr1::shared_ptr to ceph::shared_ptr would result > in any incompatibility. > > I just tried this, and it worked fine (I think because it does not > actually create a new c++ type, but acts like a typedef and just > creates an alias), so I've got no issues with this approach. > > Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shared library symbol visibility
It looks like we may be outgrowing the use of export-symbols-regex and friends to control symbol visibility for published shared libraries. On Linux, ld seems to be quite content linking against hidden symbols, but at least on OSX with Clang it seems the visibility is strictly enforced. For instance, librados exports only the prefix "rados_", but that regex hides everything in the C++ interface. Unfortunately, export-symbols-regex doesn't play nice with C++ name mangling. Large projects that I've been looking at for examples (chromium, v8, Java) seem to use a different approach based on the compiler flag "-fvisibility=hidden" that hides everything by default and uses explicit exporting. These are the basics, and there are variants that work on Windows for DLL's as well with more macro magic. #define CEPH_EXPORT __attribute__((__visibility__("default"))) class CEPH_EXPORT ObjectOperation { public: ObjectOperation(); virtual ~ObjectOperation(); ... There is a sample branch up with this approach at: http://github.com/ceph/ceph port/visibility More info https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html http://gcc.gnu.org/wiki/Visibility - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feedback on supporting libc++
On Wed, Oct 30, 2013 at 2:02 PM, Josh Durgin wrote: > On 10/29/2013 03:51 PM, Noah Watkins wrote: > > unsafe to me. Could you check whether you can run 'rados ls' compiled > against an old librados, but dynamically loading librados from this > branch compiled in c++98 mode? I'm still working on this, but my understanding so far from libc++ documentation is that libc++ and libstdc++ are API but not ABI compatible, so there shouldn't be an expectation that librados binary library built against libstc++ will work if dynamically linked against libc++. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Building Ceph using CMake
On Tue, Dec 17, 2013 at 2:09 PM, Ali Maredia wrote: > > Most of the speedup can be attributed to the fact that libtool is compiling > both PIC and non-PIC versions of every source file. CMake just builds > everything with -fPIC. We don't have an opinion on the matter, but you may > want to consider doing the same with the autotools build. > > Many source files are compiled into several targets causing them to be built > multiple time. With the CMake build I was able to pull them into static > libraries and link them into the targets that needed them. In the latest round of automake clean-ups I know some of the compilation redundancy was removed, but I dont' recall if it was all taken care of. If you happen to know have a list of the redundant stuff you found that would be helpful for improving automake. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: /usr/bin/ld: cannot find -lboost_program_options
Hi Charles, Out of curiousity, do you have a config.log handy? A recent change to configure.ac should have caught the absense of the boost program options library before this step. On Sun, Dec 1, 2013 at 7:08 PM, charles L wrote: > i reinstalled ...libboost-program-options-dev and it fixed the issue. > > Thanks. > >> Date: Mon, 2 Dec 2013 10:18:33 +0800 >> From: liw...@ubuntukylin.com >> To: charlesboy...@hotmail.com >> CC: ceph-devel@vger.kernel.org >> Subject: Re: /usr/bin/ld: cannot find -lboost_program_options >> >> Please install libboost-program-options-dev package before compiling >> for example, for Ubuntu, >> sudo apt-get install libboost-program-options-dev >> >> On 12/02/2013 09:57 AM, charles L wrote: >>> Pls can some1 help? Im compiling ceph...i did the make -j2 command and got >>> this "cannot find -lboost_program_options" many times ..so i tried to run >>> make in verbose mode...and got this... >>> >>> root@ubuntuserver:/home/ceph# V=1 make >>> Making all in . >>> make[1]: Entering directory `/home/ceph' >>> make[1]: Nothing to be done for `all-am'. >>> make[1]: Leaving directory `/home/ceph' >>> Making all in src >>> make[1]: Entering directory `/home/ceph/src' >>> make all-recursive >>> make[2]: Entering directory `/home/ceph/src' >>> Making all in ocf >>> make[3]: Entering directory `/home/ceph/src/ocf' >>> make[3]: Nothing to be done for `all'. >>> make[3]: Leaving directory `/home/ceph/src/ocf' >>> Making all in java >>> make[3]: Entering directory `/home/ceph/src/java' >>> make all-am >>> make[4]: Entering directory `/home/ceph/src/java' >>> make[4]: Nothing to be done for `all-am'. >>> make[4]: Leaving directory `/home/ceph/src/java' >>> make[3]: Leaving directory `/home/ceph/src/java' >>> make[3]: Entering directory `/home/ceph/src' >>> ./check_version ./.git_version >>> ./.git_version is up to date. >>> /bin/bash ../libtool --tag=CXX --mode=link g++ -Wall -Wtype-limits >>> -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security >>> -fno-strict-aliasing -fsigned-char -rdynamic -Wnon-virtual-dtor >>> -Wno-invalid-offsetof -fno-builtin-malloc -fno-builtin-calloc >>> -fno-builtin-realloc -fno-builtin-free -Wstrict-null-sentinel -g >>> -Wl,--as-needed -latomic_ops -o ceph_filestore_tool >>> tools/ceph-filestore-tool.o libosd.la libosdc.la libos.la -laio -lleveldb >>> -lsnappy libperfglue.la -ltcmalloc libos.la -laio -lleveldb -lsnappy >>> libglobal.la -lpthread -lm -lcrypto++ -luuid -lm -lkeyutils -lrt >>> -lboost_program_options -ldl -lboost_thread -lboost_system -lleveldb >>> -lsnappy >>> libtool: link: g++ -Wall -Wtype-limits -Wignored-qualifiers -Winit-self >>> -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char >>> -rdynamic -Wnon-virtual-dtor -Wno-invalid-offsetof -fno-builtin-malloc >>> -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free >>> -Wstrict-null-sentinel -g -Wl,--as-needed -o ceph_filestore_tool >>> tools/ceph-filestore-tool.o /usr/lib/libatomic_ops.a ./.libs/libosd.a >>> ./.libs/libosdc.a ./.libs/libperfglue.a -ltcmalloc ./.libs/libos.a -laio >>> ./.libs/libglobal.a -lpthread -lcrypto++ -luuid -lm -lkeyutils -lrt >>> -lboost_program_options -ldl -lboost_thread -lboost_system -lleveldb >>> -lsnappy >>> /usr/bin/ld: cannot find -lboost_program_options >>> collect2: error: ld returned 1 exit status >>> make[3]: *** [ceph_filestore_tool] Error 1 >>> make[3]: Leaving directory `/home/ceph/src' >>> make[2]: *** [all-recursive] Error 1 >>> make[2]: Leaving directory `/home/ceph/src' >>> make[1]: *** [all] Error 2 >>> make[1]: Leaving directory `/home/ceph/src' >>> make: *** [all-recursive] Error 1 -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>-- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [May be a bug?]Cannot umount cephfs after all mons is stopped
Did you try `umount -f`? I wouldn't say that is 'clean', but might avoid a reboot. It would seem there isn't much else that can be done if there is dirty data and no cluster to flush it to. This also looks relevant: http://tracker.ceph.com/issues/206 On Thu, Nov 28, 2013 at 9:51 PM, Ketor D wrote: > Hi Sage: >We are testing cephfs mouting by kernel 3.12. And we meet a > situation that we cannot umount cephfs and also cannot reboot the > client machine. Only hard reset can restart the client machine. > Here is the flow: > 1) Create a ceph cluster with 3 mons, 2 mds (active-standby), 2 osds. > 2) Mount the cephfs with linux kernel 3.12. > 3) Using service ceph stop mon.x to stop all mons. > 4) Then we cannot umount the cephfs. The command lsof and > fuser cannot return,and even if we use "umount -l [mount_point]", we > only see the mount_point disappeared from /etc/mtab, but we still > cannot soft reboot the client machine. > > If we get in this situation, we can only hard reset the client > machine to avoid this. > So the problem is if there is some method can gracefully > umount the cephfs and dont make the client machine cant rebooted? > > Regards! > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Building Ceph using CMake
On Nov 26, 2013, at 2:06 PM, Ali Maredia wrote: > Hi all, > > I'm a student working on a project to make ceph build faster and to help with > efforts to port ceph to other platforms using cmake. CMake is awesome. Also, you might be interested in checking out the portability work going on at github.com/ceph/ceph wip-port. -Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libuuid vs boost uuid
I put up a patch here for review https://github.com/ceph/ceph/pull/875/files It seems ok as long as boost doesn’t ever try to change their internal representation, which in this patch we reach in an grab for the 16 octet representation. Why not just grab a copy of libuuid from util-linux and keep it in tree? On Nov 25, 2013, at 9:52 PM, James Harper wrote: >> >> James, >> >> I'm using uuid.begin()/end() to grab the 16-byte representation of the UUID. >> Did you figure out how to populate a boost::uuid_t from the bytes? In >> particular, I'm referring to FileJournal::decode. >> >> Actually, I suppose that any Ceph usage of the 16-byte representation should >> be replaced using the Boost serialization of uuid_t? >> > > As I said I haven't actually tested it, apart from that I have librbd working > under Windows now ("rbd ls" and "rbd export" both work but I don't know if > they actually do anything with uuid's...) > > My patch to MStatfsReply.h to make it compile is: > > diff --git a/src/messages/MStatfsReply.h b/src/messages/MStatfsReply.h > index 8ceec9c..40a5bdd 100644 > --- a/src/messages/MStatfsReply.h > +++ b/src/messages/MStatfsReply.h > @@ -22,7 +22,7 @@ public: > > MStatfsReply() : Message(CEPH_MSG_STATFS_REPLY) {} > MStatfsReply(uuid_d &f, tid_t t, epoch_t epoch) : > Message(CEPH_MSG_STATFS_REPLY) { > -memcpy(&h.fsid, f.uuid, sizeof(h.fsid)); > +memcpy(&h.fsid, &f.uuid, sizeof(h.fsid)); > header.tid = t; > h.version = epoch; > } > > So assuming this actually works, the uuid bytes are accessible as per the > above. > > James > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libuuid vs boost uuid
James, I’m using uuid.begin()/end() to grab the 16-byte representation of the UUID. Did you figure out how to populate a boost::uuid_t from the bytes? In particular, I’m referring to FileJournal::decode. Actually, I suppose that any Ceph usage of the 16-byte representation should be replaced using the Boost serialization of uuid_t? Thanks, -Noah On Nov 13, 2013, at 2:33 PM, James Harper wrote: > Patch follows. When I wrote it I was just thinking it would be used for win32 > build, hence the #ifdef. As I said before, it compiles but I haven't tested > it. I can clean it up a bit and resend it with a signed-off-by if anyone > wants to pick it up and follow it through sooner than I can. I don't know how > boost behaves if the uuid parse fails (exception maybe?) so that would need > resolving too. > > In addition, a bunch of ceph files include the libuuid header directly, even > though all the ones I've found don't appear to need it, so they need to be > fixed for a clean compile under win32, and to remove dependency on libuuid. > There may also be other cases that need work, in particular anything that > memcpy's into the 16 byte uuid directly. See patch for MStatfsReply.h where a > minor tweak was necessary. > > (if anyone is interested, I have librados and librbd compiling under mingw32, > but I can't get boost to build its thread library so I don't get a clean > link, and there are probably other link errors too. I've run out of time for > doing much more on this for the moment) > > James > > diff --git a/src/include/uuid.h b/src/include/uuid.h > index 942b807..201ac76 100644 > --- a/src/include/uuid.h > +++ b/src/include/uuid.h > @@ -8,6 +8,70 @@ > #include "encoding.h" > #include > > +#if defined(_WIN32) > + > +#include > +#include > +#include > +#include > +#include > + > +struct uuid_d { > + boost::uuids::uuid uuid; > + > + uuid_d() { > +uuid = boost::uuids::nil_uuid(); > + } > + > + bool is_zero() const { > +return uuid.is_nil(); > +//return boost::uuids::uuid::is_nil(uuid); > + } > + > + void generate_random() { > +boost::uuids::random_generator gen; > +uuid = gen(); > + } > + > + bool parse(const char *s) { > +boost::uuids::string_generator gen; > +uuid = gen(s); > +return true; > +// what happens if parse fails? > + } > + void print(char *s) { > +std::string str = boost::lexical_cast(uuid); > +memcpy(s, str.c_str(), 37); > + } > + > + void encode(bufferlist& bl) const { > +::encode_raw(uuid, bl); > + } > + void decode(bufferlist::iterator& p) const { > +::decode_raw(uuid, p); > + } > + > + uuid_d& operator=(const uuid_d& r) { > +uuid = r.uuid; > +return *this; > + } > +}; > +WRITE_CLASS_ENCODER(uuid_d) > + > +inline std::ostream& operator<<(std::ostream& out, const uuid_d& u) { > + //char b[37]; > diff --git a/src/include/uuid.h b/src/include/uuid.h > index 942b807..201ac76 100644 > --- a/src/include/uuid.h > +++ b/src/include/uuid.h > @@ -8,6 +8,70 @@ > #include "encoding.h" > #include > > +#if defined(_WIN32) > + > +#include > +#include > +#include > +#include > +#include > + > +struct uuid_d { > + boost::uuids::uuid uuid; > + > + uuid_d() { > +uuid = boost::uuids::nil_uuid(); > + } > + > + bool is_zero() const { > +return uuid.is_nil(); > +//return boost::uuids::uuid::is_nil(uuid); > + } > + > + void generate_random() { > +boost::uuids::random_generator gen; > +uuid = gen(); > + } > + > + bool parse(const char *s) { > +boost::uuids::string_generator gen; > +uuid = gen(s); > +return true; > +// what happens if parse fails? > + } > + void print(char *s) { > +std::string str = boost::lexical_cast(uuid); > +memcpy(s, str.c_str(), 37); > + } > + > + void encode(bufferlist& bl) const { > +::encode_raw(uuid, bl); > + } > + void decode(bufferlist::iterator& p) const { > +::decode_raw(uuid, p); > + } > + > + uuid_d& operator=(const uuid_d& r) { > +uuid = r.uuid; > +return *this; > + } > +}; > +WRITE_CLASS_ENCODER(uuid_d) > + > +inline std::ostream& operator<<(std::ostream& out, const uuid_d& u) { > + //char b[37]; > + //uuid_unparse(u.uuid, b); > + return out << u.uuid; > +} > + > +inline bool operator==(const uuid_d& l, const uuid_d& r) { > + return l.uuid == r.uuid; > +} > + > +inline bool operator!=(const uuid_d& l, const uuid_d& r) { > + return l.uuid != r.uuid; > +} > +#else > extern "C" { > #include > #include > @@ -56,6 +120,6 @@ inline bool operator==(const uuid_d& l, const uuid_d& r) { > inline bool operator!=(const uuid_d& l, const uuid_d& r) { > return uuid_compare(l.uuid, r.uuid) != 0; > } > - > +#endif > > #endif > diff --git a/src/messages/MStatfsReply.h b/src/messages/MStatfsReply.h > index 8ceec9c..40a5bdd 100644 > --- a/src/messages/MStatfsReply.h > +++ b/src/messages/MStatfsReply.h > @@ -22,7 +22,7 @@ public: > > MStatfsReply() : Message(CEPH_MSG_STATFS_REPLY) {} > MStatfsReply(u
RFC: object operation instruction set
The ObjectOperation interface in librados is great for performing compound atomic operations. However, it doesn’t seem to be capable of expressing more complex flows. Consider the following set of operations that I one might want to run atomically to optionally initialize an xattr: int ret = getxattr(“foo”) if (ret < 0 && ret != -ENODATA) return ret; if (ret == -ENODATA) /* do some initialization */ else /* do something else */ As it stands, one would need to build a cls_xyz to do this. Alternatively, something like cls_lua could be used, but there is a lot of downsides to that. However, after building several cls_xyz modules, it is clear that the majority of the time is spent doing basic logic, moving some data around, and occasionally doing something like incrementing some counters. I’ve put a prototype solution up in github `github.com/ceph/ceph.git obj_op_virt_machine` that adds control instructions into the ObjectOperation interface. Using the interface I can express the above logic as follows: ObjectReadOperation op; // foo doesn’t exist. the return value // will be placed into a named register “ret" op.getxattr(“foo”, bl, NULL); // jump to label if “ret” register >= 0 op.ois_jge(“ret”, 0, “has_attr”); // jump to label if “ret” register == 0 op.ois_jeq(“ret”, -ENODATA, “no_attr”); // fall through to return any error in the // “ret” register. returns immediately op.ois_ret(“ret”); // define a label target op.ois_label(“has_attr”); /* … do some stuff … */ op.ois_ret(0); // defines a label target op.ois_label(“no_attr”); /* … do initialization … */ op.ois_ret(0); ioctx.operate(“obj”, &op); Using only a few instructions, we can get pretty good building blocks. Adding a few to examine some data in primitive ways would add another level of usefulness, too (e.g. atomic counter increment). And, this can also be made safe by ensuring that jumps are always forward, removing any problems like infinite loops. - Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libuuid vs boost uuid
Oh ok, no rush. Just wanted to know if you were still hacking on it. Thanks! On Nov 13, 2013, at 1:42 PM, James Harper wrote: >> Hi James, >> >> I just wanted to follow up on this thread. I'd like to bring this patch into >> the >> wip-port portability branch. Were you able to get the boost::uuid to work as >> a drop-in replacement? >> > > I have it compiling but haven't tested. I'll send through what I have. > > James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libuuid vs boost uuid
Hi James, I just wanted to follow up on this thread. I’d like to bring this patch into the wip-port portability branch. Were you able to get the boost::uuid to work as a drop-in replacement? Thanks, Noah On Nov 9, 2013, at 9:22 PM, Sage Weil wrote: > On Sun, 10 Nov 2013, James Harper wrote: >>> >>> On Sat, 9 Nov 2013, James Harper wrote: Just out of curiosity (recent thread about windows port) I just had a quick go at compiling librados under mingw (win32 cross compile), and one of the errors that popped up was the lack of libuuid under mingw. Ceph appears to use libuuid, but I notice boost appears to include a uuid class too, and it seems that ceph already uses some of boost (which already builds under mingw). Is there anything special about libuuid that would mean boost's uuid class couldn't replace it? And would it be better to still use ceph's uuid.h as a wrapper around the boost uuid class, or to modify ceph to use the boost uuid class directly? >>> >>> Nice! Boost uuid looks like it would work just fine. It is probably >>> easier and less disruptive to use it from within the existing class in >>> include/uuid.h. >>> >> >> That seems to work (the header compiles at least), but then it falls >> down when things try to memcpy out of it. In particular, an fsid appears >> to be a char[16]. Is that a uuid? And is keeping it as a byte array an >> optimisation? > > Probably just being lazy; where was that? Feel free to replace the memcpy > with methods to copy in/out if it's necessary... > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] how to use rados_exec
The cls_crypto.cc file in src/ hasn't been included in the Ceph compilation for a long time. Take a look at src/cls/* for a list of modules that are compiled. In particular, there is a "Hello World" example that is nice. These should work for you out-of-the-box. You could also try to compile cls_crypto.cc (follow the basic structure of src/cls/Makefile.am). -Noah On Tue, Nov 12, 2013 at 1:05 AM, 鹏 wrote: > Hi all! >long time no see! >I want to use the function rados_exec, and I found the class > cls_crypto.cc in the code source of ceph; > so I run the funtion like this: > >rados_exec(ioctx, "foo_object", "crypto" , "md5", buf, > sizeof(buf),buf2, sizeof(buf2) ) > > ant the function return operation not support! > > I check the source of ceph , and find that cls_crypto.cc is not > build。how can I bulid the class and run it! > > > > ___ > ceph-users mailing list > ceph-us...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libuuid vs boost uuid
Alan, would this fix the problem on FreeBSD? IIRC llibuuid is a terrible terrible headache, with the recommended approach, being to upstream changes to e2fsprogs-libuuid. On Fri, Nov 8, 2013 at 10:43 PM, Sage Weil wrote: > On Sat, 9 Nov 2013, James Harper wrote: >> Just out of curiosity (recent thread about windows port) I just had a >> quick go at compiling librados under mingw (win32 cross compile), and >> one of the errors that popped up was the lack of libuuid under mingw. >> Ceph appears to use libuuid, but I notice boost appears to include a >> uuid class too, and it seems that ceph already uses some of boost (which >> already builds under mingw). >> >> Is there anything special about libuuid that would mean boost's uuid >> class couldn't replace it? And would it be better to still use ceph's >> uuid.h as a wrapper around the boost uuid class, or to modify ceph to >> use the boost uuid class directly? > > Nice! Boost uuid looks like it would work just fine. It is probably > easier and less disruptive to use it from within the existing class in > include/uuid.h. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: portability issue with gmtime method in utime_t
Hi James, I think my vote would be to rename `ostream& gmtime(ostream& out) const` to something like `write_gmtime`, but the global namespace approach seems OK too. Maybe someone else has a strong opinion. --- sort of related: while we are on the subject of utime_t, `timegm` isn't portable. This is the hack I'm using in `wip-port`, but I don't think it should stay this way: diff --git a/src/include/utime.h b/src/include/utime.h index 5bebc70..1a74a85 100644 --- a/src/include/utime.h +++ b/src/include/utime.h @@ -238,6 +238,22 @@ class utime_t { bdt.tm_hour, bdt.tm_min, bdt.tm_sec, usec()); } + static time_t my_timegm (struct tm *tm) { +time_t ret; +char *tz; + +tz = getenv("TZ"); +setenv("TZ", "", 1); +tzset(); +ret = mktime(tm); +if (tz) + setenv("TZ", tz, 1); +else + unsetenv("TZ"); +tzset(); +return ret; + } + static int parse_date(const string& date, uint64_t *epoch, uint64_t *nsec, string *out_date=NULL, string *out_time=NULL) { struct tm tm; @@ -274,7 +290,7 @@ class utime_t { } else { return -EINVAL; } -time_t t = timegm(&tm); +time_t t = my_timegm(&tm); if (epoch) *epoch = (uint64_t)t; On Sat, Nov 9, 2013 at 12:24 AM, James Harper wrote: > utime.h defines a utime_t class with a gmtime() method, and also calls the > library function gmtime_r(). > > mingw implements gmtime_r() as a macro in pthread.h that in turn calls > gmtime(), and gcc bails because it gets confused about which is being called: > > utime.h: In member function 'utime_t utime_t::round_to_minute()': > utime.h:113:5: error: no matching function for call to > 'utime_t::gmtime(time_t*)' > utime.h:113:5: note: candidate is: > utime.h:146:12: note: std::ostream& utime_t::gmtime(std::ostream&) const > utime.h:146:12: note: no known conversion for argument 1 from 'time_t* {aka > long long int*}' to 'std::ostream& {aka std::basic_ostream&}' > > Same for asctime and localtime. I can work around it by creating a static > method that in turn calls ::gmtime() etc, but I'm not sure that's the best > way to do it. > > There's a bunch of other build errors in there too so it may be a lost > cause... > > James > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: emperor leftovers
On Nov 7, 2013, at 5:47 PM, Matt W. Benjamin wrote: > MSVC is the default windows env. It's probably the ideal, despite most > requirement for moving furthest towards the windows mindset. It has better > open source tool support than you might expect. Cool, thanks for the clarification. This might be a good reason for the source code reorganization blueprint, assuming part of its goals are to be able to build Ceph in components. Being able to work on just porting, say, librados, would be nice/easier. http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Source_tree_restructuring > > Matt > > - "Noah Watkins" wrote: > >> Oh, my ignorance of Windows development is enormous :) So there are >> cygwin, mingw, and msvc. And mingw “more” native than cygwin, but >> doesn’t try to do posix, and msvc just the default/native windows >> development env? >> >> On Nov 7, 2013, at 5:34 PM, Matt W. Benjamin >> wrote: >> >>> Or, MSVC, frankly. >>> >>> - "Matt W. Benjamin" wrote: >>> >>>> Yes. But you may wish to think about mingwXX porting rather than >>>> Cygwin, >>>> if you prefer native results. >>>> >>>> Matt >>>> >>>> - "Noah Watkins" wrote: >>>> >>>>> On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil >> wrote: >>>>> >>>>>> curious if the discussion on windows portability is relevant >> here >>>> or >>>>> if >>>>>> it's better treated as a separate but related effort. >>>>> >>>>> The kernel space talk that's been tossed around probably isn't >>>>> relevant, but I'd be nice to learn about cygwin porting if anyone >>>> has >>>>> knowledge in this area. >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe >>>> ceph-devel" >>>>> in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at >> http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> Matt Benjamin >>>> The Linux Box >>>> 206 South Fifth Ave. Suite 150 >>>> Ann Arbor, MI 48104 >>>> >>>> http://linuxbox.com >>>> >>>> tel. 734-761-4689 >>>> fax. 734-769-8938 >>>> cel. 734-216-5309 >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" >>>> in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> Matt Benjamin >>> The Linux Box >>> 206 South Fifth Ave. Suite 150 >>> Ann Arbor, MI 48104 >>> >>> http://linuxbox.com >>> >>> tel. 734-761-4689 >>> fax. 734-769-8938 >>> cel. 734-216-5309 > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: emperor leftovers
Oh, my ignorance of Windows development is enormous :) So there are cygwin, mingw, and msvc. And mingw “more” native than cygwin, but doesn’t try to do posix, and msvc just the default/native windows development env? On Nov 7, 2013, at 5:34 PM, Matt W. Benjamin wrote: > Or, MSVC, frankly. > > - "Matt W. Benjamin" wrote: > >> Yes. But you may wish to think about mingwXX porting rather than >> Cygwin, >> if you prefer native results. >> >> Matt >> >> - "Noah Watkins" wrote: >> >>> On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil wrote: >>> >>>> curious if the discussion on windows portability is relevant here >> or >>> if >>>> it's better treated as a separate but related effort. >>> >>> The kernel space talk that's been tossed around probably isn't >>> relevant, but I'd be nice to learn about cygwin porting if anyone >> has >>> knowledge in this area. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" >>> in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Matt Benjamin >> The Linux Box >> 206 South Fifth Ave. Suite 150 >> Ann Arbor, MI 48104 >> >> http://linuxbox.com >> >> tel. 734-761-4689 >> fax. 734-769-8938 >> cel. 734-216-5309 >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: emperor leftovers
On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil wrote: > curious if the discussion on windows portability is relevant here or if > it's better treated as a separate but related effort. The kernel space talk that's been tossed around probably isn't relevant, but I'd be nice to learn about cygwin porting if anyone has knowledge in this area. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to compile
This pull request (https://github.com/ceph/ceph/pull/812) reverts the patch that changed the struct initialization. That is C99 style, but apparently it isn't common in compilers, and GNU has just added it as an extension. I don't have a better solution at the moment, but as Greg mentioned, perhaps detecting C++11 is an option, but that means a big nasty ifdef. Another solution might be to just put the struct initialization in a C file. On Sun, Nov 3, 2013 at 2:33 PM, Xing Lin wrote: > Thanks, Noah! > > Xing > > On 11/3/2013 3:17 PM, Noah Watkins wrote: >> >> Thanks for looking at this. Unless there is a good solution I think >> reverting it is ok as breaking the compile on a few platflorms is not ok. >> Ill be lookong at this tonight. > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
feedback on supporting libc++
Out of the box on OSX Mavericks libc++ [1] is being used as opposed to libstdc++. One of the issues is that stuff from tr1 isn't available (e.g. std::tr1::shared_ptr), as they have moved to std in c++11. I'm looking for any feedback on this patch set, or if there is a better way forward. A set of patches on ceph.git:wip-libc++ [2] adds initial support (with a couple temporary hacks). These patches are very similar to the method used to support libc++ in mongodb. Summary of changes: std::tr1::shared/weak_ptr maps to ceph::shared/weak_ptr hash_map/set maps to ceph::unordered_map/set which will choose tr1::unordered_map/set over ext/hash_map/set. [1] http://libcxx.llvm.org/ [2] https://github.com/ceph/ceph/compare/wip-libc%2B%2B -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Paxos vs Raft
I'm curious about what exactly the consensus requirement and assumptions are for the monitors. For instance, in the discussion between Loic and Joao, this statement: Joao: : the recovery logic in our implementation tries to aleviate the burden of recovering multiple versions at the same time. We propose a version, let the peons accept it, then move on to the next version. On ceph, we only provide one value at a time. seems to indicate that the leader is proposing changes sequentially. However, that makes Ceph's use of paxos sound a lot like the reason for the development of the Zab protocol used in Zookeeper: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos Either way, as a testament to its understandability, or maybe just its cool factor, there are a lot of Raft reference implementations listed on this page: https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin On Fri, Sep 13, 2013 at 11:39 PM, Loic Dachary wrote: > Hi, > > Ceph ( http://ceph.com/ ) relies on a custom implementation of Paxos to > provide exabyte scale distributed storage. Like most people recently exposed > to Paxos, I struggle to understand it ... but will keep studying until I get > it :-) When a friend mentionned Raft ( > http://en.wikipedia.org/wiki/Raft_%28computer_science%29 ), it looked like an > easy way out. But it's very recent and I would very much appreciate your > opinion. Do you think it is a viable alternative to Paxos ? > > Cheers > > -- > Loïc Dachary, Artisan Logiciel Libre > All that is necessary for the triumph of evil is that good people do nothing. > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subdir-objects
I'm so excited to have a refactored automake setup :) I was just looking through build-refactor, and it doesn't really look like there is much that could be reused for the non-recrusive approach. I'll leave it up for a few days, just in case. On Sat, Sep 7, 2013 at 1:11 PM, Roald van Loon wrote: > On Sat, Sep 7, 2013 at 7:47 PM, Noah Watkins wrote: >> Oh, and one question about the non-recursive approach. If I stick a >> Makefile.am in the test/ directory I can do things like: >> >> LDADD = all-the-test-dependencies >> >> and then avoid redundant per-target primaries like test_LDADD = (deps) >> $(LDADD), because it applies to everything in the Makefile. >> >> Is that possible with the include approach, or would a naked LDADD in >> an included Makefile fragment affect all the targets in the file >> including it? > > LDADD = xyz would indeed affect all targets. However, it's not > something you want to do anyway; using an _LDADD at a target is less > confusing and less prone to errors because you know exactly what > libraries a target needs. > > For instance, in test/Makefile.am you can have a debug target > depending on libglobal, which has dependencies set by libglobal > itself; > > CEPH_GLOBAL = $(LIBGLOBAL) $(PTHREAD_LIBS) -lm $(CRYPTO_LIBS) $(EXTRALIBS) > > And then in test/Makefile.am; > > ceph_test_crypto_SOURCES = test/testcrypto.cc > ceph_test_crypto_LDADD = $(CEPH_GLOBAL) > bin_DEBUGPROGRAMS += ceph_test_crypto > > And a unittest also depending on libosd; > > unittest_pglog_SOURCES = test/osd/TestPGLog.cc > unittest_pglog_LDADD = $(LIBOSD) $(CEPH_GLOBAL) > check_PROGRAMS += unittest_pglog > > However, libosd requires libos and libosdc, but that dependency is set > by libosd; > > LIBOSD += $(LIBOSDC) $(LIBOS) > > This way, you have the dependencies in the right place. With recusive > builds you'll need an "LDADD = libosd.la libosdc.la libos.la > libglobal.la $(PTHREAD_LIBS) -lm $(CRYPTO_LIBS) $(EXTRALIBS)", so > basically you're setting the dependencies of the required libraries in > the makefile requiring those libraries, which is IMHO way to complex. > >>> I think the benefits of using recursive builds are that it may be >>> familiar to the most people, it reflects the methods/suggestions in >>> the automake manual, and, most importantly, it would seem that its use >>> forces good decomposition where as a non-recursive approach relies on >>> guidelines that are easily broken. > > I don't know how which method is more familiar, but I personally think > that anyone understanding recursive automake is capable of > understanding a simple include :-) > > The decomposition is a valid argument. I think that there are some > libraries which might benefit from complete separation, like librados > and a "libceph_client" or something like it. Those can be separated, > but most other can't. The mon, mds, os, and osd subdirs have > inter-dependencies for instance. > > We might need to restructure the source tree anyway because at some > points it has grown messy (for instance, libcommon including stuff > from mds, mon but also from include). However, I think implementing a > recursive automake right now forces us to do two things at once; > cleanup the makefiles and do some restructuring in the subdirs. I > personally think it's best to start with cleaning up makefiles and use > an include per subdir, so we can restructure the subdirs into > segregated libraries later on. > > So I all boils down to; what to do first :-) Because I agree some > things are better of with recursive builds, but it might be wise not > to do that before we have revisited the source tree layout. > > Roald -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subdir-objects
Oh, and one question about the non-recursive approach. If I stick a Makefile.am in the test/ directory I can do things like: LDADD = all-the-test-dependencies and then avoid redundant per-target primaries like test_LDADD = (deps) $(LDADD), because it applies to everything in the Makefile. Is that possible with the include approach, or would a naked LDADD in an included Makefile fragment affect all the targets in the file including it? -Noah On Sat, Sep 7, 2013 at 10:38 AM, Noah Watkins wrote: > The non-recursive approach is interesting. I just had a quick look in > the tree I despise building the most, openmpi. It has 414 Makefile.am, > and uses recursive builds. The rebuild definitely takes a while to > visit all the sub-dirs, and is pretty annoying when my patience is low > :) > > And there is definitely a big +1 for avoiding the SUBDIRS > synchronization that slows down parallel make. > > I think the benefits of using recursive builds are that it may be > familiar to the most people, it reflects the methods/suggestions in > the automake manual, and, most importantly, it would seem that its use > forces good decomposition where as a non-recursive approach relies on > guidelines that are easily broken. > > Given that the Ceph tree is relatively small (certinaly in comparison > to the 414 directory openmpi monster), are there benefits to the > non-recursive approach that are not performance related? > > - Noah > > On Sat, Sep 7, 2013 at 1:52 AM, Roald van Loon wrote: >> Hi Noah, >> >> I just had a quick look at your build-refactor branch, and I think the >> greatest difference is that you use recursive builds and I don't. I'm >> more in favor of non-recursive builds using includes for a number of >> reasons. I think the most important reasons for me are; >> >> 1) recursive make leads to repetitive AM code >> 2) recursive make takes much more time to compile (as each directory >> needs to run configure and probably most important: you loose optimal >> -jX usage due to serialization) >> 3) non-recursive make knows all deps so rebuildings is much quicker >> (it only compiles/links what is required instead of entering all >> subdirs) >> >> There is ÏMHO one good reason to use recursive build, and that is >> separation of AM code. However, that can be easily achieved with >> includes and subdir-objects. >> >> I think this is the most important difference between your and my >> approach, and I like to hear your arguments for recursive builds so we >> can agree on recursive vs non-recursive make. Then I think it would be >> great to combine work! >> >> Roald >> >> On Fri, Sep 6, 2013 at 7:27 PM, Noah Watkins >> wrote: >>> Hi Roald, >>> >>> Sage just pointed me at your wip-automake branch. I also just pushed >>> up a branch, make-refactor, that I was hacking on a bit. Not sure how >>> much overlap there is, or if my approach is bogus, but I thought I'd >>> point it out to see if there is anything that can be combined :) >>> >>> -Noah >>> >>> On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon >>> wrote: >>>> On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil wrote: >>>>> Yes, the Makefile.am is in dire need of from TLC from someone who knows a >>>>> bit of autotools-fu. It is only this way because in the beginning I >>>>> didn't know any better. >>>> >>>> Well, my average knowledge of autotools could at least fix this >>>> particular issue and clean up a bit more. It's a start I guess and >>>> helps me to continue my RGW things. >>>> >>>> I'll send out a pull request when I've found some time to implement >>>> and test this. >>>> >>>> Roald -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subdir-objects
The non-recursive approach is interesting. I just had a quick look in the tree I despise building the most, openmpi. It has 414 Makefile.am, and uses recursive builds. The rebuild definitely takes a while to visit all the sub-dirs, and is pretty annoying when my patience is low :) And there is definitely a big +1 for avoiding the SUBDIRS synchronization that slows down parallel make. I think the benefits of using recursive builds are that it may be familiar to the most people, it reflects the methods/suggestions in the automake manual, and, most importantly, it would seem that its use forces good decomposition where as a non-recursive approach relies on guidelines that are easily broken. Given that the Ceph tree is relatively small (certinaly in comparison to the 414 directory openmpi monster), are there benefits to the non-recursive approach that are not performance related? - Noah On Sat, Sep 7, 2013 at 1:52 AM, Roald van Loon wrote: > Hi Noah, > > I just had a quick look at your build-refactor branch, and I think the > greatest difference is that you use recursive builds and I don't. I'm > more in favor of non-recursive builds using includes for a number of > reasons. I think the most important reasons for me are; > > 1) recursive make leads to repetitive AM code > 2) recursive make takes much more time to compile (as each directory > needs to run configure and probably most important: you loose optimal > -jX usage due to serialization) > 3) non-recursive make knows all deps so rebuildings is much quicker > (it only compiles/links what is required instead of entering all > subdirs) > > There is ÏMHO one good reason to use recursive build, and that is > separation of AM code. However, that can be easily achieved with > includes and subdir-objects. > > I think this is the most important difference between your and my > approach, and I like to hear your arguments for recursive builds so we > can agree on recursive vs non-recursive make. Then I think it would be > great to combine work! > > Roald > > On Fri, Sep 6, 2013 at 7:27 PM, Noah Watkins wrote: >> Hi Roald, >> >> Sage just pointed me at your wip-automake branch. I also just pushed >> up a branch, make-refactor, that I was hacking on a bit. Not sure how >> much overlap there is, or if my approach is bogus, but I thought I'd >> point it out to see if there is anything that can be combined :) >> >> -Noah >> >> On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon >> wrote: >>> On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil wrote: >>>> Yes, the Makefile.am is in dire need of from TLC from someone who knows a >>>> bit of autotools-fu. It is only this way because in the beginning I >>>> didn't know any better. >>> >>> Well, my average knowledge of autotools could at least fix this >>> particular issue and clean up a bit more. It's a start I guess and >>> helps me to continue my RGW things. >>> >>> I'll send out a pull request when I've found some time to implement >>> and test this. >>> >>> Roald -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subdir-objects
Hi Roald, Sage just pointed me at your wip-automake branch. I also just pushed up a branch, make-refactor, that I was hacking on a bit. Not sure how much overlap there is, or if my approach is bogus, but I thought I'd point it out to see if there is anything that can be combined :) -Noah On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon wrote: > On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil wrote: >> Yes, the Makefile.am is in dire need of from TLC from someone who knows a >> bit of autotools-fu. It is only this way because in the beginning I >> didn't know any better. > > Well, my average knowledge of autotools could at least fix this > particular issue and clean up a bit more. It's a start I guess and > helps me to continue my RGW things. > > I'll send out a pull request when I've found some time to implement > and test this. > > Roald -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with the RBD Java bindings
On Wed, Aug 21, 2013 at 11:20 PM, Wido den Hollander wrote: > > Yes, seems like a good thing to do. I wasn't sure myself when I was writing > the bindings on how the packaging should be. I'm not entirely sure either. With JNI it's pretty simple, but JNA introduces all sorts of additional classes. Do you know of any large projects using JNA? It'd be nice to find some references to examine. > One of the things I haven't tested thoroughly enough is if you as a user of > the bindings are able to crash the JVM. Since that should never happen. I think that JNA is safe in that it will prevent itself from being used incorrectly, but I don't think anything would prevent someone from say creating a pointer off into space with JNA and then passing it to a external library that would dereference it. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subdir-objects
On Wed, Aug 21, 2013 at 12:45 PM, Roald van Loon wrote: > > from auto-registering the plugins in the RGW core. The only fix for > this is making the RGW core aware of the subdirs/plugins, but I think > that's nasty design. I'd like to have it in my make conf. This patch will turn on the option (which should also fix your problem if I understand correctly?), and should probably be committed anyway as newer versions of autotools will complain loudly about our current Makefile structure. diff --git a/src/Makefile.am b/src/Makefile.am index 93f3331..fb7c9dd 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1,4 +1,4 @@ -AUTOMAKE_OPTIONS = gnu +AUTOMAKE_OPTIONS = gnu subdir-objects SUBDIRS = ocf java DIST_SUBDIRS = gtest ocf libs3 java > So, the question is; is there a reason why we don't use subdir objects? I believe it is just historical, and unfortunately has just been repeated over and over. Ideally I think that there should be a restructuring to place a Makefile.am in every subdirectory. This would address your issue and make it significantly easier to deal with situations where we want to build a subset of Ceph, such as just FUSE and librados, for example. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with the RBD Java bindings
Wido, How would you feel about creating two RbdSnapInfo objects. The first would be something like ceph.rbd.RbdSnapInfo and the second would be ceph.rbd.jna.RbdSnapInfo. The former is what will be exposed through the API, and the later is used only internally. That should address the hacky-ness of my snap listing fix: just create a copy of the SnapInfo into the public struct. it also means we can avoid exposing users to JNA structures. On Wed, Aug 21, 2013 at 5:11 AM, Wido den Hollander wrote: > On 08/20/2013 11:26 PM, Noah Watkins wrote: >> >> Wido, >> >> I pushed up a patch to >> >> >> https://github.com/ceph/rados-java/commit/ca16d82bc5b596620609880e429ec9f4eaa4d5ce >> >> That includes a fix for this problem. The fix is a bit hacky, but the >> tests pass now. I included more details about the hack in the code. >> > > I see. Works like a charm for me now. I'll do some further testing with > CloudStack. > > Wido > >> On Thu, Aug 15, 2013 at 9:57 AM, Noah Watkins >> wrote: >>> >>> On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander >>> wrote: >>>> >>>> >>>> public List snapList() throws RbdException { >>>> IntByReference numSnaps = new IntByReference(16); >>>> PointerByReference snaps = new PointerByReference(); >>>> List list = new ArrayList(); >>>> RbdSnapInfo snapInfo, snapInfos[]; >>>> >>>> while (true) { >>>> int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps); >>> >>> >>> I think you need to allocate the memory for `snaps` yourself. Here is >>> the RBD wrapper for Python which does that: >>> >>>self.snaps = (rbd_snap_info_t * num_snaps.value)() >>>ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps), >>> byref(num_snaps)) >>> >>> - Noah >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > Wido den Hollander > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with the RBD Java bindings
Wido, I pushed up a patch to https://github.com/ceph/rados-java/commit/ca16d82bc5b596620609880e429ec9f4eaa4d5ce That includes a fix for this problem. The fix is a bit hacky, but the tests pass now. I included more details about the hack in the code. On Thu, Aug 15, 2013 at 9:57 AM, Noah Watkins wrote: > On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander wrote: >> >> public List snapList() throws RbdException { >> IntByReference numSnaps = new IntByReference(16); >> PointerByReference snaps = new PointerByReference(); >> List list = new ArrayList(); >> RbdSnapInfo snapInfo, snapInfos[]; >> >> while (true) { >> int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps); > > I think you need to allocate the memory for `snaps` yourself. Here is > the RBD wrapper for Python which does that: > > self.snaps = (rbd_snap_info_t * num_snaps.value)() > ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps), >byref(num_snaps)) > > - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with the RBD Java bindings
On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander wrote: > > public List snapList() throws RbdException { > IntByReference numSnaps = new IntByReference(16); > PointerByReference snaps = new PointerByReference(); > List list = new ArrayList(); > RbdSnapInfo snapInfo, snapInfos[]; > > while (true) { > int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps); I think you need to allocate the memory for `snaps` yourself. Here is the RBD wrapper for Python which does that: self.snaps = (rbd_snap_info_t * num_snaps.value)() ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps), byref(num_snaps)) - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
blueprint follow-up: paper cuts
I couldn't find any info on ownership of items in the CDS 'paper cuts' session. Was that done / documented somewhere? Specifically: Quickstart for librados - cls_helloworld - helloworld_rados - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rados Protocoll
On Fri, Aug 2, 2013 at 1:58 AM, Niklas Goerke wrote: > > As for the documentation you referenced: I didn't find a documentation of > the RADOS Protocol which could be used to base an implementation of librados > upon. Does anything like this exist, or would I need to "translate" the c > implementation? I do not know of any detailed documentation of the protocol except for the source :( -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: blueprint: osd: ceph on zfs
I was thinking along the lines of if it made sense to multi-purpose the BackFileSystem for non-Linux portability. In which case even things like posix_fallocate, xattr access, etc.. might fit in well, which may have equivalent functionality, under a different name. On Sun, Aug 4, 2013 at 5:47 PM, Yan, Zheng wrote: > On Mon, Aug 5, 2013 at 7:39 AM, Noah Watkins wrote: >> It seems to make sense that fiemap should be part of the `class >> BackingFileSystem` abstraction? >> > > FS_IOC_FIEMAP is a standard API, I think no need to implement it in `class > BackingFileSystem`. > > >> On Thu, Jul 25, 2013 at 4:53 PM, Sage Weil wrote: >>> http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_ceph_on_zfs >>> >>> We've done some preliminary testing and xattr debugging that allows >>> ceph-osd to run on zfsforlinux using the normal writeahead journaling mode >>> (the same mode used for xfs and ext4). However, we aren't doing anything >>> special to take advantage of zfs's capabilities. >>> >>> This session would go over what is needed to make parallel journaling work >>> (which would leverage zfs snapshots). I would also like to have a >>> discussion about whether other longer-term possibilities, such as storing >>> objects directly using the DMU, make sense given what ceph-osd's >>> ObjectStore interface really needs. It might also be an appropriate time >>> to visit whether other snapshotting linux filesystems (like nilfs2) would >>> fit well into any generalization of the filestore code that comes out of >>> this effort. >>> >>> If anybody is interested in this, please add yourself to the interested >>> parties section (or claim ownership) of this blueprint! >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: blueprint: osd: ceph on zfs
It seems to make sense that fiemap should be part of the `class BackingFileSystem` abstraction? On Thu, Jul 25, 2013 at 4:53 PM, Sage Weil wrote: > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_ceph_on_zfs > > We've done some preliminary testing and xattr debugging that allows > ceph-osd to run on zfsforlinux using the normal writeahead journaling mode > (the same mode used for xfs and ext4). However, we aren't doing anything > special to take advantage of zfs's capabilities. > > This session would go over what is needed to make parallel journaling work > (which would leverage zfs snapshots). I would also like to have a > discussion about whether other longer-term possibilities, such as storing > objects directly using the DMU, make sense given what ceph-osd's > ObjectStore interface really needs. It might also be an appropriate time > to visit whether other snapshotting linux filesystems (like nilfs2) would > fit well into any generalization of the filestore code that comes out of > this effort. > > If anybody is interested in this, please add yourself to the interested > parties section (or claim ownership) of this blueprint! > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rados Protocoll
Hi Niklas, The RADOS reference implementation in C++ is quite large. Reproducing it all in another language would be interesting, but I'm curious if wrapping the C interface is not an option for you? There are Java bindings that are being worked on here: https://github.com/wido/rados-java. There are links on ceph.com/docs to some information about Ceph, as well as videos on Youtube, and academic papers linked to. -Noah On Thu, Aug 1, 2013 at 1:01 PM, Niklas Goerke wrote: > Hi, > > I was wondering why there is no native Java implementation of librados. I'm > thinking about creating one and I'm thus looking for a documentation of the > RADOS protocol. > Also the way I see it librados implements the crush algorithm. Is there a > documentation for it? > Also an educated guess about whether the RADOS Protocol is due to changes > would be very much appreciated. > > Thank you in advance > > Niklas > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
blueprint: ceph platform portability
http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Increasing_Ceph_portability Recently I've managed to get Ceph built and running on OSX. There was a past effort to get Ceph working on non-Linux platforms, most notably FreeBSD, but that approach introduced a lot of ad-hoc macros that has made it difficult to manage changes needed to support additional platforms. This session would address the areas within Ceph that are currently non-portable, discuss the state of OSX support, and touch on what is needed to factor out platform specific functionality. Changes are roughly grouped into (1) internal critical (e.g. locking) (2) internal non-critical (some optimizations), and (2) exported headers. A significant amount of the OSX changes have been introduced as feature tests with generic alternatives, and as such the tree may already be near building on additional platforms, so it would be great to find people that would be willing to test on additional platforms. If you are interested please add yourself as an interested party or owner of this blueprint :) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: __bitwise__ annotation in inttypes
Nevermind :) I see that is for freebsd.. On Fri, Jul 12, 2013 at 7:32 PM, Noah Watkins wrote: > The following is in include/inttypes.h > > #define __bitwise__ > > typedef __u16 __bitwise__ __le16; > typedef __u16 __bitwise__ __be16; > ... > > In linux, the same definition of __bitwise__ is used when not being > run through Sparse. > > Is there any purpose in Ceph for _always_ removing those annotations, > as is done here? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
__bitwise__ annotation in inttypes
The following is in include/inttypes.h #define __bitwise__ typedef __u16 __bitwise__ __le16; typedef __u16 __bitwise__ __be16; ... In linux, the same definition of __bitwise__ is used when not being run through Sparse. Is there any purpose in Ceph for _always_ removing those annotations, as is done here? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assertion failure in update_from_paxos
It appears to be resolved in master now. On Tue, Jul 9, 2013 at 12:43 PM, Joao Eduardo Luis wrote: > On 07/09/2013 04:37 PM, Noah Watkins wrote: >> >> I'm getting the following failure when running a vstart instance with >> 1 of each daemon. > > > I can confirm this happens, as it just happened to me as well. > > My guess is that this is something Sage may have fixed last night, but will > have to check. > > -Joao > >> >> -- >> >> 0> 2013-07-09 08:30:43.213345 7fdc289e97c0 -1 mon/OSDMonitor.cc: In >> function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread >> 7fdc289e97c0 time 2013-07-09 08:30:43.207686 >> mon/OSDMonitor.cc: 129: FAILED assert(latest_bl.length() != 0) >> >> ceph version 0.65-307-g0e93dd9 >> (0e93dd93e5439fb82c416cb8eec7f36598ae7b48) >> 1: (OSDMonitor::update_from_paxos(bool*)+0x16bd) [0x5ad5ed] >> 2: (PaxosService::refresh(bool*)+0x143) [0x592963] >> 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x536c87] >> 4: (Paxos::finish_proposal()+0x3a) [0x58c13a] >> 5: (Paxos::begin(ceph::buffer::list&)+0x82c) [0x58bbac] >> 6: (Paxos::propose_queued()+0xd9) [0x58be79] >> 7: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x140) >> [0x58d230] >> 8: (PaxosService::propose_pending()+0x6c8) [0x593c28] >> 9: (PaxosService::_active()+0x58b) [0x5956fb] >> 10: (PaxosService::election_finished()+0x328) [0x595c38] >> 11: (Monitor::win_election(unsigned int, std::set> std::less, std::allocator >&, unsigned long)+0x2e5) >> [0x55ae05] >> 12: (Monitor::win_standalone_election()+0x197) [0x55b027] >> 13: (Monitor::bootstrap()+0x84b) [0x55b8fb] >> 14: (Monitor::init()+0xad) [0x55bbcd] >> 15: (main()+0x1ac7) [0x5296b7] >> 16: (__libc_start_main()+0xf5) [0x3f2dc21b75] >> 17: ./ceph-mon() [0x52e259] >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > Joao Eduardo Luis > Software Engineer | http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
assertion failure in update_from_paxos
I'm getting the following failure when running a vstart instance with 1 of each daemon. -- 0> 2013-07-09 08:30:43.213345 7fdc289e97c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fdc289e97c0 time 2013-07-09 08:30:43.207686 mon/OSDMonitor.cc: 129: FAILED assert(latest_bl.length() != 0) ceph version 0.65-307-g0e93dd9 (0e93dd93e5439fb82c416cb8eec7f36598ae7b48) 1: (OSDMonitor::update_from_paxos(bool*)+0x16bd) [0x5ad5ed] 2: (PaxosService::refresh(bool*)+0x143) [0x592963] 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x536c87] 4: (Paxos::finish_proposal()+0x3a) [0x58c13a] 5: (Paxos::begin(ceph::buffer::list&)+0x82c) [0x58bbac] 6: (Paxos::propose_queued()+0xd9) [0x58be79] 7: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x140) [0x58d230] 8: (PaxosService::propose_pending()+0x6c8) [0x593c28] 9: (PaxosService::_active()+0x58b) [0x5956fb] 10: (PaxosService::election_finished()+0x328) [0x595c38] 11: (Monitor::win_election(unsigned int, std::set, std::allocator >&, unsigned long)+0x2e5) [0x55ae05] 12: (Monitor::win_standalone_election()+0x197) [0x55b027] 13: (Monitor::bootstrap()+0x84b) [0x55b8fb] 14: (Monitor::init()+0xad) [0x55bbcd] 15: (main()+0x1ac7) [0x5296b7] 16: (__libc_start_main()+0xf5) [0x3f2dc21b75] 17: ./ceph-mon() [0x52e259] -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Java bindings for RADOS and RBD
Resending. HTML+vger issue. -- Forwarded message -- From: Noah Watkins Date: Thu, May 30, 2013 at 12:59 PM Subject: Re: Java bindings for RADOS and RBD To: Wido den Hollander Cc: ceph-devel On Mon, May 6, 2013 at 8:21 AM, Wido den Hollander wrote: > > > The reason to use JNA is that it allows us to simply drop the bindings and > run them without having them compiled against the librados or librbd headers. Nice. The JNA stuff is very easy to use. We originally looked at it for the CephFS bindings, but there was some concern about dependencies and licensing w.r.t. Hadoop. > > I've chosen to implement both RADOS and RBD into the same package and using > com.ceph.rados and com.ceph.rbd for as the package name. It could be splitted > into different projects, but I think that won't benefit anybody. Having it > all in one package seem easy, since RBD needs a RADOS IoCTX to work anyway. Cool. We have the com.ceph.fs and com.ceph.crush namespaces right now. > > I'd like to get some feedback on the bindings about the way they are now. > They are still work-in-progress, but my Unit Testing shows me they are in a > reasonable shape already. They look like a really good start. Here is some feedback in no particular order. 1. Enforcing rados usage assumptions Unlike with the C interface where users are expected to behave, we want to avoid ever crashing the JVM. So, stuff like "what happens if I create an IoCTX, then destroy the Rados object and use the IoCTX?" comes to mind. I think this would correspond to the GC running finalize on an out of scope Rados object. 2. Making IoCTX safer to use I've used the library now to bring the IoCTX into a completely separate RADOS project. Designing for that to be common would be great. For instance, that may be _very_ common for users of IoCTX.Exec() since they will have completely distinct libraries. A first step might be to expose a constant/read-only pointer. I'm not a JNA expert, but after reading a bit, it seems as though we can subclass IoCTX from Structure or maybe PointerType to make IoCTX behave. 3. Async Getting one or two async wrappers put it in the library might help reveal any challenges early on, even if the API coverage expands slowly. > Getting them into Maven Central [2] won't be that easy, so I'd like to > request space for that on ceph.com, for example ceph.com/download/maven and > users will have to manually add that to their pom.xml configuration. This would be really nice. I dunno what people expect from Maven projects that rely on native libraries. At least getting debian packages would be a good step. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Erasure encoding as a storage backend
On May 4, 2013, at 9:51 PM, Gregory Farnum wrote: > I'm pretty sure we'd just want to use erasure-coded RADOS pools, > rather than trying to do any CephFS magic erasure encoding. Doing it > above the RADOS layers would introduce some very odd behaviors in > terms of losing objects, as you've mentioned, and requires the clients > to do a lot more network traffic for reads and writes. Cool. I was just thinking of some setups I've heard of in HPC environments where the extra client work was ostensibly worth it in terms of reducing disk heads, or something :) -Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Erasure encoding as a storage backend
On May 4, 2013, at 11:36 AM, Loic Dachary wrote: > > > On 05/04/2013 08:27 PM, Noah Watkins wrote: >> >> On May 4, 2013, at 10:16 AM, Loic Dachary wrote: >> >>> it would be great to get feedback before the ceph summit to address the >>> most prominent issues. >> >> One thing that has been in the back of my mind is how this proposal is >> influenced (if at all) by a future that includes declustered per-file raid >> in CephFS. I realize that may be a distant future, but it seems as though >> there could be a lot of overlap for the (non-client driven) rebuild/recovery >> component of such an architecture. > > Hi Noah, > > I'm not sure what declustered per-file raid is, which means it had no > influence on this proposal ;-) Would you be so kind as to educate me ? I'm definitely far from an expert on the topic. But briefly the way I think about it is: Currently CephFS stripes a file byte stream across a set of objects (e.g. first MB in object 0, 2nd in object 1, etc..), and each of these objects is in turn replicated. Following a failure, PGs re-replicate objects. In client drive raid the striping algorithm is changed, and clients are calculating and distributing parity. In this case the parity rather than replication provides redundancy. So, one might consider storing objects in a pool with replication size 1. However, the standard PG that does replication wouldn't be able to handle faults correctly (parity rebuild, rather than re-replication), and a smart PG like the ErasureCodedPG would be needed. So it seems like the problems are related, but I'm not sure exactly how much overlap there is :) -Noah > Cheers > >> -Noah >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Loïc Dachary, Artisan Logiciel Libre > All that is necessary for the triumph of evil is that good people do nothing. > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Erasure encoding as a storage backend
On May 4, 2013, at 10:16 AM, Loic Dachary wrote: > it would be great to get feedback before the ceph summit to address the most > prominent issues. One thing that has been in the back of my mind is how this proposal is influenced (if at all) by a future that includes declustered per-file raid in CephFS. I realize that may be a distant future, but it seems as though there could be a lot of overlap for the (non-client driven) rebuild/recovery component of such an architecture. -Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PG statechart
Very cool! On Apr 26, 2013, at 9:21 PM, Loic Dachary wrote: > Hi Noah, > > Nice tool :-) Here is the statechart generated from PG.h. > > Cheers > > On 04/26/2013 06:07 PM, Noah Watkins wrote: >> Boost Statechart Viewer generates GraphViz: >> >> http://rtime.felk.cvut.cz/statechart-viewer/ >> >> Having trouble with my LLVM environment on 12.04, so I haven't tested it. >> >> -Noah >> >> On Apr 26, 2013, at 8:20 AM, Loic Dachary wrote: >> >>> Hi, >>> >>> I was considering drawing a statechart ( >>> http://www.math-cs.gordon.edu/courses/cs211/ATMExample/SessionStatechart.gif >>> ) to better understand the transitions of PG >>> >>> https://github.com/ceph/ceph/blob/master/src/osd/PG.h#L1341 >>> >>> and realized that it probably already exists somewhere. Does it ? >>> >>> /me hopefull ;-) >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PG statechart
Boost Statechart Viewer generates GraphViz: http://rtime.felk.cvut.cz/statechart-viewer/ Having trouble with my LLVM environment on 12.04, so I haven't tested it. -Noah On Apr 26, 2013, at 8:20 AM, Loic Dachary wrote: > Hi, > > I was considering drawing a statechart ( > http://www.math-cs.gordon.edu/courses/cs211/ATMExample/SessionStatechart.gif > ) to better understand the transitions of PG > > https://github.com/ceph/ceph/blob/master/src/osd/PG.h#L1341 > > and realized that it probably already exists somewhere. Does it ? > > /me hopefull ;-) > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erasure coding (sorry)
On Apr 18, 2013, at 2:08 PM, Josh Durgin wrote: > I talked to some folks interested in doing a more limited form of this > yesterday. They started a blueprint [1]. One of their ideas was to have > erasure coding done by a separate process (or thread perhaps). It would > use erasure coding on an object and then use librados to store the > rasure-encoded pieces in a separate pool, and finally leave a marker in > place of the original object in the first pool. This sounds at a high-level similar to work out of Microsoft: https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf The basic idea is to replicate first, then erasure code in the background. - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CephFS locality API RFC
On Mar 14, 2013, at 12:39 PM, Sage Weil wrote: > Unless those old bindings are already broken because of the preferred osd > thing… Well, for preferred_pg EOPNOTSUPP will be ignored by the old bindings, so I guess it still works :)-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CephFS locality API RFC
On Mar 14, 2013, at 11:29 AM, Greg Farnum wrote: > On Thursday, March 14, 2013 at 11:14 AM, Noah Watkins wrote: >> The current CephFS API is used to extract locality information as follows: >> >> First we get a list of OSD IDs: >> >> ceph_get_file_extent_osds(offset) -> [OSD ID]* >> >> Using the OSD IDs we can then query for the CRUSH bucket hierarchy: >> >> ceph_get_osd_crush_location(osd_id) -> path >> >> The path includes hostname information, but we'd still like to get the IP. >> The current API for doing this is: >> >> ceph_get_file_stripe_address(offset) -> [sockaddr]* >> >> that returns an IP for each OSD holds replicas. The order of the output list >> should be the same as the the OSD list, but It'd be nice to have a >> consistent API that deals with OSD id, making the correspondence explicit. > Agreed. We should probably deprecate the get_file_stripe_address() and make > them turn IDs into addresses on their own. Is there an API deprecation protocol, or just -ENOTSUPP? >> For instance: >> >> ceph_get_file_stripe_address(osd_id) -> sockaddr > How about > ceph_get_osd_address(osd_id) -> sockaddr > ;) Sounds good. Thanks!-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CephFS locality API RFC
The current CephFS API is used to extract locality information as follows: First we get a list of OSD IDs: ceph_get_file_extent_osds(offset) -> [OSD ID]* Using the OSD IDs we can then query for the CRUSH bucket hierarchy: ceph_get_osd_crush_location(osd_id) -> path The path includes hostname information, but we'd still like to get the IP. The current API for doing this is: ceph_get_file_stripe_address(offset) -> [sockaddr]* that returns an IP for each OSD holds replicas. The order of the output list should be the same as the the OSD list, but It'd be nice to have a consistent API that deals with OSD id, making the correspondence explicit. For instance: ceph_get_file_stripe_address(osd_id) -> sockaddr Another option is to have `ceph_get_osd_crush_location` return both the path and a sockaddr. Thoughts? Thanks, Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MDS running at 100% CPU, no clients
On Mar 7, 2013, at 9:24 AM, Greg Farnum wrote: > This isn't bringing up anything in my brain, but I don't know what that > _sample() function is actually doing — did you get any farther into it? _sample reads /proc/self/maps in a loop until eof or some other conditions. i couldn't figure out if the thread was stuck in _sample or a level up. Anyhow, my gdb-foo isn't stellar and I managed to crash the mds. I'm gonna stick some log points in and try to reproduce it. > -Greg > > On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote: > >> Which, looks to be in a tight loop in the memory model _sample… >> >> (gdb) bt >> #0 0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 >> #1 0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () >> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #2 0x7f027046f4c5 in std::basic_filebuf >> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #3 0x7f0270467ceb in std::basic_istream >& >> std::getline, std::allocator >> >(std::basic_istream >&, >> std::basic_string, std::allocator >&, >> char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #4 0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) () >> #5 0x005658db in MDCache::check_memory_usage() () >> #6 0x004ba929 in MDS::tick() () >> #7 0x00794c65 in SafeTimer::timer_thread() () >> #8 0x0000007958ad in SafeTimerThread::entry() () >> #9 0x7f0270d7de9a in start_thread () from >> /lib/x86_64-linux-gnu/libpthread.so.0 >> >> On Mar 6, 2013, at 6:18 PM, Noah Watkins > (mailto:jayh...@cs.ucsc.edu)> wrote: >> >>> >>> On Mar 6, 2013, at 5:57 PM, Noah Watkins >> (mailto:jayh...@cs.ucsc.edu)> wrote: >>> >>>> The MDS process in my cluster is running at 100% CPU. In fact I thought >>>> the cluster came down, but rather an ls was taking a minute. There aren't >>>> any clients active. I've left the process running in case there is any >>>> probing you'd like to do on it: >>>> >>>> virt res cpu >>>> 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds >>>> >>>> Thanks, >>>> Noah >>> >>> >>> >>> >>> This is a ceph-mds child thread under strace. The only thread >>> that appears to be doing anything. >>> >>> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 >>> Process 3372 attached - interrupt to quit >>> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 >>> ... >>> >>> That file looks to be: >>> >>> ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps >>> >>> (3337 is the parent process). >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> (mailto:majord...@vger.kernel.org) >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MDS running at 100% CPU, no clients
Which, looks to be in a tight loop in the memory model _sample… (gdb) bt #0 0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x7f027046f4c5 in std::basic_filebuf >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7f0270467ceb in std::basic_istream >& std::getline, std::allocator >(std::basic_istream >&, std::basic_string, std::allocator >&, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) () #5 0x005658db in MDCache::check_memory_usage() () #6 0x004ba929 in MDS::tick() () #7 0x00794c65 in SafeTimer::timer_thread() () #8 0x007958ad in SafeTimerThread::entry() () #9 0x7f0270d7de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 On Mar 6, 2013, at 6:18 PM, Noah Watkins wrote: > > On Mar 6, 2013, at 5:57 PM, Noah Watkins wrote: > >> The MDS process in my cluster is running at 100% CPU. In fact I thought the >> cluster came down, but rather an ls was taking a minute. There aren't any >> clients active. I've left the process running in case there is any probing >> you'd like to do on it: >> >> virt res cpu >> 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds >> >> Thanks, >> Noah >> > > > This is a ceph-mds child thread under strace. The only thread > that appears to be doing anything. > > root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 > Process 3372 attached - interrupt to quit > read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 > read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 > read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 > read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 > read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 > read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020 > read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 > ... > > That file looks to be: > > ceph-mds 3337 root 1649r REG0,30 266903 /proc/3337/maps > > (3337 is the parent process). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MDS running at 100% CPU, no clients
On Mar 6, 2013, at 5:57 PM, Noah Watkins wrote: > The MDS process in my cluster is running at 100% CPU. In fact I thought the > cluster came down, but rather an ls was taking a minute. There aren't any > clients active. I've left the process running in case there is any probing > you'd like to do on it: > > virt res cpu > 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds > > Thanks, > Noah > This is a ceph-mds child thread under strace. The only thread that appears to be doing anything. root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 Process 3372 attached - interrupt to quit read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020 read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 ... That file looks to be: ceph-mds 3337 root 1649r REG0,30 266903 /proc/3337/maps (3337 is the parent process).-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
MDS running at 100% CPU, no clients
The MDS process in my cluster is running at 100% CPU. In fact I thought the cluster came down, but rather an ls was taking a minute. There aren't any clients active. I've left the process running in case there is any probing you'd like to do on it: virt res cpu 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds Thanks, Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Approaches to wrapping aio_exec
So I've been playing with the ObjectOperationCompletion code a bit. It seems to be really important to be able to handle decoding errors in in the handle_completion() callback. In particular, I'd like to be able to reach out and set the return value the user will see in the AioCompletion. Any thoughts on dealing with this some how? -Noah On Mar 4, 2013, at 11:44 AM, Yehuda Sadeh wrote: > On Mon, Mar 4, 2013 at 11:34 AM, Noah Watkins wrote: >> >> On Mar 3, 2013, at 6:31 PM, Yehuda Sadeh wrote: >> >>> I pushed the wip-librados-exec branch last week that solves a similar >>> issue. I added two more ObjectOperation::exec() api calls. The more >>> interesting one added a callback context that is called with the >>> output buffer of the completed sub-op. Currently in order to use it >>> you'll need to use operate()/aio_operate(), however, a similar >>> aio_exec interface can be added. >> >> Thanks for the pointer to the branch. So, if I understand correctly, >> we might have a new librados::aio_exec_completion call that accepts >> a completion object? For example: >> >> aio_exec_completion(AioCompletion *c, bufferlist *outbl, >>ObjectOperationCompletion* completion) >> { >> Context *onack = new C_aio_Ack(c); >> >> ::ObjectOperation rd; >> ObjectOpCompletionCtx *ctx = new ObjectOpCompletionCtx(completion); >> rd.call(cls, method, inbl, ctx->outbl, ctx, NULL); >> objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver); >> >> return 0; >> } >> >> where the caller would provide an ObjectOperationCompletion where it's >> finish(..) would unwrap the protocol? > > Right. > >> >> Do you expect wip-librados-exec going up stream pretty soon, and would > > We can push it ahead if needed, it doesn't depend on any of the stuff > I'm working on right now. It just waits for someone to properly review > it. > >> something like librados::aio_exec_completion be a candidate for adding >> to librados? >> > > Sure, if there's a need then I don't see why not. > > Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Approaches to wrapping aio_exec
On Mar 3, 2013, at 6:31 PM, Yehuda Sadeh wrote: > I pushed the wip-librados-exec branch last week that solves a similar > issue. I added two more ObjectOperation::exec() api calls. The more > interesting one added a callback context that is called with the > output buffer of the completed sub-op. Currently in order to use it > you'll need to use operate()/aio_operate(), however, a similar > aio_exec interface can be added. Thanks for the pointer to the branch. So, if I understand correctly, we might have a new librados::aio_exec_completion call that accepts a completion object? For example: aio_exec_completion(AioCompletion *c, bufferlist *outbl, ObjectOperationCompletion* completion) { Context *onack = new C_aio_Ack(c); ::ObjectOperation rd; ObjectOpCompletionCtx *ctx = new ObjectOpCompletionCtx(completion); rd.call(cls, method, inbl, ctx->outbl, ctx, NULL); objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver); return 0; } where the caller would provide an ObjectOperationCompletion where it's finish(..) would unwrap the protocol? Do you expect wip-librados-exec going up stream pretty soon, and would something like librados::aio_exec_completion be a candidate for adding to librados? Thanks, Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Approaches to wrapping aio_exec
I've built a custom protocol on top of Rados::exec that uses serialized versions of InputObject and OutputObject to implement the protocol. Here's simple pseudo-code that provides my_service::exec: void my_service::exec(oid, input_params, bufferlist& out) { bufferlist inbl, outbl; InputObject in(input_params); ::encode(in, inbl); librados::exec(oid, inbl, outbl); OutputObject reply; ::decode(reply, outbl); out = reply.payload; } I'd like to provide a my_service::aio_exec that wraps librados::aio_exec, but doing so doesn't seem to be straight-forward with the current async interface. Notice in the above example that the callers output bufferlist must be unpacked from the reply protocol. However, the librados::aio_exec interface unpacks the output directly into the caller parameter: int librados::aio_exec(oid, …, bufferlist *outbl) { ... objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver); } What's needed is a intermediate bufferlist that can be decoded when the data is available. One way to do this would be wrap AioCompletion and intercept ack and safe callbacks to do the data unpacking. The problem with this is that we have to introduce a new AioCompletion type, or new AioCompletionImpl. I started to do this, but it feels quite clunky. Any suggestions for handling this? Thanks, Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Ceph - Feature #4230] (New) librados: node.js bindings
On Feb 23, 2013, at 12:49 PM, Alexandre Marangone wrote: > Of course, thoughts and suggestions are welcome. Oo, this is nice. A while back I built a cls_v8 object class to play with, but never got around to working on a front-end. The motivation was to be able to write things like this in the client: App.js -- function plugin() { function helper(args) { /* do stuff */ } function handler1(input, output) { /* do stuff */ } function handler2(input, output) { /* do stuff */ } } /* serialize the plugin as a string */ var myplugin = plugin.toString(); rados.ioctx_create('mypool', ioctx, function (err) { if (err) throw err; command = { 'script': myplugin, 'handler': 'handler1', 'input': 'hello', }; /* run the plugin in OSD using V8 JIT compilation */ ioctx.exec('myobject1', 'cls_v8', 'eval', command); }); - Noah > -- > Alexandre > > On Fri, Feb 22, 2013 at 2:42 AM, wrote: >> Issue #4230 has been reported by Wido den Hollander. >> >> >> Feature #4230: librados: node.js bindings >> >> Author: Wido den Hollander >> Status: New >> Priority: Low >> Assignee: >> Category: librados >> Target version: >> Source: Community (dev) >> Tags: >> Reviewed: >> >> Although I don't have a use-case at this specific point it would be very >> cool to have node.js bindings. >> >> From the docs it seems pretty simple to write these bindings and make cool >> stuff with it. >> >> I'm just opening the feature here so that it shows up and can be picked up: >> >> Some reference docs: >> >> http://nodejs.org/api/addons.html#addons_wrapping_c_objects >> http://www.slideshare.net/nsm.nikhil/writing-native-bindings-to-nodejs-in-c >> https://github.com/nikhilm/jsfoo-pune-2012 >> >> Without bindings you would be able to use node-ffi, but I think native >> bindings would be cleaner: https://github.com/rbranson/node-ffi >> >> >> >> You have received this notification because you have either subscribed to >> it, or are involved in it. >> To change your notification preferences, please click here: >> http://tracker.ceph.com/my/account > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hadoop release jars
Hi all, I've pushed up some changes that let us build stand-alone jar files for the Hadoop CephFS bindings. github.com/ceph/hadoop-common.git cephfs/branch-1.0-build-jar Running "ant cephfs" will produce "build/hadoop-cephfs.jar". I've tested it locally and things work well, so hopefully this means we can continue to develop in the hadoop-common tree and produce jar releases for whatever version combinations we care about. Is this something that should be easy to integrate into gitbuilder so we can link off the documentation page to the release jars? -Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hadoop DNS/topology details
On Feb 20, 2013, at 9:57 AM, Noah Watkins wrote: > >vector osds; > } > > ceph_get_file_extents(file, offset, length, vector& extents); > > Then we could re-use the Striper or something? > > -Noah Although, I think your previous suggestion would be much simpler to do for the C api :)-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hadoop DNS/topology details
On Feb 20, 2013, at 9:31 AM, Sage Weil wrote: >> or something like this that replaces the current extent-to-sockaddr >> interface? The proposed interface about would do the host/ip mapping, as >> well as the topology mapping? > > Yeah. The ceph_offset_to_osds should probably also have an (optional?) > out argument that tells you how long the extent is starting from offset > that is on those devices. Then you can do another call at offset+len to > get the next segment. It'd be nice to hide the striping strategy so we don't have to reproduce it in the Hadoop shim as we currently do, and which is needed with an interface using only an offset (we have to know the stripe unit to jump to the next extent). So, something like this might work: struct extent { loff_t offset, length; vector osds; } ceph_get_file_extents(file, offset, length, vector& extents); Then we could re-use the Striper or something? -Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hadoop DNS/topology details
On Feb 19, 2013, at 4:39 PM, Sage Weil wrote: > However, we do have host and rack information in the crush map, at least > for non-customized installations. How about something like > > string ceph_get_osd_crush_location(int osd, string type); > > or similar. We could call that with "host" and "rack" and get exactly > what we need, without making any changes to the data structures. This would then be used in conjunction with an interface: ceph_offset_to_osds(offset, vector& osds) ... osdmpa->pg_to_acting_osds(osds) ... or something like this that replaces the current extent-to-sockaddr interface? The proposed interface about would do the host/ip mapping, as well as the topology mapping? - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hadoop DNS/topology details
On Feb 19, 2013, at 2:22 PM, Gregory Farnum wrote: > On Tue, Feb 19, 2013 at 2:10 PM, Noah Watkins wrote: > > That is just truly annoying. Is this described anywhere in their docs? Not really. It's just there in the code--I can figure out the metric if you're interested. I suspect it is local node, local rack, off rack ordering, with no special tie breakers. > I don't think it would be hard to sort, if we had some mechanism for > doing so (crush map nearness, presumably?), Topology information from the bucket hierarchy? I think it's always some sort of heuristic. >> 1. Expand CephFS interface to return IP and hostname > > Ceph doesn't store hostnames anywhere — it really can't do this. All > it has is IPs associated with OSD ID numbers. :) Adding hostnames > would be a monitor and map change, which we could do, but given the > issues we've had with hostnames in other contexts I'd really rather > not. What is the fate of hostnames used in ceph.conf? Could that information be leveraged, when specified by the cluster admin? -Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hadoop DNS/topology details
Here is the information that I've found so far regarding the operation of Hadoop w.r.t. DNS/topology. There are two parts, the file system client requirements, and other consumers of topology information. -- File System Client -- The relevant interface between the Hadoop VFS and its underlying file system is: FileSystem:getFileBlockLocations(File, Extent) which is expected to return a list of hosts (a 3-tuple: hostname, IP, topology path) for each block that contains any part of the specified file extent. So, with triplication and 2 blocks, there are 2 * 3 = 6 3-tuples present. *** Note: HDFS sorts each list of hosts based on a distance metric applied between the initiating file system client and each of the blocks in the list using the HDFS cluster map. This should not affect correctness, although it's possible that consumers of this list (e.g. MapReduce) may assume an ordering. *** The current Ceph client can produce the same list, but does not include hostname nor topology information. Currently reverse DNS is used to fill in the hostname, and defaults to a flat topology in which all hosts are in a single topology path: "/default-rack/host". - Reverse DNS could be quite slow: - 3x replication * 1 TB / 64 MB blocks = 49152 lookups - Caching lookups could help -- Topology Information -- Services that run on a Hadoop cluster (such as MapReduce) use hostname and topology information attached to each file system block to schedule and aggregate work based on various policies. These services don't have direct access to the HDFS cluster map, and instead rely on a service to provide a mapping: DNS-names/IP -> topology path mapping This can be performed using a script/utility program that will perform bulk translations, or implemented in Java. -- A Possible Approach -- 1. Expand CephFS interface to return IP and hostname 2. Build a Ceph tool to perform DNS-name/IP -> topology path mapping Using (2) from the Hadoop shim we can perform distance sorting, as well as resolve the topology information. The tool will also be used by other Hadoop services that can make use of the topology. This would seem like a good incremental step forward. There are a _lot_ of other analytics systems out there that might be interested in running on top of Ceph, including the next-generation Hadoop releases, all of which may have slightly different requirements. So wedding ourselves to an expansion of the CephFS API at this point might be a little premature. On the other hand, providing all information now should cover our bases later :) - Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Links to various language bindings
On Feb 9, 2013, at 1:50 AM, Wido den Hollander wrote: > Hi Noah, > > On 02/08/2013 04:42 PM, Noah Watkins wrote: >> >> On Feb 8, 2013, at 1:06 AM, Wido den Hollander wrote: >> >>> Hi, >>> >>> I knew that there were Java bindings for RADOS, but they weren't linked. >>> >>> Well, some searching on Github lead me to Noah's bindings [0], but it was a >>> bit of searching. >> >> The RADOS Java bindings on my Github page should be taken down--they are >> very old and not stable. Any links to it should probably be taken down. If >> there is interest in the RADOS Java bindings, I'd be happy to get that ball >> rolling. >> > > Could you then update the README? Sure. > I did a quick test yesterday with them and they worked fine, but I just > created a Cluster object and a pool, didn't do anything fancy. Yes, they do "work", and in fact I think some people might be playing with them. They are however full of various problems, and likely need a re-write. I wouldn't rely on them for anything important. - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Links to various language bindings
On Feb 8, 2013, at 1:06 AM, Wido den Hollander wrote: > Hi, > > I knew that there were Java bindings for RADOS, but they weren't linked. > > Well, some searching on Github lead me to Noah's bindings [0], but it was a > bit of searching. The RADOS Java bindings on my Github page should be taken down--they are very old and not stable. Any links to it should probably be taken down. If there is interest in the RADOS Java bindings, I'd be happy to get that ball rolling. - Noah-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource
On Tue, Jan 15, 2013 at 1:32 AM, Danny Al-Gaaf wrote: > Am 15.01.2013 10:04, schrieb James Page: >> On 12/01/13 16:36, Noah Watkins wrote: >>> On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell >>> wrote: > > I would also prefer to not add another huge build dependency to ceph, > especially since it's e.g. not supported by SLES11 and since ceph > currently builds fine (even with these small warnings from autotools). Ahh, I had in my head a separate repository for Java bindings managed by maven (or ant). Either way, I have no strong opinion -- we only have one junit dependency :) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource
On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell wrote: > > Thanks Danny. Installing sharutils solved that minor issue. We now get > though the build just fine on opensuse 12, but sles 11sp2 gives more warnings > (pasted below). Should we be using a newer version of autoconf on sles? > I've tried moving AC_CANONICAL_TARGET earlier in the file, but that causes > some other issues with the new java macros. We could also move away from using autoconf/automake for Java, and use a packaging/dependency system designed for Java, like Maven. - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource
I haven't tested this yet, but I like it. I think several of these macros can be used to simplify a bit more of the Java config bit. I also just saw the ax_jni_include_dir macro in the autoconf archive and it looks like that can help clean-up too. On Wed, Jan 9, 2013 at 1:35 PM, Danny Al-Gaaf wrote: > The attached patch depends on the set of 6 patches I send some days ago. > See: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/11793 > > Danny Al-Gaaf (1): > configure.ac: check for org.junit.rules.ExternalResource > > autogen.sh| 2 +- > configure.ac | 29 ++--- > m4/ac_check_class.m4 | 108 > ++ > m4/ac_check_classpath.m4 | 24 +++ > m4/ac_check_rqrd_class.m4 | 26 +++ > m4/ac_java_options.m4 | 33 ++ > m4/ac_prog_jar.m4 | 39 + > m4/ac_prog_java.m4| 83 +++ > m4/ac_prog_java_works.m4 | 98 + > m4/ac_prog_javac.m4 | 45 +++ > m4/ac_prog_javac_works.m4 | 36 > m4/ac_prog_javah.m4 | 28 > m4/ac_try_compile_java.m4 | 40 + > m4/ac_try_run_javac.m4| 41 ++ > 14 files changed, 615 insertions(+), 17 deletions(-) > create mode 100644 m4/ac_check_class.m4 > create mode 100644 m4/ac_check_classpath.m4 > create mode 100644 m4/ac_check_rqrd_class.m4 > create mode 100644 m4/ac_java_options.m4 > create mode 100644 m4/ac_prog_jar.m4 > create mode 100644 m4/ac_prog_java.m4 > create mode 100644 m4/ac_prog_java_works.m4 > create mode 100644 m4/ac_prog_javac.m4 > create mode 100644 m4/ac_prog_javac_works.m4 > create mode 100644 m4/ac_prog_javah.m4 > create mode 100644 m4/ac_try_compile_java.m4 > create mode 100644 m4/ac_try_run_javac.m4 > > -- > 1.8.1 > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue
Hi Jutta, On Wed, Jan 9, 2013 at 7:11 AM, Lachfeld, Jutta wrote: > > the current content of the web page http://ceph.com/docs/master/cephfs/hadoop > shows a configuration parameter ceph.object.size. > Is it the CEPH equivalent to the "HDFS block size" parameter which I have > been looking for? Yes. By specifying ceph.object.size, the Hadoop will use a default Ceph file layout with stripe unit = object size, and stripe count = 1. This is effectively the same meaning as dfs.block.size for HDFS. > Does the parameter ceph.object.size apply to version 0.56.1? The Ceph/Hadoop file system plugin is being developed here: git://github.com/ceph/hadoop-common cephfs/branch-1.0 There is an old version of the Hadoop plugin in the Ceph tree which will be removed shortly. Regarding the versions, development is taking place in cephfs/branch-1.0 and in ceph.git master. We don't yet have a system in place for dealing with compatibility across versions because the code is in heavy development. If you are running 0.56.1 then a recent version of cephfs/branch-1.0 should work with that, but may not long, as development continues. > I would be interested in setting this parameter to values higher than 64MB, > e.g. 256MB or 512MB similar to the values I have used for HDFS for increasing > the performance of the TeraSort benchmark. Would these values be allowed and > would they at all make sense for the mechanisms used in CEPH? I can't think of any reason why a large size would cause concern, but maybe someone else can chime in? - Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue
The bindings use the default Hadoop settings (e.g. 64 or 128 MB chunks) when creating new files. The chunk size can also be specified on a per-file basis using the same interface as Hadoop. Additionally, while Hadoop doesn't provide an interface to configuration parameters beyond chunk size, we will also let users fully configure for any Ceph striping strategy. http://ceph.com/docs/master/dev/file-striping/ -Noah On Thu, Dec 13, 2012 at 12:27 PM, Gregory Farnum wrote: > On Thu, Dec 13, 2012 at 12:23 PM, Cameron Bahar wrote: >> Is the chunk size tunable in A Ceph cluster. I don't mean dynamic, but even >> statically configurable when a cluster is first installed? > > Yeah. You can set chunk size on a per-file basis; you just can't > change it once the file has any data written to it. > In the context of Hadoop the question is just if the bindings are > configured correctly to do so automatically. > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX
On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum wrote: > Oooh, very nice! Do you have a list of the dependencies that you actually > needed to install? I can put that together. They were boost, gperf, fuse4x, cryptopp. I think that might have been it. > Apart from breaking this up into smaller patches, we'll also want to reformat > some of it. Rather than sticking an #if APPLE on top of every spin lock, we > should have utility functions that do this for us. ;) Definitely. OSX has spinlock implementations for user space, but it's going to take some reading. For example, spinlocks in Ceph are initialized for shared memory, rather than the default private. It isn't clear from documentation what the semantics are of OSX spinlocks, nor is it clear if the shared memory attribute is needed. > Also, we should be able to find libatomic_ops for OS X (its parent project > works under OS X), and we can use that to construct a spin lock if we think > it'll be useful. I'm not too sure how effective its muteness are at > spinlock-y workloads. This patch set uses the OSX atomic inc/dec ops, rather than spinlocks. Another fun fact: msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o and libcommon_la-pipe.o, but HFS+ is case-insensitive by default. Result is duplicate symbols. That took a while to figure out :P -Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Review request: wip-localized-read-tests
I've pushed up patches for the first phase of testing read from replica functionality, which looks only at objecter/client level ops: wip-localized-read-tests The major points are: 1. Run libcephfs tests w/ and w/o localized reads enabled 2. Add the performance counter in Objecter to record ops sent to replica 3. Add performance counter accessor in unit tests Locally I have verified that the performance counters are working with a 3 OSD setup, although there are not yet any unit tests that try to specifically assert a positive value on the counters. Thanks, Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client crash on getcwd with non-default root mount
Here is the full test case: TEST(LibCephFS, MountRootChdir) { struct ceph_mount_info *cmount; /* create mount and new directory */ ASSERT_EQ(ceph_create(&cmount, NULL), 0); ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0); ASSERT_EQ(ceph_mount(cmount, "/"), 0); ASSERT_EQ(ceph_mkdir(cmount, "/xyz", 0700), 0); ceph_shutdown(cmount); /* create mount with non-"/" root */ ASSERT_EQ(ceph_create(&cmount, NULL), 0); ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0); ASSERT_EQ(ceph_mount(cmount, "/xyz"), 0); /* should be at "root" directory, but blows up */ ASSERT_STREQ(ceph_getcwd(cmount), "/"); } On Thu, Nov 29, 2012 at 12:02 PM, Noah Watkins wrote: > Oh, let me clarify. /otherdir exists, and the mount succeeds. > > - Noah > > On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang wrote: >> On 11/29/2012 01:52 PM, Noah Watkins wrote: >>> >>> I'm getting the assert failure below with the following test: >>> >>>ceph_mount(cmount, "/otherdir"); >> >> >> This should fail with ENOENT if you check the return code. >> -sam >> >>>ceph_getcwd(cmount); >>> >>> -- >>> >>> client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread >>> 7fded47c8780 time 2012-11-29 11:49:00.890184 >>> client/Inode.h: 165: FAILED assert(!dn_set.empty()) >>> ceph version 0.54-808-g1ed5a1f >>> (1ed5a1f984d8260d86cc25b1ae95ffedf597e579) >>> 1: (()+0x11ee89) [0x7fded36fae89] >>> 2: (()+0x1429d3) [0x7fded371e9d3] >>> 3: (ceph_getcwd()+0x11) [0x7fded36fdb41] >>> 4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a] >>> 5: (testing::Test::Run()+0xaa) [0x45017a] >>> 6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280] >>> 7: (testing::TestCase::Run()+0xbd) [0x45034d] >>> 8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7] >>> 9: (main()+0x35) [0x423115] >>> 10: (__libc_start_main()+0xed) [0x7fded2d2876d] >>> 11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() >>> [0x423171] >>> NOTE: a copy of the executable, or `objdump -rdS ` is >>> needed to interpret this. >>> terminate called after throwing an instance of 'ceph::FailedAssertion' >>> Aborted (core dumped) >>> >>> Thanks, >>> Noah >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client crash on getcwd with non-default root mount
Oh, let me clarify. /otherdir exists, and the mount succeeds. - Noah On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang wrote: > On 11/29/2012 01:52 PM, Noah Watkins wrote: >> >> I'm getting the assert failure below with the following test: >> >>ceph_mount(cmount, "/otherdir"); > > > This should fail with ENOENT if you check the return code. > -sam > >>ceph_getcwd(cmount); >> >> -- >> >> client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread >> 7fded47c8780 time 2012-11-29 11:49:00.890184 >> client/Inode.h: 165: FAILED assert(!dn_set.empty()) >> ceph version 0.54-808-g1ed5a1f >> (1ed5a1f984d8260d86cc25b1ae95ffedf597e579) >> 1: (()+0x11ee89) [0x7fded36fae89] >> 2: (()+0x1429d3) [0x7fded371e9d3] >> 3: (ceph_getcwd()+0x11) [0x7fded36fdb41] >> 4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a] >> 5: (testing::Test::Run()+0xaa) [0x45017a] >> 6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280] >> 7: (testing::TestCase::Run()+0xbd) [0x45034d] >> 8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7] >> 9: (main()+0x35) [0x423115] >> 10: (__libc_start_main()+0xed) [0x7fded2d2876d] >> 11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() >> [0x423171] >> NOTE: a copy of the executable, or `objdump -rdS ` is >> needed to interpret this. >> terminate called after throwing an instance of 'ceph::FailedAssertion' >> Aborted (core dumped) >> >> Thanks, >> Noah >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Client crash on getcwd with non-default root mount
I'm getting the assert failure below with the following test: ceph_mount(cmount, "/otherdir"); ceph_getcwd(cmount); -- client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread 7fded47c8780 time 2012-11-29 11:49:00.890184 client/Inode.h: 165: FAILED assert(!dn_set.empty()) ceph version 0.54-808-g1ed5a1f (1ed5a1f984d8260d86cc25b1ae95ffedf597e579) 1: (()+0x11ee89) [0x7fded36fae89] 2: (()+0x1429d3) [0x7fded371e9d3] 3: (ceph_getcwd()+0x11) [0x7fded36fdb41] 4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a] 5: (testing::Test::Run()+0xaa) [0x45017a] 6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280] 7: (testing::TestCase::Run()+0xbd) [0x45034d] 8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7] 9: (main()+0x35) [0x423115] 10: (__libc_start_main()+0xed) [0x7fded2d2876d] 11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() [0x423171] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted (core dumped) Thanks, Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hadoop and Ceph client/mds view of modification time
(Sorry for the dupe message. vger rejected due to HTML). Thanks, I'll try this patch this morning. Client B should perform a single stat after a notification from Client A. But, won't Sage's patch still be required, since Client A needs the MDS time to pass to Client B? On Tue, Nov 20, 2012 at 12:20 PM, Sam Lang wrote: > On 11/20/2012 01:44 PM, Noah Watkins wrote: >> >> This is a description of the clock synchronization issue we are facing >> in Hadoop: >> >> Components of Hadoop use mtime as a versioning mechanism. Here is an >> example where Client B tests the expected 'version' of a file created >> by Client A: >> >>Client A: create file, write data into file. >>Client A: expected_mtime <-- lstat(file) >>Client A: broadcast expected_mtime to client B >>... >>Client B: mtime <-- lstat(file) >>Client B: test expected_mtime == mtime > > > Here's a patch that might work to push the setattr out to the mds every time > (the same as Sage's patch for getattr). This isn't quite writeback, as it > waits for the setattr at the server to complete before returning, but I > think that's actually what you want in this case. It needs to be enabled by > setting client setattr writethru = true in the config. Also, I haven't > tested that it sends the setattr, just a basic test of functionality. > > BTW, if its always client B's first stat of the file, you won't need Sage's > patch. > > -sam > > diff --git a/src/client/Client.cc b/src/client/Client.cc > index 8d4a5ac..a7dd8f7 100644 > --- a/src/client/Client.cc > +++ b/src/client/Client.cc > @@ -4165,6 +4165,7 @@ int Client::_getattr(Inode *in, int mask, int uid, int > gid) > > int Client::_setattr(Inode *in, struct stat *attr, int mask, int uid, int > gid) > { > + int orig_mask = mask; >int issued = in->caps_issued(); > >ldout(cct, 10) << "_setattr mask " << mask << " issued " << > ccap_string(issued) << dendl; > @@ -4219,7 +4220,7 @@ int Client::_setattr(Inode *in, struct stat *attr, int > mask, int uid, int gid) >mask &= ~(CEPH_SETATTR_MTIME|CEPH_SETATTR_ATIME); > } >} > - if (!mask) > + if (!cct->_conf->client_setattr_writethru && !mask) > return 0; > >MetaRequest *req = new MetaRequest(CEPH_MDS_OP_SETATTR); > @@ -4229,6 +4230,10 @@ int Client::_setattr(Inode *in, struct stat *attr, > int mask, int uid, int gid) >req->set_filepath(path); >req->inode = in; > > + // reset mask back to original if we're meant to do writethru > + if (cct->_conf->client_setattr_writethru) > +mask = orig_mask; > + >if (mask & CEPH_SETATTR_MODE) { > req->head.args.setattr.mode = attr->st_mode; > req->inode_drop |= CEPH_CAP_AUTH_SHARED; > diff --git a/src/common/config_opts.h b/src/common/config_opts.h > index cc05095..51a2769 100644 > --- a/src/common/config_opts.h > +++ b/src/common/config_opts.h > @@ -178,6 +178,7 @@ OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 100) > // MB * n (dirty OR tx. > OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty (keep > this smallish) > OPTION(client_oc_max_dirty_age, OPT_DOUBLE, 5.0) // max age in cache > before writeback > OPTION(client_oc_max_objects, OPT_INT, 1000) // max objects in cache > +OPTION(client_setattr_writethru, OPT_BOOL, false) // send the attributes > to the mds server > // note: the max amount of "in flight" dirty data is roughly (max - target) > OPTION(fuse_use_invalidate_cb, OPT_BOOL, false) // use fuse 2.8+ invalidate > callback to keep page cache consistent > OPTION(fuse_big_writes, OPT_BOOL, true) > > >> >> Since mtime may be set in Ceph by both client and MDS, inconsistent >> mtime view is possible when clocks are not adequately synchronized. >> >> Here is a test that reproduces the problem. In the following output, >> issdm-18 has the MDS, and issdm-22 is a non-Ceph node with its time >> set to an hour earlier than the MDS node. >> >> nwatkins@issdm-22:~$ ssh issdm-18 date && ./test >> Tue Nov 20 11:40:28 PST 2012 // MDS TIME >> local time: Tue Nov 20 10:42:47 2012 // Client TIME >> fstat time: Tue Nov 20 11:40:28 2012 // mtime seen after file >> creation (MDS time) >> lstat time: Tue Nov 20 10:42:47 2012 // mtime seen after file write >> (client time) >> >> Here is the code used to produce that output. >> >> #include >> #include >> #include >> #include >> #