Re: ceph-deploy: too many argument: --setgroup 10

2015-09-03 Thread Noah Watkins
I see. This is the ownership info for /var/lib/ceph

[nwatkins@cn67 ~]$ stat /var/lib/ceph
  File: `/var/lib/ceph' -> `/ram/var/lib/ceph'
  Size: 17Blocks: 3  IO Block: 524288 symbolic link
Device: 12h/18d Inode: 4429801211  Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (0/root)   Gid: (   10/   wheel)

This is probably a rather non traditional environment that Ceph is
being deployed onto. I talked with the sys admins and they were rather
surprised by this. I guess we can take care of it on this end now that
we see the issue.

In general should the ceph-deploy tool version be paired up with the
version of Ceph being deployed, or is it the case that for Hammer the
file system uid/gid _should be_ 0 to avoid the unsupported --getgroup
flag?

Thanks!

On Wed, Sep 2, 2015 at 3:32 PM, Travis Rhoden  wrote:
> Hi Noah,
>
> What is the ownership on /var/lib/ceph ?
>
> ceph-deploy should only be trying to use --setgroup if /var/lib/ceph is
> owned by non-root.
>
> On a fresh install of Hammer, this should be root:root.
>
> The --setgroup flag was added to ceph-deploy in 1.5.26.
>
>  - Travis
>
> On Wed, Sep 2, 2015 at 1:59 PM, Noah Watkins  wrote:
>>
>> I'm getting the following error using ceph-deploy to setup a cluster.
>> It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It
>> looks like setgroup wasn't an option in Hammer, but ceph-deploy adds
>> it. Is there a trick or older version of ceph-deploy I should try?
>>
>> - Noah
>>
>> [cn67][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i
>> cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
>> [cn67][WARNIN] too many arguments: [--setgroup,10]
>> [cn67][DEBUG ]   --conf/-c FILEread configuration from the given
>> configuration file
>> [cn67][WARNIN] usage: ceph-mon -i monid [flags]
>> [cn67][DEBUG ]   --id/-i IDset ID portion of my name
>> [cn67][WARNIN]   --debug_mon n
>> [cn67][DEBUG ]   --name/-n TYPE.ID set name
>> [cn67][WARNIN] debug monitor level (e.g. 10)
>> [cn67][DEBUG ]   --cluster NAMEset cluster name (default: ceph)
>> [cn67][WARNIN]   --mkfs
>> [cn67][DEBUG ]   --version show version and quit
>> [cn67][WARNIN] build fresh monitor fs
>> [cn67][DEBUG ]
>> [cn67][WARNIN]   --force-sync
>> [cn67][DEBUG ]   -drun in foreground, log to stderr.
>> [cn67][WARNIN] force a sync from another mon by wiping local
>> data (BE CAREFUL)
>> [cn67][DEBUG ]   -frun in foreground, log to usual
>> location.
>> [cn67][WARNIN]   --yes-i-really-mean-it
>> [cn67][DEBUG ]   --debug_ms N  set message debug level (e.g. 1)
>> [cn67][WARNIN] mandatory safeguard for --force-sync
>> [cn67][WARNIN]   --compact
>> [cn67][WARNIN] compact the monitor store
>> [cn67][WARNIN]   --osdmap 
>> [cn67][WARNIN] only used when --mkfs is provided: load the
>> osdmap from 
>> [cn67][WARNIN]   --inject-monmap 
>> [cn67][WARNIN] write the  monmap to the local
>> monitor store and exit
>> [cn67][WARNIN]   --extract-monmap 
>> [cn67][WARNIN] extract the monmap from the local monitor store and
>> exit
>> [cn67][WARNIN]   --mon-data 
>> [cn67][WARNIN] where the mon store and keyring are located
>> [cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1
>> [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon
>> --cluster ceph --mkfs -i cn67 --keyring
>> /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
>> [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph-deploy: too many argument: --setgroup 10

2015-09-02 Thread Noah Watkins
I'm getting the following error using ceph-deploy to setup a cluster.
It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It
looks like setgroup wasn't an option in Hammer, but ceph-deploy adds
it. Is there a trick or older version of ceph-deploy I should try?

- Noah

[cn67][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i
cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
[cn67][WARNIN] too many arguments: [--setgroup,10]
[cn67][DEBUG ]   --conf/-c FILEread configuration from the given
configuration file
[cn67][WARNIN] usage: ceph-mon -i monid [flags]
[cn67][DEBUG ]   --id/-i IDset ID portion of my name
[cn67][WARNIN]   --debug_mon n
[cn67][DEBUG ]   --name/-n TYPE.ID set name
[cn67][WARNIN] debug monitor level (e.g. 10)
[cn67][DEBUG ]   --cluster NAMEset cluster name (default: ceph)
[cn67][WARNIN]   --mkfs
[cn67][DEBUG ]   --version show version and quit
[cn67][WARNIN] build fresh monitor fs
[cn67][DEBUG ]
[cn67][WARNIN]   --force-sync
[cn67][DEBUG ]   -drun in foreground, log to stderr.
[cn67][WARNIN] force a sync from another mon by wiping local
data (BE CAREFUL)
[cn67][DEBUG ]   -frun in foreground, log to usual location.
[cn67][WARNIN]   --yes-i-really-mean-it
[cn67][DEBUG ]   --debug_ms N  set message debug level (e.g. 1)
[cn67][WARNIN] mandatory safeguard for --force-sync
[cn67][WARNIN]   --compact
[cn67][WARNIN] compact the monitor store
[cn67][WARNIN]   --osdmap 
[cn67][WARNIN] only used when --mkfs is provided: load the
osdmap from 
[cn67][WARNIN]   --inject-monmap 
[cn67][WARNIN] write the  monmap to the local
monitor store and exit
[cn67][WARNIN]   --extract-monmap 
[cn67][WARNIN] extract the monmap from the local monitor store and exit
[cn67][WARNIN]   --mon-data 
[cn67][WARNIN] where the mon store and keyring are located
[cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon
--cluster ceph --mkfs -i cn67 --keyring
/var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


newstore OSD magic error

2015-08-27 Thread Noah Watkins
I've deployed a cluster with one monitor and one OSD using
ceph-deploy. The OSD is spinning up, and there are some errors in the
log after `ceph-deploy create osd osd0:/dev/sdb`.

2015-08-27 14:23:28.061297 7f46d6811980 -1 OSD magic @??7?V != my ceph
osd volume v026

ceph.conf:

[global]
fsid = 9126937a-c39c-42c5-a00d-30625ce8fa11
mon_initial_members = mon0
mon_host = 10.10.2.3
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
enable experimental unrecoverable data corrupting features = newstore rocksdb

public network = 10.10.2.0/8
cluster network = 10.10.1.0/8

[mon.a]
host = mon0
mon addr = 10.10.2.3:6789

[osd]
#osd objectstore = memstore
osd objectstore = newstore

[osd.0]
public addr = 10.10.2.1
cluster addr = 10.10.1.2

full config log:

2015-08-27 14:23:05.227604 7f1db823b980  0 ceph version
0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad),
process ceph-osd, pid 13468
2015-08-27 14:23:05.227632 7f1db823b980 -1 WARNING: experimental
feature 'newstore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:05.247424 7f5576b43980  0 ceph version
0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad),
process ceph-osd, pid 13493
2015-08-27 14:23:05.247449 7f5576b43980 -1 WARNING: experimental
feature 'newstore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:05.274768 7fbe45229980  0 ceph version
0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad),
process ceph-osd, pid 13497
2015-08-27 14:23:05.274797 7fbe45229980 -1 WARNING: experimental
feature 'newstore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:25.171436 7ff6a3816980  0 ceph version
0.46-24817-g94ce100 (94ce1007fdde72a67281c18447a7ac879f2614ad),
process ceph-osd, pid 13677
2015-08-27 14:23:25.171457 7ff6a3816980 -1 WARNING: experimental
feature 'newstore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:25.173214 7ff6a3816980 -1 WARNING: the following
dangerous and experimental features are enabled: newstore,rocksdb
2015-08-27 14:23:25.173623 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mkfs path
/var/lib/ceph/tmp/mnt.sU2pxt
2015-08-27 14:23:25.173633 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_path using fs driver
'generic'
2015-08-27 14:23:25.173651 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mkfs fsid is already set to
db74b022-d47b-4313-8367-44ed6ab5a2ce
2015-08-27 14:23:25.173994 7ff6a3816980 -1 WARNING: experimental
feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:25.235063 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_db opened rocksdb path
/var/lib/ceph/tmp/mnt.sU2pxt options
2015-08-27 14:23:25.235237 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) mount path
/var/lib/ceph/tmp/mnt.sU2pxt
2015-08-27 14:23:25.235254 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_path using fs driver
'generic'
2015-08-27 14:23:25.235281 7ff6a3816980 -1 WARNING: experimental
feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2015-08-27 14:23:25.263694 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _open_db opened rocksdb path
/var/lib/ceph/tmp/mnt.sU2pxt options
2015-08-27 14:23:25.263865 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _recover_next_fid old fid_max
0/0
2015-08-27 14:23:25.263887 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) _recover_next_nid old nid_max 0
2015-08-27 14:23:25.303719 7ff6a3816980  1
newstore(/var/lib/ceph/tmp/mnt.sU2pxt) umount
2015-08-27 14:23:25.525720 7ff6a3816980 -1 created object store
/var/lib/ceph/tmp/mnt.sU2pxt journal
/var/lib/ceph/tmp/mnt.sU2pxt/journal for osd.0 fsid
9126937a-c39c-42c5-a00d-30625ce8fa11
2015-08-27 14:23:25.525785 7ff6a3816980 -1 auth: error reading file:
/var/lib/ceph/tmp/mnt.sU2pxt/keyring: can't open
/var/lib/ceph/tmp/mnt.sU2pxt/keyring: (2) No such file or directory
2015-08-27 14:23:25.525984 7ff6a

Re: Ceph bindings for go & docker

2015-02-09 Thread Noah Watkins
Hi Loic,

This sounds great. The librados bindings have good test converage, but I merged 
a PR for RBD support a couple weeks ago and haven't had time to get it cleaned 
up and tests written. Do you need support for the AIO interface in librbd?

-Noah

- Original Message -
From: "Loic Dachary" 
To: "Noah Watkins" 
Cc: "Ceph Development" , "Vincent Batts" 
, "Johan Euphrosine" 
Sent: Monday, February 9, 2015 9:15:02 AM
Subject: Ceph bindings for go & docker

Hi,

I discovered https://github.com/noahdesu/go-ceph today :-) It would be useful 
in the context of a Ceph volume driver for docker ( see 
https://github.com/docker/docker/issues/10661 & 
https://github.com/docker/docker/pull/8484 ). 

Are you a docker user by any chance ?

-- 
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ann] fio plugin for libcephfs

2014-11-17 Thread Noah Watkins
I've posted a preliminary patch set to support a libcephfs io engine in fio:

   http://github.com/noahdesu/fio cephfs

You can use this right now to generate load through libcephfs, but the
plugin needs a bit more work before it goes upstream (patches
welcome), but feel free to play around with it. There is an example
script in examples/cephfs.fio.

Issues:
  Currently all the files that are created get the same size as the
total job size rather than the total size being divided by the number
of threads.

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm7 gitbuilder ?

2014-10-10 Thread Noah Watkins
On Fri, Oct 10, 2014 at 3:21 PM, Loic Dachary  wrote:
>
> You are lucky to have access to ARMv8 boxes :-) Would you be willing to run a 
> few tests on my behalf ?

That might be a challenge since we (our reserach group) don't own
them--we just have remote access for a specific project. But it
doesn't hurt to ask! I'll let you know.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm7 gitbuilder ?

2014-10-10 Thread Noah Watkins
On Fri, Oct 10, 2014 at 2:54 PM, Loic Dachary  wrote:
> I would be surprised if it could easily be setup for cross compilation. 
> Although it would be nice to have an ARMv8 I don't need it right now. Do you ?

Potentially. I'll poke around a bit and see. Maybe we can run 32-bit
builds on ARMv8. I'm not in a huge rush, and can always build from
source on those boxes, as cross-compiling might be a major headache :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm7 gitbuilder ?

2014-10-10 Thread Noah Watkins
On Fri, Oct 10, 2014 at 12:16 PM, Loic Dachary  wrote:
> Hi Noah,
>
> My focus is to create centos7 and ubuntu-14.04 packages at this point.

I think a newish Ubuntu should work just fine for us. Are the builds
using a cross compile? If so it'd be great if there was one build for
ARMv8.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm7 gitbuilder ?

2014-10-10 Thread Noah Watkins
On Fri, Oct 10, 2014 at 10:14 AM, Sage Weil  wrote:
> On Fri, 10 Oct 2014, Loic Dachary wrote:
>> On 10/10/2014 17:11, Sage Weil wrote:
>> > On Fri, 10 Oct 2014, Loic Dachary wrote:
>> >> Hi Sandon,
>> >>
>> >> Would it be possible to resurect / create an arm7 gitbuilder for
>> >> whatever distribution is more convenient ? Janne made a great
>> >> contribution to erasure code optimization on NEON (
>> >> https://bitbucket.org/jannau/gf-complete/branch/neon ) and it would make
>> >> it easier to have a gitbuilder to run it throught teuthology and torture
>> >> it ;-) If that's complicated or there is another way to run teuthology
>> >> jobs for this purpose, I'm open to suggestions.
>> >
>> > We have a bunch of armv7l nodes that used to run distcc and a gitbuilder.
>> > It's a bit of a time suck to maintain, though.  Is there an interested
>> > party we can set up with lab access that can help administer these on an
>> > ongoing basis?  Sandon is a highly contended resource.  :)
>>
>> Long term I'm not sure. I'll try to setup something temporary and see
>> where it goes. Is there a risk that these machines go away in the next 6
>> months ? 3 months ?

I am an interested party, and at least in the short term would
dedicate some time to getting arm7 builds working too.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why index (collectionIndex) need a lock?

2014-10-01 Thread Noah Watkins
I didn't know about mutrace, thanks for that reference!

On Tue, Sep 30, 2014 at 8:13 PM, Milosz Tanski  wrote:
> On Tue, Sep 30, 2014 at 7:36 PM, Noah Watkins  
> wrote:
>> On Tue, Sep 30, 2014 at 10:42 AM, Somnath Roy  
>> wrote:
>>> Also, I don't think this lock has big impact to performance since it is 
>>> already sharded to index level. I tried with reader/writer implementation 
>>> of this lock (logic will be somewhat similar to your state concept) and not 
>>> getting any benefit .
>>
>> If there is interest in identifying locks that are introducing latency
>> it might useful to add some tracking features to Mutex and RWLock. A
>> simple thing would be to just record maximum wait times per lock and
>> dump this via admin socket.
>
> Noah,
>
> You're better off running some kind of synthetic test using mutrace
> (you can't use tcmalloc/jemalloc) or measuring futex syscalls via a
> pref tracepoint. Generally adding this kind of tracking into the locks
> itself ends up being even more expensive.
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: mil...@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why index (collectionIndex) need a lock?

2014-09-30 Thread Noah Watkins
On Tue, Sep 30, 2014 at 10:42 AM, Somnath Roy  wrote:
> Also, I don't think this lock has big impact to performance since it is 
> already sharded to index level. I tried with reader/writer implementation of 
> this lock (logic will be somewhat similar to your state concept) and not 
> getting any benefit .

If there is interest in identifying locks that are introducing latency
it might useful to add some tracking features to Mutex and RWLock. A
simple thing would be to just record maximum wait times per lock and
dump this via admin socket.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LTTng unfriendly with mixed static/dynamic linking

2014-08-12 Thread Noah Watkins
The Mutex tracepoints were just a driving example, so definitely feel
free to remove them. But libcommon is pretty big, so I suspect that
that if tracing is merged that someone will eventually want
tracepoints in libcommon.

On Tue, Aug 12, 2014 at 12:41 PM, Adam Crume  wrote:
> Sage, if I understood you correctly on the video call, you have
> reservations about making libcommon a dynamic library because of
> incompatible changes between versions causing problems when packages
> use different versions, and you brought up the idea of having a static
> version and a dynamic version.  I don't think that would entirely
> work, because rbd (which must use the dynamic version) and libcommon
> would have to be in different packages, so they could have version
> mismatches.
>
> There's another alternative, which is to remove all tracepoints from
> libcommon.  At the moment, the only tracepoints are in Mutex, and
> they're not necessary for rbd-replay.  (Noah added them as an example
> of using LTTng in Ceph.  Noah, are you using these tracepoints?)  If
> we ever wanted to trace anything in libcommon, though, this issue
> would come up again.
>
> On Sat, Jul 26, 2014 at 3:29 AM, Joao Eduardo Luis
>  wrote:
>> On 07/25/2014 11:12 PM, Sage Weil wrote:
>>>
>>> On Fri, 25 Jul 2014, Adam Crume wrote:

 I tried all solutions, and it looks like only #1 works.  #2 gives the
 error "/usr/bin/ld: main: hidden symbol `tracepoint_dlopen' in
 common_tp.a(common_tp.o) is referenced by DSO" when linking.  #3 gives
 the error "./liblibrary.so: undefined reference to
 `tracepoint_dlopen'" when linking.  (Linking is complicated by the
 fact that LTTng uses special symbol attributes, and tracepoint_dlopen
 happens to be weak and hidden.)
>>>
>>>
>>> I think #1 is good for other reasons, too.  We already have issues (I
>>> think!) with binaries that use librados and also link libcommon
>>> statically.  Specifically, I think we've seen that having mismatched
>>> versions of librados and the binary installed lead to confusion about the
>>> contents/structure of mdconfig_t (g_conf).  This is one of the reasons
>>> why the libcommon and rgw packages require an identical version of
>>> librados or librbd or whatever--to avoid this inconsistency.
>>>
 Unless I'm mistaken (and I very well
 may be), we will have to ensure that all traced code is either 1)
 placed in a shared library and never statically linked elsewhere, or
 2) never linked into any shared library.
>>>
>>>
>>> That sounds doable and sane to me:
>>>
>>>   - librados, librbd, libceph_common, etc. would have the tracepoints in
>>> the same .so
>>>   - ceph-osd could have its own tracepoints, as long as they are always
>>> static.  (For example, libos.la, which is linked statically by ceph-mon
>>> and ceph-osd but never dynamically.)
>>>
>>> One pain point in all of this, though, is that the libceph_common.so (or
>>> whatever) will need to go into a separate package that is required by
>>> librados.so and librbd and ceph-common and everything else.  'ceph-common'
>>> is what this ought to be called, but we've coopted it to mean 'ceph
>>> clients'.  I'm not sure it if it worthwhile to go through the hinjinx to
>>> rename ceph-common to ceph-clients and repurpose ceph-common for this?
>>>
>>> sage
>>
>>
>> I notice that ceph-common contains no libs whatsoever.  We may want to
>> change ceph-common to ceph-client or something and have libcommon shipped as
>> ceph-common, but I imagine that would be a pain as package management goes.
>> Or we could take the path of least resistance (and possibly open ourselves
>> to confusion?) and ship libcommon in a 'ceph-libs' package -- although it
>> looks like it would be a 1-lib package :)
>>
>>   -Joao
>>
>>>
>>>
>>>

 Thoughts?

 Adam

 On Fri, Jul 25, 2014 at 11:48 AM, Adam Crume  wrote:
>
> LTTng requires tracepoints to be linked into a program only once.  If
> tracepoints are linked in multiple times, the program crashes at
> startup with: "LTTng-UST: Error (-17) while registering tracepoint
> probe. Duplicate registration of tracepoint probes having the same
> name is not allowed."
>
> This is problematic when mixing static and dynamic linking.  If the
> tracepoints are in a static library, that library can end up in an
> executable multiple times by being linked in directly, as well as
> being statically linked into a dynamic library.  Even if the
> tracepoints are not linked directly into the executable, they can be
> statically linked into multiple dynamic libraries that the executable
> loads.
>
> For us, this problem shows up with libcommon, and could show up with
> others such as libosd.  (In general, I think anything added to
> noinst_LTLIBRARIES is static, and anything added to lib_LTLIBRARIES is
> dynamic.)
>
> There are a few ways of solving the issue:
> 1. Change every 

[ANN] Maven support for CephFS Java

2014-05-23 Thread Noah Watkins
The `cephfs-java` package can now to retreived from the Maven repository below.

Note that this is a _very alpha_ release, but contains a whole lot of
awesome in the form of the native JNI bits embedded in the package, so
there should be no need to do an RPM/Deb install of these dependencies
(e.g. libcephfs-jni) and deal with all those silly search paths.

Please shoot any bug reports to the list.

Current support is limited to Linux/x86-64, but can add other archs if
requested. All existing cephfs-java packages are unaffected.

- Noah

  
  
  ceph-maven
  http://ceph.com/maven
  
  

  

  com.ceph
  cephfs
  0.0.1

  
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Packaging RFC for cephfs-java

2014-05-21 Thread Noah Watkins
Hey ceph-devel,

In its current form deploying any software that depends on
`cephfs-java` can be a real pain! Users need to make sure they have
one of each of the following installed:

* platform-indep: libcephfs.jar
* platform-dep: libcephfs_jni.[so,dylib,...]
- osx/linux * x86/x86-64 * etc...

We build RPM and Debian packages for both, but that doesn't solve the
following dependency problems. First, The JVM needs to be told about
the locations of the native library via a special "java.library.path"
property specified when the JVM starts, and the JVM doesn't always
look in the places that the RPM/Deb packages stick the binaries.
Projects like Hadoop invoke the JVM deep within shell scripts, so
getting these dependencies to resolve has been a continuous source of
frustration for users. The second problem is that users expect easy
dependency management using solutions like Maven Central, and so
currently dealing with the libcephfs.jar turns into a special case in
the deployment procedure.

We solve the second problem by publishing our artifacts into Maven
Central. That is pretty easy, but the problem of resolving the native
library dependencies remain. There are two ways to deal with this
challenge.

First, several projects (e.g. JNA and LevelDB-Java) embed everything,
including pre-built libraries for all supported platforms into a
single JAR file, and the native library is transparently loaded at
run-time. I've create a POC that shows how to do this:

   http://github.com/ceph/ceph wip-native-cephfs-java

This approach is ideal from a users approach, but means more
packaging/shipping complexity.

The second approach would depend on the current RPM/Deb installation,
and we could manually search for these in well-known places. This
could work I suppose, but it still requires that extra step to
rendezvous deps from Maven and yum/apt.

Any thoughts welcome!

-Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph-osd seg-fault with small writes

2014-03-03 Thread Noah Watkins
Running rados bench with the default 4MB object size works fine, but
when I shrink the write size to 4K (i.e. running `rados -p data -b
4096 bench 30 write`), ceph-osd will segfault almost immediately.

Here is the segfault, and a link to the full debug uploaded using
ceph-post-file.

ceph-post-file id: f9c8a1f8-969f-46c2-9cef-41b4683fdc76

 ceph version 0.77-624-g82f62b1 (82f62b1ea82f6d92f7a5ed0bcbacd608770a15e3)
 1: ./ceph-osd() [0x95dbaf]
 2: (()+0xfbb0) [0x7f3f1f2f4bb0]
 3: (gsignal()+0x37) [0x7f3f1d7cdf77]
 4: (abort()+0x148) [0x7f3f1d7d15e8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3f1e0d96e5]
 6: (()+0x5e856) [0x7f3f1e0d7856]
 7: (()+0x5e883) [0x7f3f1e0d7883]
 8: (()+0x5eaae) [0x7f3f1e0d7aae]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1f2) [0xa3d1b2]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned
long, int, ThreadPool::TPHandle*)+0x8e9) [0x749c59]
 11: (FileStore::_do_transactions(std::list >&, unsigned long,
ThreadPool::TPHandle*)+0x6c) [0x74d88c]
 12: (FileStore::_do_op(FileStore::OpSequencer*,
ThreadPool::TPHandle&)+0x167) [0x74da17]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaef) [0xa2dfdf]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xa2eed0]
 15: (()+0x7f6e) [0x7f3f1f2ecf6e]
 16: (clone()+0x6d) [0x7f3f1d8919cd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: librados::ObjectIterator segfault

2014-03-02 Thread Noah Watkins
Oh great. I'll pull in that patch. Thanks

On Sun, Mar 2, 2014 at 11:12 AM, Ilya Dryomov  wrote:
> On Sun, Mar 2, 2014 at 8:38 PM, Noah Watkins  wrote:
>> This is a segfault occuring in the latest master listing objects with
>> `rados -p data ls`
>>
>> Full trace: http://pastebin.com/3JG9cX0Z
>>
>> nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados lspools
>> data
>> metadata
>> rbd
>> nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados -p data ls
>> *** Caught signal (Segmentation fault) **
>>  in thread 7f84f02ce7c0
>>  ceph version 0.77-620-gf3976c1 (f3976c16531096b9979842fc4445d40d6e889932)
>>  1: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x43e2df]
>>  2: (()+0xfbb0) [0x7f84eef6bbb0]
>>  3: (librados::ObjectIterator::operator=(librados::ObjectIterator
>> const&)+0x27) [0x7f84ef31f807]
>>  4: (librados::ObjectIterator::ObjectIterator(librados::ObjectIterator
>> const&)+0x30) [0x7f84ef31fb10]
>>  5: (main()+0x1bff) [0x41274f]
>>  6: (__libc_start_main()+0xf5) [0x7f84ee193de5]
>>  7: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x41b967]
>> 2014-03-02 10:35:57.471206 7f84f02ce7c0 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7f84f02ce7c0
>
> I ran into it a couple days ago.  I think Josh has it fixed.
>
> https://github.com/ceph/ceph/pull/1322
>
> Thanks,
>
> Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


librados::ObjectIterator segfault

2014-03-02 Thread Noah Watkins
This is a segfault occuring in the latest master listing objects with
`rados -p data ls`

Full trace: http://pastebin.com/3JG9cX0Z

nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados lspools
data
metadata
rbd
nwatkins@kyoto:~/ceph2/src$ CEPH_CONF=ceph.conf ./rados -p data ls
*** Caught signal (Segmentation fault) **
 in thread 7f84f02ce7c0
 ceph version 0.77-620-gf3976c1 (f3976c16531096b9979842fc4445d40d6e889932)
 1: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x43e2df]
 2: (()+0xfbb0) [0x7f84eef6bbb0]
 3: (librados::ObjectIterator::operator=(librados::ObjectIterator
const&)+0x27) [0x7f84ef31f807]
 4: (librados::ObjectIterator::ObjectIterator(librados::ObjectIterator
const&)+0x30) [0x7f84ef31fb10]
 5: (main()+0x1bff) [0x41274f]
 6: (__libc_start_main()+0xf5) [0x7f84ee193de5]
 7: /home/nwatkins/ceph2/src/.libs/lt-rados() [0x41b967]
2014-03-02 10:35:57.471206 7f84f02ce7c0 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f84f02ce7c0
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion error in librados

2014-02-25 Thread Noah Watkins
On Tue, Feb 25, 2014 at 9:51 AM, Josh Durgin  wrote:
> That's a good idea. This particular assert in a Mutex is almost always
> a use-after-free of the Mutex or structure containing it though.

I think that a use-after-free will also throw an EINVAL (assuming it
isn't a pathalogical case) as pthread_mutex_lock checks an
initialization magic variable. I think that particular mutex isn't
initialized with flags that would cause any of the other possible
return values.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion error in librados

2014-02-25 Thread Noah Watkins
Perhaps using gtest-style asserts (ASSERT_EQ(r, 0)) in Ceph would be
useful so we can see parameter values to the assertion in the log. In
this case, the return value from pthread_mutex_lock is almost
certainly EINVAL, but it'd be informative to know for sure.

On Tue, Feb 25, 2014 at 7:58 AM, Filippos Giannakos  wrote:
> Hi Greg,
>
> Unfortunately we don't keep any Ceph related logs on the client side. On the
> server side, we kept the default log settings to avoid overlogging.
> Do you think that there might be something usefull on the OSD side ?
>
> On Tue, Feb 25, 2014 at 07:28:30AM -0800, Gregory Farnum wrote:
>> Do you have logs? The assert indicates that the messenger got back
>> something other than "okay" when trying to grab a local Mutex, which
>> shouldn't be able to happen. It may be that some error-handling path
>> didn't drop it (within the same thread that later tried to grab it
>> again), but we'll need more details to track it down.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>
> Kind Regards,
> --
> Filippos
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: doxygen help

2014-02-23 Thread Noah Watkins
I don't think Asphyxiate handles 'class'

  http://marc.info/?l=ceph-devel&m=135130277326664&w=2

On Sun, Feb 23, 2014 at 12:38 PM, Sage Weil  wrote:
> A while back I added some doxygen comments/docs to librados.hpp and tried
> to have sphinx slurp it up into the generated docs on ceph.com:
>
> 
> https://github.com/ceph/ceph/commit/d0c4600e6645d116b9b2c4eea56ef5851eea54d5
>
> Unfortunately this flames out with some obscure doxygen error and I wasn't
> able to sort it out given my limited attention span and experience with
> such tools.  You can see the error here:
>
> 
> http://gitbuilder.sepia.ceph.com/gitbuilder-doc/log.cgi?log=d0c4600e6645d116b9b2c4eea56ef5851eea54d5
>
> Exception occurred:
> File "/tmp/virtualenv-docs/src/asphyxiate/asphyxiate/__init__.py", line
> 388, in render_compounddef
> "cannot handle {node.tag} kind={node.attrib[kind]}".format(node=node)
> Error: Assertion cannot handle compounddef kind=class
>
> Anybody know what might be going on?  This is the main thing preventing
> the C++ librados API docs (such as they are) from appearing on the site.
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] ceph hadoop using ambari

2014-02-17 Thread Noah Watkins
Hi Kesten,

It's a little difficult to tell what the source of the problem is, but
looking at the gist you referenced, I don't see anything that would
indicate that Ceph is causing the issue. For instance,
hadoop-mapred-tasktracker-xxx-yyy-hdfs01.log looks like Hadoop daemons
are having problems conneting to each other. Finding out what command
in hadoop-daemon.sh is causing the permission errors might be
informative, but I don't have any experience with Ambari.

On Mon, Feb 17, 2014 at 9:23 AM, Kesten Broughton  wrote:
> I posted this to ceph-devel-owner before seeing that this is the correct
> place to post.
>
> My company is trying to evaluate virtualized hdfs clusters using ceph as a
> drop-in replacement for staging and development
> following http://ceph.com/docs/master/cephfs/hadoop/.  We deploy clusters
> with ambari 1.3.2.
>
> I spun up a 10 node cluster with 3 datanodes, name, secondary, 3
> zookeepers, ambari master, and accumulo master.
>
> Our process is
> This was likely the cause of shutdown errors.  Should do
> 1. Run ambari install
> 2. shut down all ambari services
> 3. push modified core-site.xml to datanodes, name, secondary
> 4. restart ambari services
>
> I am getting errors
> /usr/lib/hadoop/bin/hadoop-daemon.sh: Permission denied
>
> in the ambari console error log from the command:
> su - hdfs -c  'export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&
> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start
> datanode'
>
>
> I think this is an ambari issue, but I¹m wondering
> 1.  Is there a detailed guide of using ambari with ceph-hadoop, or has
> anyone tried it?
> 2.  Is there a script or list of log files useful for debugging ceph
> issues in general?
>
> thanks,
>
> kesten
>
>
> ps.
> I have opened a gist via
> https://gist.github.com/darKoram/9051450
> and an issue on the horton forums at
> http://hortonworks.com/community/forums/topic/ambari-restart-services-give-
> bash-usrlibhadoopbinhadoop-daemon-sh-permiss/#post-48793
>
>
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: feedback on supporting libc++

2014-01-09 Thread Noah Watkins
I've send up a pull request with an initial run at this patch set:

   https://github.com/ceph/ceph/pull/1064

The only thing that we haven't mentioned as possible solutions is to
only use boost variants, rather than switching between the ones in
std:: (in c++11) and std::tr1::.

On Tue, Dec 31, 2013 at 11:59 AM, Josh Durgin  wrote:
> On 12/31/2013 08:59 AM, Noah Watkins wrote:
>>
>> Thanks for testing that Josh. Before cleaning up this patch set, I
>> have a few questions.
>>
>> I'm still not clear on how to handle the "std::tr1::shared_ptr <
>> ObjListCtx > ctx;" in librados.hpp. If we change this to
>> ceph::shared_ptr, then we'll also need to some how ship with the
>> translations here:
>>
>>https://github.com/ceph/ceph/blob/port/libc%2B%2B/src/include/memory.h
>
>
> I'd suggest treating it like we did buffer.h and associated headers -
> make a symlink to it from include/rados/memory.h, and install a copy of
> it with the librados headers.
>
>
>> It's also not clear that ceph::shared_ptr should be exposed publically
>> if there is a thought we might start switching out implementations of
>> ceph::shared_ptr via memory.h (e.g. by using boost implementation).
>
>
> We can't change the actual type used by librados, since AIUI that's
> part of the ABI, so if we want to use another type internally we can
> make include/rados/memory.h a copy of the original instead of a symlink,
> and then change the internal include/memory.h however we like.
>
> Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] how to use the function ceph_open_layout

2014-01-03 Thread Noah Watkins
You'll need to register the new pool with the MDS:

ceph mds add_data_pool 

On Thu, Jan 2, 2014 at 9:48 PM, 鹏  wrote:
>  Hi all;
> today,  I want to use the fuction of ceph_open_layout() in libcephFs.h
>
> I creat a new pool  success,
> # rados mkpool data1
> and then I  edit the code like this:
>
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data1")
>
> and then the fd is -22!
>
> when I use the data pool , it can success
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data")
>
> the ceph_open_layout support read/write to a new pool???
>
>  thinks you for the help!
> yous !
>
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: feedback on supporting libc++

2013-12-31 Thread Noah Watkins
Thanks for testing that Josh. Before cleaning up this patch set, I
have a few questions.

I'm still not clear on how to handle the "std::tr1::shared_ptr <
ObjListCtx > ctx;" in librados.hpp. If we change this to
ceph::shared_ptr, then we'll also need to some how ship with the
translations here:

  https://github.com/ceph/ceph/blob/port/libc%2B%2B/src/include/memory.h

It's also not clear that ceph::shared_ptr should be exposed publically
if there is a thought we might start switching out implementations of
ceph::shared_ptr via memory.h (e.g. by using boost implementation).

On Mon, Dec 30, 2013 at 5:19 PM, Josh Durgin  wrote:
> On 12/27/2013 03:34 PM, Noah Watkins wrote:
>>
>> On Wed, Oct 30, 2013 at 2:02 PM, Josh Durgin 
>> wrote:
>>>
>>> On 10/29/2013 03:51 PM, Noah Watkins wrote:
>>>
>>> unsafe to me. Could you check whether you can run 'rados ls' compiled
>>> against an old librados, but dynamically loading librados from this
>>> branch compiled in c++98 mode?
>>
>>
>> I'm still working on this, but my understanding so far from libc++
>> documentation is that libc++ and libstdc++ are API but not ABI
>> compatible, so there shouldn't be an expectation that librados binary
>> library built against libstc++ will work if dynamically linked against
>> libc++.
>
>
> I meant if it was compiled against libstdc++ both times, I was curious
> whether changing std::tr1::shared_ptr to ceph::shared_ptr would result
> in any incompatibility.
>
> I just tried this, and it worked fine (I think because it does not
> actually create a new c++ type, but acts like a typedef and just
> creates an alias), so I've got no issues with this approach.
>
> Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Shared library symbol visibility

2013-12-30 Thread Noah Watkins
It looks like we may be outgrowing the use of export-symbols-regex and
friends to control symbol visibility for published shared libraries.
On Linux, ld seems to be quite content linking against hidden symbols,
but at least on OSX with Clang it seems the visibility is strictly
enforced.

For instance, librados exports only the prefix "rados_", but that
regex hides everything in the C++ interface. Unfortunately,
export-symbols-regex doesn't play nice with C++ name mangling.

Large projects that I've been looking at for examples (chromium, v8,
Java) seem to use a different approach based on the compiler flag
"-fvisibility=hidden" that hides everything by default and uses
explicit exporting.

These are the basics, and there are variants that work on Windows for
DLL's as well with more macro magic.

#define CEPH_EXPORT __attribute__((__visibility__("default")))

  class CEPH_EXPORT ObjectOperation {
public:
  ObjectOperation();
  virtual ~ObjectOperation();
  ...

There is a sample branch up with this approach at:

   http://github.com/ceph/ceph port/visibility

More info

https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html
http://gcc.gnu.org/wiki/Visibility

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: feedback on supporting libc++

2013-12-27 Thread Noah Watkins
On Wed, Oct 30, 2013 at 2:02 PM, Josh Durgin  wrote:
> On 10/29/2013 03:51 PM, Noah Watkins wrote:
>
> unsafe to me. Could you check whether you can run 'rados ls' compiled
> against an old librados, but dynamically loading librados from this
> branch compiled in c++98 mode?

I'm still working on this, but my understanding so far from libc++
documentation is that libc++ and libstdc++ are API but not ABI
compatible, so there shouldn't be an expectation that librados binary
library built against libstc++ will work if dynamically linked against
libc++.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Building Ceph using CMake

2013-12-20 Thread Noah Watkins
On Tue, Dec 17, 2013 at 2:09 PM, Ali Maredia  wrote:
>
> Most of the speedup can be attributed to the fact that libtool is compiling 
> both PIC and non-PIC versions of every source file. CMake just builds 
> everything with -fPIC. We don't have an opinion on the matter, but you may 
> want to consider doing the same with the autotools build.
>
> Many source files are compiled into several targets causing them to be built 
> multiple time. With the CMake build I was able to pull them into static 
> libraries and link them into the targets that needed them.

In the latest round of automake clean-ups I know some of the
compilation redundancy was removed, but I dont' recall if it was all
taken care of. If you happen to know have a list of the redundant
stuff you found that would be helpful for improving automake.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: /usr/bin/ld: cannot find -lboost_program_options

2013-12-01 Thread Noah Watkins
Hi Charles,

Out of curiousity, do you have a config.log handy? A recent change to
configure.ac should have caught the absense of the boost program
options library before this step.

On Sun, Dec 1, 2013 at 7:08 PM, charles L  wrote:
> i reinstalled ...libboost-program-options-dev and it fixed the issue.
>
> Thanks.
> 
>> Date: Mon, 2 Dec 2013 10:18:33 +0800
>> From: liw...@ubuntukylin.com
>> To: charlesboy...@hotmail.com
>> CC: ceph-devel@vger.kernel.org
>> Subject: Re: /usr/bin/ld: cannot find -lboost_program_options
>>
>> Please install libboost-program-options-dev package before compiling
>> for example, for Ubuntu,
>> sudo apt-get install libboost-program-options-dev
>>
>> On 12/02/2013 09:57 AM, charles L wrote:
>>> Pls can some1 help? Im compiling ceph...i did the make -j2 command and got 
>>> this "cannot find -lboost_program_options" many times ..so i tried to run 
>>> make in verbose mode...and got this...
>>>
>>> root@ubuntuserver:/home/ceph# V=1 make
>>> Making all in .
>>> make[1]: Entering directory `/home/ceph'
>>> make[1]: Nothing to be done for `all-am'.
>>> make[1]: Leaving directory `/home/ceph'
>>> Making all in src
>>> make[1]: Entering directory `/home/ceph/src'
>>> make all-recursive
>>> make[2]: Entering directory `/home/ceph/src'
>>> Making all in ocf
>>> make[3]: Entering directory `/home/ceph/src/ocf'
>>> make[3]: Nothing to be done for `all'.
>>> make[3]: Leaving directory `/home/ceph/src/ocf'
>>> Making all in java
>>> make[3]: Entering directory `/home/ceph/src/java'
>>> make all-am
>>> make[4]: Entering directory `/home/ceph/src/java'
>>> make[4]: Nothing to be done for `all-am'.
>>> make[4]: Leaving directory `/home/ceph/src/java'
>>> make[3]: Leaving directory `/home/ceph/src/java'
>>> make[3]: Entering directory `/home/ceph/src'
>>> ./check_version ./.git_version
>>> ./.git_version is up to date.
>>> /bin/bash ../libtool --tag=CXX --mode=link g++ -Wall -Wtype-limits 
>>> -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security 
>>> -fno-strict-aliasing -fsigned-char -rdynamic -Wnon-virtual-dtor 
>>> -Wno-invalid-offsetof -fno-builtin-malloc -fno-builtin-calloc 
>>> -fno-builtin-realloc -fno-builtin-free -Wstrict-null-sentinel -g 
>>> -Wl,--as-needed -latomic_ops -o ceph_filestore_tool 
>>> tools/ceph-filestore-tool.o libosd.la libosdc.la libos.la -laio -lleveldb 
>>> -lsnappy libperfglue.la -ltcmalloc libos.la -laio -lleveldb -lsnappy 
>>> libglobal.la -lpthread -lm -lcrypto++ -luuid -lm -lkeyutils -lrt 
>>> -lboost_program_options -ldl -lboost_thread -lboost_system -lleveldb 
>>> -lsnappy
>>> libtool: link: g++ -Wall -Wtype-limits -Wignored-qualifiers -Winit-self 
>>> -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char 
>>> -rdynamic -Wnon-virtual-dtor -Wno-invalid-offsetof -fno-builtin-malloc 
>>> -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free 
>>> -Wstrict-null-sentinel -g -Wl,--as-needed -o ceph_filestore_tool 
>>> tools/ceph-filestore-tool.o /usr/lib/libatomic_ops.a ./.libs/libosd.a 
>>> ./.libs/libosdc.a ./.libs/libperfglue.a -ltcmalloc ./.libs/libos.a -laio 
>>> ./.libs/libglobal.a -lpthread -lcrypto++ -luuid -lm -lkeyutils -lrt 
>>> -lboost_program_options -ldl -lboost_thread -lboost_system -lleveldb 
>>> -lsnappy
>>> /usr/bin/ld: cannot find -lboost_program_options
>>> collect2: error: ld returned 1 exit status
>>> make[3]: *** [ceph_filestore_tool] Error 1
>>> make[3]: Leaving directory `/home/ceph/src'
>>> make[2]: *** [all-recursive] Error 1
>>> make[2]: Leaving directory `/home/ceph/src'
>>> make[1]: *** [all] Error 2
>>> make[1]: Leaving directory `/home/ceph/src'
>>> make: *** [all-recursive] Error 1 --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>--
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [May be a bug?]Cannot umount cephfs after all mons is stopped

2013-11-29 Thread Noah Watkins
Did you try `umount -f`? I wouldn't say that is 'clean', but might
avoid a reboot. It would seem there isn't much else that can be done
if there is dirty data and no cluster to flush it to.

This also looks relevant:

   http://tracker.ceph.com/issues/206

On Thu, Nov 28, 2013 at 9:51 PM, Ketor D  wrote:
> Hi Sage:
>We are testing cephfs mouting by kernel 3.12. And we meet a
> situation that we cannot umount cephfs and also cannot reboot the
> client machine. Only hard reset can restart the client machine.
> Here is the flow:
> 1) Create a ceph cluster with 3 mons, 2 mds (active-standby), 2 osds.
> 2) Mount the cephfs with linux kernel 3.12.
> 3) Using service ceph stop mon.x to stop all mons.
> 4) Then we cannot umount the cephfs. The command lsof and
> fuser cannot return,and even if we use "umount -l [mount_point]",  we
> only see the mount_point disappeared from /etc/mtab, but we still
> cannot soft reboot the client machine.
>
> If we get in this situation, we can only hard reset the client
> machine to avoid this.
> So the problem is if there is some method can gracefully
> umount the cephfs and dont make the client machine cant rebooted?
>
>  Regards!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Building Ceph using CMake

2013-11-26 Thread Noah Watkins

On Nov 26, 2013, at 2:06 PM, Ali Maredia  wrote:

> Hi all,
> 
> I'm a student working on a project to make ceph build faster and to help with 
> efforts to port ceph to other platforms using cmake. 

CMake is awesome. Also, you might be interested in checking out the portability 
work going on at github.com/ceph/ceph wip-port.

-Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-26 Thread Noah Watkins
I put up a patch here for review

https://github.com/ceph/ceph/pull/875/files

It seems ok as long as boost doesn’t ever try to change their internal 
representation, which in this patch we reach in an grab for the 16 octet 
representation. Why not just grab a copy of libuuid from util-linux and keep it 
in tree?

On Nov 25, 2013, at 9:52 PM, James Harper  wrote:

>> 
>> James,
>> 
>> I'm using uuid.begin()/end() to grab the 16-byte representation of the UUID.
>> Did you figure out how to populate a boost::uuid_t from the bytes? In
>> particular, I'm referring to FileJournal::decode.
>> 
>> Actually, I suppose that any Ceph usage of the 16-byte representation should
>> be replaced using the Boost serialization of uuid_t?
>> 
> 
> As I said I haven't actually tested it, apart from that I have librbd working 
> under Windows now ("rbd ls" and "rbd export" both work but I don't know if 
> they actually do anything with uuid's...)
> 
> My patch to MStatfsReply.h to make it compile is:
> 
> diff --git a/src/messages/MStatfsReply.h b/src/messages/MStatfsReply.h
> index 8ceec9c..40a5bdd 100644
> --- a/src/messages/MStatfsReply.h
> +++ b/src/messages/MStatfsReply.h
> @@ -22,7 +22,7 @@ public:
> 
>   MStatfsReply() : Message(CEPH_MSG_STATFS_REPLY) {}
>   MStatfsReply(uuid_d &f, tid_t t, epoch_t epoch) : 
> Message(CEPH_MSG_STATFS_REPLY) {
> -memcpy(&h.fsid, f.uuid, sizeof(h.fsid));
> +memcpy(&h.fsid, &f.uuid, sizeof(h.fsid));
> header.tid = t;
> h.version = epoch;
>   }
> 
> So assuming this actually works, the uuid bytes are accessible as per the 
> above.
> 
> James
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-25 Thread Noah Watkins
James,

I’m using uuid.begin()/end() to grab the 16-byte representation of the UUID. 
Did you figure out how to populate a boost::uuid_t from the bytes? In 
particular, I’m referring to FileJournal::decode. 

Actually, I suppose that any Ceph usage of the 16-byte representation should be 
replaced using the Boost serialization of uuid_t?

Thanks,
-Noah

On Nov 13, 2013, at 2:33 PM, James Harper  wrote:

> Patch follows. When I wrote it I was just thinking it would be used for win32 
> build, hence the #ifdef. As I said before, it compiles but I haven't tested 
> it. I can clean it up a bit and resend it with a signed-off-by if anyone 
> wants to pick it up and follow it through sooner than I can. I don't know how 
> boost behaves if the uuid parse fails (exception maybe?) so that would need 
> resolving too.
> 
> In addition, a bunch of ceph files include the libuuid header directly, even 
> though all the ones I've found don't appear to need it, so they need to be 
> fixed for a clean compile under win32, and to remove dependency on libuuid. 
> There may also be other cases that need work, in particular anything that 
> memcpy's into the 16 byte uuid directly. See patch for MStatfsReply.h where a 
> minor tweak was necessary.
> 
> (if anyone is interested, I have librados and librbd compiling under mingw32, 
> but I can't get boost to build its thread library so I don't get a clean 
> link, and there are probably other link errors too. I've run out of time for 
> doing much more on this for the moment)
> 
> James
> 
> diff --git a/src/include/uuid.h b/src/include/uuid.h
> index 942b807..201ac76 100644
> --- a/src/include/uuid.h
> +++ b/src/include/uuid.h
> @@ -8,6 +8,70 @@
> #include "encoding.h"
> #include 
> 
> +#if defined(_WIN32)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct uuid_d {
> +  boost::uuids::uuid uuid;
> +
> +  uuid_d() {
> +uuid = boost::uuids::nil_uuid();
> +  }
> +
> +  bool is_zero() const {
> +return uuid.is_nil();
> +//return boost::uuids::uuid::is_nil(uuid);
> +  }
> +
> +  void generate_random() {
> +boost::uuids::random_generator gen;
> +uuid = gen();
> +  }
> +
> +  bool parse(const char *s) {
> +boost::uuids::string_generator gen;
> +uuid = gen(s);
> +return true;
> +// what happens if parse fails?
> +  }
> +  void print(char *s) {
> +std::string str = boost::lexical_cast(uuid);
> +memcpy(s, str.c_str(), 37);
> +  }
> +
> +  void encode(bufferlist& bl) const {
> +::encode_raw(uuid, bl);
> +  }
> +  void decode(bufferlist::iterator& p) const {
> +::decode_raw(uuid, p);
> +  }
> +
> +  uuid_d& operator=(const uuid_d& r) {
> +uuid = r.uuid;
> +return *this;
> +  }
> +};
> +WRITE_CLASS_ENCODER(uuid_d)
> +
> +inline std::ostream& operator<<(std::ostream& out, const uuid_d& u) {
> +  //char b[37];
> diff --git a/src/include/uuid.h b/src/include/uuid.h
> index 942b807..201ac76 100644
> --- a/src/include/uuid.h
> +++ b/src/include/uuid.h
> @@ -8,6 +8,70 @@
> #include "encoding.h"
> #include 
> 
> +#if defined(_WIN32)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct uuid_d {
> +  boost::uuids::uuid uuid;
> +
> +  uuid_d() {
> +uuid = boost::uuids::nil_uuid();
> +  }
> +
> +  bool is_zero() const {
> +return uuid.is_nil();
> +//return boost::uuids::uuid::is_nil(uuid);
> +  }
> +
> +  void generate_random() {
> +boost::uuids::random_generator gen;
> +uuid = gen();
> +  }
> +
> +  bool parse(const char *s) {
> +boost::uuids::string_generator gen;
> +uuid = gen(s);
> +return true;
> +// what happens if parse fails?
> +  }
> +  void print(char *s) {
> +std::string str = boost::lexical_cast(uuid);
> +memcpy(s, str.c_str(), 37);
> +  }
> +
> +  void encode(bufferlist& bl) const {
> +::encode_raw(uuid, bl);
> +  }
> +  void decode(bufferlist::iterator& p) const {
> +::decode_raw(uuid, p);
> +  }
> +
> +  uuid_d& operator=(const uuid_d& r) {
> +uuid = r.uuid;
> +return *this;
> +  }
> +};
> +WRITE_CLASS_ENCODER(uuid_d)
> +
> +inline std::ostream& operator<<(std::ostream& out, const uuid_d& u) {
> +  //char b[37];
> +  //uuid_unparse(u.uuid, b);
> +  return out << u.uuid;
> +}
> +
> +inline bool operator==(const uuid_d& l, const uuid_d& r) {
> +  return l.uuid == r.uuid;
> +}
> +
> +inline bool operator!=(const uuid_d& l, const uuid_d& r) {
> +  return l.uuid != r.uuid;
> +}
> +#else
> extern "C" {
> #include 
> #include 
> @@ -56,6 +120,6 @@ inline bool operator==(const uuid_d& l, const uuid_d& r) {
> inline bool operator!=(const uuid_d& l, const uuid_d& r) {
>   return uuid_compare(l.uuid, r.uuid) != 0;
> }
> -
> +#endif
> 
> #endif
> diff --git a/src/messages/MStatfsReply.h b/src/messages/MStatfsReply.h
> index 8ceec9c..40a5bdd 100644
> --- a/src/messages/MStatfsReply.h
> +++ b/src/messages/MStatfsReply.h
> @@ -22,7 +22,7 @@ public:
> 
>   MStatfsReply() : Message(CEPH_MSG_STATFS_REPLY) {}
>   MStatfsReply(u

RFC: object operation instruction set

2013-11-17 Thread Noah Watkins
The ObjectOperation interface in librados is great for performing compound 
atomic operations. However, it doesn’t seem to be capable of expressing more 
complex flows. Consider the following set of operations that I one might want 
to run atomically to optionally initialize an xattr:

   int ret = getxattr(“foo”)
   if (ret < 0 && ret != -ENODATA)
 return ret;

   if (ret == -ENODATA)
 /* do some initialization */
   else
 /* do something else */

As it stands, one would need to build a cls_xyz to do this. Alternatively, 
something like cls_lua could be used, but there is a lot of downsides to that. 
However, after building several cls_xyz modules, it is clear that the majority 
of the time is spent doing basic logic, moving some data around, and 
occasionally doing something like incrementing some counters.

I’ve put a prototype solution up in github

`github.com/ceph/ceph.git obj_op_virt_machine`

that adds control instructions into the ObjectOperation interface. Using the 
interface I can express the above logic as follows:

   ObjectReadOperation op;

   // foo doesn’t exist. the return value
   // will be placed into a named register “ret"
   op.getxattr(“foo”, bl, NULL);

   // jump to label if “ret” register >= 0
   op.ois_jge(“ret”, 0, “has_attr”); 

   // jump to label if “ret” register == 0
   op.ois_jeq(“ret”, -ENODATA, “no_attr”);

   // fall through to return any error in the
   // “ret” register. returns immediately
   op.ois_ret(“ret”);

   // define a label target
   op.ois_label(“has_attr”);
   /* … do some stuff … */
   op.ois_ret(0);

  // defines a label target
  op.ois_label(“no_attr”);
  /* … do initialization … */
  op.ois_ret(0);

  ioctx.operate(“obj”, &op);

Using only a few instructions, we can get pretty good building blocks. Adding a 
few to examine some data in primitive ways would add another level of 
usefulness, too (e.g. atomic counter increment). And, this can also be made 
safe by ensuring that jumps are always forward, removing any problems like 
infinite loops.

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-13 Thread Noah Watkins
Oh ok, no rush.  Just wanted to know if you were still hacking on it. Thanks!

On Nov 13, 2013, at 1:42 PM, James Harper  wrote:

>> Hi James,
>> 
>> I just wanted to follow up on this thread. I'd like to bring this patch into 
>> the
>> wip-port portability branch. Were you able to get the boost::uuid to work as
>> a drop-in replacement?
>> 
> 
> I have it compiling but haven't tested. I'll send through what I have.
> 
> James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-13 Thread Noah Watkins
Hi James,

I just wanted to follow up on this thread. I’d like to bring this patch into 
the wip-port portability branch. Were you able to get the boost::uuid to work 
as a drop-in replacement?

Thanks,
Noah

On Nov 9, 2013, at 9:22 PM, Sage Weil  wrote:

> On Sun, 10 Nov 2013, James Harper wrote:
>>> 
>>> On Sat, 9 Nov 2013, James Harper wrote:
 Just out of curiosity (recent thread about windows port) I just had a
 quick go at compiling librados under mingw (win32 cross compile), and
 one of the errors that popped up was the lack of libuuid under mingw.
 Ceph appears to use libuuid, but I notice boost appears to include a
 uuid class too, and it seems that ceph already uses some of boost (which
 already builds under mingw).
 
 Is there anything special about libuuid that would mean boost's uuid
 class couldn't replace it? And would it be better to still use ceph's
 uuid.h as a wrapper around the boost uuid class, or to modify ceph to
 use the boost uuid class directly?
>>> 
>>> Nice!  Boost uuid looks like it would work just fine.  It is probably
>>> easier and less disruptive to use it from within the existing class in
>>> include/uuid.h.
>>> 
>> 
>> That seems to work (the header compiles at least), but then it falls 
>> down when things try to memcpy out of it. In particular, an fsid appears 
>> to be a char[16]. Is that a uuid? And is keeping it as a byte array an 
>> optimisation?
> 
> Probably just being lazy; where was that?  Feel free to replace the memcpy 
> with methods to copy in/out if it's necessary...
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] how to use rados_exec

2013-11-12 Thread Noah Watkins
The cls_crypto.cc file in src/ hasn't been included in the Ceph
compilation for a long time. Take a look at src/cls/* for a list of
modules that are compiled. In particular, there is a "Hello World"
example that is nice. These should work for you out-of-the-box.

You could also try to compile cls_crypto.cc (follow the basic
structure of src/cls/Makefile.am).

-Noah

On Tue, Nov 12, 2013 at 1:05 AM, 鹏  wrote:
>  Hi all!
>long time no see!
>I  want to use the function rados_exec,  and I found  the class
> cls_crypto.cc  in the  code source of ceph;
> so I  run the funtion like this:
>
>rados_exec(ioctx, "foo_object", "crypto" , "md5", buf,
> sizeof(buf),buf2, sizeof(buf2) )
>
> ant the function return   operation not support!
>
>  I check the source of ceph , and find that  cls_crypto.cc is not
> build。how can I bulid the class and run it!
>
>
>
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libuuid vs boost uuid

2013-11-09 Thread Noah Watkins
Alan, would this fix the problem on FreeBSD? IIRC llibuuid is a
terrible terrible headache, with the recommended approach, being to
upstream changes to e2fsprogs-libuuid.

On Fri, Nov 8, 2013 at 10:43 PM, Sage Weil  wrote:
> On Sat, 9 Nov 2013, James Harper wrote:
>> Just out of curiosity (recent thread about windows port) I just had a
>> quick go at compiling librados under mingw (win32 cross compile), and
>> one of the errors that popped up was the lack of libuuid under mingw.
>> Ceph appears to use libuuid, but I notice boost appears to include a
>> uuid class too, and it seems that ceph already uses some of boost (which
>> already builds under mingw).
>>
>> Is there anything special about libuuid that would mean boost's uuid
>> class couldn't replace it? And would it be better to still use ceph's
>> uuid.h as a wrapper around the boost uuid class, or to modify ceph to
>> use the boost uuid class directly?
>
> Nice!  Boost uuid looks like it would work just fine.  It is probably
> easier and less disruptive to use it from within the existing class in
> include/uuid.h.
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: portability issue with gmtime method in utime_t

2013-11-09 Thread Noah Watkins
Hi James,

I think my vote would be to rename `ostream& gmtime(ostream& out)
const` to something like `write_gmtime`, but the global namespace
approach seems OK too. Maybe someone else has a strong opinion.

---

sort of related: while we are on the subject of utime_t, `timegm`
isn't portable. This is the hack I'm using in `wip-port`, but I don't
think it should stay this way:

diff --git a/src/include/utime.h b/src/include/utime.h
index 5bebc70..1a74a85 100644
--- a/src/include/utime.h
+++ b/src/include/utime.h
@@ -238,6 +238,22 @@ class utime_t {
 bdt.tm_hour, bdt.tm_min, bdt.tm_sec, usec());
   }

+  static time_t my_timegm (struct tm *tm) {
+time_t ret;
+char *tz;
+
+tz = getenv("TZ");
+setenv("TZ", "", 1);
+tzset();
+ret = mktime(tm);
+if (tz)
+  setenv("TZ", tz, 1);
+else
+  unsetenv("TZ");
+tzset();
+return ret;
+  }
+
   static int parse_date(const string& date, uint64_t *epoch, uint64_t *nsec,
 string *out_date=NULL, string *out_time=NULL) {
 struct tm tm;
@@ -274,7 +290,7 @@ class utime_t {
 } else {
   return -EINVAL;
 }
-time_t t = timegm(&tm);
+time_t t = my_timegm(&tm);
 if (epoch)
   *epoch = (uint64_t)t;


On Sat, Nov 9, 2013 at 12:24 AM, James Harper
 wrote:
> utime.h defines a utime_t class with a gmtime() method, and also calls the 
> library function gmtime_r().
>
> mingw implements gmtime_r() as a macro in pthread.h that in turn calls 
> gmtime(), and gcc bails because it gets confused about which is being called:
>
> utime.h: In member function 'utime_t utime_t::round_to_minute()':
> utime.h:113:5: error: no matching function for call to 
> 'utime_t::gmtime(time_t*)'
> utime.h:113:5: note: candidate is:
> utime.h:146:12: note: std::ostream& utime_t::gmtime(std::ostream&) const
> utime.h:146:12: note:   no known conversion for argument 1 from 'time_t* {aka 
> long long int*}' to 'std::ostream& {aka std::basic_ostream&}'
>
> Same for asctime and localtime. I can work around it by creating a static 
> method that in turn calls ::gmtime() etc, but I'm not sure that's the best 
> way to do it.
>
> There's a bunch of other build errors in there too so it may be a lost 
> cause...
>
> James
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: emperor leftovers

2013-11-07 Thread Noah Watkins

On Nov 7, 2013, at 5:47 PM, Matt W. Benjamin  wrote:

> MSVC is the default windows env.  It's probably the ideal, despite most
> requirement for moving furthest towards the windows mindset.  It has better
> open source tool support than you might expect.

Cool, thanks for the clarification. This might be a good reason for the source 
code reorganization blueprint, assuming part of its goals are to be able to 
build Ceph in components. Being able to work on just porting, say, librados, 
would be nice/easier.

http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Source_tree_restructuring

> 
> Matt
> 
> - "Noah Watkins"  wrote:
> 
>> Oh, my ignorance of Windows development is enormous :) So there are
>> cygwin, mingw, and msvc. And mingw “more” native than cygwin, but
>> doesn’t try to do posix, and msvc just the default/native windows
>> development env?
>> 
>> On Nov 7, 2013, at 5:34 PM, Matt W. Benjamin 
>> wrote:
>> 
>>> Or, MSVC, frankly.
>>> 
>>> - "Matt W. Benjamin"  wrote:
>>> 
>>>> Yes.  But you may wish to think about mingwXX porting rather than
>>>> Cygwin,
>>>> if you prefer native results.
>>>> 
>>>> Matt
>>>> 
>>>> - "Noah Watkins"  wrote:
>>>> 
>>>>> On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil 
>> wrote:
>>>>> 
>>>>>> curious if the discussion on windows portability is relevant
>> here
>>>> or
>>>>> if
>>>>>> it's better treated as a separate but related effort.
>>>>> 
>>>>> The kernel space talk that's been tossed around probably isn't
>>>>> relevant, but I'd be nice to learn about cygwin porting if anyone
>>>> has
>>>>> knowledge in this area.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> ceph-devel"
>>>>> in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> -- 
>>>> Matt Benjamin
>>>> The Linux Box
>>>> 206 South Fifth Ave. Suite 150
>>>> Ann Arbor, MI  48104
>>>> 
>>>> http://linuxbox.com
>>>> 
>>>> tel.  734-761-4689 
>>>> fax.  734-769-8938 
>>>> cel.  734-216-5309 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>> ceph-devel"
>>>> in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> -- 
>>> Matt Benjamin
>>> The Linux Box
>>> 206 South Fifth Ave. Suite 150
>>> Ann Arbor, MI  48104
>>> 
>>> http://linuxbox.com
>>> 
>>> tel.  734-761-4689 
>>> fax.  734-769-8938 
>>> cel.  734-216-5309
> 
> -- 
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
> 
> http://linuxbox.com
> 
> tel.  734-761-4689 
> fax.  734-769-8938 
> cel.  734-216-5309

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: emperor leftovers

2013-11-07 Thread Noah Watkins
Oh, my ignorance of Windows development is enormous :) So there are cygwin, 
mingw, and msvc. And mingw “more” native than cygwin, but doesn’t try to do 
posix, and msvc just the default/native windows development env?

On Nov 7, 2013, at 5:34 PM, Matt W. Benjamin  wrote:

> Or, MSVC, frankly.
> 
> - "Matt W. Benjamin"  wrote:
> 
>> Yes.  But you may wish to think about mingwXX porting rather than
>> Cygwin,
>> if you prefer native results.
>> 
>> Matt
>> 
>> - "Noah Watkins"  wrote:
>> 
>>> On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil  wrote:
>>> 
>>>> curious if the discussion on windows portability is relevant here
>> or
>>> if
>>>> it's better treated as a separate but related effort.
>>> 
>>> The kernel space talk that's been tossed around probably isn't
>>> relevant, but I'd be nice to learn about cygwin porting if anyone
>> has
>>> knowledge in this area.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>> ceph-devel"
>>> in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> -- 
>> Matt Benjamin
>> The Linux Box
>> 206 South Fifth Ave. Suite 150
>> Ann Arbor, MI  48104
>> 
>> http://linuxbox.com
>> 
>> tel.  734-761-4689 
>> fax.  734-769-8938 
>> cel.  734-216-5309 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
> 
> http://linuxbox.com
> 
> tel.  734-761-4689 
> fax.  734-769-8938 
> cel.  734-216-5309 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: emperor leftovers

2013-11-07 Thread Noah Watkins
On Thu, Nov 7, 2013 at 5:15 PM, Sage Weil  wrote:

> curious if the discussion on windows portability is relevant here or if
> it's better treated as a separate but related effort.

The kernel space talk that's been tossed around probably isn't
relevant, but I'd be nice to learn about cygwin porting if anyone has
knowledge in this area.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unable to compile

2013-11-04 Thread Noah Watkins
This pull request (https://github.com/ceph/ceph/pull/812) reverts the
patch that changed the struct initialization. That is C99 style, but
apparently it isn't common in compilers, and GNU has just added it as
an extension.

I don't have a better solution at the moment, but as Greg mentioned,
perhaps detecting C++11 is an option, but that means a big nasty
ifdef. Another solution might be to just put the struct initialization
in a C file.

On Sun, Nov 3, 2013 at 2:33 PM, Xing Lin  wrote:
> Thanks, Noah!
>
> Xing
>
> On 11/3/2013 3:17 PM, Noah Watkins wrote:
>>
>> Thanks for looking at this. Unless there is a good solution I think
>> reverting it is ok as breaking the compile on a few platflorms is not ok.
>> Ill be lookong at this tonight.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


feedback on supporting libc++

2013-10-29 Thread Noah Watkins
Out of the box on OSX Mavericks libc++ [1] is being used as opposed to
libstdc++. One of the issues is that stuff from tr1 isn't available
(e.g. std::tr1::shared_ptr), as they have moved to std in c++11.

I'm looking for any feedback on this patch set, or if there is a
better way forward.

A set of patches on ceph.git:wip-libc++ [2] adds initial support (with
a couple temporary hacks). These patches are very similar to the
method used to support libc++ in mongodb.

Summary of changes:

  std::tr1::shared/weak_ptr maps to ceph::shared/weak_ptr

  hash_map/set maps to ceph::unordered_map/set

which will choose tr1::unordered_map/set over ext/hash_map/set.

[1] http://libcxx.llvm.org/
[2] https://github.com/ceph/ceph/compare/wip-libc%2B%2B
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Paxos vs Raft

2013-09-14 Thread Noah Watkins
I'm curious about what exactly the consensus requirement and
assumptions are for the monitors. For instance, in the discussion
between Loic and Joao, this statement:

  Joao: : the recovery logic in our implementation tries to aleviate
the burden of recovering multiple versions at the same time. We
propose a version, let the peons accept it, then move on to the next
version. On ceph, we only provide one value at a time.

seems to indicate that the leader is proposing changes sequentially.
However, that makes Ceph's use of paxos sound a lot like the reason
for the development of the Zab protocol used in Zookeeper:

  https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos

Either way, as a testament to its understandability, or maybe just its
cool factor, there are a lot of Raft reference implementations listed
on this page:

  https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin

On Fri, Sep 13, 2013 at 11:39 PM, Loic Dachary  wrote:
> Hi,
>
> Ceph ( http://ceph.com/ ) relies on a custom implementation of Paxos to 
> provide exabyte scale distributed storage. Like most people recently exposed 
> to Paxos, I struggle to understand it ... but will keep studying until I get 
> it :-) When a friend mentionned Raft (  
> http://en.wikipedia.org/wiki/Raft_%28computer_science%29 ), it looked like an 
> easy way out. But it's very recent and I would very much appreciate your 
> opinion. Do you think it is a viable alternative to Paxos ?
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subdir-objects

2013-09-07 Thread Noah Watkins
I'm so excited to have a refactored automake setup :)

I was just looking through build-refactor, and it doesn't really look
like there is much that could be reused for the non-recrusive
approach. I'll leave it up for a few days, just in case.

On Sat, Sep 7, 2013 at 1:11 PM, Roald van Loon  wrote:
> On Sat, Sep 7, 2013 at 7:47 PM, Noah Watkins  wrote:
>> Oh, and one question about the non-recursive approach. If I stick a
>> Makefile.am in the test/ directory I can do things like:
>>
>>   LDADD = all-the-test-dependencies
>>
>> and then avoid redundant per-target primaries like test_LDADD = (deps)
>> $(LDADD), because it applies to everything in the Makefile.
>>
>> Is that possible with the include approach, or would a naked LDADD in
>> an included Makefile fragment affect all the targets in the file
>> including it?
>
> LDADD = xyz would indeed affect all targets. However, it's not
> something you want to do anyway; using an _LDADD at a target is less
> confusing and less prone to errors because you know exactly what
> libraries a target needs.
>
> For instance, in test/Makefile.am you can have a debug target
> depending on libglobal, which has dependencies set by libglobal
> itself;
>
> CEPH_GLOBAL = $(LIBGLOBAL) $(PTHREAD_LIBS) -lm $(CRYPTO_LIBS) $(EXTRALIBS)
>
> And then in test/Makefile.am;
>
> ceph_test_crypto_SOURCES = test/testcrypto.cc
> ceph_test_crypto_LDADD = $(CEPH_GLOBAL)
> bin_DEBUGPROGRAMS += ceph_test_crypto
>
> And a unittest also depending on libosd;
>
> unittest_pglog_SOURCES = test/osd/TestPGLog.cc
> unittest_pglog_LDADD = $(LIBOSD) $(CEPH_GLOBAL)
> check_PROGRAMS += unittest_pglog
>
> However, libosd requires libos and libosdc, but that dependency is set
> by libosd;
>
> LIBOSD += $(LIBOSDC) $(LIBOS)
>
> This way, you have the dependencies in the right place. With recusive
> builds you'll need an "LDADD = libosd.la libosdc.la libos.la
> libglobal.la $(PTHREAD_LIBS) -lm $(CRYPTO_LIBS) $(EXTRALIBS)", so
> basically you're setting the dependencies of the required libraries in
> the makefile requiring those libraries, which is IMHO way to complex.
>
>>> I think the benefits of using recursive builds are that it may be
>>> familiar to the most people, it reflects the methods/suggestions in
>>> the automake manual, and, most importantly, it would seem that its use
>>> forces good decomposition where as a non-recursive approach relies on
>>> guidelines that are easily broken.
>
> I don't know how which method is more familiar, but I personally think
> that anyone understanding recursive automake is capable of
> understanding a simple include :-)
>
> The decomposition is a valid argument. I think that there are some
> libraries which might benefit from complete separation, like librados
> and a "libceph_client" or something like it. Those can be separated,
> but most other can't. The mon, mds, os, and osd subdirs have
> inter-dependencies for instance.
>
> We might need to restructure the source tree anyway because at some
> points it has grown messy (for instance, libcommon including stuff
> from mds, mon but also from include). However, I think implementing a
> recursive automake right now forces us to do two things at once;
> cleanup the makefiles and do some restructuring in the subdirs. I
> personally think it's best to start with cleaning up makefiles and use
> an include per subdir, so we can restructure the subdirs into
> segregated libraries later on.
>
> So I all boils down to; what to do first :-) Because I agree some
> things are better of with recursive builds, but it might be wise not
> to do that before we have revisited the source tree layout.
>
> Roald
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subdir-objects

2013-09-07 Thread Noah Watkins
Oh, and one question about the non-recursive approach. If I stick a
Makefile.am in the test/ directory I can do things like:

  LDADD = all-the-test-dependencies

and then avoid redundant per-target primaries like test_LDADD = (deps)
$(LDADD), because it applies to everything in the Makefile.

Is that possible with the include approach, or would a naked LDADD in
an included Makefile fragment affect all the targets in the file
including it?

-Noah

On Sat, Sep 7, 2013 at 10:38 AM, Noah Watkins  wrote:
> The non-recursive approach is interesting. I just had a quick look in
> the tree I despise building the most, openmpi. It has 414 Makefile.am,
> and uses recursive builds. The rebuild definitely takes a while to
> visit all the sub-dirs, and is pretty annoying when my patience is low
> :)
>
> And there is definitely a big +1 for avoiding the SUBDIRS
> synchronization that slows down parallel make.
>
> I think the benefits of using recursive builds are that it may be
> familiar to the most people, it reflects the methods/suggestions in
> the automake manual, and, most importantly, it would seem that its use
> forces good decomposition where as a non-recursive approach relies on
> guidelines that are easily broken.
>
> Given that the Ceph tree is relatively small (certinaly in comparison
> to the 414 directory openmpi monster), are there benefits to the
> non-recursive approach that are not performance related?
>
> - Noah
>
> On Sat, Sep 7, 2013 at 1:52 AM, Roald van Loon  wrote:
>> Hi Noah,
>>
>> I just had a quick look at your build-refactor branch, and I think the
>> greatest difference is that you use recursive builds and I don't. I'm
>> more in favor of non-recursive builds using includes for a number of
>> reasons. I think the most important reasons for me are;
>>
>> 1) recursive make leads to repetitive AM code
>> 2) recursive make takes much more time to compile (as each directory
>> needs to run configure and probably most important: you loose optimal
>> -jX usage due to serialization)
>> 3) non-recursive make knows all deps so rebuildings is much quicker
>> (it only compiles/links what is required instead of entering all
>> subdirs)
>>
>> There is ÏMHO one good reason to use recursive build, and that is
>> separation of AM code. However, that can be easily achieved with
>> includes and subdir-objects.
>>
>> I think this is the most important difference between your and my
>> approach, and I like to hear your arguments for recursive builds so we
>> can agree on recursive vs non-recursive make. Then I think it would be
>> great to combine work!
>>
>> Roald
>>
>> On Fri, Sep 6, 2013 at 7:27 PM, Noah Watkins  
>> wrote:
>>> Hi Roald,
>>>
>>> Sage just pointed me at your wip-automake branch. I also just pushed
>>> up a branch, make-refactor, that I was hacking on a bit. Not sure how
>>> much overlap there is, or if my approach is bogus, but I thought I'd
>>> point it out to see if there is anything that can be combined :)
>>>
>>> -Noah
>>>
>>> On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon  
>>> wrote:
>>>> On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil  wrote:
>>>>> Yes, the Makefile.am is in dire need of from TLC from someone who knows a
>>>>> bit of autotools-fu.  It is only this way because in the beginning I
>>>>> didn't know any better.
>>>>
>>>> Well, my average knowledge of autotools could at least fix this
>>>> particular issue and clean up a bit more. It's a start I guess and
>>>> helps me to continue my RGW things.
>>>>
>>>> I'll send out a pull request when I've found some time to implement
>>>> and test this.
>>>>
>>>> Roald
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subdir-objects

2013-09-07 Thread Noah Watkins
The non-recursive approach is interesting. I just had a quick look in
the tree I despise building the most, openmpi. It has 414 Makefile.am,
and uses recursive builds. The rebuild definitely takes a while to
visit all the sub-dirs, and is pretty annoying when my patience is low
:)

And there is definitely a big +1 for avoiding the SUBDIRS
synchronization that slows down parallel make.

I think the benefits of using recursive builds are that it may be
familiar to the most people, it reflects the methods/suggestions in
the automake manual, and, most importantly, it would seem that its use
forces good decomposition where as a non-recursive approach relies on
guidelines that are easily broken.

Given that the Ceph tree is relatively small (certinaly in comparison
to the 414 directory openmpi monster), are there benefits to the
non-recursive approach that are not performance related?

- Noah

On Sat, Sep 7, 2013 at 1:52 AM, Roald van Loon  wrote:
> Hi Noah,
>
> I just had a quick look at your build-refactor branch, and I think the
> greatest difference is that you use recursive builds and I don't. I'm
> more in favor of non-recursive builds using includes for a number of
> reasons. I think the most important reasons for me are;
>
> 1) recursive make leads to repetitive AM code
> 2) recursive make takes much more time to compile (as each directory
> needs to run configure and probably most important: you loose optimal
> -jX usage due to serialization)
> 3) non-recursive make knows all deps so rebuildings is much quicker
> (it only compiles/links what is required instead of entering all
> subdirs)
>
> There is ÏMHO one good reason to use recursive build, and that is
> separation of AM code. However, that can be easily achieved with
> includes and subdir-objects.
>
> I think this is the most important difference between your and my
> approach, and I like to hear your arguments for recursive builds so we
> can agree on recursive vs non-recursive make. Then I think it would be
> great to combine work!
>
> Roald
>
> On Fri, Sep 6, 2013 at 7:27 PM, Noah Watkins  wrote:
>> Hi Roald,
>>
>> Sage just pointed me at your wip-automake branch. I also just pushed
>> up a branch, make-refactor, that I was hacking on a bit. Not sure how
>> much overlap there is, or if my approach is bogus, but I thought I'd
>> point it out to see if there is anything that can be combined :)
>>
>> -Noah
>>
>> On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon  
>> wrote:
>>> On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil  wrote:
>>>> Yes, the Makefile.am is in dire need of from TLC from someone who knows a
>>>> bit of autotools-fu.  It is only this way because in the beginning I
>>>> didn't know any better.
>>>
>>> Well, my average knowledge of autotools could at least fix this
>>> particular issue and clean up a bit more. It's a start I guess and
>>> helps me to continue my RGW things.
>>>
>>> I'll send out a pull request when I've found some time to implement
>>> and test this.
>>>
>>> Roald
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subdir-objects

2013-09-06 Thread Noah Watkins
Hi Roald,

Sage just pointed me at your wip-automake branch. I also just pushed
up a branch, make-refactor, that I was hacking on a bit. Not sure how
much overlap there is, or if my approach is bogus, but I thought I'd
point it out to see if there is anything that can be combined :)

-Noah

On Wed, Aug 21, 2013 at 2:01 PM, Roald van Loon  wrote:
> On Wed, Aug 21, 2013 at 10:41 PM, Sage Weil  wrote:
>> Yes, the Makefile.am is in dire need of from TLC from someone who knows a
>> bit of autotools-fu.  It is only this way because in the beginning I
>> didn't know any better.
>
> Well, my average knowledge of autotools could at least fix this
> particular issue and clean up a bit more. It's a start I guess and
> helps me to continue my RGW things.
>
> I'll send out a pull request when I've found some time to implement
> and test this.
>
> Roald
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with the RBD Java bindings

2013-08-22 Thread Noah Watkins
On Wed, Aug 21, 2013 at 11:20 PM, Wido den Hollander  wrote:
>
> Yes, seems like a good thing to do. I wasn't sure myself when I was writing
> the bindings on how the packaging should be.

I'm not entirely sure either. With JNI it's pretty simple, but JNA
introduces all sorts of additional classes. Do you know of any large
projects using JNA? It'd be nice to find some references to examine.

> One of the things I haven't tested thoroughly enough is if you as a user of
> the bindings are able to crash the JVM. Since that should never happen.

I think that JNA is safe in that it will prevent itself from being
used incorrectly, but I don't think anything would prevent someone
from say creating a pointer off into space with JNA and then passing
it to a external library that would dereference it.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subdir-objects

2013-08-21 Thread Noah Watkins
On Wed, Aug 21, 2013 at 12:45 PM, Roald van Loon  wrote:
>
> from auto-registering the plugins in the RGW core. The only fix for
> this is making the RGW core aware of the subdirs/plugins, but I think
> that's nasty design. I'd like to have it in my make conf.

This patch will turn on the option (which should also fix your problem
if I understand correctly?), and should probably be committed anyway
as newer versions of autotools will complain loudly about our current
Makefile structure.

diff --git a/src/Makefile.am b/src/Makefile.am
index 93f3331..fb7c9dd 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -1,4 +1,4 @@
-AUTOMAKE_OPTIONS = gnu
+AUTOMAKE_OPTIONS = gnu subdir-objects
 SUBDIRS = ocf java
 DIST_SUBDIRS = gtest ocf libs3 java

> So, the question is; is there a reason why we don't use subdir objects?

I believe it is just historical, and unfortunately has just been
repeated over and over. Ideally I think that there should be a
restructuring to place a Makefile.am in every subdirectory. This would
address your issue and make it significantly easier to deal with
situations where we want to build a subset of Ceph, such as just FUSE
and librados, for example.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with the RBD Java bindings

2013-08-21 Thread Noah Watkins
Wido,

How would you feel about creating two RbdSnapInfo objects. The first
would be something like ceph.rbd.RbdSnapInfo and the second would be
ceph.rbd.jna.RbdSnapInfo. The former is what will be exposed through
the API, and the later is used only internally. That should address
the hacky-ness of my snap listing fix: just create a copy of the
SnapInfo into the public struct. it also means we can avoid exposing
users to JNA structures.

On Wed, Aug 21, 2013 at 5:11 AM, Wido den Hollander  wrote:
> On 08/20/2013 11:26 PM, Noah Watkins wrote:
>>
>> Wido,
>>
>> I pushed up a patch to
>>
>>
>> https://github.com/ceph/rados-java/commit/ca16d82bc5b596620609880e429ec9f4eaa4d5ce
>>
>> That includes a fix for this problem. The fix is a bit hacky, but the
>> tests pass now. I included more details about the hack in the code.
>>
>
> I see. Works like a charm for me now. I'll do some further testing with
> CloudStack.
>
> Wido
>
>> On Thu, Aug 15, 2013 at 9:57 AM, Noah Watkins 
>> wrote:
>>>
>>> On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander 
>>> wrote:
>>>>
>>>>
>>>> public List snapList() throws RbdException {
>>>>  IntByReference numSnaps = new IntByReference(16);
>>>>  PointerByReference snaps = new PointerByReference();
>>>>  List list = new ArrayList();
>>>>  RbdSnapInfo snapInfo, snapInfos[];
>>>>
>>>>  while (true) {
>>>>  int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps);
>>>
>>>
>>> I think you need to allocate the memory for `snaps` yourself. Here is
>>> the RBD wrapper for Python which does that:
>>>
>>>self.snaps = (rbd_snap_info_t * num_snaps.value)()
>>>ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps),
>>> byref(num_snaps))
>>>
>>> - Noah
>>
>> --
>>
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with the RBD Java bindings

2013-08-20 Thread Noah Watkins
Wido,

I pushed up a patch to

   
https://github.com/ceph/rados-java/commit/ca16d82bc5b596620609880e429ec9f4eaa4d5ce

That includes a fix for this problem. The fix is a bit hacky, but the
tests pass now. I included more details about the hack in the code.

On Thu, Aug 15, 2013 at 9:57 AM, Noah Watkins  wrote:
> On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander  wrote:
>>
>> public List snapList() throws RbdException {
>> IntByReference numSnaps = new IntByReference(16);
>> PointerByReference snaps = new PointerByReference();
>> List list = new ArrayList();
>> RbdSnapInfo snapInfo, snapInfos[];
>>
>> while (true) {
>> int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps);
>
> I think you need to allocate the memory for `snaps` yourself. Here is
> the RBD wrapper for Python which does that:
>
>   self.snaps = (rbd_snap_info_t * num_snaps.value)()
>   ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps),
>byref(num_snaps))
>
> - Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with the RBD Java bindings

2013-08-15 Thread Noah Watkins
On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander  wrote:
>
> public List snapList() throws RbdException {
> IntByReference numSnaps = new IntByReference(16);
> PointerByReference snaps = new PointerByReference();
> List list = new ArrayList();
> RbdSnapInfo snapInfo, snapInfos[];
>
> while (true) {
> int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps);

I think you need to allocate the memory for `snaps` yourself. Here is
the RBD wrapper for Python which does that:

  self.snaps = (rbd_snap_info_t * num_snaps.value)()
  ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps),
   byref(num_snaps))

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


blueprint follow-up: paper cuts

2013-08-12 Thread Noah Watkins
I couldn't find any info on ownership of items in the CDS 'paper cuts'
session. Was that done / documented somewhere? Specifically:

Quickstart for librados
  - cls_helloworld
  - helloworld_rados

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rados Protocoll

2013-08-04 Thread Noah Watkins
On Fri, Aug 2, 2013 at 1:58 AM, Niklas Goerke  wrote:
>
> As for the documentation you referenced: I didn't find a documentation of
> the RADOS Protocol which could be used to base an implementation of librados
> upon. Does anything like this exist, or would I need to "translate" the c
> implementation?

I do not know of any detailed documentation of the protocol except for
the source :(
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blueprint: osd: ceph on zfs

2013-08-04 Thread Noah Watkins
I was thinking along the lines of if it made sense to multi-purpose
the BackFileSystem for non-Linux portability. In which case even
things like posix_fallocate, xattr access, etc.. might fit in well,
which may have equivalent functionality, under a different name.

On Sun, Aug 4, 2013 at 5:47 PM, Yan, Zheng  wrote:
> On Mon, Aug 5, 2013 at 7:39 AM, Noah Watkins  wrote:
>> It seems to make sense that fiemap should be part of the `class
>> BackingFileSystem` abstraction?
>>
>
> FS_IOC_FIEMAP is a standard API, I think no need to implement it in `class
> BackingFileSystem`.
>
>
>> On Thu, Jul 25, 2013 at 4:53 PM, Sage Weil  wrote:
>>> http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_ceph_on_zfs
>>>
>>> We've done some preliminary testing and xattr debugging that allows
>>> ceph-osd to run on zfsforlinux using the normal writeahead journaling mode
>>> (the same mode used for xfs and ext4).  However, we aren't doing anything
>>> special to take advantage of zfs's capabilities.
>>>
>>> This session would go over what is needed to make parallel journaling work
>>> (which would leverage zfs snapshots).  I would also like to have a
>>> discussion about whether other longer-term possibilities, such as storing
>>> objects directly using the DMU, make sense given what ceph-osd's
>>> ObjectStore interface really needs.  It might also be an appropriate time
>>> to visit whether other snapshotting linux filesystems (like nilfs2) would
>>> fit well into any generalization of the filestore code that comes out of
>>> this effort.
>>>
>>> If anybody is interested in this, please add yourself to the interested
>>> parties section (or claim ownership) of this blueprint!
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blueprint: osd: ceph on zfs

2013-08-04 Thread Noah Watkins
It seems to make sense that fiemap should be part of the `class
BackingFileSystem` abstraction?

On Thu, Jul 25, 2013 at 4:53 PM, Sage Weil  wrote:
> http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_ceph_on_zfs
>
> We've done some preliminary testing and xattr debugging that allows
> ceph-osd to run on zfsforlinux using the normal writeahead journaling mode
> (the same mode used for xfs and ext4).  However, we aren't doing anything
> special to take advantage of zfs's capabilities.
>
> This session would go over what is needed to make parallel journaling work
> (which would leverage zfs snapshots).  I would also like to have a
> discussion about whether other longer-term possibilities, such as storing
> objects directly using the DMU, make sense given what ceph-osd's
> ObjectStore interface really needs.  It might also be an appropriate time
> to visit whether other snapshotting linux filesystems (like nilfs2) would
> fit well into any generalization of the filestore code that comes out of
> this effort.
>
> If anybody is interested in this, please add yourself to the interested
> parties section (or claim ownership) of this blueprint!
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rados Protocoll

2013-08-01 Thread Noah Watkins
Hi Niklas,

The RADOS reference implementation in C++ is quite large. Reproducing
it all in another language would be interesting, but I'm curious if
wrapping the C interface is not an option for you? There are Java
bindings that are being worked on here:
https://github.com/wido/rados-java.

There are links on ceph.com/docs to some information about Ceph, as
well as videos on Youtube, and academic papers linked to.

-Noah

On Thu, Aug 1, 2013 at 1:01 PM, Niklas Goerke  wrote:
> Hi,
>
> I was wondering why there is no native Java implementation of librados. I'm
> thinking about creating one and I'm thus looking for a documentation of the
> RADOS protocol.
> Also the way I see it librados implements the crush algorithm. Is there a
> documentation for it?
> Also an educated guess about whether the RADOS Protocol is due to changes
> would be very much appreciated.
>
> Thank you in advance
>
> Niklas
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


blueprint: ceph platform portability

2013-07-26 Thread Noah Watkins
http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Increasing_Ceph_portability

Recently I've managed to get Ceph built and running on OSX. There was
a past effort to get Ceph working on non-Linux platforms, most notably
FreeBSD, but that approach introduced a lot of ad-hoc macros that has
made it difficult to manage changes needed to support additional
platforms.

This session would address the areas within Ceph that are currently
non-portable, discuss the state of OSX support, and touch on what is
needed to factor out platform specific functionality. Changes are
roughly grouped into (1) internal critical (e.g. locking) (2) internal
non-critical (some optimizations), and (2) exported headers. A
significant amount of the OSX changes have been introduced as feature
tests with generic alternatives, and as such the tree may already be
near building on additional platforms, so it would be great to find
people that would be willing to test on additional platforms.

If you are interested please add yourself as an interested party or
owner of this blueprint :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: __bitwise__ annotation in inttypes

2013-07-12 Thread Noah Watkins
Nevermind :) I see that is for freebsd..

On Fri, Jul 12, 2013 at 7:32 PM, Noah Watkins  wrote:
> The following is in include/inttypes.h
>
>   #define __bitwise__
>
>   typedef __u16 __bitwise__ __le16;
>   typedef __u16 __bitwise__ __be16;
>   ...
>
> In linux, the same definition of __bitwise__ is used when not being
> run through Sparse.
>
> Is there any purpose in Ceph for _always_ removing those annotations,
> as is done here?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


__bitwise__ annotation in inttypes

2013-07-12 Thread Noah Watkins
The following is in include/inttypes.h

  #define __bitwise__

  typedef __u16 __bitwise__ __le16;
  typedef __u16 __bitwise__ __be16;
  ...

In linux, the same definition of __bitwise__ is used when not being
run through Sparse.

Is there any purpose in Ceph for _always_ removing those annotations,
as is done here?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assertion failure in update_from_paxos

2013-07-09 Thread Noah Watkins
It appears to be resolved in master now.

On Tue, Jul 9, 2013 at 12:43 PM, Joao Eduardo Luis
 wrote:
> On 07/09/2013 04:37 PM, Noah Watkins wrote:
>>
>> I'm getting the following failure when running a vstart instance with
>> 1 of each daemon.
>
>
> I can confirm this happens, as it just happened to me as well.
>
> My guess is that this is something Sage may have fixed last night, but will
> have to check.
>
>   -Joao
>
>>
>> --
>>
>> 0> 2013-07-09 08:30:43.213345 7fdc289e97c0 -1 mon/OSDMonitor.cc: In
>> function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
>> 7fdc289e97c0 time 2013-07-09 08:30:43.207686
>> mon/OSDMonitor.cc: 129: FAILED assert(latest_bl.length() != 0)
>>
>>   ceph version 0.65-307-g0e93dd9
>> (0e93dd93e5439fb82c416cb8eec7f36598ae7b48)
>>   1: (OSDMonitor::update_from_paxos(bool*)+0x16bd) [0x5ad5ed]
>>   2: (PaxosService::refresh(bool*)+0x143) [0x592963]
>>   3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x536c87]
>>   4: (Paxos::finish_proposal()+0x3a) [0x58c13a]
>>   5: (Paxos::begin(ceph::buffer::list&)+0x82c) [0x58bbac]
>>   6: (Paxos::propose_queued()+0xd9) [0x58be79]
>>   7: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x140)
>> [0x58d230]
>>   8: (PaxosService::propose_pending()+0x6c8) [0x593c28]
>>   9: (PaxosService::_active()+0x58b) [0x5956fb]
>>   10: (PaxosService::election_finished()+0x328) [0x595c38]
>>   11: (Monitor::win_election(unsigned int, std::set> std::less, std::allocator >&, unsigned long)+0x2e5)
>> [0x55ae05]
>>   12: (Monitor::win_standalone_election()+0x197) [0x55b027]
>>   13: (Monitor::bootstrap()+0x84b) [0x55b8fb]
>>   14: (Monitor::init()+0xad) [0x55bbcd]
>>   15: (main()+0x1ac7) [0x5296b7]
>>   16: (__libc_start_main()+0xf5) [0x3f2dc21b75]
>>   17: ./ceph-mon() [0x52e259]
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Joao Eduardo Luis
> Software Engineer | http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


assertion failure in update_from_paxos

2013-07-09 Thread Noah Watkins
I'm getting the following failure when running a vstart instance with
1 of each daemon.

--

0> 2013-07-09 08:30:43.213345 7fdc289e97c0 -1 mon/OSDMonitor.cc: In
function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fdc289e97c0 time 2013-07-09 08:30:43.207686
mon/OSDMonitor.cc: 129: FAILED assert(latest_bl.length() != 0)

 ceph version 0.65-307-g0e93dd9 (0e93dd93e5439fb82c416cb8eec7f36598ae7b48)
 1: (OSDMonitor::update_from_paxos(bool*)+0x16bd) [0x5ad5ed]
 2: (PaxosService::refresh(bool*)+0x143) [0x592963]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x536c87]
 4: (Paxos::finish_proposal()+0x3a) [0x58c13a]
 5: (Paxos::begin(ceph::buffer::list&)+0x82c) [0x58bbac]
 6: (Paxos::propose_queued()+0xd9) [0x58be79]
 7: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x140) [0x58d230]
 8: (PaxosService::propose_pending()+0x6c8) [0x593c28]
 9: (PaxosService::_active()+0x58b) [0x5956fb]
 10: (PaxosService::election_finished()+0x328) [0x595c38]
 11: (Monitor::win_election(unsigned int, std::set, std::allocator >&, unsigned long)+0x2e5)
[0x55ae05]
 12: (Monitor::win_standalone_election()+0x197) [0x55b027]
 13: (Monitor::bootstrap()+0x84b) [0x55b8fb]
 14: (Monitor::init()+0xad) [0x55bbcd]
 15: (main()+0x1ac7) [0x5296b7]
 16: (__libc_start_main()+0xf5) [0x3f2dc21b75]
 17: ./ceph-mon() [0x52e259]
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Java bindings for RADOS and RBD

2013-05-30 Thread Noah Watkins
Resending. HTML+vger issue.

-- Forwarded message --
From: Noah Watkins 
Date: Thu, May 30, 2013 at 12:59 PM
Subject: Re: Java bindings for RADOS and RBD
To: Wido den Hollander 
Cc: ceph-devel 


On Mon, May 6, 2013 at 8:21 AM, Wido den Hollander  wrote:
>
>
> The reason to use JNA is that it allows us to simply drop the bindings and 
> run them without having them compiled against the librados or librbd headers.


Nice. The JNA stuff is very easy to use. We originally looked at it
for the CephFS bindings, but there was some concern about dependencies
and licensing w.r.t. Hadoop.

>
> I've chosen to implement both RADOS and RBD into the same package and using 
> com.ceph.rados and com.ceph.rbd for as the package name. It could be splitted 
> into different projects, but I think that won't benefit anybody. Having it 
> all in one package seem easy, since RBD needs a RADOS IoCTX to work anyway.


Cool. We have the com.ceph.fs and com.ceph.crush namespaces right now.

>
> I'd like to get some feedback on the bindings about the way they are now. 
> They are still work-in-progress, but my Unit Testing shows me they are in a 
> reasonable shape already.


They look like a really good start. Here is some feedback in no
particular order.

1. Enforcing rados usage assumptions

Unlike with the C interface where users are expected to behave, we
want to avoid ever crashing the JVM. So, stuff like "what happens if I
create an IoCTX, then destroy the Rados object and use the IoCTX?"
comes to mind. I think this would correspond to the GC running
finalize on an out of scope Rados object.

2. Making IoCTX safer to use

I've used the library now to bring the IoCTX into a completely
separate RADOS project. Designing for that to be common would be
great. For instance, that may be _very_ common for users of
IoCTX.Exec() since they will have completely distinct libraries.

A first step might be to expose a constant/read-only pointer. I'm not
a JNA expert, but after reading a bit, it seems as though we can
subclass IoCTX from Structure or maybe PointerType to make IoCTX
behave.

3. Async

Getting one or two async wrappers put it in the library might help
reveal any challenges early on, even if the API coverage expands
slowly.


> Getting them into Maven Central [2] won't be that easy, so I'd like to 
> request space for that on ceph.com, for example ceph.com/download/maven and 
> users will have to manually add that to their pom.xml configuration.


This would be really nice. I dunno what people expect from Maven
projects that rely on native libraries. At least getting debian
packages would be a good step.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Erasure encoding as a storage backend

2013-05-05 Thread Noah Watkins

On May 4, 2013, at 9:51 PM, Gregory Farnum  wrote:

> I'm pretty sure we'd just want to use erasure-coded RADOS pools,
> rather than trying to do any CephFS magic erasure encoding. Doing it
> above the RADOS layers would introduce some very odd behaviors in
> terms of losing objects, as you've mentioned, and requires the clients
> to do a lot more network traffic for reads and writes.

Cool. I was just thinking of some setups I've heard of in HPC environments 
where the extra client work was ostensibly worth it in terms of reducing disk 
heads, or something :)

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Erasure encoding as a storage backend

2013-05-04 Thread Noah Watkins

On May 4, 2013, at 11:36 AM, Loic Dachary  wrote:

> 
> 
> On 05/04/2013 08:27 PM, Noah Watkins wrote:
>> 
>> On May 4, 2013, at 10:16 AM, Loic Dachary  wrote:
>> 
>>> it would be great to get feedback before the ceph summit to address the 
>>> most prominent issues.
>> 
>> One thing that has been in the back of my mind is how this proposal is 
>> influenced (if at all) by a future that includes declustered per-file raid 
>> in CephFS. I realize that may be a distant future, but it seems as though 
>> there could be a lot of overlap for the (non-client driven) rebuild/recovery 
>> component of such an architecture.
> 
> Hi Noah,
> 
> I'm not sure what declustered per-file raid is, which means it had no 
> influence on this proposal ;-) Would you be so kind as to educate me ?

I'm definitely far from an expert on the topic. But briefly the way I think 
about it is:

Currently CephFS stripes a file byte stream across a set of objects (e.g. first 
MB in object 0, 2nd in object 1, etc..), and each of these objects is in turn 
replicated. Following a failure, PGs re-replicate objects.

In client drive raid the striping algorithm is changed, and clients are 
calculating and distributing parity. In this case the parity rather than 
replication provides redundancy. So, one might consider storing objects in a 
pool with replication size 1. However, the standard PG that does replication 
wouldn't be able to handle faults correctly (parity rebuild, rather than 
re-replication), and a smart PG like the ErasureCodedPG would be needed.

So it seems like the problems are related, but I'm not sure exactly how much 
overlap there is :)

-Noah


> Cheers
> 
>> -Noah
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Erasure encoding as a storage backend

2013-05-04 Thread Noah Watkins

On May 4, 2013, at 10:16 AM, Loic Dachary  wrote:

> it would be great to get feedback before the ceph summit to address the most 
> prominent issues.

One thing that has been in the back of my mind is how this proposal is 
influenced (if at all) by a future that includes declustered per-file raid in 
CephFS. I realize that may be a distant future, but it seems as though there 
could be a lot of overlap for the (non-client driven) rebuild/recovery 
component of such an architecture.

-Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PG statechart

2013-04-26 Thread Noah Watkins
Very cool!

On Apr 26, 2013, at 9:21 PM, Loic Dachary  wrote:

> Hi Noah,
> 
> Nice tool :-) Here is the statechart generated from PG.h.
> 
> Cheers
> 
> On 04/26/2013 06:07 PM, Noah Watkins wrote:
>> Boost Statechart Viewer generates GraphViz:
>> 
>>  http://rtime.felk.cvut.cz/statechart-viewer/
>> 
>> Having trouble with my LLVM environment on 12.04, so I haven't tested it.
>> 
>> -Noah
>> 
>> On Apr 26, 2013, at 8:20 AM, Loic Dachary  wrote:
>> 
>>> Hi,
>>> 
>>> I was considering drawing a statechart ( 
>>> http://www.math-cs.gordon.edu/courses/cs211/ATMExample/SessionStatechart.gif
>>>  ) to better understand the transitions of PG
>>> 
>>> https://github.com/ceph/ceph/blob/master/src/osd/PG.h#L1341
>>> 
>>> and realized that it probably already exists somewhere. Does it ? 
>>> 
>>> /me hopefull ;-)
>>> 
>>> -- 
>>> Loïc Dachary, Artisan Logiciel Libre
>>> 
>> 
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PG statechart

2013-04-26 Thread Noah Watkins
Boost Statechart Viewer generates GraphViz:

  http://rtime.felk.cvut.cz/statechart-viewer/

Having trouble with my LLVM environment on 12.04, so I haven't tested it.

-Noah

On Apr 26, 2013, at 8:20 AM, Loic Dachary  wrote:

> Hi,
> 
> I was considering drawing a statechart ( 
> http://www.math-cs.gordon.edu/courses/cs211/ATMExample/SessionStatechart.gif 
> ) to better understand the transitions of PG
> 
> https://github.com/ceph/ceph/blob/master/src/osd/PG.h#L1341
> 
> and realized that it probably already exists somewhere. Does it ? 
> 
> /me hopefull ;-)
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: erasure coding (sorry)

2013-04-18 Thread Noah Watkins

On Apr 18, 2013, at 2:08 PM, Josh Durgin  wrote:

> I talked to some folks interested in doing a more limited form of this
> yesterday. They started a blueprint [1]. One of their ideas was to have
> erasure coding done by a separate process (or thread perhaps). It would
> use erasure coding on an object and then use librados to store the
> rasure-encoded pieces in a separate pool, and finally leave a marker in
> place of the original object in the first pool.

This sounds at a high-level similar to work out of Microsoft:

  https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf

The basic idea is to replicate first, then erasure code in the background.

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CephFS locality API RFC

2013-03-14 Thread Noah Watkins

On Mar 14, 2013, at 12:39 PM, Sage Weil  wrote:

> Unless those old bindings are already broken because of the preferred osd 
> thing…

Well, for preferred_pg EOPNOTSUPP will be ignored by the old bindings, so I 
guess it still works :)--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CephFS locality API RFC

2013-03-14 Thread Noah Watkins

On Mar 14, 2013, at 11:29 AM, Greg Farnum  wrote:

> On Thursday, March 14, 2013 at 11:14 AM, Noah Watkins wrote:
>> The current CephFS API is used to extract locality information as follows:
>> 
>> First we get a list of OSD IDs:
>> 
>> ceph_get_file_extent_osds(offset) -> [OSD ID]*
>> 
>> Using the OSD IDs we can then query for the CRUSH bucket hierarchy:
>> 
>> ceph_get_osd_crush_location(osd_id) -> path
>> 
>> The path includes hostname information, but we'd still like to get the IP. 
>> The current API for doing this is:
>> 
>> ceph_get_file_stripe_address(offset) -> [sockaddr]*
>> 
>> that returns an IP for each OSD holds replicas. The order of the output list 
>> should be the same as the the OSD list, but It'd be nice to have a 
>> consistent API that deals with OSD id, making the correspondence explicit.
> Agreed. We should probably deprecate the get_file_stripe_address() and make 
> them turn IDs into addresses on their own.

Is there an API deprecation protocol, or just -ENOTSUPP?


>> For instance:
>> 
>> ceph_get_file_stripe_address(osd_id) -> sockaddr
> How about  
> ceph_get_osd_address(osd_id) -> sockaddr
> ;)

Sounds good.

Thanks!--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CephFS locality API RFC

2013-03-14 Thread Noah Watkins
The current CephFS API is used to extract locality information as follows:

First we get a list of OSD IDs:

  ceph_get_file_extent_osds(offset) -> [OSD ID]*

Using the OSD IDs we can then query for the CRUSH bucket hierarchy:

  ceph_get_osd_crush_location(osd_id) -> path

The path includes hostname information, but we'd still like to get the IP. The 
current API for doing this is:

  ceph_get_file_stripe_address(offset) -> [sockaddr]*

that returns an IP for each OSD holds replicas. The order of the output list 
should be the same as the the OSD list, but It'd be nice to have a consistent 
API that deals with OSD id, making the correspondence explicit. For instance:

  ceph_get_file_stripe_address(osd_id) -> sockaddr

Another option is to have `ceph_get_osd_crush_location` return both the path 
and a sockaddr.

Thoughts?

Thanks,
Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MDS running at 100% CPU, no clients

2013-03-07 Thread Noah Watkins

On Mar 7, 2013, at 9:24 AM, Greg Farnum  wrote:

> This isn't bringing up anything in my brain, but I don't know what that 
> _sample() function is actually doing — did you get any farther into it?

_sample reads /proc/self/maps in a loop until eof or some other conditions. i 
couldn't figure out if the thread was stuck in _sample or a level up. Anyhow, 
my gdb-foo isn't stellar and I managed to crash the mds. I'm gonna stick some 
log points in and try to reproduce it.


> -Greg
> 
> On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote:
> 
>> Which, looks to be in a tight loop in the memory model _sample…
>> 
>> (gdb) bt
>> #0 0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1 0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () 
>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #2 0x7f027046f4c5 in std::basic_filebuf 
>> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #3 0x7f0270467ceb in std::basic_istream >& 
>> std::getline, std::allocator 
>> >(std::basic_istream >&, 
>> std::basic_string, std::allocator >&, 
>> char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #4 0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) ()
>> #5 0x005658db in MDCache::check_memory_usage() ()
>> #6 0x004ba929 in MDS::tick() ()
>> #7 0x00794c65 in SafeTimer::timer_thread() ()
>> #8 0x0000007958ad in SafeTimerThread::entry() ()
>> #9 0x7f0270d7de9a in start_thread () from 
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> 
>> On Mar 6, 2013, at 6:18 PM, Noah Watkins > (mailto:jayh...@cs.ucsc.edu)> wrote:
>> 
>>> 
>>> On Mar 6, 2013, at 5:57 PM, Noah Watkins >> (mailto:jayh...@cs.ucsc.edu)> wrote:
>>> 
>>>> The MDS process in my cluster is running at 100% CPU. In fact I thought 
>>>> the cluster came down, but rather an ls was taking a minute. There aren't 
>>>> any clients active. I've left the process running in case there is any 
>>>> probing you'd like to do on it:
>>>> 
>>>> virt res cpu
>>>> 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds
>>>> 
>>>> Thanks,
>>>> Noah
>>> 
>>> 
>>> 
>>> 
>>> This is a ceph-mds child thread under strace. The only thread
>>> that appears to be doing anything.
>>> 
>>> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
>>> Process 3372 attached - interrupt to quit
>>> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
>>> ...
>>> 
>>> That file looks to be:
>>> 
>>> ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps
>>> 
>>> (3337 is the parent process).
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org 
>> (mailto:majord...@vger.kernel.org)
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MDS running at 100% CPU, no clients

2013-03-06 Thread Noah Watkins
Which, looks to be in a tight loop in the memory model _sample…

(gdb) bt
#0  0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x7f027046f4c5 in std::basic_filebuf 
>::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x7f0270467ceb in std::basic_istream >& 
std::getline, std::allocator 
>(std::basic_istream >&, std::basic_string, std::allocator >&, char) () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) ()
#5  0x005658db in MDCache::check_memory_usage() ()
#6  0x004ba929 in MDS::tick() ()
#7  0x00794c65 in SafeTimer::timer_thread() ()
#8  0x007958ad in SafeTimerThread::entry() ()
#9  0x7f0270d7de9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0

On Mar 6, 2013, at 6:18 PM, Noah Watkins  wrote:

> 
> On Mar 6, 2013, at 5:57 PM, Noah Watkins  wrote:
> 
>> The MDS process in my cluster is running at 100% CPU. In fact I thought the 
>> cluster came down, but rather an ls was taking a minute. There aren't any 
>> clients active. I've left the process running in case there is any probing 
>> you'd like to do on it:
>> 
>> virt   res  cpu
>> 4629m  88m 5260 S   92  1.1 113:32.79 ceph-mds
>> 
>> Thanks,
>> Noah
>> 
> 
> 
> This is a ceph-mds child thread under strace. The only thread
> that appears to be doing anything.
> 
> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
> Process 3372 attached - interrupt to quit
> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
> read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020
> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
> ...
> 
> That file looks to be:
> 
> ceph-mds 3337 root 1649r   REG0,30   266903 /proc/3337/maps
> 
> (3337 is the parent process).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MDS running at 100% CPU, no clients

2013-03-06 Thread Noah Watkins

On Mar 6, 2013, at 5:57 PM, Noah Watkins  wrote:

> The MDS process in my cluster is running at 100% CPU. In fact I thought the 
> cluster came down, but rather an ls was taking a minute. There aren't any 
> clients active. I've left the process running in case there is any probing 
> you'd like to do on it:
> 
> virt   res  cpu
> 4629m  88m 5260 S   92  1.1 113:32.79 ceph-mds
> 
> Thanks,
> Noah
> 


This is a ceph-mds child thread under strace. The only thread
that appears to be doing anything.

root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
Process 3372 attached - interrupt to quit
read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020
read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
...

That file looks to be:

ceph-mds 3337 root 1649r   REG0,30   266903 /proc/3337/maps

(3337 is the parent process).--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MDS running at 100% CPU, no clients

2013-03-06 Thread Noah Watkins
The MDS process in my cluster is running at 100% CPU. In fact I thought the 
cluster came down, but rather an ls was taking a minute. There aren't any 
clients active. I've left the process running in case there is any probing 
you'd like to do on it:

virt   res  cpu
4629m  88m 5260 S   92  1.1 113:32.79 ceph-mds

Thanks,
Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Approaches to wrapping aio_exec

2013-03-06 Thread Noah Watkins
So I've been playing with the ObjectOperationCompletion code a bit. It seems to 
be really important to be able to handle decoding errors in in the 
handle_completion() callback. In particular, I'd like to be able to reach out 
and set the return value the user will see in the AioCompletion.

Any thoughts on dealing with this some how?

-Noah

On Mar 4, 2013, at 11:44 AM, Yehuda Sadeh  wrote:

> On Mon, Mar 4, 2013 at 11:34 AM, Noah Watkins  wrote:
>> 
>> On Mar 3, 2013, at 6:31 PM, Yehuda Sadeh  wrote:
>> 
>>> I pushed the wip-librados-exec branch last week that solves a similar
>>> issue. I added two more ObjectOperation::exec() api calls. The more
>>> interesting one added a callback context that is called with the
>>> output buffer of the completed sub-op. Currently in order to use it
>>> you'll need to use operate()/aio_operate(), however, a similar
>>> aio_exec interface can be added.
>> 
>> Thanks for the pointer to the branch. So, if I understand correctly,
>> we might have a new librados::aio_exec_completion call that accepts
>> a completion object? For example:
>> 
>> aio_exec_completion(AioCompletion *c, bufferlist *outbl,
>>ObjectOperationCompletion* completion)
>> {
>>  Context *onack = new C_aio_Ack(c);
>> 
>>  ::ObjectOperation rd;
>>  ObjectOpCompletionCtx *ctx = new ObjectOpCompletionCtx(completion);
>>  rd.call(cls, method, inbl, ctx->outbl, ctx, NULL);
>>  objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver);
>> 
>>  return 0;
>> }
>> 
>> where the caller would provide an ObjectOperationCompletion where it's
>> finish(..) would unwrap the protocol?
> 
> Right.
> 
>> 
>> Do you expect wip-librados-exec going up stream pretty soon, and would
> 
> We can push it ahead if needed, it doesn't depend on any of the stuff
> I'm working on right now. It just waits for someone to properly review
> it.
> 
>> something like librados::aio_exec_completion be a candidate for adding
>> to librados?
>> 
> 
> Sure, if there's a need then I don't see why not.
> 
> Yehuda

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Approaches to wrapping aio_exec

2013-03-04 Thread Noah Watkins

On Mar 3, 2013, at 6:31 PM, Yehuda Sadeh  wrote:

> I pushed the wip-librados-exec branch last week that solves a similar
> issue. I added two more ObjectOperation::exec() api calls. The more
> interesting one added a callback context that is called with the
> output buffer of the completed sub-op. Currently in order to use it
> you'll need to use operate()/aio_operate(), however, a similar
> aio_exec interface can be added.

Thanks for the pointer to the branch. So, if I understand correctly,
we might have a new librados::aio_exec_completion call that accepts
a completion object? For example:

aio_exec_completion(AioCompletion *c, bufferlist *outbl,
ObjectOperationCompletion* completion)
{
  Context *onack = new C_aio_Ack(c);

  ::ObjectOperation rd;
  ObjectOpCompletionCtx *ctx = new ObjectOpCompletionCtx(completion);
  rd.call(cls, method, inbl, ctx->outbl, ctx, NULL);
  objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver);

  return 0;
}

where the caller would provide an ObjectOperationCompletion where it's
finish(..) would unwrap the protocol?

Do you expect wip-librados-exec going up stream pretty soon, and would
something like librados::aio_exec_completion be a candidate for adding
to librados?

Thanks,
Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Approaches to wrapping aio_exec

2013-03-03 Thread Noah Watkins
I've built a custom protocol on top of Rados::exec that uses serialized 
versions of InputObject and OutputObject to implement the protocol. Here's 
simple pseudo-code that provides my_service::exec:

 void my_service::exec(oid, input_params, bufferlist& out) {
  bufferlist inbl, outbl;
  InputObject in(input_params);
  ::encode(in, inbl);

  librados::exec(oid, inbl, outbl);

  OutputObject reply;
  ::decode(reply, outbl);
  out = reply.payload;
 }

I'd like to provide a my_service::aio_exec that wraps librados::aio_exec, but 
doing so doesn't seem to be straight-forward with the current async interface.

Notice in the above example that the callers output bufferlist must be unpacked 
from the reply protocol. However, the librados::aio_exec interface unpacks the 
output directly into the caller parameter:

int librados::aio_exec(oid, …, bufferlist *outbl)
{
...
   objecter->read(oid, oloc, rd, snap_seq, outbl, 0, onack, &c->objver);
}

What's needed is a intermediate bufferlist that can be decoded when the data is 
available.

One way to do this would be wrap AioCompletion and intercept ack and safe 
callbacks to do the data unpacking. The problem with this is that we have to 
introduce a new AioCompletion type, or new AioCompletionImpl. I started to do 
this, but it feels quite clunky.

Any suggestions for handling this?

Thanks,
Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ceph - Feature #4230] (New) librados: node.js bindings

2013-02-23 Thread Noah Watkins

On Feb 23, 2013, at 12:49 PM, Alexandre Marangone 
 wrote:

> Of course, thoughts and suggestions are welcome.

Oo, this is nice. A while back I built a cls_v8 object class to play with, but 
never got around to working on a front-end. The motivation was to be able to 
write things like this in the client:

App.js
--

function plugin() {

  function helper(args) {
/* do stuff */
  }

  function handler1(input, output) {
/* do stuff */
  }

  function handler2(input, output) {
/* do stuff */
  }

}

/* serialize the plugin as a string */
var myplugin = plugin.toString();

rados.ioctx_create('mypool', ioctx, function (err) {
  if (err)
throw err;

  command = {
'script': myplugin,
'handler': 'handler1',
'input': 'hello',
  };

  /* run the plugin in OSD using V8 JIT compilation */
  ioctx.exec('myobject1', 'cls_v8', 'eval', command);

});

- Noah

> --
> Alexandre
> 
> On Fri, Feb 22, 2013 at 2:42 AM,   wrote:
>> Issue #4230 has been reported by Wido den Hollander.
>> 
>> 
>> Feature #4230: librados: node.js bindings
>> 
>> Author: Wido den Hollander
>> Status: New
>> Priority: Low
>> Assignee:
>> Category: librados
>> Target version:
>> Source: Community (dev)
>> Tags:
>> Reviewed:
>> 
>> Although I don't have a use-case at this specific point it would be very
>> cool to have node.js bindings.
>> 
>> From the docs it seems pretty simple to write these bindings and make cool
>> stuff with it.
>> 
>> I'm just opening the feature here so that it shows up and can be picked up:
>> 
>> Some reference docs:
>> 
>> http://nodejs.org/api/addons.html#addons_wrapping_c_objects
>> http://www.slideshare.net/nsm.nikhil/writing-native-bindings-to-nodejs-in-c
>> https://github.com/nikhilm/jsfoo-pune-2012
>> 
>> Without bindings you would be able to use node-ffi, but I think native
>> bindings would be cleaner: https://github.com/rbranson/node-ffi
>> 
>> 
>> 
>> You have received this notification because you have either subscribed to
>> it, or are involved in it.
>> To change your notification preferences, please click here:
>> http://tracker.ceph.com/my/account
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hadoop release jars

2013-02-22 Thread Noah Watkins
Hi all,

I've pushed up some changes that let us build stand-alone jar files for the 
Hadoop CephFS bindings.

  github.com/ceph/hadoop-common.git cephfs/branch-1.0-build-jar

Running "ant cephfs" will produce "build/hadoop-cephfs.jar".

I've tested it locally and things work well, so hopefully this means we can 
continue to develop in the hadoop-common tree and produce jar releases for 
whatever version combinations we care about.

Is this something that should be easy to integrate into gitbuilder so we can 
link off the documentation page to the release jars?

-Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hadoop DNS/topology details

2013-02-20 Thread Noah Watkins

On Feb 20, 2013, at 9:57 AM, Noah Watkins  wrote:
> 
>vector osds;
>  }
> 
>  ceph_get_file_extents(file, offset, length, vector& extents);
> 
> Then we could re-use the Striper or something?
> 
> -Noah

Although, I think your previous suggestion would be much simpler to do for the 
C api :)--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hadoop DNS/topology details

2013-02-20 Thread Noah Watkins

On Feb 20, 2013, at 9:31 AM, Sage Weil  wrote:

>> or something like this that replaces the current extent-to-sockaddr 
>> interface? The proposed interface about would do the host/ip mapping, as 
>> well as the topology mapping?
> 
> Yeah.  The ceph_offset_to_osds should probably also have an (optional?) 
> out argument that tells you how long the extent is starting from offset 
> that is on those devices.  Then you can do another call at offset+len to 
> get the next segment.


It'd be nice to hide the striping strategy so we don't have to reproduce it in 
the Hadoop shim as we currently do, and which is needed with an interface using 
only an offset (we have to know the stripe unit to jump to the next extent). 
So, something like this might work:

  struct extent {
loff_t offset, length;
vector osds;
  }

  ceph_get_file_extents(file, offset, length, vector& extents);

Then we could re-use the Striper or something?

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hadoop DNS/topology details

2013-02-20 Thread Noah Watkins

On Feb 19, 2013, at 4:39 PM, Sage Weil  wrote:

> However, we do have host and rack information in the crush map, at least 
> for non-customized installations.  How about something like
> 
>  string ceph_get_osd_crush_location(int osd, string type);
> 
> or similar.  We could call that with "host" and "rack" and get exactly 
> what we need, without making any changes to the data structures.

This would then be used in conjunction with an interface:

 ceph_offset_to_osds(offset, vector& osds)
...
osdmpa->pg_to_acting_osds(osds)
...

or something like this that replaces the current extent-to-sockaddr interface? 
The proposed interface about would do the host/ip mapping, as well as the 
topology mapping?

- Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hadoop DNS/topology details

2013-02-19 Thread Noah Watkins

On Feb 19, 2013, at 2:22 PM, Gregory Farnum  wrote:

> On Tue, Feb 19, 2013 at 2:10 PM, Noah Watkins  wrote:
> 
> That is just truly annoying. Is this described anywhere in their docs?

Not really. It's just there in the code--I can figure out the metric if you're 
interested. I suspect it is local node, local rack, off rack ordering, with no 
special tie breakers.

> I don't think it would be hard to sort, if we had some mechanism for
> doing so (crush map nearness, presumably?),

Topology information from the bucket hierarchy? I think it's always some sort 
of heuristic.

>> 1. Expand CephFS interface to return IP and hostname
> 
> Ceph doesn't store hostnames anywhere — it really can't do this. All
> it has is IPs associated with OSD ID numbers. :) Adding hostnames
> would be a monitor and map change, which we could do, but given the
> issues we've had with hostnames in other contexts I'd really rather
> not.

What is the fate of hostnames used in ceph.conf? Could that information be 
leveraged, when specified by the cluster admin?

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hadoop DNS/topology details

2013-02-19 Thread Noah Watkins
Here is the information that I've found so far regarding the operation of 
Hadoop w.r.t. DNS/topology. There are two parts, the file system client 
requirements, and other consumers of topology information.

-- File System Client --

The relevant interface between the Hadoop VFS and its underlying file system is:

  FileSystem:getFileBlockLocations(File, Extent)

which is expected to return a list of hosts (a 3-tuple: hostname, IP, topology 
path) for each block that contains any part of the specified file extent. So, 
with triplication and 2 blocks, there are 2 * 3 = 6 3-tuples present.

  *** Note: HDFS sorts each list of hosts based on a distance metric applied 
between the initiating file system client and each of the blocks in the list 
using the HDFS cluster map. This should not affect correctness, although it's 
possible that consumers of this list (e.g. MapReduce) may assume an ordering. 
***

The current Ceph client can produce the same list, but does not include 
hostname nor topology information. Currently reverse DNS is used to fill in the 
hostname, and defaults to a flat topology in which all hosts are in a single 
topology path: "/default-rack/host".

- Reverse DNS could be quite slow:
   - 3x replication * 1 TB / 64 MB blocks = 49152 lookups
   - Caching lookups could help

-- Topology Information --

Services that run on a Hadoop cluster (such as MapReduce) use hostname and 
topology information attached to each file system block to schedule and 
aggregate work based on various policies. These services don't have direct 
access to the HDFS cluster map, and instead rely on a service to provide a 
mapping:

   DNS-names/IP -> topology path mapping

This can be performed using a script/utility program that will perform bulk 
translations, or implemented in Java.

-- A Possible Approach --

1. Expand CephFS interface to return IP and hostname
2. Build a Ceph tool to perform DNS-name/IP -> topology path mapping

Using (2) from the Hadoop shim we can perform distance sorting, as well as 
resolve the topology information. The tool will also be used by other Hadoop 
services that can make use of the topology.

This would seem like a good incremental step forward. There are a _lot_ of 
other analytics systems out there that might be interested in running on top of 
Ceph, including the next-generation Hadoop releases, all of which may have 
slightly different requirements. So wedding ourselves to an expansion of the 
CephFS API at this point might be a little premature. On the other hand, 
providing all information now should cover our bases later :)

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Links to various language bindings

2013-02-10 Thread Noah Watkins

On Feb 9, 2013, at 1:50 AM, Wido den Hollander  wrote:

> Hi Noah,
> 
> On 02/08/2013 04:42 PM, Noah Watkins wrote:
>> 
>> On Feb 8, 2013, at 1:06 AM, Wido den Hollander  wrote:
>> 
>>> Hi,
>>> 
>>> I knew that there were Java bindings for RADOS, but they weren't linked.
>>> 
>>> Well, some searching on Github lead me to Noah's bindings [0], but it was a 
>>> bit of searching.
>> 
>> The RADOS Java bindings on my Github page should be taken down--they are 
>> very old and not stable. Any links to it should probably be taken down. If 
>> there is interest in the RADOS Java bindings, I'd be happy to get that ball 
>> rolling.
>> 
> 
> Could you then update the README?

Sure.

> I did a quick test yesterday with them and they worked fine, but I just 
> created a Cluster object and a pool, didn't do anything fancy.

Yes, they do "work", and in fact I think some people might be playing with 
them. They are however full of various problems, and likely need a re-write. I 
wouldn't rely on them for anything important.

- Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Links to various language bindings

2013-02-08 Thread Noah Watkins

On Feb 8, 2013, at 1:06 AM, Wido den Hollander  wrote:

> Hi,
> 
> I knew that there were Java bindings for RADOS, but they weren't linked.
> 
> Well, some searching on Github lead me to Noah's bindings [0], but it was a 
> bit of searching.

The RADOS Java bindings on my Github page should be taken down--they are very 
old and not stable. Any links to it should probably be taken down. If there is 
interest in the RADOS Java bindings, I'd be happy to get that ball rolling.

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-15 Thread Noah Watkins
On Tue, Jan 15, 2013 at 1:32 AM, Danny Al-Gaaf  wrote:
> Am 15.01.2013 10:04, schrieb James Page:
>> On 12/01/13 16:36, Noah Watkins wrote:
>>> On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell
>>>  wrote:
>
> I would also prefer to not add another huge build dependency to ceph,
> especially since it's e.g. not supported by SLES11 and since ceph
> currently builds fine (even with these small warnings from autotools).

Ahh, I had in my head a separate repository for Java bindings managed
by maven (or ant). Either way, I have no strong opinion -- we only
have one junit dependency :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-12 Thread Noah Watkins
On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell  wrote:
>
> Thanks Danny.  Installing sharutils solved that minor issue.  We now get 
> though the build just fine on opensuse 12, but sles 11sp2 gives more warnings 
> (pasted below).  Should we be using a newer version of autoconf  on sles?  
> I've tried moving AC_CANONICAL_TARGET earlier in the file, but that causes 
> some other issues with the new java macros.

We could also move away from using autoconf/automake for Java, and use
a packaging/dependency system designed for Java, like Maven.

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-09 Thread Noah Watkins
I haven't tested this yet, but I like it. I think several of these
macros can be used to simplify a bit more of the Java config bit. I
also just saw the ax_jni_include_dir macro in the autoconf archive and
it looks like that can help clean-up too.

On Wed, Jan 9, 2013 at 1:35 PM, Danny Al-Gaaf  wrote:
> The attached patch depends on the set of 6 patches I send some days ago.
> See: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/11793
>
> Danny Al-Gaaf (1):
>   configure.ac: check for org.junit.rules.ExternalResource
>
>  autogen.sh|   2 +-
>  configure.ac  |  29 ++---
>  m4/ac_check_class.m4  | 108 
> ++
>  m4/ac_check_classpath.m4  |  24 +++
>  m4/ac_check_rqrd_class.m4 |  26 +++
>  m4/ac_java_options.m4 |  33 ++
>  m4/ac_prog_jar.m4 |  39 +
>  m4/ac_prog_java.m4|  83 +++
>  m4/ac_prog_java_works.m4  |  98 +
>  m4/ac_prog_javac.m4   |  45 +++
>  m4/ac_prog_javac_works.m4 |  36 
>  m4/ac_prog_javah.m4   |  28 
>  m4/ac_try_compile_java.m4 |  40 +
>  m4/ac_try_run_javac.m4|  41 ++
>  14 files changed, 615 insertions(+), 17 deletions(-)
>  create mode 100644 m4/ac_check_class.m4
>  create mode 100644 m4/ac_check_classpath.m4
>  create mode 100644 m4/ac_check_rqrd_class.m4
>  create mode 100644 m4/ac_java_options.m4
>  create mode 100644 m4/ac_prog_jar.m4
>  create mode 100644 m4/ac_prog_java.m4
>  create mode 100644 m4/ac_prog_java_works.m4
>  create mode 100644 m4/ac_prog_javac.m4
>  create mode 100644 m4/ac_prog_javac_works.m4
>  create mode 100644 m4/ac_prog_javah.m4
>  create mode 100644 m4/ac_try_compile_java.m4
>  create mode 100644 m4/ac_try_run_javac.m4
>
> --
> 1.8.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2013-01-09 Thread Noah Watkins
Hi Jutta,

On Wed, Jan 9, 2013 at 7:11 AM, Lachfeld, Jutta
 wrote:
>
> the current content of the web page http://ceph.com/docs/master/cephfs/hadoop 
> shows a configuration parameter ceph.object.size.
> Is it the CEPH equivalent  to the "HDFS block size" parameter which I have 
> been looking for?

Yes. By specifying ceph.object.size, the Hadoop will use a default
Ceph file layout with stripe unit = object size, and stripe count = 1.
This is effectively the same meaning as dfs.block.size for HDFS.

> Does the parameter ceph.object.size apply to version 0.56.1?

The Ceph/Hadoop file system plugin is being developed here:

  git://github.com/ceph/hadoop-common cephfs/branch-1.0

There is an old version of the Hadoop plugin in the Ceph tree which
will be removed shortly. Regarding the versions, development is taking
place in cephfs/branch-1.0 and in ceph.git master. We don't yet have a
system in place for dealing with compatibility across versions because
the code is in heavy development.

If you are running 0.56.1 then a recent version of cephfs/branch-1.0
should work with that, but may not long, as development continues.

> I would be interested in setting this parameter to values higher than 64MB, 
> e.g. 256MB or 512MB similar to the values I have used for HDFS for increasing 
> the performance of the TeraSort benchmark. Would these values be allowed and 
> would they at all make sense for the mechanisms used in CEPH?

I can't think of any reason why a large size would cause concern, but
maybe someone else can chime in?

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2012-12-13 Thread Noah Watkins
The bindings use the default Hadoop settings (e.g. 64 or 128 MB
chunks) when creating new files. The chunk size can also be specified
on a per-file basis using the same interface as Hadoop. Additionally,
while Hadoop doesn't provide an interface to configuration parameters
beyond chunk size, we will also let users fully configure for any Ceph
striping strategy. http://ceph.com/docs/master/dev/file-striping/

-Noah

On Thu, Dec 13, 2012 at 12:27 PM, Gregory Farnum  wrote:
> On Thu, Dec 13, 2012 at 12:23 PM, Cameron Bahar  wrote:
>> Is the chunk size tunable in A Ceph cluster. I don't mean dynamic, but even 
>> statically configurable when a cluster is first installed?
>
> Yeah. You can set chunk size on a per-file basis; you just can't
> change it once the file has any data written to it.
> In the context of Hadoop the question is just if the bindings are
> configured correctly to do so automatically.
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX

2012-12-09 Thread Noah Watkins
On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum  wrote:
> Oooh, very nice! Do you have a list of the dependencies that you actually 
> needed to install?

I can put that together. They were boost, gperf, fuse4x, cryptopp. I
think that might have been it.

> Apart from breaking this up into smaller patches, we'll also want to reformat 
> some of it. Rather than sticking an #if APPLE on top of every spin lock, we 
> should have utility functions that do this for us. ;)

Definitely. OSX has spinlock implementations for user space, but it's
going to take some reading. For example, spinlocks in Ceph are
initialized for shared memory, rather than the default private. It
isn't clear from documentation what the semantics are of OSX
spinlocks, nor is it clear if the shared memory attribute is needed.

> Also, we should be able to find libatomic_ops for OS X (its parent project 
> works under OS X), and we can use that to construct a spin lock if we think 
> it'll be useful. I'm not too sure how effective its muteness are at 
> spinlock-y workloads.

This patch set uses the OSX atomic inc/dec ops, rather than spinlocks.

Another fun fact:

msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o
and libcommon_la-pipe.o, but HFS+ is case-insensitive by default.
Result is duplicate symbols. That took a while to figure out :P

-Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Review request: wip-localized-read-tests

2012-11-30 Thread Noah Watkins
I've pushed up patches for the first phase of testing read from
replica functionality, which looks only at objecter/client level ops:

   wip-localized-read-tests

The major points are:

  1. Run libcephfs tests w/ and w/o localized reads enabled
  2. Add the performance counter in Objecter to record ops sent to replica
  3. Add performance counter accessor in unit tests

Locally I have verified that the performance counters are working with
a 3 OSD setup, although there are not yet any unit tests that try to
specifically assert a positive value on the counters.

Thanks,
Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
Here is the full test case:

TEST(LibCephFS, MountRootChdir) {
  struct ceph_mount_info *cmount;

  /* create mount and new directory */
  ASSERT_EQ(ceph_create(&cmount, NULL), 0);
  ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
  ASSERT_EQ(ceph_mount(cmount, "/"), 0);
  ASSERT_EQ(ceph_mkdir(cmount, "/xyz", 0700), 0);
  ceph_shutdown(cmount);

  /* create mount with non-"/" root */
  ASSERT_EQ(ceph_create(&cmount, NULL), 0);
  ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
  ASSERT_EQ(ceph_mount(cmount, "/xyz"), 0);

   /* should be at "root" directory, but blows up */
  ASSERT_STREQ(ceph_getcwd(cmount), "/");
}

On Thu, Nov 29, 2012 at 12:02 PM, Noah Watkins  wrote:
> Oh, let me clarify. /otherdir exists, and the mount succeeds.
>
> - Noah
>
> On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang  wrote:
>> On 11/29/2012 01:52 PM, Noah Watkins wrote:
>>>
>>> I'm getting the assert failure below with the following test:
>>>
>>>ceph_mount(cmount, "/otherdir");
>>
>>
>> This should fail with ENOENT if you check the return code.
>> -sam
>>
>>>ceph_getcwd(cmount);
>>>
>>> --
>>>
>>> client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
>>> 7fded47c8780 time 2012-11-29 11:49:00.890184
>>> client/Inode.h: 165: FAILED assert(!dn_set.empty())
>>>   ceph version 0.54-808-g1ed5a1f
>>> (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
>>>   1: (()+0x11ee89) [0x7fded36fae89]
>>>   2: (()+0x1429d3) [0x7fded371e9d3]
>>>   3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
>>>   4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
>>>   5: (testing::Test::Run()+0xaa) [0x45017a]
>>>   6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
>>>   7: (testing::TestCase::Run()+0xbd) [0x45034d]
>>>   8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
>>>   9: (main()+0x35) [0x423115]
>>>   10: (__libc_start_main()+0xed) [0x7fded2d2876d]
>>>   11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs()
>>> [0x423171]
>>>   NOTE: a copy of the executable, or `objdump -rdS ` is
>>> needed to interpret this.
>>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> Aborted (core dumped)
>>>
>>> Thanks,
>>> Noah
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
Oh, let me clarify. /otherdir exists, and the mount succeeds.

- Noah

On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang  wrote:
> On 11/29/2012 01:52 PM, Noah Watkins wrote:
>>
>> I'm getting the assert failure below with the following test:
>>
>>ceph_mount(cmount, "/otherdir");
>
>
> This should fail with ENOENT if you check the return code.
> -sam
>
>>ceph_getcwd(cmount);
>>
>> --
>>
>> client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
>> 7fded47c8780 time 2012-11-29 11:49:00.890184
>> client/Inode.h: 165: FAILED assert(!dn_set.empty())
>>   ceph version 0.54-808-g1ed5a1f
>> (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
>>   1: (()+0x11ee89) [0x7fded36fae89]
>>   2: (()+0x1429d3) [0x7fded371e9d3]
>>   3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
>>   4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
>>   5: (testing::Test::Run()+0xaa) [0x45017a]
>>   6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
>>   7: (testing::TestCase::Run()+0xbd) [0x45034d]
>>   8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
>>   9: (main()+0x35) [0x423115]
>>   10: (__libc_start_main()+0xed) [0x7fded2d2876d]
>>   11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs()
>> [0x423171]
>>   NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> Aborted (core dumped)
>>
>> Thanks,
>> Noah
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
I'm getting the assert failure below with the following test:

  ceph_mount(cmount, "/otherdir");
  ceph_getcwd(cmount);

--

client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
7fded47c8780 time 2012-11-29 11:49:00.890184
client/Inode.h: 165: FAILED assert(!dn_set.empty())
 ceph version 0.54-808-g1ed5a1f (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
 1: (()+0x11ee89) [0x7fded36fae89]
 2: (()+0x1429d3) [0x7fded371e9d3]
 3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
 4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
 5: (testing::Test::Run()+0xaa) [0x45017a]
 6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
 7: (testing::TestCase::Run()+0xbd) [0x45034d]
 8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
 9: (main()+0x35) [0x423115]
 10: (__libc_start_main()+0xed) [0x7fded2d2876d]
 11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() [0x423171]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

Thanks,
Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hadoop and Ceph client/mds view of modification time

2012-11-21 Thread Noah Watkins
(Sorry for the dupe message. vger rejected due to HTML).

Thanks, I'll try this patch this morning.

Client B should perform a single stat after a notification from Client
A. But, won't Sage's patch still be required, since Client A needs the
MDS time to pass to Client B?

On Tue, Nov 20, 2012 at 12:20 PM, Sam Lang  wrote:
> On 11/20/2012 01:44 PM, Noah Watkins wrote:
>>
>> This is a description of the clock synchronization issue we are facing
>> in Hadoop:
>>
>> Components of Hadoop use mtime as a versioning mechanism. Here is an
>> example where Client B tests the expected 'version' of a file created
>> by Client A:
>>
>>Client A: create file, write data into file.
>>Client A: expected_mtime <-- lstat(file)
>>Client A: broadcast expected_mtime to client B
>>...
>>Client B: mtime <-- lstat(file)
>>Client B: test expected_mtime == mtime
>
>
> Here's a patch that might work to push the setattr out to the mds every time
> (the same as Sage's patch for getattr).  This isn't quite writeback, as it
> waits for the setattr at the server to complete before returning, but I
> think that's actually what you want in this case.  It needs to be enabled by
> setting client setattr writethru = true in the config.  Also, I haven't
> tested that it sends the setattr, just a basic test of functionality.
>
> BTW, if its always client B's first stat of the file, you won't need Sage's
> patch.
>
> -sam
>
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 8d4a5ac..a7dd8f7 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -4165,6 +4165,7 @@ int Client::_getattr(Inode *in, int mask, int uid, int
> gid)
>
>  int Client::_setattr(Inode *in, struct stat *attr, int mask, int uid, int
> gid)
>  {
> +  int orig_mask = mask;
>int issued = in->caps_issued();
>
>ldout(cct, 10) << "_setattr mask " << mask << " issued " <<
> ccap_string(issued) << dendl;
> @@ -4219,7 +4220,7 @@ int Client::_setattr(Inode *in, struct stat *attr, int
> mask, int uid, int gid)
>mask &= ~(CEPH_SETATTR_MTIME|CEPH_SETATTR_ATIME);
>  }
>}
> -  if (!mask)
> +  if (!cct->_conf->client_setattr_writethru && !mask)
>  return 0;
>
>MetaRequest *req = new MetaRequest(CEPH_MDS_OP_SETATTR);
> @@ -4229,6 +4230,10 @@ int Client::_setattr(Inode *in, struct stat *attr,
> int mask, int uid, int gid)
>req->set_filepath(path);
>req->inode = in;
>
> +  // reset mask back to original if we're meant to do writethru
> +  if (cct->_conf->client_setattr_writethru)
> +mask = orig_mask;
> +
>if (mask & CEPH_SETATTR_MODE) {
>  req->head.args.setattr.mode = attr->st_mode;
>  req->inode_drop |= CEPH_CAP_AUTH_SHARED;
> diff --git a/src/common/config_opts.h b/src/common/config_opts.h
> index cc05095..51a2769 100644
> --- a/src/common/config_opts.h
> +++ b/src/common/config_opts.h
> @@ -178,6 +178,7 @@ OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 100)
> // MB * n  (dirty OR tx.
>  OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty (keep
> this smallish)
>  OPTION(client_oc_max_dirty_age, OPT_DOUBLE, 5.0)  // max age in cache
> before writeback
>  OPTION(client_oc_max_objects, OPT_INT, 1000)  // max objects in cache
> +OPTION(client_setattr_writethru, OPT_BOOL, false)  // send the attributes
> to the mds server
>  // note: the max amount of "in flight" dirty data is roughly (max - target)
>  OPTION(fuse_use_invalidate_cb, OPT_BOOL, false) // use fuse 2.8+ invalidate
> callback to keep page cache consistent
>  OPTION(fuse_big_writes, OPT_BOOL, true)
>
>
>>
>> Since mtime may be set in Ceph by both client and MDS, inconsistent
>> mtime view is possible when clocks are not adequately synchronized.
>>
>> Here is a test that reproduces the problem. In the following output,
>> issdm-18 has the MDS, and issdm-22 is a non-Ceph node with its time
>> set to an hour earlier than the MDS node.
>>
>> nwatkins@issdm-22:~$ ssh issdm-18 date && ./test
>> Tue Nov 20 11:40:28 PST 2012   // MDS TIME
>> local time: Tue Nov 20 10:42:47 2012  // Client TIME
>> fstat time: Tue Nov 20 11:40:28 2012  // mtime seen after file
>> creation (MDS time)
>> lstat time: Tue Nov 20 10:42:47 2012  // mtime seen after file write
>> (client time)
>>
>> Here is the code used to produce that output.
>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #

  1   2   3   >