How to set MDS log size

2012-04-11 Thread Madhusudhana U
Hi all,
In my MDS system of ceph cluster, my entire root partition is full bcz of 
one big mds log file


[root@ceph-node-7 ceph]# du -sh *
0   mds.admin.log
27G mds.ceph-node-7.log

[root@ceph-node-7 ceph]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda2  39G   39G 0 100% /
tmpfs  24G 0   24G   0% /dev/shm
/dev/sda1 2.0G   76M  1.8G   5% /boot
/dev/sda5  19G  409M   18G   3% /ceph

How to set a max size for mds log ? does it auto rotate ? Bcz my root 
partition is full, will this affect read/write performance in
my ceph cluster?

Thanks
Madhusudhana

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Stefan Kleijkers

Hello,

Yes that's no problem. I'm using that configuration for some time now. 
Just generate a config with multiple OSD clauses with the same node/host.


With the newer ceph version mkcephfs is smart enough to detect the osd's 
on the same node and will generate a crushmap whereby the objects get 
replicated to different nodes.


I didn't see any impact on the performance (if you have enough 
processing power, because you need more of that).


I wanted to use just a few OSD's per node with mdraid, so I could use 
RAID6. This way I could swap a faulty disk without bringing the node 
down. But I couldn't get it stable with mdraid.


Stefan

On 04/11/2012 09:42 AM, Madhusudhana U wrote:

Hi all,
I have a system with 2T SATA drive and I want to add it to my ceph
cluster. I was thinking instead of creating one large OSD, can't
I have 44 osd's of 450G each ? Is this possible ? if possible, will
this improve read/write performance ?

Thanks
__Madhusudhana

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Tomasz Paszkowski
Hi,

Please correct me if I'am wrong. You would like to partition single drive ?


Dnia 11-04-2012 o godz. 09:42 Madhusudhana U
 napisał(a):

> Hi all,
> I have a system with 2T SATA drive and I want to add it to my ceph
> cluster. I was thinking instead of creating one large OSD, can't
> I have 44 osd's of 450G each ? Is this possible ? if possible, will
> this improve read/write performance ?
>
> Thanks
> __Madhusudhana
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Madhusudhana U
Stefan Kleijkers  unilogicnetworks.net> writes:

> 
> Hello,
> 
> Yes that's no problem. I'm using that configuration for some time now. 
> Just generate a config with multiple OSD clauses with the same node/host.
> 
> With the newer ceph version mkcephfs is smart enough to detect the osd's 
> on the same node and will generate a crushmap whereby the objects get 
> replicated to different nodes.
> 
> I didn't see any impact on the performance (if you have enough 
> processing power, because you need more of that).
> 
> I wanted to use just a few OSD's per node with mdraid, so I could use 
> RAID6. This way I could swap a faulty disk without bringing the node 
> down. But I couldn't get it stable with mdraid.
> 
This is how my OSD part in ceph.conf looks like

[osd.0]
host = ceph-node-1
btrfs devs = /dev/sda6

[osd.1]
host = ceph-node-2
btrfs devs = /dev/sda6

[osd.2]
host = ceph-node-3
btrfs devs = /dev/sda6

[osd.3]
host = ceph-node-4
btrfs devs = /dev/sda6



Can you please help me how I can add multiple OSD in the same machine 
considering that i have 4 partition created for OSD ?

I have powerful machines having 6 quad core Intel Xeon with 48G of RAM







--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Tomasz Paszkowski
Hi,

It'll not increase overall storage system performance. Partitioning of
single disk drive gives you no performance gains.





On Wed, Apr 11, 2012 at 2:38 PM, Madhusudhana U
 wrote:
> Tomasz Paszkowski  gmail.com> writes:
>
>>
>> Hi,
>>
>> Please correct me if I'am wrong. You would like to partition single drive ?
>>
> Yes,
> I want to create 4 partitions in a  single drive. This will increase the
> OSD number. Will this increase in OSD also increases performance ?
>
> Thanks
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Tomasz Paszkowski
SS7, Asterisk, SAN, Datacenter, Cloud Computing
+48500166299
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Tomasz Paszkowski
If you're single machine change hostname in the cfg to be the same but
you need to change dev name to be different for each osd process on
single machine.

On Wed, Apr 11, 2012 at 2:42 PM, Madhusudhana U
 wrote:
> Stefan Kleijkers  unilogicnetworks.net> writes:
>
>>
>> Hello,
>>
>> Yes that's no problem. I'm using that configuration for some time now.
>> Just generate a config with multiple OSD clauses with the same node/host.
>>
>> With the newer ceph version mkcephfs is smart enough to detect the osd's
>> on the same node and will generate a crushmap whereby the objects get
>> replicated to different nodes.
>>
>> I didn't see any impact on the performance (if you have enough
>> processing power, because you need more of that).
>>
>> I wanted to use just a few OSD's per node with mdraid, so I could use
>> RAID6. This way I could swap a faulty disk without bringing the node
>> down. But I couldn't get it stable with mdraid.
>>
> This is how my OSD part in ceph.conf looks like
>
> [osd.0]
>        host = ceph-node-1
>        btrfs devs = /dev/sda6
>
> [osd.1]
>        host = ceph-node-2
>        btrfs devs = /dev/sda6
>
> [osd.2]
>        host = ceph-node-3
>        btrfs devs = /dev/sda6
>
> [osd.3]
>        host = ceph-node-4
>        btrfs devs = /dev/sda6
>
>
>
> Can you please help me how I can add multiple OSD in the same machine
> considering that i have 4 partition created for OSD ?
>
> I have powerful machines having 6 quad core Intel Xeon with 48G of RAM
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Tomasz Paszkowski
SS7, Asterisk, SAN, Datacenter, Cloud Computing
+48500166299
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Stefan Kleijkers

Hello,

You will get something like this:

[osd.0]
host = ceph-node-1
btrfs devs = /dev/sda6

[osd.1]
host = ceph-node-1
btrfs devs = /dev/sda7

[osd.2]
host = ceph-node-1
btrfs devs = /dev/sda8

[osd.3]
host = ceph-node-1
btrfs devs = /dev/sda9


[osd.4]
host = ceph-node-2
btrfs devs = /dev/sda6

[osd.5]
host = ceph-node-2
btrfs devs = /dev/sda7

etc...

But as Tomasz mentions, you get no extra performance, because in most cases the 
disk is the bottleneck.

Besides I recommend to not use btrfs devs anymore, they want to deprecate that option. So you 
only get the "osd data =" option.

If you really want to add performance use more disks or use a fast journal 
device (I use a SSD).

Stefan




On 04/11/2012 02:42 PM, Madhusudhana U wrote:

Stefan Kleijkers  unilogicnetworks.net>  writes:


Hello,

Yes that's no problem. I'm using that configuration for some time now.
Just generate a config with multiple OSD clauses with the same node/host.

With the newer ceph version mkcephfs is smart enough to detect the osd's
on the same node and will generate a crushmap whereby the objects get
replicated to different nodes.

I didn't see any impact on the performance (if you have enough
processing power, because you need more of that).

I wanted to use just a few OSD's per node with mdraid, so I could use
RAID6. This way I could swap a faulty disk without bringing the node
down. But I couldn't get it stable with mdraid.


This is how my OSD part in ceph.conf looks like

[osd.0]
 host = ceph-node-1
 btrfs devs = /dev/sda6

[osd.1]
 host = ceph-node-2
 btrfs devs = /dev/sda6

[osd.2]
 host = ceph-node-3
 btrfs devs = /dev/sda6

[osd.3]
 host = ceph-node-4
 btrfs devs = /dev/sda6



Can you please help me how I can add multiple OSD in the same machine
considering that i have 4 partition created for OSD ?

I have powerful machines having 6 quad core Intel Xeon with 48G of RAM







--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph 0.44+ and leveldb

2012-04-11 Thread Jonathan Dieter
On Mon, 2012-04-09 at 22:36 -0700, Sage Weil wrote:
> Hi Laszlo, Jonathan,
> 
> configure will now accept a --with-system-leveldb flag that will use the 
> installed libleveldb.  For Debian, we just need to update debian/rules and 
> add libsnappy-dev and libleveldb-dev to the build depends.
> 
> Jonathan, I'm not doing anything to the .spec file yet either, since 
> presumably leveldb needs to be packaged first.
> 
> The change is in the next and master branches, so it'll be there for 
> v0.45 (which I plan to release tomorrow).

Thanks much!  I'll see what it will take to get leveldb into Fedora.

Jonathan


signature.asc
Description: This is a digitally signed message part


Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Sage Weil
On Wed, 11 Apr 2012, Tomasz Paszkowski wrote:
> Hi,
> 
> It'll not increase overall storage system performance. Partitioning of
> single disk drive gives you no performance gains.

It will in fact slow these down, because each ceph-osd instance will by 
doing periodic syncfs(2) calls and they will interfere.

sage

> 
> 
> 
> 
> 
> On Wed, Apr 11, 2012 at 2:38 PM, Madhusudhana U
>  wrote:
> > Tomasz Paszkowski  gmail.com> writes:
> >
> >>
> >> Hi,
> >>
> >> Please correct me if I'am wrong. You would like to partition single drive ?
> >>
> > Yes,
> > I want to create 4 partitions in a  single drive. This will increase the
> > OSD number. Will this increase in OSD also increases performance ?
> >
> > Thanks
> >
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Tomasz Paszkowski
> SS7, Asterisk, SAN, Datacenter, Cloud Computing
> +48500166299
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

Interesting Error

2012-04-11 Thread Alex Elder

I'm running suites/iozone.sh on a 3-node ceph cluster with
each running kernel ceph-client/wip-layout-helpers.

I've hit a consistent error twice now, but it seems to be
hitting it when running with particular arguments.

Here are the three commands in that workunit:
iozone -c -e -s 1024M -r 16K -t 1 -F f1 -i 0 -i 1
iozone -c -e -s 1024M -r 1M -t 1 -F f2 -i 0 -i 1
iozone -c -e -s 10240M -r 1M -t 1 -F f3 -i 0 -i 1

The first two run to completion without a problem.  The third
one runs for a while and then reports something like what's
below, and then hangs the test (system is still operational).
I see this in the syslog, but I'm not sure its timing aligned
with the failure:
[ 3925.501128] libceph: osd1 10.214.133.32:6800 socket closed

Since it shows up only with the 10MB file size and 1MB record
size I am wondering if this combination hits some sort of
boundary that would help me understand what's wrong.  Anyone
have any ideas?

Here is how my three nodes are configured in the teuthology file:
- [mon.a, mon.c, osd.0]
- [mon.b, mds.a, osd.1]
- [client.0]

Thanks.

-Alex


Run began: Wed Apr 11 08:36:52 2012

Include close in write timing
Include fsync in write timing
File size set to 10485760 KB
Record Size 1024 KB
Command line used: iozone -c -e -s 10240M -r 1M -t 1 -F f3 -i 0 
-i 1

Output is in Kbytes/sec
Time Resolution = 0.01 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 1 process
Each process writes a 10485760 Kbyte file in 1024 Kbyte records

Error writing block 9408, fd= 3

Children see throughput for  1 initial writers  =   0.00 KB/sec
Parent sees throughput for  1 initial writers   =   0.00 KB/sec
Min throughput per process  =   0.00 KB/sec
Max throughput per process  =   0.00 KB/sec
Avg throughput per process  =   0.00 KB/sec
Min xfer=   0.00 KB

Child 0
f3: No such file or directory
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Statically binding ports for ceph-osd

2012-04-11 Thread Greg Farnum
You're unlikely to hit it since you're setting all addresses, but we somehow 
managed to introduce an error even in that small patch -- you may want to pull 
in commit cd4a760e9b22047fa5a45d0211ec4130809d725e as well.
-Greg

On Tuesday, April 10, 2012 at 5:13 PM, Nick Bartos wrote:
> Good enough for me, I'll just patch it for the short term.
> 
> Thanks!
> 
> On Tue, Apr 10, 2012 at 4:51 PM, Sage Weil  (mailto:s...@newdream.net)> wrote:
> > On Tue, 10 Apr 2012, Nick Bartos wrote:
> > > Awesome, thanks so much!  Can I assume this will make it into the next
> > > ceph stable release?  I'll probably just backport it now before we
> > > actually start using it, so I don't have to change the config later.
> > > 
> > 
> > 
> > v0.45 is out today/tomorrow, but it'll be in v0.46.
> > 
> > sage
> > 
> > 
> > > 
> > > On Tue, Apr 10, 2012 at 4:16 PM, Greg Farnum
> > > mailto:gregory.far...@dreamhost.com)> 
> > > wrote:
> > > > Yep, you're absolutely correct. Might as well let users specify the 
> > > > whole address rather than just the port, though ? since your patch 
> > > > won't apply to current upstream due to some heartbeating changes I 
> > > > whipped up another one which adds the "osd heartbeat addr" option. It's 
> > > > pushed it to master in commit 6fbac10dc68e67d1c700421f311cf5e26991d39c, 
> > > > but you'll want to backport (easy) or carry your change until you 
> > > > upgrade (and remember to change the config!). :)
> > > > Thanks for the report!
> > > > -Greg
> > > > 
> > > > 
> > > > On Tuesday, April 10, 2012 at 12:56 PM, Nick Bartos wrote:
> > > > 
> > > > > After doing some more looking at the code, it appears that this option
> > > > > is not supported. I created a small patch (attached) which adds the
> > > > > functionality. Is there any way we could get this, or something like
> > > > > this, applied upstream? I think this is important functionality for
> > > > > firewalled environments, and seems like a simple fix since all the
> > > > > other services (including ones for ceph-mon and ceph-mds) already
> > > > > allow you to specify a static port.
> > > > > 
> > > > > 
> > > > > On Mon, Apr 9, 2012 at 5:27 PM, Nick Bartos  > > > > (mailto:n...@pistoncloud.com)> wrote:
> > > > > > I'm trying to get ceph-osd's listening ports to be set statically 
> > > > > > for
> > > > > > firewall reasons. I am able to get 2 of the 3 ports set statically,
> > > > > > however the 3rd one is still getting set dynamically.
> > > > > > 
> > > > > > I am using:
> > > > > > 
> > > > > > [osd.48]
> > > > > > host = 172.16.0.13
> > > > > > cluster addr = 172.16.0.13:6944
> > > > > > public addr = 172.16.0.13:6945
> > > > > > 
> > > > > > The daemon will successfully bind to 6944 and 6945, but also binds 
> > > > > > to
> > > > > > 6800. What additional option do I need? I started looking at the
> > > > > > code and thought "hb addr = 172.16.0.13:6946" would do it, but
> > > > > > specifying that option seems to have no effect (or at least does not
> > > > > > achieve the desired result).
> > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Attachments:
> > > > > - ceph-0.41-osd_hb_port.patch
> > > > > 
> > > > 
> > > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majord...@vger.kernel.org 
> > > (mailto:majord...@vger.kernel.org)
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> 
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins
Hi all,

A simple program like this:

int main(int argc, char **argv)
{
int ret;
struct ceph_mount_info *cmount;

ceph_create(&cmount, NULL);
//ceph_mount(cmount, NULL);
ceph_chdir(cmount, "/");
}

will segfault because in the below snippet, cmount->get_client() returns NULL 
when ceph_mount(..) has not been been called with success.

extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)   
  
{   
  
return cmount->get_client()->chdir(s);  
   
}

It would be very useful to get a uniform error return value rather than the 
fault. Something like this came to mind:

diff --git a/src/libcephfs.cc b/src/libcephfs.cc
index b1481e6..4751e8f 100644
--- a/src/libcephfs.cc
+++ b/src/libcephfs.cc
@@ -180,6 +180,10 @@ public:
 return cct;
   }
 
+  bool is_mounted(void) {
+return mounted;
+  }
+
 private:
   uint64_t msgr_nonce;
   bool mounted;
@@ -282,6 +286,8 @@ extern "C" const char* ceph_getcwd(struct ceph_mount_info 
*cmount)
 
 extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)
 {
+  if (!cmount->is_mounted())
+return -1004;
   return cmount->get_client()->chdir(s);
 }

Any thoughts on a good way to handle this?

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to set MDS log size

2012-04-11 Thread Josh Durgin

On 04/11/2012 12:33 AM, Madhusudhana U wrote:

Hi all,
In my MDS system of ceph cluster, my entire root partition is full bcz of
one big mds log file


[root@ceph-node-7 ceph]# du -sh *
0   mds.admin.log
27G mds.ceph-node-7.log

[root@ceph-node-7 ceph]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda2  39G   39G 0 100% /
tmpfs  24G 0   24G   0% /dev/shm
/dev/sda1 2.0G   76M  1.8G   5% /boot
/dev/sda5  19G  409M   18G   3% /ceph

How to set a max size for mds log ? does it auto rotate ? Bcz my root
partition is full, will this affect read/write performance in
my ceph cluster?


Ceph comes with a logrotate config, which should be in
/etc/logrotate.d/ceph on debian-based distros. If you changed the
location of the logs, you'll need to update the logrotate configuration
to match.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Greg Farnum
On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
> Hi all,
> 
> A simple program like this:
> 
> int main(int argc, char **argv)
> {
> int ret;
> struct ceph_mount_info *cmount;
> 
> ceph_create(&cmount, NULL);
> //ceph_mount(cmount, NULL);
> ceph_chdir(cmount, "/");
> }
> 
> will segfault because in the below snippet, cmount->get_client() returns NULL 
> when ceph_mount(..) has not been been called with success.
> 
> extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s) 
> { 
> return cmount->get_client()->chdir(s); 
> }
> 
> It would be very useful to get a uniform error return value rather than the 
> fault. Something like this came to mind:
> 
> diff --git a/src/libcephfs.cc (http://libcephfs.cc) b/src/libcephfs.cc 
> (http://libcephfs.cc)
> index b1481e6..4751e8f 100644
> --- a/src/libcephfs.cc (http://libcephfs.cc)
> +++ b/src/libcephfs.cc (http://libcephfs.cc)
> @@ -180,6 +180,10 @@ public:
> return cct;
> }
> 
> + bool is_mounted(void) {
> + return mounted;
> + }
> +
> private:
> uint64_t msgr_nonce;
> bool mounted;
> @@ -282,6 +286,8 @@ extern "C" const char* ceph_getcwd(struct ceph_mount_info 
> *cmount)
> 
> extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)
> {
> + if (!cmount->is_mounted())
> + return -1004;
> return cmount->get_client()->chdir(s);
> }
> 
> Any thoughts on a good way to handle this?
> 
> -Noah 
I'm not sure where the "-1004" came from, but I've been doing things very much 
like this in my code recently. Functions that require simple preconditions 
(well, really any preconditions) like that should document them, have defined 
behavior if they're not met, and return defined error codes. I believe our new 
code does, but . 

Patches welcome, of course, but if it's still a problem this summer we can 
probably put an intern on it for a day to help them understand why standard 
programming practices are good things. ;)

-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Yehuda Sadeh Weinraub
On Wed, Apr 11, 2012 at 11:12 AM, Noah Watkins  wrote:
>
> Hi all,
>
> A simple program like this:
>
> int main(int argc, char **argv)
> {
>        int ret;
>        struct ceph_mount_info *cmount;
>
>        ceph_create(&cmount, NULL);
>        //ceph_mount(cmount, NULL);
>        ceph_chdir(cmount, "/");
> }
>
> will segfault because in the below snippet, cmount->get_client() returns
> NULL when ceph_mount(..) has not been been called with success.
>
> extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)
> {
>        return cmount->get_client()->chdir(s);
> }
>
> It would be very useful to get a uniform error return value rather than
> the fault. Something like this came to mind:
>
> diff --git a/src/libcephfs.cc b/src/libcephfs.cc
> index b1481e6..4751e8f 100644
> --- a/src/libcephfs.cc
> +++ b/src/libcephfs.cc
> @@ -180,6 +180,10 @@ public:
>     return cct;
>   }
>
> +  bool is_mounted(void) {
> +    return mounted;
> +  }
> +
>  private:
>   uint64_t msgr_nonce;
>   bool mounted;
> @@ -282,6 +286,8 @@ extern "C" const char* ceph_getcwd(struct
> ceph_mount_info *cmount)
>
>  extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)
>  {
> +  if (!cmount->is_mounted())
> +    return -1004;
>   return cmount->get_client()->chdir(s);
>  }
>
> Any thoughts on a good way to handle this?

Also need to check that cmount is initialized.  I'd add a helper:

Client *ceph_get_client(struct ceph_mount_info *cmont)
{
  if (cmount && cmount->is_mounted())
return cmount->get_client();

  return NULL;
}

extern "C" int ceph_chdir (struct ceph_mount_info *cmount, const char *s)
{
  Client *client = ceph_get_client(cmount);
  if (!client)
return -EINVAL;

  return client->chdir(s);
}


>
> -Noah--
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins

On Apr 11, 2012, at 11:22 AM, Greg Farnum wrote:

> On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
>> Hi all,
>> 
>> -Noah 
> I'm not sure where the "-1004" came from,

ceph_mount(..) seems to return some random error codes (-1000, 1001) already  :)

> Patches welcome, of course, but if it's still a problem this summer we can 
> probably put an intern on it for a day to help them understand why standard 
> programming practices are good things. ;)

I'll happily create a patch for this :) Any method to come up with 
Ceph-specific error codes?

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Greg Farnum
On Wednesday, April 11, 2012 at 11:25 AM, Noah Watkins wrote:
> 
> On Apr 11, 2012, at 11:22 AM, Greg Farnum wrote:
> 
> > On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
> > > Hi all,
> > > 
> > > -Noah 
> > I'm not sure where the "-1004" came from,
> 
> ceph_mount(..) seems to return some random error codes (-1000, 1001) already 
> :)

 legacy undocumented grr 
Let's try to use standard error codes where available, and (if we have to 
create our own) document any new ones with user-accessible names and 
explanations. I don't know which one is "best" but I see a lot of applicable 
choices when scanning errno-base et al.

Also, what Yehuda said. :)
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins

On Apr 11, 2012, at 11:22 AM, Yehuda Sadeh Weinraub wrote:

> Also need to check that cmount is initialized.  I'd add a helper:
> 
> Client *ceph_get_client(struct ceph_mount_info *cmont)
> {
>  if (cmount && cmount->is_mounted())
>return cmount->get_client();
> 
>  return NULL;
> }

How useful is checking cmount != NULL here? This defensive check depends on 
users initializing their cmount pointers to NULL, but the API doesn't do 
anything to require this initialization assumption.

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Greg Farnum
On Wednesday, April 11, 2012 at 2:59 PM, Noah Watkins wrote:
> 
> On Apr 11, 2012, at 11:22 AM, Yehuda Sadeh Weinraub wrote:
> 
> > Also need to check that cmount is initialized. I'd add a helper:
> > 
> > Client *ceph_get_client(struct ceph_mount_info *cmont)
> > {
> > if (cmount && cmount->is_mounted())
> > return cmount->get_client();
> > 
> > return NULL;
> > }
> 
> 
> 
> How useful is checking cmount != NULL here? This defensive check depends on 
> users initializing their cmount pointers to NULL, but the API doesn't do 
> anything to require this initialization assumption.
> 
> - Noah 
I had a whole email going until I realized you were just right. So, yeah, that 
wouldn't do anything since a cmount they forgot to have the API initialize is 
just going to hold random data. Urgh.
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins

On Apr 11, 2012, at 3:13 PM, Greg Farnum wrote:

> On Wednesday, April 11, 2012 at 2:59 PM, Noah Watkins wrote:
>> 
>> On Apr 11, 2012, at 11:22 AM, Yehuda Sadeh Weinraub wrote:
>> 
>>> Also need to check that cmount is initialized. I'd add a helper:
>>> 
>>> Client *ceph_get_client(struct ceph_mount_info *cmont)
>>> {
>>> if (cmount && cmount->is_mounted())
>>> return cmount->get_client();
>>> 
>>> return NULL;
>>> }
>> 
>> 
>> 
>> How useful is checking cmount != NULL here? This defensive check depends on 
>> users initializing their cmount pointers to NULL, but the API doesn't do 
>> anything to require this initialization assumption.
>> 
>> - Noah 
> I had a whole email going until I realized you were just right. So, yeah, 
> that wouldn't do anything since a cmount they forgot to have the API 
> initialize is just going to hold random data. Urgh.

One could pair the pointer with a magic value in a separate structure, but even 
libc doesn't go to these lengths to protect users...--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins

On Apr 11, 2012, at 11:32 AM, Greg Farnum wrote:

> On Wednesday, April 11, 2012 at 11:25 AM, Noah Watkins wrote:
>> 
>> On Apr 11, 2012, at 11:22 AM, Greg Farnum wrote:
>> 
>>> On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
 Hi all,
 
 -Noah 
>>> I'm not sure where the "-1004" came from,
>> 
>> ceph_mount(..) seems to return some random error codes (-1000, 1001) already 
>> :)
> 
>  legacy undocumented grr 
> Let's try to use standard error codes where available, and (if we have to 
> create our own) document any new ones with user-accessible names and 
> explanations. I don't know which one is "best" but I see a lot of applicable 
> choices when scanning errno-base et al.

If I'm choosing from from errno-base I might go with

  #define ENXIO6  /* No such device or address */

It's used:

osd/OSD.cc
void OSD::handle_misdirected_op(PG *pg, OpRequest *op)
…
reply_op_error(op, -ENXIO)

I'm wondering if you happen to know if this will be propagated back to the 
client? I'd be nice to have an exclusive not-mounted condition on the client 
side.

-Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Yehuda Sadeh Weinraub
On Wed, Apr 11, 2012 at 3:13 PM, Greg Farnum
 wrote:
> On Wednesday, April 11, 2012 at 2:59 PM, Noah Watkins wrote:
>>
>> On Apr 11, 2012, at 11:22 AM, Yehuda Sadeh Weinraub wrote:
>>
>> > Also need to check that cmount is initialized. I'd add a helper:
>> >
>> > Client *ceph_get_client(struct ceph_mount_info *cmont)
>> > {
>> > if (cmount && cmount->is_mounted())
>> > return cmount->get_client();
>> >
>> > return NULL;
>> > }
>>
>>
>>
>> How useful is checking cmount != NULL here? This defensive check depends on 
>> users initializing their cmount pointers to NULL, but the API doesn't do 
>> anything to require this initialization assumption.
>>
>> - Noah
> I had a whole email going until I realized you were just right. So, yeah, 
> that wouldn't do anything since a cmount they forgot to have the API 
> initialize is just going to hold random data. Urgh.

There's no destructor either, maybe it's a good time to add one?

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Greg Farnum
On Wednesday, April 11, 2012 at 3:18 PM, Noah Watkins wrote:
> One could pair the pointer with a magic value in a separate structure, but 
> even libc doesn't go to these lengths to protect users...

At that point we'd run a pretty high risk of segfaulting when trying to deref 
and look at the magic value anyway. Nothing useful we can do here, I think. :/  




On Wednesday, April 11, 2012 at 3:29 PM, Noah Watkins wrote:

>  
> On Apr 11, 2012, at 11:32 AM, Greg Farnum wrote:
>  
> > On Wednesday, April 11, 2012 at 11:25 AM, Noah Watkins wrote:
> > >  
> > > On Apr 11, 2012, at 11:22 AM, Greg Farnum wrote:
> > >  
> > > > On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
> > > > > Hi all,
> > > > >  
> > > > > -Noah  
> > > > I'm not sure where the "-1004" came from,
> > >  
> > >  
> > >  
> > > ceph_mount(..) seems to return some random error codes (-1000, 1001) 
> > > already :)
> >  
> >  legacy undocumented grr 
> > Let's try to use standard error codes where available, and (if we have to 
> > create our own) document any new ones with user-accessible names and 
> > explanations. I don't know which one is "best" but I see a lot of 
> > applicable choices when scanning errno-base et al.
>  
>  
>  
> If I'm choosing from from errno-base I might go with
>  
> #define ENXIO 6 /* No such device or address */
>  
> It's used:
>  
> osd/OSD.cc (http://OSD.cc)
> void OSD::handle_misdirected_op(PG *pg, OpRequest *op)
> …
> reply_op_error(op, -ENXIO)
>  
> I'm wondering if you happen to know if this will be propagated back to the 
> client? I'd be nice to have an exclusive not-mounted condition on the client 
> side.
>  
> -Noah  
Unfortunately it is. (I just grepped and see it's returned by both the MDS and 
the OSD, but it's not in Objecter et al so it's not filtered out anywhere.)

ENODEV isn't returned from anything; do you think that makes sense?
-Greg



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Greg Farnum


On Wednesday, April 11, 2012 at 3:34 PM, Yehuda Sadeh Weinraub wrote:

> On Wed, Apr 11, 2012 at 3:13 PM, Greg Farnum
> mailto:gregory.far...@dreamhost.com)> wrote:
> > On Wednesday, April 11, 2012 at 2:59 PM, Noah Watkins wrote:
> > > 
> > > On Apr 11, 2012, at 11:22 AM, Yehuda Sadeh Weinraub wrote:
> > > 
> > > > Also need to check that cmount is initialized. I'd add a helper:
> > > > 
> > > > Client *ceph_get_client(struct ceph_mount_info *cmont)
> > > > {
> > > > if (cmount && cmount->is_mounted())
> > > > return cmount->get_client();
> > > > 
> > > > return NULL;
> > > > }
> > > 
> > > 
> > > 
> > > 
> > > 
> > > How useful is checking cmount != NULL here? This defensive check depends 
> > > on users initializing their cmount pointers to NULL, but the API doesn't 
> > > do anything to require this initialization assumption.
> > > 
> > > - Noah
> > I had a whole email going until I realized you were just right. So, yeah, 
> > that wouldn't do anything since a cmount they forgot to have the API 
> > initialize is just going to hold random data. Urgh.
> 
> 
> 
> There's no destructor either, maybe it's a good time to add one?
> 
> Yehuda 
Actually, there is. The problem is that to the client it's an opaque pointer 
under many(most?) circumstances, so that it can be used by C users. 
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make libcephfs return error when unmounted?

2012-04-11 Thread Noah Watkins

On Apr 11, 2012, at 3:39 PM, Greg Farnum wrote:

> On Wednesday, April 11, 2012 at 3:18 PM, Noah Watkins wrote:
>> One could pair the pointer with a magic value in a separate structure, but 
>> even libc doesn't go to these lengths to protect users...
> 
> At that point we'd run a pretty high risk of segfaulting when trying to deref 
> and look at the magic value anyway. Nothing useful we can do here, I think. :/

Actually I meant:

struct container {
  int magic;
  struct ceph_mount_info *ptr;
};

ceph_chdir(struct container something);

Then, container.magic should equal INITIALIZED where INITIALIZED is "some value 
that is unlikely to ever be present at a memory location where container is 
allocated". Lol… overkill and not bullet-proof.

> On Wednesday, April 11, 2012 at 3:29 PM, Noah Watkins wrote:
> 
>> 
>> On Apr 11, 2012, at 11:32 AM, Greg Farnum wrote:
>> 
>>> On Wednesday, April 11, 2012 at 11:25 AM, Noah Watkins wrote:
 
 On Apr 11, 2012, at 11:22 AM, Greg Farnum wrote:
 
> On Wednesday, April 11, 2012 at 11:12 AM, Noah Watkins wrote:
>> Hi all,
>> 
>> -Noah  
> I'm not sure where the "-1004" came from,
 
 
 
 ceph_mount(..) seems to return some random error codes (-1000, 1001) 
 already :)
>>> 
>>>  legacy undocumented grr 
>>> Let's try to use standard error codes where available, and (if we have to 
>>> create our own) document any new ones with user-accessible names and 
>>> explanations. I don't know which one is "best" but I see a lot of 
>>> applicable choices when scanning errno-base et al.
>> 
>> 
>> 
>> If I'm choosing from from errno-base I might go with
>> 
>> #define ENXIO 6 /* No such device or address */
>> 
>> It's used:
>> 
>> osd/OSD.cc (http://OSD.cc)
>> void OSD::handle_misdirected_op(PG *pg, OpRequest *op)
>> …
>> reply_op_error(op, -ENXIO)
>> 
>> I'm wondering if you happen to know if this will be propagated back to the 
>> client? I'd be nice to have an exclusive not-mounted condition on the client 
>> side.
>> 
>> -Noah  
> Unfortunately it is. (I just grepped and see it's returned by both the MDS 
> and the OSD, but it's not in Objecter et al so it's not filtered out 
> anywhere.)
> 
> ENODEV isn't returned from anything; do you think that makes sense?
> -Greg
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel crashes with RBD

2012-04-11 Thread Danny Kukawka
Hi,

we are currently testing CEPH with RBD on a cluster with 1GBit and
10Gbit interfaces. While we see no kernel crashes with RBD if the
cluster runs on the 1GBit interfaces, we see very frequent kernel
crashes with the 10Gbit network while running tests with e.g. fio
against the RBDs.

I've tested it with kernel v3.0 and also 3.3.0 (with the patches from
the 'for-linus' branch from ceph-client.git at git.kernel.org).

With more client machines running tests the crashes occur even much
faster. The issue is fully reproducible here.

Has anyone seen similar problems? See the backtrace below.

Regards

Danny

PID: 10902  TASK: 88032a9a2080  CPU: 0   COMMAND: "kworker/0:0"
 #0 [8803235fd950] machine_kexec at 810265ee
 #1 [8803235fd9a0] crash_kexec at 810a3bda
 #2 [8803235fda70] oops_end at 81444688
 #3 [8803235fda90] __bad_area_nosemaphore at 81032a35
 #4 [8803235fdb50] do_page_fault at 81446d3e
 #5 [8803235fdc50] page_fault at 81443865
[exception RIP: read_partial_message+816]
RIP: a041e500  RSP: 8803235fdd00  RFLAGS: 00010246
RAX:   RBX: 09d7  RCX: 8000
RDX:   RSI: 09d7  RDI: 813c8d78
RBP: 880328827030   R8: 09d7   R9: 4000
R10:   R11: 81205800  R12: 
R13: 0069  R14: 88032a9bc780  R15: 
ORIG_RAX:   CS: 0010  SS: 0018
 #6 [8803235fdd38] thread_return at 81440e82
 #7 [8803235fdd78] try_read at a041ed58 [libceph]
 #8 [8803235fddf8] con_work at a041fb2e [libceph]
 #9 [8803235fde28] process_one_work at 8107487c
#10 [8803235fde78] worker_thread at 8107740a
#11 [8803235fdee8] kthread at 8107b736
#12 [8803235fdf48] kernel_thread_helper at 8144c144



signature.asc
Description: OpenPGP digital signature