Re: [ceph-users] monitor failover of ceph

2013-10-11 Thread Michael Lowe
You must have a quorum or MORE than 50% of your monitors functioning for the 
cluster to function.  With one of two you only have 50% which isn't enough and 
stops i/o.

Sent from my iPad

> On Oct 11, 2013, at 11:28 PM, "飞"  wrote:
> 
> hello, I am a new user of ceph,
> I have built a ceph testing Environment for block storage,
> I have 2 osd and 2 monitor,In addition to failover test, other tests are 
> normal.
> when I perform failover test, if I stop one osd , the cluster is OK,
> but if I stop one monitor , the cluster have entire die , why ? thank you.
> 
> my configure file :
> ; global
> [global]
>  ; enable secure authentication
>  ; auth supported = cephx
>  
>  auth cluster required = none
>  auth service required = none
>  auth client required = none
>  
>  mon clock drift allowed = 3
>  
> ;  monitors
> ;  You need at least one.  You need at least three if you want to
> ;  tolerate any node failures.  Always create an odd number.
> [mon]
>  mon data = /home/ceph/mon$id
>  ; some minimal logging (just message traffic) to aid debugging
>  debug ms = 1
> [mon.0]
>  host = sheepdog1
>  mon addr = 192.168.0.19:6789
>  
> [mon.1]
>  mon data = /var/lib/ceph/mon.$id
> host = sheepdog2
> mon addr = 192.168.0.219:6789
>  
> ; mds
> ;  You need at least one.  Define two to get a standby.
> [mds]
>  ; where the mds keeps it's secret encryption keys
>  keyring = /home/ceph/keyring.mds.$id
> [mds.0]
>  host = sheepdog1
> ; osd
> ;  You need at least one.  Two if you want data to be replicated.
> ;  Define as many as you like.
> [osd]
> ; This is where the btrfs volume will be mounted.  
>  osd data = /home/ceph/osd.$id
>  osd journal = /home/ceph/osd.$id/journal
>  osd journal size = 512
>  ; working with ext4
>  filestore xattr use omap = true
>  
>  ; solve rbd data corruption
>  filestore fiemap = false
> 
> [osd.0]
> host = sheepdog1
> osd data = /var/lib/ceph/osd/diskb
> osd journal = /var/lib/ceph/osd/diskb/journal
> [osd.2]
>  host = sheepdog2
>  osd data = /var/lib/ceph/osd/diskc
>  osd journal = /var/lib/ceph/osd/diskc/journal
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] monitor failover of ceph

2013-10-11 Thread ??
hello, I am a new user of ceph, 
I have built a ceph testing Environment for block storage,
I have 2 osd and 2 monitor,In addition to failover test, other tests are normal.
when I perform failover test, if I stop one osd , the cluster is OK,
but if I stop one monitor , the cluster have entire die , why ? thank you.

my configure file :
; global
[global]
 ; enable secure authentication
 ; auth supported = cephx
 
 auth cluster required = none
 auth service required = none
 auth client required = none
 
 mon clock drift allowed = 3
 
;  monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any node failures.  Always create an odd number.
[mon]
 mon data = /home/ceph/mon$id
 ; some minimal logging (just message traffic) to aid debugging
 debug ms = 1
[mon.0]
 host = sheepdog1
 mon addr = 192.168.0.19:6789
 
[mon.1]
 mon data = /var/lib/ceph/mon.$id
host = sheepdog2
mon addr = 192.168.0.219:6789
 
; mds
;  You need at least one.  Define two to get a standby.
[mds]
 ; where the mds keeps it's secret encryption keys
 keyring = /home/ceph/keyring.mds.$id
[mds.0]
 host = sheepdog1
; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like. 
[osd]
; This is where the btrfs volume will be mounted.   
 osd data = /home/ceph/osd.$id
 osd journal = /home/ceph/osd.$id/journal
 osd journal size = 512
 ; working with ext4
 filestore xattr use omap = true
 
 ; solve rbd data corruption
 filestore fiemap = false


[osd.0]
host = sheepdog1
osd data = /var/lib/ceph/osd/diskb
osd journal = /var/lib/ceph/osd/diskb/journal
[osd.2]
 host = sheepdog2
 osd data = /var/lib/ceph/osd/diskc
 osd journal = /var/lib/ceph/osd/diskc/journal___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephforum.com

2013-10-11 Thread Shain Miley
I was wondering if something like this:

http://www.osqa.net/

might be a bit more useful than setting up a brand new forum.

There is a lot of help available between the mailing list and both of the irc 
rooms, however there are common questions that definitely seem to come up over 
and over again (on both the ML and irc), something like this could help 
eliminate a bit of the unnecessary traffic as well as help new users get 
answers to some of their most basic questions right away.

Just a thought.

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649

From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Darren Birkett [darren.birk...@gmail.com]
Sent: Friday, October 11, 2013 3:39 AM
To: Joao Eduardo Luis
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] cephforum.com

Hi,

I'd have to say in general I agree with the other responders.  Not really for 
reasons of preferring a ML over a forum necessarily, but just because the ML 
already exists.  One of the biggest challenges for anyone new coming in to an 
open source project such as ceph is availability of information and 
documentation.  Having this information in as few places as possible, rather 
than sprawling over a lot of different formats and locations, makes it easier 
to find what you need and to know where to go when you want to ask a question.

- Darren


On 11 October 2013 00:16, Joao Eduardo Luis 
mailto:joao.l...@inktank.com>> wrote:
On 10/10/2013 09:55 PM, Wido den Hollander wrote:
On 10/10/2013 10:49 PM, ja...@peacon.co.uk wrote:
Hello!

Anyone else think a web forum for ceph could work?  I'm thinking simple
vbulletin or phpBB site.

To me it seems this would increase accessibility to the great info
(&minds) on here... but obviously it would need those great minds to
work :)


Well, I'm not sure. Forums are nice, but for technical discussions they
most of the time don't work that well.

The problem imho usually is that they distract a lot from the technical
discussion with all the footers, banners, smilies, etc, etc.

Another thing is that most people in the community already have a hard
time keeping up the the dev and users mailinglist, so adding another
channel of information would make it even harder to keep up.

Following up on that you get the problem that you suddenly have multiple
channels:
- mailinglists
- irc
- forum

So information becomes decentralized and harder to find for people.

I'd personally prefer to stick to the mailinglist and IRC.

I'm with Wido.

For a user-facing experience, where you ask a question about something ailing 
you and get an answer, stackoverflow (or the likes) work pretty well as it is, 
with the added benefit that the right answers get upvoted.  I however am not 
sure if it would work that well for Ceph, where answers may not be that 
straightforward.

For technical discussions, I believe the lists tend to be the best format to 
address them.  Besides, as someone who already follows both ceph-users and 
ceph-devel, both irc channels, and the tracker, I feel that following an 
additional forum as well would impose an extra overhead to how we interact.  
And we should keep in mind that most questions that would end up being asked in 
the forums would have already been answered on the mailing lists (which are 
archived btw), or would end up duplicating things that are under current 
discussion.


  -Joao

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-admin doesn't list user anymore

2013-10-11 Thread Yehuda Sadeh
On Fri, Oct 11, 2013 at 7:46 AM, Valery Tschopp
 wrote:
> Hi,
>
> Since we upgraded ceph to 0.67.4, the radosgw-admin doesn't list all the
> users anymore:
>
> root@ineri:~# radosgw-admin user info
> could not fetch user info: no user info saved
>
>
> But it still work for single user:
>
> root@ineri:~# radosgw-admin user info --uid=valery
> { "user_id": "valery",
>"display_name": "Valery Tschopp",
>"email": "valery.tsch...@switch.ch",
> ...
>
> The debug log file is too big for the mailing-list, but here it is on
> pastebin: http://pastebin.com/cFypJ2Qd
>

What version did you upgrade from?

You can try using the following:

$ radosgw-admin metadata list bucket

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool write performance

2013-10-11 Thread james

Just a thought; did you try setting noop scheduler for the SSDs?

I guess the journal is written uncached (?)  So maybe sticking the SSDs 
behind BBWC might help by reducing write latency to near zero.  Also 
maybe wear rate might be lower on the SSD too (if journal IO straddles 
physical cells).



On 2013-10-11 16:55, Gregory Farnum wrote:
On Thu, Oct 10, 2013 at 12:47 PM, Sergey Pimkov 
 wrote:

Hello!

I'm testing small CEPH pool consists of some SSD drives (without any
spinners).  Ceph version is 0.67.4. Seems like write performance of 
this
configuration is not so good as possible, when I testing it with 
small block

size (4k).

Pool configuration:
2 mons on separated hosts, one host with two OSD. First partition of 
each
disk is used for journal and has 20Gb size, second is formatted as 
XFS and
used for data (mount options: 
rw,noexec,nodev,noatime,nodiratime,inode64).

20% of space left unformatted. Journal aio and dio turned on.

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS 
with 4k
block, iodepth 16 (tested with fio). Linear throughput of disks is 
about

420Mb/s. Network throughput is 1Gbit/s.

I use rbd pool with size 1 and want this pool to act like RAID0 at 
this

time.

Virtual machine (QEMU/KVM) on separated host is configured to use 
100Gb RBD

as second disk. Fio running in this machine (iodepth 16, buffered=0,
direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
Multiple quests with the same configuration shows similar summary 
result.
Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. 
Latency is

about 7ms.


You need to figure out where this is coming from. The OSD does have
some internal queueing that can add up to a millisecond or so of
latency, but 7ms of latency is far more than you should be getting on
an SSD.

You also aren't putting enough concurrency on the disks — with 16
in-flight ops against two disks, that's 8 each, plus you're 
traversing

the network so it looks a lot more like 1 IO queued than 16 to the
SSD.

All that said, Ceph is a distributed storage system that is 
respecting

the durability constraints you give it — you aren't going to get IOP
numbers matching a good local SSD without a big investment.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


I also tried to pre-fill RBD without any results.

Atop shows about 90% disks utilization during tests. CPU utilization 
is
about 400% (2x Xeon E5504 is installed on ceph node). There is a lot 
of free
memory on host. Blktrace shows that about 4k operations (4k to about 
40k
bytes) completing every second on every disk. OSD throughput is 
about 30

MB/s.

I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that 
too
optimistic for CEPH with such load or if I missed up something 
important?
I also tried to use one disk as journal (20GB, last space left 
unformatted)
and configure the next disk as OSD, this configuration have shown 
almost the

same result.

Playing with some osd/filestore/journal options with admin socket 
ended with

no result.

Please, tell me am I wrong with this setup? Or should I use more 
disks to
get better performance with small concurrent writes? Or is ceph 
optimized
for work with slow spinners and shouldn't be used with SSD disk 
only?

Thank you very much in advance!

My ceph configuration:
ceph.conf

==
[global]

  auth cluster required = none
  auth service required = none
  auth client required = none

[client]

  rbd cache = true
  rbd cache max dirty = 0

[osd]

  osd journal aio = true
  osd max backfills = 4
  osd recovery max active = 1
  filestore max sync interval = 5

[mon.1]

  host = ceph1
  mon addr = 10.10.0.1:6789

[mon.2]

host = ceph2
mon addr = 10.10.0.2:6789

[osd.72]
  host = ceph7
  devs = /dev/sdd2
  osd journal = /dev/sdd1

[osd.73]
  host = ceph7
  devs = /dev/sde2
  osd journal = /dev/sde1

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool write performance

2013-10-11 Thread Gregory Farnum
On Thu, Oct 10, 2013 at 12:47 PM, Sergey Pimkov  wrote:
> Hello!
>
> I'm testing small CEPH pool consists of some SSD drives (without any
> spinners).  Ceph version is 0.67.4. Seems like write performance of this
> configuration is not so good as possible, when I testing it with small block
> size (4k).
>
> Pool configuration:
> 2 mons on separated hosts, one host with two OSD. First partition of each
> disk is used for journal and has 20Gb size, second is formatted as XFS and
> used for data (mount options: rw,noexec,nodev,noatime,nodiratime,inode64).
> 20% of space left unformatted. Journal aio and dio turned on.
>
> Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with 4k
> block, iodepth 16 (tested with fio). Linear throughput of disks is about
> 420Mb/s. Network throughput is 1Gbit/s.
>
> I use rbd pool with size 1 and want this pool to act like RAID0 at this
> time.
>
> Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb RBD
> as second disk. Fio running in this machine (iodepth 16, buffered=0,
> direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
> Multiple quests with the same configuration shows similar summary result.
> Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. Latency is
> about 7ms.

You need to figure out where this is coming from. The OSD does have
some internal queueing that can add up to a millisecond or so of
latency, but 7ms of latency is far more than you should be getting on
an SSD.

You also aren't putting enough concurrency on the disks — with 16
in-flight ops against two disks, that's 8 each, plus you're traversing
the network so it looks a lot more like 1 IO queued than 16 to the
SSD.

All that said, Ceph is a distributed storage system that is respecting
the durability constraints you give it — you aren't going to get IOP
numbers matching a good local SSD without a big investment.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

> I also tried to pre-fill RBD without any results.
>
> Atop shows about 90% disks utilization during tests. CPU utilization is
> about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of free
> memory on host. Blktrace shows that about 4k operations (4k to about 40k
> bytes) completing every second on every disk. OSD throughput is about 30
> MB/s.
>
> I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too
> optimistic for CEPH with such load or if I missed up something important?
> I also tried to use one disk as journal (20GB, last space left unformatted)
> and configure the next disk as OSD, this configuration have shown almost the
> same result.
>
> Playing with some osd/filestore/journal options with admin socket ended with
> no result.
>
> Please, tell me am I wrong with this setup? Or should I use more disks to
> get better performance with small concurrent writes? Or is ceph optimized
> for work with slow spinners and shouldn't be used with SSD disk only?
> Thank you very much in advance!
>
> My ceph configuration:
> ceph.conf
> ==
> [global]
>
>   auth cluster required = none
>   auth service required = none
>   auth client required = none
>
> [client]
>
>   rbd cache = true
>   rbd cache max dirty = 0
>
> [osd]
>
>   osd journal aio = true
>   osd max backfills = 4
>   osd recovery max active = 1
>   filestore max sync interval = 5
>
> [mon.1]
>
>   host = ceph1
>   mon addr = 10.10.0.1:6789
>
> [mon.2]
>
> host = ceph2
> mon addr = 10.10.0.2:6789
>
> [osd.72]
>   host = ceph7
>   devs = /dev/sdd2
>   osd journal = /dev/sdd1
>
> [osd.73]
>   host = ceph7
>   devs = /dev/sde2
>   osd journal = /dev/sde1
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool write performance

2013-10-11 Thread Andrei Mikhailovsky
Hi 

i've also tested 4k performance and found similar results with fio and iozone 
tests as well as simple dd. I've noticed that my io rate doesn't go above 2k-3k 
in the virtual machines. I've got two servers with ssd journals but spindles 
for the osd. I've previusly tried to use nfs + zfs on the same hardware with 
the same drives acting as cache drives. The nfs performance was far better for 
4k io. I was hitting around 60k when the storage servers were reading the test 
file from ram. 

It looks like some more optimisations have to be done to fix the current 
bottleneck. 

Having said this, the read performance from multiple clients would excel NFS by 
far. In nfs I would not see the total speeds over 450-500 but with ceph i was 
going over 1GB/s 

Andrei 
- Original Message -

From: "Sergey Pimkov"  
To: ceph-users@lists.ceph.com 
Sent: Thursday, 10 October, 2013 8:47:32 PM 
Subject: [ceph-users] SSD pool write performance 

Hello! 

I'm testing small CEPH pool consists of some SSD drives (without any 
spinners). Ceph version is 0.67.4. Seems like write performance of this 
configuration is not so good as possible, when I testing it with small 
block size (4k). 

Pool configuration: 
2 mons on separated hosts, one host with two OSD. First partition of 
each disk is used for journal and has 20Gb size, second is formatted as 
XFS and used for data (mount options: 
rw,noexec,nodev,noatime,nodiratime,inode64). 20% of space left 
unformatted. Journal aio and dio turned on. 

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with 
4k block, iodepth 16 (tested with fio). Linear throughput of disks is 
about 420Mb/s. Network throughput is 1Gbit/s. 

I use rbd pool with size 1 and want this pool to act like RAID0 at this 
time. 

Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb 
RBD as second disk. Fio running in this machine (iodepth 16, buffered=0, 
direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS. 
Multiple quests with the same configuration shows similar summary 
result. Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. 
Latency is about 7ms. I also tried to pre-fill RBD without any results. 

Atop shows about 90% disks utilization during tests. CPU utilization is 
about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of 
free memory on host. Blktrace shows that about 4k operations (4k to 
about 40k bytes) completing every second on every disk. OSD throughput 
is about 30 MB/s. 

I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too 
optimistic for CEPH with such load or if I missed up something important? 
I also tried to use one disk as journal (20GB, last space left 
unformatted) and configure the next disk as OSD, this configuration have 
shown almost the same result. 

Playing with some osd/filestore/journal options with admin socket ended 
with no result. 

Please, tell me am I wrong with this setup? Or should I use more disks 
to get better performance with small concurrent writes? Or is ceph 
optimized for work with slow spinners and shouldn't be used with SSD 
disk only? 
Thank you very much in advance! 

My ceph configuration: 
ceph.conf 
== 
[global] 

auth cluster required = none 
auth service required = none 
auth client required = none 

[client] 

rbd cache = true 
rbd cache max dirty = 0 

[osd] 

osd journal aio = true 
osd max backfills = 4 
osd recovery max active = 1 
filestore max sync interval = 5 

[mon.1] 

host = ceph1 
mon addr = 10.10.0.1:6789 

[mon.2] 

host = ceph2 
mon addr = 10.10.0.2:6789 

[osd.72] 
host = ceph7 
devs = /dev/sdd2 
osd journal = /dev/sdd1 

[osd.73] 
host = ceph7 
devs = /dev/sde2 
osd journal = /dev/sde1 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with RadosGW bench

2013-10-11 Thread Gregory Farnum
Without more details it sounds like you're just overloading the
cluster. How are the clients generating their load — is there any
throttling?
4 gateways can probably process on the order of 15k ops/second; each
of those PUT ops is going to require 3 writes to the disks on the
backend (times whatever the replication value is), so the OSDs can
probably handle 72*120/(2*3)=1440 PUTS/s; meanwhile you have 300
clients all trying to do 800k puts (=2.4 million puts, or about 2 days
of write time), and I'm guessing they're sending them out as fast as
they can generate them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Oct 11, 2013 at 5:56 AM, Alexis GÜNST HORN
 wrote:
> Hello to all,
>
> Here is my context :
>
> - Ceph cluster composed of 72 OSDs (ie 72 disks).
> - 4 radosgw gateways
> - Round robin DNS for load balancing accross gateways
>
> My goal is to test / bench the S3 API.
>
> Here is my scenario, with 300 clients from 300 différents hosts :
>
> 1) each client uploading about 800.000 files. One bucket / client, on
> the same account
> 2) each client making recursive "ls" to get the whole list of the bucket
> 3) each  client randomly copying one object to another
> 4) each client randomly moving one object to another
> 5) each client randomly deleting an object
>
>
> Here is the result :
> 1 => OK
> 2 => OK
> but, 3, 4, 5 are both OK and KO.
>
> In fact, i get a lot of error 500 with PUT requests.
>
> Here is the Apache Log :
> [Fri Oct 11 12:46:36 2013] [error] [client xxx.xxx.xxx.xxx] FastCGI:
> comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> [Fri Oct 11 12:46:36 2013] [error] [client xxx.xxx.xxx.xxx] FastCGI:
> incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"
>
> And, in radosgw logs, i have some of these lines :
> radosgw: 2013-10-07 17:12:20.843522 7f61462ad700  1 heartbeat_map
> is_healthy 'RGWProcess::m_tp thread 0x7f60e4fa7700' had timed out
> after 600
>
> and
>
> radosgw: 2013-10-07 17:12:14.027608 7f61007d3700  1 heartbeat_map
> reset_timeout 'RGWProcess::m_tp thread 0x7f61007d3700' had timed out
> after 600
>
> But, the Ceph cluster is still OK (HEALTH_OK).
>
> Here are my options for radosgw :
>
> [client.radosgw.gateway]
> host = 
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw socket path = /tmp/radosgw.sock
> rgw enable ops log = false
> rgw print continue = false
> rgw enable usage log = true
> debug rgw = 0
> rgw usage log tick interval = 30
> rgw usage log flush threshold = 1024
> rgw usage max shards = 32
> rgw usage max user shards = 1
> rgw dns name = 
> rgw thread pool size = 150
> rgw gc max objs = 64
>
> Do you have any idea to explain theses Errors 500 ?
>
> Thanks a lot for your help
> Alexis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with OSDs not starting

2013-10-11 Thread Gregory Farnum
On Fri, Oct 11, 2013 at 2:49 AM,   wrote:
> Hi
>
>
>
> I am installing Ceph using the chef cookbook recipes and I am having an
> issue with ceph-osd-all-starter
>
>
>
> Here’s a dump from client.log
>
>
>
> 
>
> Error executing action `start` on resource 'service[ceph_osd]'
>
> 
>
>
>
> Chef::Exceptions::Exec
>
> --
>
> /sbin/start ceph-osd-all-starter returned 1, expected 0
>
>
>
> Resource Declaration:
>
> -
>
> # In /var/cache/chef/cookbooks/ceph/recipes/osd.rb
>
>
>
> 153:   service "ceph_osd" do
>
> 154: case service_type
>
> 155: when "upstart"
>
> 156:   service_name "ceph-osd-all-starter"
>
> 157:   provider Chef::Provider::Service::Upstart
>
> 158: else
>
> 159:   service_name "ceph"
>
> 160: end
>
> 161: action [ :enable, :start ]
>
>
>
> Compiled Resource:
>
> --
>
> # Declared in /var/cache/chef/cookbooks/ceph/recipes/osd.rb:153:in
> `from_file'
>
>
>
> service("ceph_osd") do
>
>   enabled true
>
>   pattern "ceph_osd"
>
>   provider Chef::Provider::Service::Upstart
>
>   recipe_name "osd"
>
>   supports {:restart=>true}
>
>   action [:enable, :start]
>
>   startup_type :automatic
>
>   retry_delay 2
>
>   cookbook_name "ceph"
>
>   service_name "ceph-osd-all-starter"
>
>   retries 0
>
> end
>
>
>
> [Sat, 21 Sep 2013 14:30:12 +] ERROR: Running exception handlers
>
> [Sat, 21 Sep 2013 14:30:13 +] FATAL: Saving node information to
> /var/cache/chef/failed-run-data.json
>
> [Sat, 21 Sep 2013 14:30:13 +] ERROR: Exception handlers complete
>
> [Sat, 21 Sep 2013 14:30:13 +] ERROR: Chef::Exceptions::Exec:
> service[ceph_osd] (ceph::osd line 153) had an error: Chef::Exceptions::Exec:
> /sbin/start ceph-o$
>
> [Sat, 21 Sep 2013 14:30:13 +] FATAL: Stacktrace dumped to
> /var/cache/chef/chef-stacktrace.out
>
> [Sat, 21 Sep 2013 14:30:13 +] ERROR: Sleeping for 900 seconds before
> trying again
>
>
>
> Also I checked out /var/log/upstart/ceph-osd-all-starter.log  and its
> reporting the following
>
>
>
> ceph-disk: Error: ceph osd create failed: Command '['/usr/bin/ceph',
> '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring',
> '/var/lib/ceph/bootstra$

This line looks to have been mangled somewhere — is that actually the
file it's trying to use? If so, I imagine that's the problem... ;)
If it is referring to the right key location, have you checked that it
exists, is readable by Chef, and matches what the monitors have?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
> INFO:ceph-disk:Activating
> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.3e1a93b4-75ab-4c76-b325-64d88d1ba0fe
>
> 2013-09-21 14:30:12.793020 7f1bf4702700  0 librados: client.bootstrap-osd
> authentication error (1) Operation not permitted
>
> Error connecting to cluster: PermissionError
>
> ERROR:ceph-disk:Failed to activate
>
> ceph-disk: Error: ceph osd create failed: Command '['/usr/bin/ceph',
> '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring',
> '/var/lib/ceph/bootstra$
>
> ceph-disk: Error: One or more partitions failed to activate
>
>
>
>
>
> Any suggestions how to track this issue down? Sounds like some sort of
> permissions issue
>
>
>
> Regards,
>
>
>
> Ian
>
>
>
>
>
> Dell Corporation Limited is registered in England and Wales. Company
> Registration Number: 2081369
> Registered address: Dell House, The Boulevard, Cain Road, Bracknell,
> Berkshire, RG12 1LF, UK.
> Company details for other Dell UK entities can be found on  www.dell.co.uk.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw-admin doesn't list user anymore

2013-10-11 Thread Valery Tschopp

Hi,

Since we upgraded ceph to 0.67.4, the radosgw-admin doesn't list all the 
users anymore:


root@ineri:~# radosgw-admin user info
could not fetch user info: no user info saved


But it still work for single user:

root@ineri:~# radosgw-admin user info --uid=valery
{ "user_id": "valery",
   "display_name": "Valery Tschopp",
   "email": "valery.tsch...@switch.ch",
...

The debug log file is too big for the mailing-list, but here it is on 
pastebin: http://pastebin.com/cFypJ2Qd


Cheers,
Valery
--
SWITCH
--
Valery Tschopp, Software Engineer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: valery.tsch...@switch.ch phone: +41 44 268 1544





smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool write performance

2013-10-11 Thread Mark Nelson

On 10/10/2013 02:47 PM, Sergey Pimkov wrote:

Hello!

I'm testing small CEPH pool consists of some SSD drives (without any
spinners).  Ceph version is 0.67.4. Seems like write performance of this
configuration is not so good as possible, when I testing it with small
block size (4k).

Pool configuration:
2 mons on separated hosts, one host with two OSD. First partition of
each disk is used for journal and has 20Gb size, second is formatted as
XFS and used for data (mount options:
rw,noexec,nodev,noatime,nodiratime,inode64). 20% of space left
unformatted. Journal aio and dio turned on.

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with
4k block, iodepth 16 (tested with fio). Linear throughput of disks is
about 420Mb/s. Network throughput is 1Gbit/s.

I use rbd pool with size 1 and want this pool to act like RAID0 at this
time.

Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb
RBD as second disk. Fio running in this machine (iodepth 16, buffered=0,
direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
Multiple quests with the same configuration shows similar summary
result. Local kernel RBD on host with OSD also shows about 2-2.5k IOPS.
Latency is about 7ms. I also tried to pre-fill RBD without any results.

Atop shows about 90% disks utilization during tests. CPU utilization is
about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of
free memory on host. Blktrace shows that about 4k operations (4k to
about 40k bytes) completing every second on every disk. OSD throughput
is about 30 MB/s.


Hi!  First thing to try is disabling all in-memory dubugging.  Not sure 
how much it will help, but it should give you something.


debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug hadoop = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0



I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too
optimistic for CEPH with such load or if I missed up something important?
I also tried to use one disk as journal (20GB, last space left
unformatted) and configure the next disk as OSD, this configuration have
shown almost the same result.

Playing with some osd/filestore/journal options with admin socket ended
with no result.

Please, tell me am I wrong with this setup? Or should I use more disks
to get better performance with small concurrent writes? Or is ceph
optimized for work with slow spinners and shouldn't be used with SSD
disk only?


We definitely have some things to work on for small IO performance.  I 
suspect some of the changes we'll be making over the coming months 
should help.



Thank you very much in advance!

My ceph configuration:
ceph.conf
==
[global]

   auth cluster required = none
   auth service required = none
   auth client required = none

[client]

   rbd cache = true
   rbd cache max dirty = 0

[osd]

   osd journal aio = true
   osd max backfills = 4
   osd recovery max active = 1
   filestore max sync interval = 5

[mon.1]

   host = ceph1
   mon addr = 10.10.0.1:6789

[mon.2]

host = ceph2
mon addr = 10.10.0.2:6789

[osd.72]
   host = ceph7
   devs = /dev/sdd2
   osd journal = /dev/sdd1

[osd.73]
   host = ceph7
   devs = /dev/sde2
   osd journal = /dev/sde1

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel: [ 8773.432358] libceph: osd1 192.168.0.131:6803 socket error on read

2013-10-11 Thread Sébastien Han
Hi,

I was wondering, why did you use CephFS instead of RBD?
RBD is much more reliable and well integrated with QEMU/KVM.

Or perhaps you want to try CephFS?


Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood.”

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 10, rue de la Victoire - 75009 Paris
Web : www.enovance.com - Twitter : @enovance

On October 11, 2013 at 4:47:58 AM, Frerot, Jean-Sébastien 
(jsfre...@egliseespoir.com) wrote:
>
>Hi,
>I followed this documentation and didn't specify any CRUSH settings.
>
>http://ceph.com/docs/next/rbd/rbd-openstack/
>
>--
>Jean-Sébastien Frerot
>jsfre...@egliseespoir.com
>
>
>2013/10/10 Gregory Farnum  
>
>> Okay. As a quick guess you probably used a CRUSH placement option with
>> your new pools that wasn't supported by the old kernel, although it
>> might have been something else.
>>
>> I suspect that you'll find FUSE works better for you anyway as long as
>> you can use it — faster updates from us to you. ;)
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Thu, Oct 10, 2013 at 10:53 AM, Frerot, Jean-Sébastien
>> wrote:
>> > Hi,
>> > Thx for your reply :)
>> >
>> > kernel: Linux compute01 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10
>> 20:03:44
>> > UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > So yes I'm using cephfs and was also using rdb at the same time using
>> > different pools. My ceph fs was setup 3 months ago and I upgraded it a
>> > couple of days ago. I move VM images from rdb to the cephfs by copying
>> the
>> > file from rdb to local FS then to cephfs.
>> >
>> > I create pools like this:
>> > ceph osd pool create volumes 128
>> > ceph osd pool create images 128
>> > ceph osd pool create live_migration 128
>> >
>> > Yes I had checked dmesg but didn't find anything relevant.
>> >
>> > However, as a last resort I decided to mount my FS using fuse. And it
>> works
>> > like a charm. So for now I'm sticking with fuse :)
>> >
>> > Let me know if you want me to do some explicit testing. It may take some
>> > time for me to do them as I'm using ceph but I can manage to have some
>> time
>> > for maintenances.
>> >
>> > Regards,
>> >
>> >
>> > --
>> > Jean-Sébastien Frerot
>> > jsfre...@egliseespoir.com
>> >
>> >
>> > 2013/10/10 Gregory Farnum  
>> >>
>> >> (Sorry for the delayed response, this was in my spam folder!)
>> >>
>> >> Has this issue persisted? Are you using the stock 13.04 kernel?
>> >>
>> >> Can you describe your setup a little more clearly? It sounds like
>> >> maybe you're using CephFS now and were using rbd before; is that
>> >> right? What data did you move, when, and how did you set up your
>> >> CephFS to use the pools?
>> >> The socket errors are often a slightly spammy notification that the
>> >> socket isn't in use but has shut down; here they look to be an
>> >> indicator of something actually gone wrong — perhaps you've
>> >> inadvertently activated features incompatible with your kernel client,
>> >> but let's see what's going on more before we jump to that conclusion.
>> >> have you checked dmesg for anything else at those points?
>> >> -Greg
>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >>
>> >> On Sat, Oct 5, 2013 at 6:42 PM, Frerot, Jean-Sébastien
>> >> wrote:
>> >> > Hi,
>> >> > I have a ceph cluster running with 3 physical servers,
>> >> >
>> >> > Here is how my setup is configured
>> >> > server1: mon, osd, mds
>> >> > server2: mon, osd, mds
>> >> > server3: mon
>> >> > OS ubuntu 13.04
>> >> > ceph version: 0.67.4-1raring (recentrly upgrade to see if my problem
>> >> > still
>> >> > persisted with the new version)
>> >> >
>> >> > So I was running version CUTTLEFISH until yesterday. And I was using
>> >> > ceph
>> >> > with openstack (using rdb) but I simplified my setup and removed
>> >> > openstack
>> >> > to simply use kvm with virtmanager.
>> >> >
>> >> > So I created a new pool to be able to do live migration of kvm
>> instances
>> >> > #ceph osd lspools
>> >> > 0 data,1 metadata,2 rbd,3 volumes,4 images,6 live_migration,
>> >> >
>> >> > I've been running VMs for some days without problems, but then I
>> notice
>> >> > that
>> >> > I couldn't use the full disk size of my first VM (web01 which was 160G
>> >> > big
>> >> > originaly) but now is only 119G stored in ceph. I also have a windows
>> >> > instance running on a 300G raw file located in ceph too. So trying to
>> >> > fix
>> >> > the issue I decided to do a local backup of my file in cause something
>> >> > goes
>> >> > wrong and guess what, i wasn't able to copy the file from ceph to my
>> >> > local
>> >> > drive. The moment I tried to do that "cp live_migration/web01 /mnt/"
>> the
>> >> > OS
>> >> > hangs, and syslog show this >30 lines/s:
>> >> >
>> >> > Oct 5 15:25:45 server2 kernel: [ 8773.432358] libceph: osd1
>> >> > 192.168.0.131:6803 socket error on read
>> >> >
>> >> > i couldn't kill my cp neither normally reboot my server. So I had to
>> >> > r

[ceph-users] Problems with RadosGW bench

2013-10-11 Thread Alexis GÜNST HORN
Hello to all,

Here is my context :

- Ceph cluster composed of 72 OSDs (ie 72 disks).
- 4 radosgw gateways
- Round robin DNS for load balancing accross gateways

My goal is to test / bench the S3 API.

Here is my scenario, with 300 clients from 300 différents hosts :

1) each client uploading about 800.000 files. One bucket / client, on
the same account
2) each client making recursive "ls" to get the whole list of the bucket
3) each  client randomly copying one object to another
4) each client randomly moving one object to another
5) each client randomly deleting an object


Here is the result :
1 => OK
2 => OK
but, 3, 4, 5 are both OK and KO.

In fact, i get a lot of error 500 with PUT requests.

Here is the Apache Log :
[Fri Oct 11 12:46:36 2013] [error] [client xxx.xxx.xxx.xxx] FastCGI:
comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Fri Oct 11 12:46:36 2013] [error] [client xxx.xxx.xxx.xxx] FastCGI:
incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"

And, in radosgw logs, i have some of these lines :
radosgw: 2013-10-07 17:12:20.843522 7f61462ad700  1 heartbeat_map
is_healthy 'RGWProcess::m_tp thread 0x7f60e4fa7700' had timed out
after 600

and

radosgw: 2013-10-07 17:12:14.027608 7f61007d3700  1 heartbeat_map
reset_timeout 'RGWProcess::m_tp thread 0x7f61007d3700' had timed out
after 600

But, the Ceph cluster is still OK (HEALTH_OK).

Here are my options for radosgw :

[client.radosgw.gateway]
host = 
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
rgw enable ops log = false
rgw print continue = false
rgw enable usage log = true
debug rgw = 0
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
rgw dns name = 
rgw thread pool size = 150
rgw gc max objs = 64

Do you have any idea to explain theses Errors 500 ?

Thanks a lot for your help
Alexis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Issue with OSDs not starting

2013-10-11 Thread Ian_M_Porter
Hi

I am installing Ceph using the chef cookbook recipes and I am having an issue 
with ceph-osd-all-starter

Here's a dump from client.log


Error executing action `start` on resource 'service[ceph_osd]'


Chef::Exceptions::Exec
--
/sbin/start ceph-osd-all-starter returned 1, expected 0

Resource Declaration:
-
# In /var/cache/chef/cookbooks/ceph/recipes/osd.rb

153:   service "ceph_osd" do
154: case service_type
155: when "upstart"
156:   service_name "ceph-osd-all-starter"
157:   provider Chef::Provider::Service::Upstart
158: else
159:   service_name "ceph"
160: end
161: action [ :enable, :start ]

Compiled Resource:
--
# Declared in /var/cache/chef/cookbooks/ceph/recipes/osd.rb:153:in `from_file'

service("ceph_osd") do
  enabled true
  pattern "ceph_osd"
  provider Chef::Provider::Service::Upstart
  recipe_name "osd"
  supports {:restart=>true}
  action [:enable, :start]
  startup_type :automatic
  retry_delay 2
  cookbook_name "ceph"
  service_name "ceph-osd-all-starter"
  retries 0
end

[Sat, 21 Sep 2013 14:30:12 +] ERROR: Running exception handlers
[Sat, 21 Sep 2013 14:30:13 +] FATAL: Saving node information to 
/var/cache/chef/failed-run-data.json
[Sat, 21 Sep 2013 14:30:13 +] ERROR: Exception handlers complete
[Sat, 21 Sep 2013 14:30:13 +] ERROR: Chef::Exceptions::Exec: 
service[ceph_osd] (ceph::osd line 153) had an error: Chef::Exceptions::Exec: 
/sbin/start ceph-o$
[Sat, 21 Sep 2013 14:30:13 +] FATAL: Stacktrace dumped to 
/var/cache/chef/chef-stacktrace.out
[Sat, 21 Sep 2013 14:30:13 +] ERROR: Sleeping for 900 seconds before trying 
again

Also I checked out /var/log/upstart/ceph-osd-all-starter.log  and its reporting 
the following

ceph-disk: Error: ceph osd create failed: Command '['/usr/bin/ceph', 
'--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', 
'/var/lib/ceph/bootstra$
INFO:ceph-disk:Activating 
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.3e1a93b4-75ab-4c76-b325-64d88d1ba0fe
2013-09-21 14:30:12.793020 7f1bf4702700  0 librados: client.bootstrap-osd 
authentication error (1) Operation not permitted
Error connecting to cluster: PermissionError
ERROR:ceph-disk:Failed to activate
ceph-disk: Error: ceph osd create failed: Command '['/usr/bin/ceph', 
'--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', 
'/var/lib/ceph/bootstra$
ceph-disk: Error: One or more partitions failed to activate


Any suggestions how to track this issue down? Sounds like some sort of 
permissions issue

Regards,

Ian


Dell Corporation Limited is registered in England and Wales. Company 
Registration Number: 2081369
Registered address: Dell House, The Boulevard, Cain Road, Bracknell,  
Berkshire, RG12 1LF, UK.
Company details for other Dell UK entities can be found on  www.dell.co.uk.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph error after upgrade Argonaut to Bobtail to Cuttlefish

2013-10-11 Thread Wido den Hollander

On 10/11/2013 10:25 AM, Ansgar Jazdzewski wrote:

Hi,

i updated my cluster yesterday an all is gone well.
But Today i got an error i never seen before.

-
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.5 is active+clean+inconsistent, acting [9,4]
1 scrub errors
-

any idea to fix it?

after i did the up grade i created a new pool with a higher pg_num
(rbd_new 1024)

-
# ceph osd dump | grep rep\ size
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 3 'rbd_new' rep size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 2604 owner 0
-

but i guess this can cause the error?



No, since the PG is in the pool 'rbd'.

The PG number is always prefixed with the pool ID, so in this case pool 
2 is 'rbd'.


I recommend you try repairing the PG, see: 
http://ceph.com/docs/master/rados/operations/control/


Wido


Thanks for any help
Ansgar



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw s3API failed to authorize request

2013-10-11 Thread Matt Thompson
The documentation page at http://ceph.com/docs/master/radosgw/config/states:

Important Check the key output. Sometimes radosgw-admin generates a key
with an escape (\) character, and some clients do not know how to handle
escape characters. Remedies include removing the escape character (\),
encapsulating the string in quotes, or simply regenerating the key and
ensuring that it does not have an escape character.

Since your secret key does have a "\" in it, do you want to try
regenerating to see if that helps?

-Matt


On Fri, Oct 11, 2013 at 9:20 AM, lixuehui wrote:

> **
> Hi All:
> I installed gateway on my cluster. but always get 403 response:
>   for bucket in conn.get_all_buckets():
>
>  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 
> 387, in get_all_buckets
> response.status, response.reason, body)
> boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
>
> AccessDenied
>
> In fact,the user I've  defined the permission :
>
>
>   { "user_id": "johndoe",
>   "display_name": "John Doe",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "johndoe",
>   "access_key": "OEGPBGHD9DJRWVR3TYZC",
>   "secret_key": "639gPny\/AZN2CTYAy1BV5V4kfqRP3\/1GOikHgUni"}],
>   "swift_keys": [],
>   "caps": [
> { "type": "usage",
>   "perm": "*"},
> { "type": "user",
>   "perm": "*"}],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": []}
>
>
> and in the client the code is :
>
>  #!/usr/bin/env python2
> import boto
> import boto.s3.connection
> access_key='OEGPBGHD9DJRWVR3TYZC'
> secret_key='639gPny\/AZN2CTYAy1BV5V4kfqRP3\/1GOikHgUni'
> conn=boto.connect_s3(
> aws_access_key_id=access_key,
> aws_secret_access_key=secret_key,
> host="cephclient21.com",
> is_secure = False ,
> calling_format=boto.s3.connection.OrdinaryCallingFormat(),
> )
>  for bucket in conn.get_all_buckets():
> print "{name}\t{created}".format(
> name=bucket.name,
> created=bucket.creation_date,
> )
>
>The gateway info is :
>  2013-10-11 13:16:31.456348 7fcdf0073780 20 enqueued request req=0x154d760
> 2013-10-11 13:16:31.456436 7fcdf0073780 20 RGWWQ:
> 2013-10-11 13:16:31.456458 7fcdf0073780 20 req: 0x154d760
> 2013-10-11 13:16:31.456505 7fcdf0073780 10 allocated request req=0x154dfa0
> 2013-10-11 13:16:31.456561 7fcddcff9700 20 dequeued request req=0x154d760
> 2013-10-11 13:16:31.456633 7fcddcff9700 20 RGWWQ: empty
> 2013-10-11 13
> :16:31.456671 7fcddcff9700  1 == starting new request req=0x154d760 =
> 2013-10-11 13
> :16:31.456965 7fcddcff9700  2 req 4:0.000296::PUT 
> /my-new-bucket/::initializing
> 2013-10-11 13
> :16:31.457168 7fcddcff9700 10 s->object= s->bucket=my-new-bucket
> 2013-10-11 13:16:31.457205 7fcddcff9700 20 FCGI_ROLE=RESPONDER
> 2013-10-11 13:16:31.457217 7fcddcff9700 20 SCRIPT_URL=/my-new-bucket/
> 2013-10-11 13:16:31.457226 7fcddcff9700 20 SCRIPT_URI=
> http://ceph-client21/my-new-bucket/
>
> 2013-10-11 13:16:31.457235 7fcddcff9700 20 HTTP_AUTHORIZATION=AWS 
> OEGPBGHD9DJRWVR3TYZC:QjpQBiyGqQ+X3Hp6E0MTUeQSkXw=
> 2013-10-11 13:16:31.457246 7fcddcff9700 20 HTTP_HOST=ceph-client21
> 2013-10-11 13:16:31.457257 7fcddcff9700 20 HTTP_ACCEPT_ENCODING=identity
>
> 2013-10-11 13:16:31.457266 7fcddcff9700 20 HTTP_DATE=Fri, 11 Oct 2013 
> 05:15:35 GMT
> 2013-10-11 13:16:31.457275 7fcddcff9700 20 CONTENT_LENGTH=0
>
> 2013-10-11 13:16:31.457285 7fcddcff9700 20 HTTP_USER_AGENT=Boto/2.13.3 
> Python/2.7.3 Linux/3.5.0-23-generic
>
> 2013-10-11 13:16:31.457294 7fcddcff9700 20 PATH=/usr/local/bin:/usr/bin:/bin
> 2013-10-11 13:16:31.457303 7fcddcff9700 20 SERVER_SIGNATURE=
>
> 2013-10-11 13:16:31.457312 7fcddcff9700 20 SERVER_SOFTWARE=Apache/2.2.22 
> (Ubuntu)
> 2013-10-11 13:16:31.457321 7fcddcff9700 20 SERVER_NAME=ceph-client21
> 2013-10-11 13:16:31.457330 7fcddcff9700 20 SERVER_ADDR=192.168.50.115
> 2013-10-11 13:16:31.457339 7fcddcff9700 20 SERVER_PORT=80
> 2013-10-11 13:16:31.457348 7fcddcff9700 20 REMOTE_ADDR=192.168.50.105
> 2013-10-11 13:16:31.457357 7fcddcff9700 20 DOCUMENT_ROOT=/var/www
> 2013-10-11 13:16:31.457366 7fcddcff9700 20 SERVER_ADMIN=[no address given]
>
> 2013-10-11 13:16:31.457376 7fcddcff9700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
> 2013-10-11 13:16:31.457389 7fcddcff9700 20 REMOTE_PORT=38823
> 2013-10-11 13:16:31.457404 7fcddcff9700 20 GATEWAY_INTERFACE=CGI/1.1
> 2013-10-11 13:16:31.457420 7fcddcff9700 20 SERVER_PROTOCOL=HTTP/1.1
> 2013-10-11 13:16:31.457430 7fcddcff9700 20 REQUEST_METHOD=PUT
>
> 2013-10-11 13:16:31.457439 7fcddcff9700 20 
> QUERY_STRING=page=my-new-bucket¶ms=/
> 2013-10-11 13:16:31.457448 7fcddcff9700 20 REQUEST_URI=/my-new-bucket/
> 2013-10-11 13:16:31.457457 7fcddcff9700 20 SCRIPT_NAME=/my-new-bucket/
>
> 2013-10-11 13:16:31.457469 7fcddcff9700  2 req 4:0.000799:s3:PUT 
> /my-new-bucket/::getting o

[ceph-users] Ceph error after upgrade Argonaut to Bobtail to Cuttlefish

2013-10-11 Thread Ansgar Jazdzewski
Hi,

i updated my cluster yesterday an all is gone well.
But Today i got an error i never seen before.

-
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.5 is active+clean+inconsistent, acting [9,4]
1 scrub errors
-

any idea to fix it?

after i did the up grade i created a new pool with a higher pg_num (rbd_new
1024)

-
# ceph osd dump | grep rep\ size
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 3 'rbd_new' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last_change 2604 owner 0
-

but i guess this can cause the error?

Thanks for any help
Ansgar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw s3API failed to authorize request

2013-10-11 Thread lixuehui
Hi All:
I installed gateway on my cluster. but always get 403 response:
 for bucket in conn.get_all_buckets():
 File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 387, 
in get_all_buckets
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
AccessDenied

In fact,the user I've  defined the permission :
 
{ "user_id": "johndoe",
  "display_name": "John Doe",
  "email": "",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [],
  "keys": [
{ "user": "johndoe",
  "access_key": "OEGPBGHD9DJRWVR3TYZC",
  "secret_key": "639gPny\/AZN2CTYAy1BV5V4kfqRP3\/1GOikHgUni"}],
  "swift_keys": [],
  "caps": [
{ "type": "usage",
  "perm": "*"},
{ "type": "user",
  "perm": "*"}],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}

and in the client the code is :
#!/usr/bin/env python2
import boto
import boto.s3.connection 
access_key='OEGPBGHD9DJRWVR3TYZC'
secret_key='639gPny\/AZN2CTYAy1BV5V4kfqRP3\/1GOikHgUni'
conn=boto.connect_s3(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
host="cephclient21.com",
is_secure = False ,
calling_format=boto.s3.connection.OrdinaryCallingFormat(),
)
 for bucket in conn.get_all_buckets():
print "{name}\t{created}".format(
name=bucket.name,
created=bucket.creation_date,
)
   The gateway info is :
2013-10-11 13:16:31.456348 7fcdf0073780 20 enqueued request req=0x154d760
2013-10-11 13:16:31.456436 7fcdf0073780 20 RGWWQ:
2013-10-11 13:16:31.456458 7fcdf0073780 20 req: 0x154d760
2013-10-11 13:16:31.456505 7fcdf0073780 10 allocated request req=0x154dfa0
2013-10-11 13:16:31.456561 7fcddcff9700 20 dequeued request req=0x154d760
2013-10-11 13:16:31.456633 7fcddcff9700 20 RGWWQ: empty
2013-10-11 13:16:31.456671 7fcddcff9700  1 == starting new request 
req=0x154d760 =
2013-10-11 13:16:31.456965 7fcddcff9700  2 req 4:0.000296::PUT 
/my-new-bucket/::initializing
2013-10-11 13:16:31.457168 7fcddcff9700 10 s->object= 
s->bucket=my-new-bucket
2013-10-11 13:16:31.457205 7fcddcff9700 20 FCGI_ROLE=RESPONDER
2013-10-11 13:16:31.457217 7fcddcff9700 20 SCRIPT_URL=/my-new-bucket/
2013-10-11 13:16:31.457226 7fcddcff9700 20 
SCRIPT_URI=http://ceph-client21/my-new-bucket/
2013-10-11 13:16:31.457235 7fcddcff9700 20 HTTP_AUTHORIZATION=AWS 
OEGPBGHD9DJRWVR3TYZC:QjpQBiyGqQ+X3Hp6E0MTUeQSkXw=
2013-10-11 13:16:31.457246 7fcddcff9700 20 HTTP_HOST=ceph-client21
2013-10-11 13:16:31.457257 7fcddcff9700 20 HTTP_ACCEPT_ENCODING=identity
2013-10-11 13:16:31.457266 7fcddcff9700 20 HTTP_DATE=Fri, 11 Oct 2013 05:15:35 
GMT
2013-10-11 13:16:31.457275 7fcddcff9700 20 CONTENT_LENGTH=0
2013-10-11 13:16:31.457285 7fcddcff9700 20 HTTP_USER_AGENT=Boto/2.13.3 
Python/2.7.3 Linux/3.5.0-23-generic
2013-10-11 13:16:31.457294 7fcddcff9700 20 PATH=/usr/local/bin:/usr/bin:/bin
2013-10-11 13:16:31.457303 7fcddcff9700 20 SERVER_SIGNATURE=
2013-10-11 13:16:31.457312 7fcddcff9700 20 SERVER_SOFTWARE=Apache/2.2.22 
(Ubuntu)
2013-10-11 13:16:31.457321 7fcddcff9700 20 SERVER_NAME=ceph-client21
2013-10-11 13:16:31.457330 7fcddcff9700 20 SERVER_ADDR=192.168.50.115
2013-10-11 13:16:31.457339 7fcddcff9700 20 SERVER_PORT=80
2013-10-11 13:16:31.457348 7fcddcff9700 20 REMOTE_ADDR=192.168.50.105
2013-10-11 13:16:31.457357 7fcddcff9700 20 DOCUMENT_ROOT=/var/www
2013-10-11 13:16:31.457366 7fcddcff9700 20 SERVER_ADMIN=[no address given]
2013-10-11 13:16:31.457376 7fcddcff9700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
2013-10-11 13:16:31.457389 7fcddcff9700 20 REMOTE_PORT=38823
2013-10-11 13:16:31.457404 7fcddcff9700 20 GATEWAY_INTERFACE=CGI/1.1
2013-10-11 13:16:31.457420 7fcddcff9700 20 SERVER_PROTOCOL=HTTP/1.1
2013-10-11 13:16:31.457430 7fcddcff9700 20 REQUEST_METHOD=PUT
2013-10-11 13:16:31.457439 7fcddcff9700 20 
QUERY_STRING=page=my-new-bucket¶ms=/
2013-10-11 13:16:31.457448 7fcddcff9700 20 REQUEST_URI=/my-new-bucket/
2013-10-11 13:16:31.457457 7fcddcff9700 20 SCRIPT_NAME=/my-new-bucket/
2013-10-11 13:16:31.457469 7fcddcff9700  2 req 4:0.000799:s3:PUT 
/my-new-bucket/::getting op
2013-10-11 13:16:31.457504 7fcddcff9700  2 req 4:0.000835:s3:PUT 
/my-new-bucket/:create_bucket:authorizing
2013-10-11 13:16:31.457594 7fcddcff9700 20 get_obj_state: rctx=0x7fcd80009c50 
obj=.users:OEGPBGHD9DJRWVR3TYZC state=0x7fcd80009d18 s->prefetch_data=0
2013-10-11 13:16:31.457651 7fcddcff9700 10 moving .users+OEGPBGHD9DJRWVR3TYZC 
to cache LRU end
2013-10-11 13:16:31.457671 7fcddcff9700 10 cache get: 
name=.users+OEGPBGHD9DJRWVR3TYZC : type miss (requested=6, cached=3)
2013-10-11 13:16:31.464221 7fcddcff9700 10 cache put: 
name=.users+OEGPBGHD9DJRWVR3TYZC
2013-10-11 13:16:31.464242 7fcddcff9700 10 moving .users+OEGPBGHD9DJRWVR3TYZC 
to cache LRU end
2013-10-11 13:16:31.464276 7fcddcff9700 20 get_obj_state: s->obj_tag was set 
empty
2013-10-11 13:16:31.464303 7fcddcff9700 10 moving .users+

Re: [ceph-users] cephforum.com

2013-10-11 Thread Darren Birkett
Hi,

I'd have to say in general I agree with the other responders.  Not really
for reasons of preferring a ML over a forum necessarily, but just because
the ML already exists.  One of the biggest challenges for anyone new coming
in to an open source project such as ceph is availability of information
and documentation.  Having this information in as few places as possible,
rather than sprawling over a lot of different formats and locations, makes
it easier to find what you need and to know where to go when you want to
ask a question.

- Darren


On 11 October 2013 00:16, Joao Eduardo Luis  wrote:

> On 10/10/2013 09:55 PM, Wido den Hollander wrote:
>
>> On 10/10/2013 10:49 PM, ja...@peacon.co.uk wrote:
>>
>>> Hello!
>>>
>>> Anyone else think a web forum for ceph could work?  I'm thinking simple
>>> vbulletin or phpBB site.
>>>
>>> To me it seems this would increase accessibility to the great info
>>> (&minds) on here... but obviously it would need those great minds to
>>> work :)
>>>
>>>
>> Well, I'm not sure. Forums are nice, but for technical discussions they
>> most of the time don't work that well.
>>
>> The problem imho usually is that they distract a lot from the technical
>> discussion with all the footers, banners, smilies, etc, etc.
>>
>> Another thing is that most people in the community already have a hard
>> time keeping up the the dev and users mailinglist, so adding another
>> channel of information would make it even harder to keep up.
>>
>> Following up on that you get the problem that you suddenly have multiple
>> channels:
>> - mailinglists
>> - irc
>> - forum
>>
>> So information becomes decentralized and harder to find for people.
>>
>> I'd personally prefer to stick to the mailinglist and IRC.
>>
>
> I'm with Wido.
>
> For a user-facing experience, where you ask a question about something
> ailing you and get an answer, stackoverflow (or the likes) work pretty well
> as it is, with the added benefit that the right answers get upvoted.  I
> however am not sure if it would work that well for Ceph, where answers may
> not be that straightforward.
>
> For technical discussions, I believe the lists tend to be the best format
> to address them.  Besides, as someone who already follows both ceph-users
> and ceph-devel, both irc channels, and the tracker, I feel that following
> an additional forum as well would impose an extra overhead to how we
> interact.  And we should keep in mind that most questions that would end up
> being asked in the forums would have already been answered on the mailing
> lists (which are archived btw), or would end up duplicating things that are
> under current discussion.
>
>
>   -Joao
>
> --
> Joao Eduardo Luis
> Software Engineer | http://inktank.com | http://ceph.com
>
> __**_
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com