Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
Your solution of pre-pending the environment name to the bucket, was
my first choice, but at the moment I can't ask the devs to change the
code to do that. For now I have to stick with the zones solution.
Should I follow the federated zones docs
(http://ceph.com/docs/master/radosgw/federated-config/) but skip the
sync step?

Thank you,

Marco Garcês

On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote:
 You could setup dedicated zones for each environment, and not
 replicate between them.

 Each zone would have it's own URL, but you would be able to re-use
 usernames and bucket names.  If different URLs are a problem, you
 might be able to get around that in the load balancer or the web
 servers.  I wouldn't really recommend that, but it's possible.


 I have a similar requirement.  I was able to pre-pending the
 environment name to the bucket in my client code, which made things
 much easier.


 On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote:
 Hi there,

 I have this situation, where I'm using the same Ceph cluster (with
 radosgw), for two different environments, QUAL and PRE-PRODUCTION.

 I need different users for each environment, but I need to create the
 same buckets, with the same name; I understand there is no way to have
 2 buckets with the same name, but how can I go around this? Perhaps
 creating a different pool for each user?

 Can you help me? Thank you in advance, my best regards,

 Marco Garcês
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Luca Mazzaferro

Dear Users,
I'm quite new to CEPH.
I completed the tutorial here:
http://ceph.com/docs/giant/start/quick-ceph-deploy

After it, I turned off the VMs where the OSDs, Monitors and MDS were.

This morning I restarted the machines but the OSD don't want to restart, 
while the other

services restarted without any problems.
On one node:
[root@ceph-node1 ~]# service ceph status
=== mon.ceph-node1 ===
mon.ceph-node1: running {version:0.80.7}
=== osd.2 ===
osd.2: not running.
=== mds.ceph-node1 ===
mds.ceph-node1: running {version:0.80.7}

[root@ceph-node1 ~]# /etc/init.d/ceph -a start osd.2
=== osd.2 ===
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.2 
--keyring=/var/lib/ceph/osd/ceph-2/keyring osd crush create-or-move -- 2 
0.01 host=ceph-node1 root=default'


This is happening on all the machines.

On the admin-node side the ceph healt command or the ceph -w hangs forever.

The logs files don't show any problem, the last line on the OSDs is from 
yesterday.


Could anyone help me to solve this problem?
Thank you.
Cheers.

   Luca

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Antonio Messina
On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro
luca.mazzafe...@rzg.mpg.de wrote:
 Dear Users,

Hi Luca,

 On the admin-node side the ceph healt command or the ceph -w hangs forever.

I'm not a ceph expert either, but this is usually an indication that
the monitors are not running.

How many MONs are you running? Are they all alive? What's in the mon
logs? Also check the time on the mon nodes.

cheers,
Antonio

-- 
antonio.s.mess...@gmail.com
antonio.mess...@uzh.ch +41 (0)44 635 42 22
S3IT: Service and Support for Science IT   http://www.s3it.uzh.ch/
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich Switzerland
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
Hello,

While working at a customer I've ran into a 10GbE latency which seems
high to me.

I have access to a couple of Ceph cluster and I ran a simple ping test:

$ ping -s 8192 -c 100 -n ip

Two results I got:

rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

Both these environment are running with Intel 82599ES 10Gbit cards in
LACP. One with Extreme Networks switches, the other with Arista.

Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
seeing:

rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

As you can see, the Cisco Nexus network has high latency compared to the
other setup.

You would say the switches are to blame, but we also tried with a direct
TwinAx connection, but that didn't help.

This setup also uses the Intel 82599ES cards, so the cards don't seem to
be the problem.

The MTU is set to 9000 on all these networks and cards.

I was wondering, others with a Ceph cluster running on 10GbE, could you
perform a simple network latency test like this? I'd like to compare the
results.

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Hello Cephers,
Recently we observed a couple of inconsistencies in our Ceph cluster, there 
were two major patterns leading to inconsistency as I observed: 1) EIO to read 
the file, 2) the digest is inconsistent (for EC) even there is no read error).

While ceph has built-in tool sets to repair the inconsistencies, I also would 
like to check with the community in terms of what is the best ways to handle 
such issues (e.g. should we run fsck / xfs_repair when such issue happens).

In more details, I have the following questions:
1. When there is inconsistency detected, what is the chance there is some 
hardware issues which need to be repaired physically, or should I run some 
disk/filesystem tools to further check?
2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we 
solely relay on Ceph's repair tool sets?

It would be great to hear you experience and suggestions.

BTW, we are using XFS in the cluster.

Thanks,
Guang 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Dan van der Ster
Between two hosts on an HP Procurve 6600, no jumbo frames:

rtt min/avg/max/mdev = 0.096/0.128/0.151/0.019 ms

Cheers, Dan

On Thu Nov 06 2014 at 2:19:07 PM Wido den Hollander w...@42on.com wrote:

 Hello,

 While working at a customer I've ran into a 10GbE latency which seems
 high to me.

 I have access to a couple of Ceph cluster and I ran a simple ping test:

 $ ping -s 8192 -c 100 -n ip

 Two results I got:

 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

 Both these environment are running with Intel 82599ES 10Gbit cards in
 LACP. One with Extreme Networks switches, the other with Arista.

 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
 seeing:

 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

 As you can see, the Cisco Nexus network has high latency compared to the
 other setup.

 You would say the switches are to blame, but we also tried with a direct
 TwinAx connection, but that didn't help.

 This setup also uses the Intel 82599ES cards, so the cards don't seem to
 be the problem.

 The MTU is set to 9000 on all these networks and cards.

 I was wondering, others with a Ceph cluster running on 10GbE, could you
 perform a simple network latency test like this? I'd like to compare the
 results.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
Hi,
I've only ever seen (1), EIO to read a file. In this case I've always just
killed / formatted / replaced that OSD completely -- that moves the PG to a
new master and the new replication fixes the inconsistency. This way,
I've never had to pg repair. I don't know if this is a best or even good
practise, but it works for us.
Cheers, Dan

On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.com wrote:

 Hello Cephers,
 Recently we observed a couple of inconsistencies in our Ceph cluster,
 there were two major patterns leading to inconsistency as I observed: 1)
 EIO to read the file, 2) the digest is inconsistent (for EC) even there is
 no read error).

 While ceph has built-in tool sets to repair the inconsistencies, I also
 would like to check with the community in terms of what is the best ways to
 handle such issues (e.g. should we run fsck / xfs_repair when such issue
 happens).

 In more details, I have the following questions:
 1. When there is inconsistency detected, what is the chance there is some
 hardware issues which need to be repaired physically, or should I run some
 disk/filesystem tools to further check?
 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should
 we solely relay on Ceph's repair tool sets?

 It would be great to hear you experience and suggestions.

 BTW, we are using XFS in the cluster.

 Thanks,
 Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Luis Periquito
Hi Wido,

What is the full topology? Are you using a north-south or east-west? So far
I've seen the east-west are slightly slower. What are the fabric modes you
have configured? How is everything connected? Also you have no information
on the OS - if I remember correctly there was a lot of improvements in the
latest kernels...

And what about the bandwith?

The values you present don't seem awfully high, and the deviation seems low.

On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote:

 Hello,

 While working at a customer I've ran into a 10GbE latency which seems
 high to me.

 I have access to a couple of Ceph cluster and I ran a simple ping test:

 $ ping -s 8192 -c 100 -n ip

 Two results I got:

 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

 Both these environment are running with Intel 82599ES 10Gbit cards in
 LACP. One with Extreme Networks switches, the other with Arista.

 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
 seeing:

 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

 As you can see, the Cisco Nexus network has high latency compared to the
 other setup.

 You would say the switches are to blame, but we also tried with a direct
 TwinAx connection, but that didn't help.

 This setup also uses the Intel 82599ES cards, so the cards don't seem to
 be the problem.

 The MTU is set to 9000 on all these networks and cards.

 I was wondering, others with a Ceph cluster running on 10GbE, could you
 perform a simple network latency test like this? I'd like to compare the
 results.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Luca Mazzaferro

On 11/06/2014 12:36 PM, Antonio Messina wrote:

On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro
luca.mazzafe...@rzg.mpg.de wrote:

Dear Users,

Hi Luca,


On the admin-node side the ceph healt command or the ceph -w hangs forever.

I'm not a ceph expert either, but this is usually an indication that
the monitors are not running.

How many MONs are you running? Are they all alive? What's in the mon
logs? Also check the time on the mon nodes.

cheers,
Antonio


Ciao Antonio,
thank you very much for your answer.

I'm running 3 MONs and they are all alive.

The logs doesn't shows any problem that I can recognize.
This is a section after a restart from the initial monitor:

2014-11-06 14:31:36.795298 7fb66e4867a0  0 ceph version 0.80.7 
(6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-mon, pid 28050
2014-11-06 14:31:36.860884 7fb66e4867a0  0 starting mon.ceph-node1 rank 
0 at 192.168.122.21:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-node1 
fsid 62e03428-0c4a-4ede-be18-c2cfed10639d
2014-11-06 14:31:36.861383 7fb66e4867a0  1 mon.ceph-node1@-1(probing) e3 
preinit fsid 62e03428-0c4a-4ede-be18-c2cfed10639d
2014-11-06 14:31:36.862614 7fb66e4867a0  1 
mon.ceph-node1@-1(probing).paxosservice(pgmap 1..218) refresh upgraded, 
format 0 - 1
2014-11-06 14:31:36.862666 7fb66e4867a0  1 mon.ceph-node1@-1(probing).pg 
v0 on_upgrade discarding in-core PGMap
2014-11-06 14:31:36.866958 7fb66e4867a0  0 
mon.ceph-node1@-1(probing).mds e4 print_map

epoch4
flags0
created2014-11-04 12:30:56.224692
modified2014-11-05 13:00:53.377356
tableserver0
root0
session_timeout60
session_autoclose300
max_file_size1099511627776
last_failure0
last_failure_osd_epoch0
compatcompat={},rocompat={},incompat={1=base v0.20,2=client 
writeable ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap}

max_mds1
in0
up{0=4243}
failed
stopped
data_pools0
metadata_pool1
inline_datadisabled
4243:192.168.122.21:6805/28039 'ceph-node1' mds.0.1 up:active seq 2

2014-11-06 14:31:36.867144 7fb66e4867a0  0 
mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, 
adjusting msgr requires
2014-11-06 14:31:36.867155 7fb66e4867a0  0 
mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, 
adjusting msgr requires
2014-11-06 14:31:36.867157 7fb66e4867a0  0 
mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, 
adjusting msgr requires
2014-11-06 14:31:36.867159 7fb66e4867a0  0 
mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, 
adjusting msgr requires
2014-11-06 14:31:36.867850 7fb66e4867a0  1 
mon.ceph-node1@-1(probing).paxosservice(auth 1..37) refresh upgraded, 
format 0 - 1
2014-11-06 14:31:36.868898 7fb66e4867a0  0 mon.ceph-node1@-1(probing) 
e3  my rank is now 0 (was -1)
2014-11-06 14:31:36.869655 7fb666410700  0 -- 192.168.122.21:6789/0  
192.168.122.22:6789/0 pipe(0x2b18a00 sd=22 :0 s=1 pgs=0 cs=0 l=0 
c=0x2950c60).fault
2014-11-06 14:31:36.869817 7fb66630f700  0 -- 192.168.122.21:6789/0  
192.168.122.23:6789/0 pipe(0x2b19680 sd=21 :0 s=1 pgs=0 cs=0 l=0 
c=0x29518c0).fault
2014-11-06 14:31:52.224266 7fb66580d700  0 -- 192.168.122.21:6789/0  
192.168.122.22:6789/0 pipe(0x2b1be80 sd=23 :6789 s=0 pgs=0 cs=0 l=0 
c=0x2951b80).accept connect_seq 0 vs existing 0 state connecting
2014-11-06 14:31:57.987230 7fb66570c700  0 -- 192.168.122.21:6789/0  
192.168.122.23:6789/0 pipe(0x2b1d280 sd=24 :6789 s=0 pgs=0 cs=0 l=0 
c=0x2951ce0).accept connect_seq 0 vs existing 0 state connecting
2014-11-06 14:32:36.868421 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796
2014-11-06 14:32:36.868739 7fb668213700  0 log [WRN] : reached 
concerning levels of available space on local monitor storage (20% free)
2014-11-06 14:33:36.869029 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796
2014-11-06 14:34:36.869285 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796
2014-11-06 14:35:36.869588 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796
2014-11-06 14:36:36.869910 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796
2014-11-06 14:37:36.870395 7fb668213700  0 
mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 
8563152 used 6364364 avail 1763796



Instead from my admin node waiting for about 5 minutes I got this:
[rzgceph@admin-node my-cluster]$ ceph -s
2014-11-06 12:18:43.723751 7f3f5d645700  0 monclient(hunting): 
authenticate timed out after 300
2014-11-06 12:18:43.723848 7f3f5d645700  0 librados: client.admin 
authentication error (110) Connection timed out


Which leads me to this discussion:


Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? 
Not an filesystem expert here, but would like to understand the underlying what 
happened behind the EIO and does that reveal something (e.g. hardware issue).

In our case, we are using 6TB drive so that there are lot of data to migrate 
and as backfilling/recovering bring latency increasing, we hope to avoid that 
as much as we can..

Thanks,
Guang


 From: daniel.vanders...@cern.ch 
 Date: Thu, 6 Nov 2014 13:36:46 + 
 Subject: Re: PG inconsistency 
 To: yguan...@outlook.com; ceph-users@lists.ceph.com 
 
 Hi, 
 I've only ever seen (1), EIO to read a file. In this case I've always 
 just killed / formatted / replaced that OSD completely -- that moves 
 the PG to a new master and the new replication fixes the 
 inconsistency. This way, I've never had to pg repair. I don't know if 
 this is a best or even good practise, but it works for us. 
 Cheers, Dan 
 
 On Thu Nov 06 2014 at 2:24:32 PM GuangYang 
 yguan...@outlook.commailto:yguan...@outlook.com wrote: 
 Hello Cephers, 
 Recently we observed a couple of inconsistencies in our Ceph cluster, 
 there were two major patterns leading to inconsistency as I observed: 
 1) EIO to read the file, 2) the digest is inconsistent (for EC) even 
 there is no read error). 
 
 While ceph has built-in tool sets to repair the inconsistencies, I also 
 would like to check with the community in terms of what is the best 
 ways to handle such issues (e.g. should we run fsck / xfs_repair when 
 such issue happens). 
 
 In more details, I have the following questions: 
 1. When there is inconsistency detected, what is the chance there is 
 some hardware issues which need to be repaired physically, or should I 
 run some disk/filesystem tools to further check? 
 2. Should we use fsck / xfs_repair to fix the inconsistencies, or 
 should we solely relay on Ceph's repair tool sets? 
 
 It would be great to hear you experience and suggestions. 
 
 BTW, we are using XFS in the cluster. 
 
 Thanks, 
 Guang 
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
What is your version of the ceph?
0.80.0 - 0.80.3
https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b

Thu Nov 06 2014 at 16:24:21, GuangYang yguan...@outlook.com:

 Hello Cephers,
 Recently we observed a couple of inconsistencies in our Ceph cluster,
 there were two major patterns leading to inconsistency as I observed: 1)
 EIO to read the file, 2) the digest is inconsistent (for EC) even there is
 no read error).

 While ceph has built-in tool sets to repair the inconsistencies, I also
 would like to check with the community in terms of what is the best ways to
 handle such issues (e.g. should we run fsck / xfs_repair when such issue
 happens).

 In more details, I have the following questions:
 1. When there is inconsistency detected, what is the chance there is some
 hardware issues which need to be repaired physically, or should I run some
 disk/filesystem tools to further check?
 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should
 we solely relay on Ceph's repair tool sets?

 It would be great to hear you experience and suggestions.

 BTW, we are using XFS in the cluster.

 Thanks,
 Guang
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
We are using v0.80.4. Just would like to ask for general suggestion here :)

Thanks,
Guang


 From: malm...@gmail.com 
 Date: Thu, 6 Nov 2014 13:46:12 + 
 Subject: Re: [ceph-users] PG inconsistency 
 To: yguan...@outlook.com; ceph-de...@vger.kernel.org; 
 ceph-users@lists.ceph.com 
 
 What is your version of the ceph? 
 0.80.0 - 0.80.3 
 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b 
 
 Thu Nov 06 2014 at 16:24:21, GuangYang 
 yguan...@outlook.commailto:yguan...@outlook.com: 
 Hello Cephers, 
 Recently we observed a couple of inconsistencies in our Ceph cluster, 
 there were two major patterns leading to inconsistency as I observed: 
 1) EIO to read the file, 2) the digest is inconsistent (for EC) even 
 there is no read error). 
 
 While ceph has built-in tool sets to repair the inconsistencies, I also 
 would like to check with the community in terms of what is the best 
 ways to handle such issues (e.g. should we run fsck / xfs_repair when 
 such issue happens). 
 
 In more details, I have the following questions: 
 1. When there is inconsistency detected, what is the chance there is 
 some hardware issues which need to be repaired physically, or should I 
 run some disk/filesystem tools to further check? 
 2. Should we use fsck / xfs_repair to fix the inconsistencies, or 
 should we solely relay on Ceph's repair tool sets? 
 
 It would be great to hear you experience and suggestions. 
 
 BTW, we are using XFS in the cluster. 
 
 Thanks, 
 Guang 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
Thu Nov 06 2014 at 16:44:09, GuangYang yguan...@outlook.com:

 Thanks Dan. By killed/formatted/replaced the OSD, did you replace the
 disk? Not an filesystem expert here, but would like to understand the
 underlying what happened behind the EIO and does that reveal something
 (e.g. hardware issue).

 In our case, we are using 6TB drive so that there are lot of data to
 migrate and as backfilling/recovering bring latency increasing, we hope to
 avoid that as much as we can..


For example, use the following parameters:
osd_recovery_delay_start = 10
osd recovery op priority = 2
osd max backfills = 1
osd recovery max active =1
osd recovery threads = 1




 Thanks,
 Guang

 
  From: daniel.vanders...@cern.ch
  Date: Thu, 6 Nov 2014 13:36:46 +
  Subject: Re: PG inconsistency
  To: yguan...@outlook.com; ceph-users@lists.ceph.com
 
  Hi,
  I've only ever seen (1), EIO to read a file. In this case I've always
  just killed / formatted / replaced that OSD completely -- that moves
  the PG to a new master and the new replication fixes the
  inconsistency. This way, I've never had to pg repair. I don't know if
  this is a best or even good practise, but it works for us.
  Cheers, Dan
 
  On Thu Nov 06 2014 at 2:24:32 PM GuangYang
  yguan...@outlook.commailto:yguan...@outlook.com wrote:
  Hello Cephers,
  Recently we observed a couple of inconsistencies in our Ceph cluster,
  there were two major patterns leading to inconsistency as I observed:
  1) EIO to read the file, 2) the digest is inconsistent (for EC) even
  there is no read error).
 
  While ceph has built-in tool sets to repair the inconsistencies, I also
  would like to check with the community in terms of what is the best
  ways to handle such issues (e.g. should we run fsck / xfs_repair when
  such issue happens).
 
  In more details, I have the following questions:
  1. When there is inconsistency detected, what is the chance there is
  some hardware issues which need to be repaired physically, or should I
  run some disk/filesystem tools to further check?
  2. Should we use fsck / xfs_repair to fix the inconsistencies, or
  should we solely relay on Ceph's repair tool sets?
 
  It would be great to hear you experience and suggestions.
 
  BTW, we are using XFS in the cluster.
 
  Thanks,
  Guang

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread German Anders


also, between two hosts on a NetGear SW model at 10GbE:

rtt min/avg/max/mdev = 0.104/0.196/0.288/0.055 ms



German Anders



















--- Original message ---
Asunto: [ceph-users] Typical 10GbE latency
De: Wido den Hollander w...@42on.com
Para: ceph-us...@ceph.com
Fecha: Thursday, 06/11/2014 10:18

Hello,

While working at a customer I've ran into a 10GbE latency which seems
high to me.

I have access to a couple of Ceph cluster and I ran a simple ping 
test:


$ ping -s 8192 -c 100 -n ip

Two results I got:

rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

Both these environment are running with Intel 82599ES 10Gbit cards in
LACP. One with Extreme Networks switches, the other with Arista.

Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches 
I'm

seeing:

rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

As you can see, the Cisco Nexus network has high latency compared to 
the

other setup.

You would say the switches are to blame, but we also tried with a 
direct

TwinAx connection, but that didn't help.

This setup also uses the Intel 82599ES cards, so the cards don't seem 
to

be the problem.

The MTU is set to 9000 on all these networks and cards.

I was wondering, others with a Ceph cluster running on 10GbE, could 
you
perform a simple network latency test like this? I'd like to compare 
the

results.

--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
IIRC, the EIO we had also correlated with a SMART status that showed the
disk was bad enough for a warranty replacement -- so yes, I replaced the
disk in these cases.

Cheers, Dan

On Thu Nov 06 2014 at 2:44:08 PM GuangYang yguan...@outlook.com wrote:

 Thanks Dan. By killed/formatted/replaced the OSD, did you replace the
 disk? Not an filesystem expert here, but would like to understand the
 underlying what happened behind the EIO and does that reveal something
 (e.g. hardware issue).

 In our case, we are using 6TB drive so that there are lot of data to
 migrate and as backfilling/recovering bring latency increasing, we hope to
 avoid that as much as we can..

 Thanks,
 Guang

 
  From: daniel.vanders...@cern.ch
  Date: Thu, 6 Nov 2014 13:36:46 +
  Subject: Re: PG inconsistency
  To: yguan...@outlook.com; ceph-users@lists.ceph.com
 
  Hi,
  I've only ever seen (1), EIO to read a file. In this case I've always
  just killed / formatted / replaced that OSD completely -- that moves
  the PG to a new master and the new replication fixes the
  inconsistency. This way, I've never had to pg repair. I don't know if
  this is a best or even good practise, but it works for us.
  Cheers, Dan
 
  On Thu Nov 06 2014 at 2:24:32 PM GuangYang
  yguan...@outlook.commailto:yguan...@outlook.com wrote:
  Hello Cephers,
  Recently we observed a couple of inconsistencies in our Ceph cluster,
  there were two major patterns leading to inconsistency as I observed:
  1) EIO to read the file, 2) the digest is inconsistent (for EC) even
  there is no read error).
 
  While ceph has built-in tool sets to repair the inconsistencies, I also
  would like to check with the community in terms of what is the best
  ways to handle such issues (e.g. should we run fsck / xfs_repair when
  such issue happens).
 
  In more details, I have the following questions:
  1. When there is inconsistency detected, what is the chance there is
  some hardware issues which need to be repaired physically, or should I
  run some disk/filesystem tools to further check?
  2. Should we use fsck / xfs_repair to fix the inconsistencies, or
  should we solely relay on Ceph's repair tool sets?
 
  It would be great to hear you experience and suggestions.
 
  BTW, we are using XFS in the cluster.
 
  Thanks,
  Guang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
On 11/06/2014 02:38 PM, Luis Periquito wrote:
 Hi Wido,
 
 What is the full topology? Are you using a north-south or east-west? So far
 I've seen the east-west are slightly slower. What are the fabric modes you
 have configured? How is everything connected? Also you have no information
 on the OS - if I remember correctly there was a lot of improvements in the
 latest kernels...

The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are
two 7000 units and 8 3000s spread out over 4 racks.

But the test I did was with two hosts connected to the same Nexus 3000
switch using TwinAx cabling of 3m.

The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but
that didn't make a difference.

 
 And what about the bandwith?
 

Just fine, no problems getting 10Gbit through the NICs.

 The values you present don't seem awfully high, and the deviation seems low.
 

No, they don't seem high, but they are about 40% higher then the values
I see on other environments. 40% is a lot.

This Ceph cluster is SSD-only, so the lower the latency, the more IOps
the system can do.

Wido

 On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote:
 
 Hello,

 While working at a customer I've ran into a 10GbE latency which seems
 high to me.

 I have access to a couple of Ceph cluster and I ran a simple ping test:

 $ ping -s 8192 -c 100 -n ip

 Two results I got:

 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

 Both these environment are running with Intel 82599ES 10Gbit cards in
 LACP. One with Extreme Networks switches, the other with Arista.

 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
 seeing:

 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

 As you can see, the Cisco Nexus network has high latency compared to the
 other setup.

 You would say the switches are to blame, but we also tried with a direct
 TwinAx connection, but that didn't help.

 This setup also uses the Intel 82599ES cards, so the cards don't seem to
 be the problem.

 The MTU is set to 9000 on all these networks and cards.

 I was wondering, others with a Ceph cluster running on 10GbE, could you
 perform a simple network latency test like this? I'd like to compare the
 results.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
By the way,
Is it possible to run 2 radosgw on the same host?

I think I have created the zone, not sure if it was correct, because
it used the default pool names, even though I had changed them in the
json file I had provided.

Now I am trying to run ceph-radosgw with two different entries in the
ceph.conf file, but without sucess. Example:

[client.radosgw.gw]
host = GATEWAY
keyring = /etc/ceph/keyring.radosgw.gw
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw print continue = false
rgw dns name = gateway.local
rgw enable ops log = false
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
rgw cache lru size = 15000
rgw thread pool size = 2048

#[client.radosgw.gw.env2]
#host = GATEWAY
#keyring = /etc/ceph/keyring.radosgw.gw
#rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock
#log file = /var/log/ceph/client.env2.radosgw.gateway.log
#rgw print continue = false
#rgw dns name = cephppr.local
#rgw enable ops log = false
#rgw enable usage log = true
#rgw usage log tick interval = 30
#rgw usage log flush threshold = 1024
#rgw usage max shards = 32
#rgw usage max user shards = 1
#rgw cache lru size = 15000
#rgw thread pool size = 2048
#rgw zone = ppr

It fails to create the socket:
2014-11-06 15:39:08.862364 7f80cc670880  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930
2014-11-06 15:39:08.870429 7f80cc670880  0 librados:
client.radosgw.gw.env2 authentication error (1) Operation not
permitted
2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage
provider (RADOS)


What am I doing wrong?

Marco Garcês
#sysadmin
Maputo - Mozambique
[Skype] marcogarces


On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote:
 Your solution of pre-pending the environment name to the bucket, was
 my first choice, but at the moment I can't ask the devs to change the
 code to do that. For now I have to stick with the zones solution.
 Should I follow the federated zones docs
 (http://ceph.com/docs/master/radosgw/federated-config/) but skip the
 sync step?

 Thank you,

 Marco Garcês

 On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote:
 You could setup dedicated zones for each environment, and not
 replicate between them.

 Each zone would have it's own URL, but you would be able to re-use
 usernames and bucket names.  If different URLs are a problem, you
 might be able to get around that in the load balancer or the web
 servers.  I wouldn't really recommend that, but it's possible.


 I have a similar requirement.  I was able to pre-pending the
 environment name to the bucket in my client code, which made things
 much easier.


 On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote:
 Hi there,

 I have this situation, where I'm using the same Ceph cluster (with
 radosgw), for two different environments, QUAL and PRE-PRODUCTION.

 I need different users for each environment, but I need to create the
 same buckets, with the same name; I understand there is no way to have
 2 buckets with the same name, but how can I go around this? Perhaps
 creating a different pool for each user?

 Can you help me? Thank you in advance, my best regards,

 Marco Garcês
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Udo Lembke
Hi,
from one host to five OSD-hosts.

NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network).

rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms
rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms
rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms
rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms
rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms


Udo

Am 06.11.2014 14:18, schrieb Wido den Hollander:
 Hello,
 
 While working at a customer I've ran into a 10GbE latency which seems
 high to me.
 
 I have access to a couple of Ceph cluster and I ran a simple ping test:
 
 $ ping -s 8192 -c 100 -n ip
 
 Two results I got:
 
 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
 
 Both these environment are running with Intel 82599ES 10Gbit cards in
 LACP. One with Extreme Networks switches, the other with Arista.
 
 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
 seeing:
 
 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
 
 As you can see, the Cisco Nexus network has high latency compared to the
 other setup.
 
 You would say the switches are to blame, but we also tried with a direct
 TwinAx connection, but that didn't help.
 
 This setup also uses the Intel 82599ES cards, so the cards don't seem to
 be the problem.
 
 The MTU is set to 9000 on all these networks and cards.
 
 I was wondering, others with a Ceph cluster running on 10GbE, could you
 perform a simple network latency test like this? I'd like to compare the
 results.
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Luis Periquito
What is the COPP?

On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote:

 On 11/06/2014 02:38 PM, Luis Periquito wrote:
  Hi Wido,
 
  What is the full topology? Are you using a north-south or east-west? So
 far
  I've seen the east-west are slightly slower. What are the fabric modes
 you
  have configured? How is everything connected? Also you have no
 information
  on the OS - if I remember correctly there was a lot of improvements in
 the
  latest kernels...

 The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are
 two 7000 units and 8 3000s spread out over 4 racks.

 But the test I did was with two hosts connected to the same Nexus 3000
 switch using TwinAx cabling of 3m.

 The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but
 that didn't make a difference.

 
  And what about the bandwith?
 

 Just fine, no problems getting 10Gbit through the NICs.

  The values you present don't seem awfully high, and the deviation seems
 low.
 

 No, they don't seem high, but they are about 40% higher then the values
 I see on other environments. 40% is a lot.

 This Ceph cluster is SSD-only, so the lower the latency, the more IOps
 the system can do.

 Wido

  On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com
 wrote:
 
  Hello,
 
  While working at a customer I've ran into a 10GbE latency which seems
  high to me.
 
  I have access to a couple of Ceph cluster and I ran a simple ping test:
 
  $ ping -s 8192 -c 100 -n ip
 
  Two results I got:
 
  rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
  rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
 
  Both these environment are running with Intel 82599ES 10Gbit cards in
  LACP. One with Extreme Networks switches, the other with Arista.
 
  Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
  seeing:
 
  rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
 
  As you can see, the Cisco Nexus network has high latency compared to the
  other setup.
 
  You would say the switches are to blame, but we also tried with a direct
  TwinAx connection, but that didn't help.
 
  This setup also uses the Intel 82599ES cards, so the cards don't seem to
  be the problem.
 
  The MTU is set to 9000 on all these networks and cards.
 
  I was wondering, others with a Ceph cluster running on 10GbE, could you
  perform a simple network latency test like this? I'd like to compare the
  results.
 
  --
  Wido den Hollander
  42on B.V.
  Ceph trainer and consultant
 
  Phone: +31 (0)20 700 9902
  Skype: contact42on
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 


 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Irek Fasikhov
Hi,Udo.
Good value :)

Whether an additional optimization on the host?
Thanks.

Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de:

 Hi,
 from one host to five OSD-hosts.

 NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network).

 rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms
 rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms
 rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms
 rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms
 rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms


 Udo

 Am 06.11.2014 14:18, schrieb Wido den Hollander:
  Hello,
 
  While working at a customer I've ran into a 10GbE latency which seems
  high to me.
 
  I have access to a couple of Ceph cluster and I ran a simple ping test:
 
  $ ping -s 8192 -c 100 -n ip
 
  Two results I got:
 
  rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
  rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
 
  Both these environment are running with Intel 82599ES 10Gbit cards in
  LACP. One with Extreme Networks switches, the other with Arista.
 
  Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
  seeing:
 
  rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
 
  As you can see, the Cisco Nexus network has high latency compared to the
  other setup.
 
  You would say the switches are to blame, but we also tried with a direct
  TwinAx connection, but that didn't help.
 
  This setup also uses the Intel 82599ES cards, so the cards don't seem to
  be the problem.
 
  The MTU is set to 9000 on all these networks and cards.
 
  I was wondering, others with a Ceph cluster running on 10GbE, could you
  perform a simple network latency test like this? I'd like to compare the
  results.
 

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Robert Sander
Hi,

2 LACP bonded Intel Corporation Ethernet 10G 2P X520 Adapters, no jumbo
frames, here:

rtt min/avg/max/mdev = 0.141/0.207/0.313/0.040 ms
rtt min/avg/max/mdev = 0.124/0.223/0.289/0.044 ms
rtt min/avg/max/mdev = 0.302/0.378/0.460/0.038 ms
rtt min/avg/max/mdev = 0.282/0.389/0.473/0.035 ms

All hosts on the same stacked pair of Dell N4032F switches.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
Update:

I was able to fix the authentication error, and I have 2 radosgw
running on the same host.
The problem now, is, I believe I have created the zone wrong, or, I am
doing something wrong, because I can login with the user I had before,
and I can access his buckets. I need to have everything separated.

Here are my zone info:

default zone:
{ domain_root: .rgw,
  control_pool: .rgw.control,
  gc_pool: .rgw.gc,
  log_pool: .log,
  intent_log_pool: .intent-log,
  usage_log_pool: .usage,
  user_keys_pool: .users,
  user_email_pool: .users.email,
  user_swift_pool: .users.swift,
  user_uid_pool: .users.uid,
  system_key: { access_key: ,
  secret_key: },
  placement_pools: [
{ key: default-placement,
  val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets,
  data_extra_pool: .rgw.buckets.extra}}]}

env2 zone:
{ domain_root: .rgw,
  control_pool: .rgw.control,
  gc_pool: .rgw.gc,
  log_pool: .log,
  intent_log_pool: .intent-log,
  usage_log_pool: .usage,
  user_keys_pool: .users,
  user_email_pool: .users.email,
  user_swift_pool: .users.swift,
  user_uid_pool: .users.uid,
  system_key: { access_key: ,
  secret_key: },
  placement_pools: [
{ key: default-placement,
  val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets,
  data_extra_pool: .rgw.buckets.extra}}]}

Could you guys help me?



Marco Garcês


On Thu, Nov 6, 2014 at 3:56 PM, Marco Garcês ma...@garces.cc wrote:
 By the way,
 Is it possible to run 2 radosgw on the same host?

 I think I have created the zone, not sure if it was correct, because
 it used the default pool names, even though I had changed them in the
 json file I had provided.

 Now I am trying to run ceph-radosgw with two different entries in the
 ceph.conf file, but without sucess. Example:

 [client.radosgw.gw]
 host = GATEWAY
 keyring = /etc/ceph/keyring.radosgw.gw
 rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
 log file = /var/log/ceph/client.radosgw.gateway.log
 rgw print continue = false
 rgw dns name = gateway.local
 rgw enable ops log = false
 rgw enable usage log = true
 rgw usage log tick interval = 30
 rgw usage log flush threshold = 1024
 rgw usage max shards = 32
 rgw usage max user shards = 1
 rgw cache lru size = 15000
 rgw thread pool size = 2048

 #[client.radosgw.gw.env2]
 #host = GATEWAY
 #keyring = /etc/ceph/keyring.radosgw.gw
 #rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock
 #log file = /var/log/ceph/client.env2.radosgw.gateway.log
 #rgw print continue = false
 #rgw dns name = cephppr.local
 #rgw enable ops log = false
 #rgw enable usage log = true
 #rgw usage log tick interval = 30
 #rgw usage log flush threshold = 1024
 #rgw usage max shards = 32
 #rgw usage max user shards = 1
 #rgw cache lru size = 15000
 #rgw thread pool size = 2048
 #rgw zone = ppr

 It fails to create the socket:
 2014-11-06 15:39:08.862364 7f80cc670880  0 ceph version 0.80.5
 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930
 2014-11-06 15:39:08.870429 7f80cc670880  0 librados:
 client.radosgw.gw.env2 authentication error (1) Operation not
 permitted
 2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage
 provider (RADOS)


 What am I doing wrong?

 Marco Garcês
 #sysadmin
 Maputo - Mozambique
 [Skype] marcogarces


 On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote:
 Your solution of pre-pending the environment name to the bucket, was
 my first choice, but at the moment I can't ask the devs to change the
 code to do that. For now I have to stick with the zones solution.
 Should I follow the federated zones docs
 (http://ceph.com/docs/master/radosgw/federated-config/) but skip the
 sync step?

 Thank you,

 Marco Garcês

 On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com 
 wrote:
 You could setup dedicated zones for each environment, and not
 replicate between them.

 Each zone would have it's own URL, but you would be able to re-use
 usernames and bucket names.  If different URLs are a problem, you
 might be able to get around that in the load balancer or the web
 servers.  I wouldn't really recommend that, but it's possible.


 I have a similar requirement.  I was able to pre-pending the
 environment name to the bucket in my client code, which made things
 much easier.


 On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote:
 Hi there,

 I have this situation, where I'm using the same Ceph cluster (with
 radosgw), for two different environments, QUAL and PRE-PRODUCTION.

 I need different users for each environment, but I need to create the
 same buckets, with the same name; I understand there is no way to have
 2 buckets with the same name, but how can I go around this? Perhaps
 creating a different pool for each user?

 Can you help me? Thank you in advance, my best regards,

 Marco Garcês
 ___
 ceph-users 

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
On 11/06/2014 02:58 PM, Luis Periquito wrote:
 What is the COPP?
 

Nothing special, default settings. 200 ICMP packets/second.

But we also tested with a direct TwinAx cable between two hosts, so no
switch involved. That did not improve the latency.

So this seems to be a kernel/driver issue somewhere, but I can't think
of anything.

The systems I have access to have no special tuning and get much better
latency.

Wido

 On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote:
 
 On 11/06/2014 02:38 PM, Luis Periquito wrote:
 Hi Wido,

 What is the full topology? Are you using a north-south or east-west? So
 far
 I've seen the east-west are slightly slower. What are the fabric modes
 you
 have configured? How is everything connected? Also you have no
 information
 on the OS - if I remember correctly there was a lot of improvements in
 the
 latest kernels...

 The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are
 two 7000 units and 8 3000s spread out over 4 racks.

 But the test I did was with two hosts connected to the same Nexus 3000
 switch using TwinAx cabling of 3m.

 The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but
 that didn't make a difference.


 And what about the bandwith?


 Just fine, no problems getting 10Gbit through the NICs.

 The values you present don't seem awfully high, and the deviation seems
 low.


 No, they don't seem high, but they are about 40% higher then the values
 I see on other environments. 40% is a lot.

 This Ceph cluster is SSD-only, so the lower the latency, the more IOps
 the system can do.

 Wido

 On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com
 wrote:

 Hello,

 While working at a customer I've ran into a 10GbE latency which seems
 high to me.

 I have access to a couple of Ceph cluster and I ran a simple ping test:

 $ ping -s 8192 -c 100 -n ip

 Two results I got:

 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

 Both these environment are running with Intel 82599ES 10Gbit cards in
 LACP. One with Extreme Networks switches, the other with Arista.

 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
 seeing:

 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

 As you can see, the Cisco Nexus network has high latency compared to the
 other setup.

 You would say the switches are to blame, but we also tried with a direct
 TwinAx connection, but that didn't help.

 This setup also uses the Intel 82599ES cards, so the cards don't seem to
 be the problem.

 The MTU is set to 9000 on all these networks and cards.

 I was wondering, others with a Ceph cluster running on 10GbE, could you
 perform a simple network latency test like this? I'd like to compare the
 results.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on

 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad William Seys
Hi Sam,

 Sounds like you needed osd 20.  You can mark osd 20 lost.
 -Sam

Does not work:

# ceph osd lost 20 --yes-i-really-mean-it   

osd.20 is not down or doesn't exist


Also, here is an interesting post which I will follow from October:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html


Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I 
temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. 
That immediately caused all of my incomplete PGs to start recovering and 
everything seemed to come back OK. I was serving out and RBD from here and 
xfs_repair reported no problems. So... happy ending?

What started this all was that I was altering my CRUSH map causing significant 
rebalancing on my cluster which had size = 2. During this process I lost an 
OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only 
lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get 
the PGs to recover without changing the min_size.


It is good that this worked for him, but it also seems like a bug that it 
worked!  (I.e. ceph should have been able to recover on its own without weird 
workarounds.)

I'll let you know if this works for me!

Thanks,
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] buckets and users

2014-11-06 Thread Craig Lewis
You need to tell each radosgw daemon which zone to use.  In ceph.conf, I
have:
[client.radosgw.ceph3c]
  host = ceph3c
  rgw socket path = /var/run/ceph/radosgw.ceph3c
  keyring = /etc/ceph/ceph.client.radosgw.ceph3c.keyring
  log file = /var/log/ceph/radosgw.log
  admin socket = /var/run/ceph/radosgw.asok
  rgw dns name = us-central-1.ceph.cdlocal
  rgw region = us
  rgw region root pool = .us.rgw.root
  rgw zone = us-central-1
  rgw zone root pool = .us-central-1.rgw.root




On Thu, Nov 6, 2014 at 6:35 AM, Marco Garcês ma...@garces.cc wrote:

 Update:

 I was able to fix the authentication error, and I have 2 radosgw
 running on the same host.
 The problem now, is, I believe I have created the zone wrong, or, I am
 doing something wrong, because I can login with the user I had before,
 and I can access his buckets. I need to have everything separated.

 Here are my zone info:

 default zone:
 { domain_root: .rgw,
   control_pool: .rgw.control,
   gc_pool: .rgw.gc,
   log_pool: .log,
   intent_log_pool: .intent-log,
   usage_log_pool: .usage,
   user_keys_pool: .users,
   user_email_pool: .users.email,
   user_swift_pool: .users.swift,
   user_uid_pool: .users.uid,
   system_key: { access_key: ,
   secret_key: },
   placement_pools: [
 { key: default-placement,
   val: { index_pool: .rgw.buckets.index,
   data_pool: .rgw.buckets,
   data_extra_pool: .rgw.buckets.extra}}]}

 env2 zone:
 { domain_root: .rgw,
   control_pool: .rgw.control,
   gc_pool: .rgw.gc,
   log_pool: .log,
   intent_log_pool: .intent-log,
   usage_log_pool: .usage,
   user_keys_pool: .users,
   user_email_pool: .users.email,
   user_swift_pool: .users.swift,
   user_uid_pool: .users.uid,
   system_key: { access_key: ,
   secret_key: },
   placement_pools: [
 { key: default-placement,
   val: { index_pool: .rgw.buckets.index,
   data_pool: .rgw.buckets,
   data_extra_pool: .rgw.buckets.extra}}]}

 Could you guys help me?



 Marco Garcês


 On Thu, Nov 6, 2014 at 3:56 PM, Marco Garcês ma...@garces.cc wrote:
  By the way,
  Is it possible to run 2 radosgw on the same host?
 
  I think I have created the zone, not sure if it was correct, because
  it used the default pool names, even though I had changed them in the
  json file I had provided.
 
  Now I am trying to run ceph-radosgw with two different entries in the
  ceph.conf file, but without sucess. Example:
 
  [client.radosgw.gw]
  host = GATEWAY
  keyring = /etc/ceph/keyring.radosgw.gw
  rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
  log file = /var/log/ceph/client.radosgw.gateway.log
  rgw print continue = false
  rgw dns name = gateway.local
  rgw enable ops log = false
  rgw enable usage log = true
  rgw usage log tick interval = 30
  rgw usage log flush threshold = 1024
  rgw usage max shards = 32
  rgw usage max user shards = 1
  rgw cache lru size = 15000
  rgw thread pool size = 2048
 
  #[client.radosgw.gw.env2]
  #host = GATEWAY
  #keyring = /etc/ceph/keyring.radosgw.gw
  #rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock
  #log file = /var/log/ceph/client.env2.radosgw.gateway.log
  #rgw print continue = false
  #rgw dns name = cephppr.local
  #rgw enable ops log = false
  #rgw enable usage log = true
  #rgw usage log tick interval = 30
  #rgw usage log flush threshold = 1024
  #rgw usage max shards = 32
  #rgw usage max user shards = 1
  #rgw cache lru size = 15000
  #rgw thread pool size = 2048
  #rgw zone = ppr
 
  It fails to create the socket:
  2014-11-06 15:39:08.862364 7f80cc670880  0 ceph version 0.80.5
  (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930
  2014-11-06 15:39:08.870429 7f80cc670880  0 librados:
  client.radosgw.gw.env2 authentication error (1) Operation not
  permitted
  2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage
  provider (RADOS)
 
 
  What am I doing wrong?
 
  Marco Garcês
  #sysadmin
  Maputo - Mozambique
  [Skype] marcogarces
 
 
  On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote:
  Your solution of pre-pending the environment name to the bucket, was
  my first choice, but at the moment I can't ask the devs to change the
  code to do that. For now I have to stick with the zones solution.
  Should I follow the federated zones docs
  (http://ceph.com/docs/master/radosgw/federated-config/) but skip the
  sync step?
 
  Thank you,
 
  Marco Garcês
 
  On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com
 wrote:
  You could setup dedicated zones for each environment, and not
  replicate between them.
 
  Each zone would have it's own URL, but you would be able to re-use
  usernames and bucket names.  If different URLs are a problem, you
  might be able to get around that in the load balancer or the web
  servers.  I wouldn't really recommend that, but it's possible.
 
 
  I have a similar requirement.  I was able to pre-pending the
  

[ceph-users] Red Hat/CentOS kernel-ml to get RBD module

2014-11-06 Thread Robert LeBlanc
The maintainers of the kernel-ml[1] package have graciously accepted the
request to include the RBD module in the mainline kernel build[2]. This
should help people test out new kernels with RBD easier if you have better
things to than build new kernels.

Thanks kernel-ml maintainers!

Robert LeBlanc

[1] http://elrepo.org/tiki/kernel-ml
[2] http://elrepo.org/bugs/view.php?id=521
http://www.google.com/url?q=http%3A%2F%2Felrepo.org%2Fbugs%2Fview.php%3Fid%3D521sa=Dsntz=1usg=AFQjCNGddj0-FGeMKy9k0l4HPduM-47ZRw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Udo Lembke
Hi,
no special optimizations on the host.
In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu
+ debian).

The pings from one osd to the others are comparable.

Udo

On 06.11.2014 15:00, Irek Fasikhov wrote:
 Hi,Udo.
 Good value :)

 Whether an additional optimization on the host?
 Thanks.

 Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de
 mailto:ulem...@polarzone.de:

 Hi,
 from one host to five OSD-hosts.

 NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade
 network).

 rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms
 rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms
 rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms
 rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms
 rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms


 Udo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Robert LeBlanc
rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms

IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR
switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64
on Intel(R) Atom(TM) CPU  C2750 with 32 GB of RAM.

On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de wrote:

  Hi,
 no special optimizations on the host.
 In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu +
 debian).

 The pings from one osd to the others are comparable.

 Udo

 On 06.11.2014 15:00, Irek Fasikhov wrote:

 Hi,Udo.
 Good value :)

  Whether an additional optimization on the host?
 Thanks.

 Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de:

 Hi,
 from one host to five OSD-hosts.

 NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network).

 rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms
 rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms
 rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms
 rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms
 rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms


 Udo



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-06 Thread John Spray
Jasper,

Thanks for this -- I've reproduced this issue in a development
environment.  We'll see if this is also an issue on giant, and
backport a fix if appropriate.  I'll update this thread soon.

Cheers,
John

On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
 Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
 --cluster ceph --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576
  writing 9567209693~1048576
  writing 9568258269~1048576
  writing 9569306845~1048576
  writing 9570355421~1048576
  writing 9571403997~1048576
  writing 9572452573~1048576
  writing 9573501149~1048576
  writing 9574549725~1048576
  writing 9575598301~1048576
  writing 9576646877~1048576
  writing 9577695453~1048576
  writing 9578744029~1048576
  writing 9579792605~1048576
  writing 9580841181~1048576
  writing 9581889757~1048576
  writing 9582938333~1048576
  writing 9583986909~1048576
  writing 9585035485~1048576
  writing 9586084061~1048576
  writing 9587132637~1048576
  writing 9588181213~1048576
  writing 9589229789~1048576
  writing 9590278365~1048576
  writing 9591326941~1048576
  writing 9592375517~1048576
  writing 9593424093~1048576
  writing 9594472669~1048576
  writing 9595521245~1048576
  writing 9596569821~1048576
  writing 9597618397~1048576
  writing 9598666973~1048576
  writing 9599715549~1048576
  writing 9600764125~1048576
  writing 9601812701~1048576
  writing 9602861277~1048576
  writing 9603909853~1048576
  writing 9604958429~1048576
  writing 

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Amusingly, that's what I'm working on this week.

http://tracker.ceph.com/issues/7862

There are pretty good reasons for why it works the way it does right
now, but it certainly is unexpected.
-Sam

On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys
cws...@physics.wisc.edu wrote:
 Hi Sam,

 Sounds like you needed osd 20.  You can mark osd 20 lost.
 -Sam

 Does not work:

 # ceph osd lost 20 --yes-i-really-mean-it
 osd.20 is not down or doesn't exist


 Also, here is an interesting post which I will follow from October:
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html

 
 Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I
 temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1.
 That immediately caused all of my incomplete PGs to start recovering and
 everything seemed to come back OK. I was serving out and RBD from here and
 xfs_repair reported no problems. So... happy ending?

 What started this all was that I was altering my CRUSH map causing significant
 rebalancing on my cluster which had size = 2. During this process I lost an
 OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only
 lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get
 the PGs to recover without changing the min_size.
 

 It is good that this worked for him, but it also seems like a bug that it
 worked!  (I.e. ceph should have been able to recover on its own without weird
 workarounds.)

 I'll let you know if this works for me!

 Thanks,
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Also, are you certain that osd 20 is not up?
-Sam

On Thu, Nov 6, 2014 at 10:52 AM, Samuel Just sam.j...@inktank.com wrote:
 Amusingly, that's what I'm working on this week.

 http://tracker.ceph.com/issues/7862

 There are pretty good reasons for why it works the way it does right
 now, but it certainly is unexpected.
 -Sam

 On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys
 cws...@physics.wisc.edu wrote:
 Hi Sam,

 Sounds like you needed osd 20.  You can mark osd 20 lost.
 -Sam

 Does not work:

 # ceph osd lost 20 --yes-i-really-mean-it
 osd.20 is not down or doesn't exist


 Also, here is an interesting post which I will follow from October:
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html

 
 Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I
 temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1.
 That immediately caused all of my incomplete PGs to start recovering and
 everything seemed to come back OK. I was serving out and RBD from here and
 xfs_repair reported no problems. So... happy ending?

 What started this all was that I was altering my CRUSH map causing 
 significant
 rebalancing on my cluster which had size = 2. During this process I lost an
 OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only
 lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get
 the PGs to recover without changing the min_size.
 

 It is good that this worked for him, but it also seems like a bug that it
 worked!  (I.e. ceph should have been able to recover on its own without weird
 workarounds.)

 I'll let you know if this works for me!

 Thanks,
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Sam,

  Amusingly, that's what I'm working on this week.
  
  http://tracker.ceph.com/issues/7862

Well, thanks for any bugfixes in advance!  :)

 Also, are you certain that osd 20 is not up?
 -Sam

Yep.

# ceph osd metadata 20
Error ENOENT: osd.20 does not exist

So part of ceph thinks osd.20 doesn't exist, but another part (the 
down_osds_we_would_probe) thinks the osd exists and is down?

In other news, my min_size was set to 1, so the same fix might not apply to 
me.  Instead I set the pool size from 2 to 1, then back again.  Looks like the 
end result is merely going to be that the down+incomplete get converted to 
incomplete.  :/  I'll let you (and future googlers) know.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Basic Ceph Questions

2014-11-06 Thread Craig Lewis
On Wed, Nov 5, 2014 at 11:57 PM, Wido den Hollander w...@42on.com wrote:

 On 11/05/2014 11:03 PM, Lindsay Mathieson wrote:

 
  - Geo Replication - thats done via federated gateways? looks complicated
 :(
* The remote slave, it would be read only?
 

 That is only for the RADOS Gateway. Ceph itself (RADOS) does not support
 Geo Replication.



That is only for the RADOS Gateway. Ceph itself (RADOS) does not support
Geo Replication.

The 3 services built on top of RADOS support backups, but RADOS itself does
not.  For RDB, you can use snapshot diffs, and ship them offsite (see
various threads on the ML).  For RadosGW, there is Federation.  For CephFS,
you can use traditional POSIX filesystem backup tools.




  - Disaster strikes, apart from DR backups how easy is it to recover your
 data
  off ceph OSD's? one of the things I liked about gluster was that if I
 totally
  screwed up the gluster masters, I could always just copy the data off the
  filesystem. Not so much with ceph.
 

 It's a bit harder with Ceph. Eventually it is doable, but that is
 something that would take a lot of time.


In practice, not really.  Out of curiosity, I attempted this for some
RadosGW objects.  It was easy when there was a single object less than
4MB.  It very quickyl became complicated with a few larger objects.  You'd
have to have a very deep understanding of the service to track all of the
information down with the cluster offline.

It's definitely possible, just not practical.




 
  - Am I abusing ceph? :) I just have a small 3 node VM server cluster
 with 20
  windows VM;s, some servers, some VDI. The shared store is a QNAP nas
 which is
  struggling. I'm using ceph for
  - Shared Storage
  - Replication/Redundancy
  - Improved performance
 

 I think that 3 nodes is not sufficient, Ceph really starts performing
 when you go 10 nodes (excluding monitors).


If it meets your needs, then it's working.  :-)

You're going to spend a lot more time managing the 3 node Ceph cluster than
you spent on the QNAP.  If it doesn't make sense for you to spent a lot of
time dealing with storage, then a single shared store with more IOPS would
be a better fit.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-06 Thread John Spray
This is still an issue on master, so a fix will be coming soon.
Follow the ticket for updates:
http://tracker.ceph.com/issues/10025

Thanks for finding the bug!

John

On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote:
 Jasper,

 Thanks for this -- I've reproduced this issue in a development
 environment.  We'll see if this is also an issue on giant, and
 backport a fix if appropriate.  I'll update this thread soon.

 Cheers,
 John

 On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
 Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
 --cluster ceph --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576
  writing 9567209693~1048576
  writing 9568258269~1048576
  writing 9569306845~1048576
  writing 9570355421~1048576
  writing 9571403997~1048576
  writing 9572452573~1048576
  writing 9573501149~1048576
  writing 9574549725~1048576
  writing 9575598301~1048576
  writing 9576646877~1048576
  writing 9577695453~1048576
  writing 9578744029~1048576
  writing 9579792605~1048576
  writing 9580841181~1048576
  writing 9581889757~1048576
  writing 9582938333~1048576
  writing 9583986909~1048576
  writing 9585035485~1048576
  writing 9586084061~1048576
  writing 9587132637~1048576
  writing 9588181213~1048576
  writing 9589229789~1048576
  writing 9590278365~1048576
  writing 9591326941~1048576
  writing 9592375517~1048576
  writing 9593424093~1048576
  writing 9594472669~1048576
  writing 9595521245~1048576
  writing 9596569821~1048576
  

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Craig Lewis
On Thu, Nov 6, 2014 at 11:27 AM, Chad Seys


  Also, are you certain that osd 20 is not up?
  -Sam

 Yep.

 # ceph osd metadata 20
 Error ENOENT: osd.20 does not exist

 So part of ceph thinks osd.20 doesn't exist, but another part (the
 down_osds_we_would_probe) thinks the osd exists and is down?


You'll have trouble until osd.20 exists again.

Ceph really does not want to lose data.  Even if you tell it the osd is
gone, ceph won't believe you.  Once ceph can probe any osd that claims to
be 20, it might let you proceed with your recovery.  Then you'll probably
need to use ceph pg pgid mark_unfound_lost.

If you don't have a free bay to create a real osd.20, it's possible to fake
it with some small loop-back filesystems.  Bring it up and mark it OUT.  It
will probably cause some remapping.  I would keep it around until you get
things healthy.

If you create a real osd.20, you might want to leave it OUT until you get
things healthy again.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Craig,

 You'll have trouble until osd.20 exists again.
 
 Ceph really does not want to lose data.  Even if you tell it the osd is
 gone, ceph won't believe you.  Once ceph can probe any osd that claims to
 be 20, it might let you proceed with your recovery.  Then you'll probably
 need to use ceph pg pgid mark_unfound_lost.
 
 If you don't have a free bay to create a real osd.20, it's possible to fake
 it with some small loop-back filesystems.  Bring it up and mark it OUT.  It
 will probably cause some remapping.  I would keep it around until you get
 things healthy.
 
 If you create a real osd.20, you might want to leave it OUT until you get
 things healthy again.

Thanks for the recovery tip!

I would guess I could safely remove an OSD (mark OUT, wait for migration to 
stop, then crush osd rm) and then add back in as osd.20 would work?

New switch:
--yes-i-really-REALLY-mean-it

;)
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Diff based on Timestamp

2014-11-06 Thread Nick Fisk
I have been thinking about the implications of losing the snapshot chain on
a RBD when doing export-diff-import-diff between two separate physical
locations. As I understand it, in this scenario when you take the first
snapshot again on the source, you would In effect end up copying the whole
RBD image across to the other site as the diff would be based on
Creation-1st Snap. If this was a large multi TB RBD, even over a reasonably
fast link, this could a long time to resync.

 

From what I understand the RADOS objects which RBD's are striped across,
have last modified timestamps. Would it be feasible to add an option to the
rbd command to export a diff of modified blocks since a certain timestamp? 

 

This way you could take a new snapshot on the source RBD and then specify a
timestamp from just before the previously deleted snapshot and export the
blocks to bring the 2nd copy back up to date. You could then resume the
normal export-diff-import-diff procedure.

 

Please tell me if I am thinking about this in completely the wrong way, or
if this is actually a possible solution.

 

Many Thanks,

Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah
Hi Guys,

I am sure many of you guys have installed cephfs using puppet. I am trying to 
install “firefly” using the puppet module from  
https://github.com/ceph/puppet-ceph.git  

and running into the “ceph_config” file issue where it’s unable to find the 
config file and I am not sure why.

Here’s the error I get while running puppet on one of the mon nodes:

Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could 
not evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could 
not evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not 
evaluate: No ability to determine if ceph_config exists

—Jiten___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] installing ceph object gateway

2014-11-06 Thread Michael Kuriger
Is there updated documentation explaining how to install and use the
object gateway?


http://docs.ceph.com/docs/master/install/install-ceph-gateway/

I attempted this install and quickly run into problems.

Thanks!
-M

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd down

2014-11-06 Thread Shain Miley
I tried restarting all the osd's on that node, osd.70 was the only ceph process 
that did not come back online.

There is nothing in the ceph-osd log for osd.70.

However I do see over 13,000 of these messages in the kern.log:

Nov  6 19:54:27 hqosd6 kernel: [34042786.392178] XFS (sdl1): xfs_log_force: 
error 5 returned.

Does anyone have any suggestions on how I might be able to get this HD back in 
the cluster (or whether or not it is worth even trying).

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: Shain Miley [smi...@npr.org]
Sent: Tuesday, November 04, 2014 3:55 PM
To: ceph-users@lists.ceph.com
Subject: osd down

Hello,

We are running ceph version 0.80.5 with 108 osd's.

Today I noticed that one of the osd's is down:

root@hqceph1:/var/log/ceph# ceph -s
 cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
  health HEALTH_WARN crush map has legacy tunables
  monmap e1: 3 mons at
{hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0},
election epoch 146, quorum 0,1,2 hqceph1,hqceph2,hqceph3
  osdmap e7119: 108 osds: 107 up, 107 in
   pgmap v6729985: 3208 pgs, 17 pools, 81193 GB data, 21631 kobjects
 216 TB used, 171 TB / 388 TB avail
 3204 active+clean
4 active+clean+scrubbing
   client io 4079 kB/s wr, 8 op/s


Using osd dump I determined that it is osd number 70:

osd.70 down out weight 0 up_from 2668 up_thru 6886 down_at 6913
last_clean_interval [488,2665) 10.35.1.217:6814/22440
10.35.1.217:6820/22440 10.35.1.217:6824/22440 10.35.1.217:6830/22440
autoout,exists 5dbd4a14-5045-490e-859b-15533cd67568


Looking at that node, the drive is still mounted and I did not see any
errors in any of the system logs, and the raid level status shows the
drive as up and healthy, etc.


root@hqosd6:~# df -h |grep 70
/dev/sdl1   3.7T  1.9T  1.9T  51% /var/lib/ceph/osd/ceph-70


I was hoping that someone might be able to advise me on the next course
of action (can I add the osd back in?, should I replace the drive
altogether, etc)

I have attached the osd log to this email.

Any suggestions would be great.

Thanks,

Shain















--
Shain Miley | Manager of Systems and Infrastructure, Digital Media |
smi...@npr.org | 202.513.3649
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing CephFs via puppet

2014-11-06 Thread Loic Dachary
Hi,

At the moment puppet-ceph does not support CephFS. The error you're seeing does 
not ring a bell, would you have more context to help diagnose it ?

Cheers

On 06/11/2014 23:44, JIten Shah wrote:
 Hi Guys,
 
 I am sure many of you guys have installed cephfs using puppet. I am trying to 
 install “firefly” using the puppet module from  
 https://github.com/ceph/puppet-ceph.git  
 
 and running into the “ceph_config” file issue where it’s unable to find the 
 config file and I am not sure why.
 
 Here’s the error I get while running puppet on one of the mon nodes:
 
 Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could 
 not evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could 
 not evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not 
 evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
 evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
 ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
 evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not 
 evaluate: No ability to determine if ceph_config exists
 
 —Jiten
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah
Thanks Loic. 

What is the recommended puppet module for installing cephFS ?

I can send more details about puppet-ceph but basically I haven't changed 
anything in there except for assigning values to the required params in the 
yaml file. 

--Jiten 



 On Nov 6, 2014, at 7:24 PM, Loic Dachary l...@dachary.org wrote:
 
 Hi,
 
 At the moment puppet-ceph does not support CephFS. The error you're seeing 
 does not ring a bell, would you have more context to help diagnose it ?
 
 Cheers
 
 On 06/11/2014 23:44, JIten Shah wrote:
 Hi Guys,
 
 I am sure many of you guys have installed cephfs using puppet. I am trying 
 to install “firefly” using the puppet module from  
 https://github.com/ceph/puppet-ceph.git  
 
 and running into the “ceph_config” file issue where it’s unable to find the 
 config file and I am not sure why.
 
 Here’s the error I get while running puppet on one of the mon nodes:
 
 Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could 
 not evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: 
 Could not evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could 
 not evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
 evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
 ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
 evaluate: No ability to determine if ceph_config exists
 Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could 
 not evaluate: No ability to determine if ceph_config exists
 
 —Jiten
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 -- 
 Loïc Dachary, Artisan Logiciel Libre
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Cluster with two radosgw

2014-11-06 Thread lakshmi k s
Any best practices available for Radosgw HA? Please suggest.


On Wednesday, November 5, 2014 2:08 PM, lakshmi k s lux...@yahoo.com wrote:
 


Hello -

My ceph cluster needs to have two rados gateway nodes eventually interfacing 
with Openstack haproxy. I have been successful in bringing up one of them. What 
are the steps for additional rados gateway node to be included in cluster? Any 
help is greatly appreciated.

Thanks much.
Lakshmi.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] installing ceph object gateway

2014-11-06 Thread M Ranga Swami Reddy
Please share the problem/issue details(like error msg,etc). We could
check and help.

Thanks
Swami

On Fri, Nov 7, 2014 at 4:41 AM, Michael Kuriger mk7...@yp.com wrote:
 Is there updated documentation explaining how to install and use the
 object gateway?


 http://docs.ceph.com/docs/master/install/install-ceph-gateway/

 I attempted this install and quickly run into problems.

 Thanks!
 -M

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it normal that osd's memory exceed 1GB under stress test?

2014-11-06 Thread 谢锐
I set mon_osd_down_out_interval to two days,and do stress test. the memory of 
osd exceed 1GB.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest?

2014-11-06 Thread 谢锐
and make one osd down.then do stress test by fio.

-- Original --


From:  谢锐xie...@szsandstone.com;

Date:  Fri, Nov 7, 2014 02:50 PM

To:  ceph-usersceph-us...@ceph.com; 


Subject:  [ceph-users] Is it normal that osd's memory exceed 1GB under 
stresstest?


I set mon_osd_down_out_interval to two days,and do stress test. the memory of 
osd exceed 1GB.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com