Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-23 Thread Alexandre DERUMIER
BTW,
the new samsung PM853T SSD, announce 665 TBW for 4K random write
http://www.tomsitpro.com/articles/samsung-3-bit-nand-enterprise-ssd,1-1922.html

and price are cheaper than intel s3500. (around 450€ ex vat)

(Cluster will be build next year, so I have some time before choose the good 
one ssd)


my main concern, is to known if it's really needed to have replication x3 
(mainly for cost price).
But I can wait to have lower ssd price next year, and go to 3x if necessary.



- Mail original - 

De: Alexandre DERUMIER aderum...@odiso.com 
À: Christian Balzer ch...@gol.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 07:59:58 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 

That's not the only thing you should worry about. 
Aside from the higher risk there's total cost of ownership or Cost per 
terabyte written ($/TBW). 
So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at 
about $850, the 3700 can reliably store 7300TB while the 3500 is only 
rated for 450TB. 
You do the math. ^.^ 

Yes, I known,I have already do the math. But I'm far from reach this amount of 
write. 

workload is (really) random, so 20% of write of 3iops, 4k block = 25MB/s of 
write, 2TB each day. 
with replication 3x, 6TB each day of write. 
60x450TBW = 27000TBW / 6TB = 4500 days = 12,5 years ;) 

so with journal write, it of course less, but I think it should be enough for 5 
years 


I'll also test key-value store, as no more journal, less write. 
(Not sure it works fine with rbd for the moment) 

- Mail original - 

De: Christian Balzer ch...@gol.com 
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 07:29:52 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 


On Fri, 23 May 2014 07:02:15 +0200 (CEST) Alexandre DERUMIER wrote: 

 What is your main goal for that cluster, high IOPS, high sequential 
 writes or reads? 
 
 high iops, mostly random. (it's an rbd cluster, with qemu-kvm guest, 
 around 1000vms, doing smalls ios each one). 
 
 80%read|20% write 
 
 I don't care about sequential workload, or bandwith. 
 
 
 Remember my Slow IOPS on RBD... thread, you probably shouldn't expect 
 more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
 
 Yes, that's enough for me ! I can't use spinner disk, because it's 
 really too slow. I need around 3iops for around 20TB of storage. 
 
 I could even go to cheaper consummer ssd (like crucial m550), I think I 
 could reach 2000-4000 iops from it. But I'm afraid of 
 durability|stability. 
 
That's not the only thing you should worry about. 
Aside from the higher risk there's total cost of ownership or Cost per 
terabyte written ($/TBW). 
So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at 
about $850, the 3700 can reliably store 7300TB while the 3500 is only 
rated for 450TB. 
You do the math. ^.^ 

Christian 
 - Mail original - 
 
 De: Christian Balzer ch...@gol.com 
 À: ceph-users@lists.ceph.com 
 Envoyé: Vendredi 23 Mai 2014 04:57:51 
 Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 
 3x ? 
 
 
 Hello, 
 
 On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote: 
 
  Hi, 
  
  I'm looking to build a full osd ssd cluster, with this config: 
  
 What is your main goal for that cluster, high IOPS, high sequential 
 writes or reads? 
 
 Remember my Slow IOPS on RBD... thread, you probably shouldn't expect 
 more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
 
  6 nodes, 
  
  each node 10 osd/ ssd drives (dual 10gbit network). (1journal + datas 
  on each osd) 
  
 Halving the write speed of the SSD, leaving you with about 2GB/s max 
 write speed per node. 
 
 If you're after good write speeds and with a replication factor of 2 I 
 would split the network into public and cluster ones. 
 If you're however after top read speeds, use bonding for the 2 links 
 into the public network, half of your SSDs per node are able to saturate 
 that. 
 
  ssd drive will be entreprise grade, 
  
  maybe intel sc3500 800GB (well known ssd) 
  
 How much write activity do you expect per OSD (remember that you in your 
 case writes are doubled)? Those drives have a total write capacity of 
 about 450TB (within 5 years). 
 
  or new Samsung SSD PM853T 960GB (don't have too much info about it for 
  the moment, but price seem a little bit lower than intel) 
  
 
 Looking at the specs it seems to have a better endurance (I used 
 500GB/day, a value that seemed realistic given the 2 numbers they gave), 
 at least double that of the Intel. 
 Alas they only give a 3 year warranty, which makes me wonder. 
 Also the latencies are significantly higher than the 3500. 
 
  
  I would like to have some advise on replication level, 
  
  
  Maybe somebody have experience with intel sc3500 failure rate ? 
 
 I doubt many people have managed to wear out SSDs of that vintage in 
 normal usage yet. And 

Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread Alexandre DERUMIER
https://github.com/rochaporto/collectd-ceph 

It has a set of collectd plugins pushing metrics which mostly map what 
the ceph commands return. In the setup we have it pushes them to 
graphite and the displays rely on grafana (check for a screenshot in 
the link above). 


Thanks for sharing ricardo !

I was looking to create a dashboard for grafana too, yours seem very good :)



- Mail original - 

De: Ricardo Rocha rocha.po...@gmail.com 
À: 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com) 
ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org 
Envoyé: Vendredi 23 Mai 2014 02:58:04 
Objet: collectd / graphite / grafana .. calamari? 

Hi. 

I saw the thread a couple days ago on ceph-users regarding collectd... 
and yes, i've been working on something similar for the last few days 
:) 

https://github.com/rochaporto/collectd-ceph 

It has a set of collectd plugins pushing metrics which mostly map what 
the ceph commands return. In the setup we have it pushes them to 
graphite and the displays rely on grafana (check for a screenshot in 
the link above). 

As it relies on common building blocks, it's easily extensible and 
we'll come up with new dashboards soon - things like plotting osd data 
against the metrics from the collectd disk plugin, which we also 
deploy. 

This email is mostly to share the work, but also to check on Calamari? 
I asked Patrick after the RedHat/Inktank news and have no idea what it 
provides, but i'm sure it comes with lots of extra sauce - he 
suggested to ask in the list. 

What's the timeline to have it open sourced? It would be great to have 
a look at it, and as there's work from different people in this area 
maybe start working together on some fancier monitoring tools. 

Regards, 
Ricardo 
-- 
To unsubscribe from this list: send the line unsubscribe ceph-devel in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
Dear ceph,

I am trying to setup ceph 0.80.1 with the following components :

1 x mon - Debian Wheezy (i386)
3 x osds - Debian Wheezy (i386)

(all are kvm powered)

Status after the standard setup procedure :

root@ceph-node2:~# ceph -s
cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean
 monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1
 osdmap e11: 3 osds: 3 up, 3 in
  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
103 MB used, 15223 MB / 15326 MB avail
 192 incomplete

root@ceph-node2:~# ceph health
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean

root@ceph-node2:~# ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph-node2
0   0   osd.0   up  1
-3  0   host ceph-node3
1   0   osd.1   up  1
-4  0   host ceph-node4
2   0   osd.2   up  1


root@ceph-node2:~# ceph osd dump
epoch 11
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
created 2014-05-23 09:00:08.780211
modified 2014-05-23 09:01:33.438001
flags 

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0

pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0

pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3

osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/11373 192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b

osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/10542 192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be

osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
[0,0) 192.168.123.53:6800/6962 192.168.123.53:6801/6962 
192.168.123.53:6802/6962 192.168.123.53:6803/6962 exists,up 
aa06d7e4-181c-4d70-bb8e-018b088c5053


What am I doing wrong here ?
Or what kind of additional information should be provided to get troubleshooted.

thanks,

---

Jan

P.S. with emperor 0.72.2 I had no such problems
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread Sankar P
Hi,

I have four old machines lying around. I would like to setup ceph on
these machines.

Are there any screencast or tutorial with commands, on how to obtain,
install and configure on ceph on these machines ?

The official documentation page OS Recommendations seem to list only
old distros and not the new version of distros (openSUSE and Ubuntu).

So I wanted to ask if there is a screencast or tutorial or techtalk on
how to setup Ceph for a total newbie ?

-- 
Sankar P
http://psankar.blogspot.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and fails

2014-05-23 Thread Lukac, Erik
Hi Simon,

thanks for your reply.
I already installed OS for my ceph-nodes via Kickstart (via network) from 
Redhat Satellite and I dont want to do that again because some other config had 
also been done.
xfsprogs is not part of the rhel base repository but of some extra package with 
costs per node/CPU/whatever called Scalable File System. For some other nodes 
I installed xfsprogs from centos-6-base repo but now I want to try a clean 
rhel-based-only install and so I'll add ceph on my nodes from 
/etc/yum.repos.d/ceph, install manually with yum and then do a ceph-deploy and 
see what will happen ;)

Greetz from munich

Erik


Von: Simon Ironside [sirons...@caffetine.org]
Gesendet: Freitag, 23. Mai 2014 01:07
An: Lukac, Erik
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and 
fails

On 22/05/14 23:56, Lukac, Erik wrote:
 But: this fails because of the dependencies. xfsprogs is in rhel6 repo,
 but not in el6 L

I hadn't noticed that xfsprogs is included in the ceph repos, I'm using
the package from the RHEL 6.5 DVD, which is the same version, you'll
find it in the ScalableFileSystem repo on the Install DVD.

HTH,
Simon.

--
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: i...@br.de; Website: http://www.BR.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about osd objectstore = keyvaluestore-dev setting

2014-05-23 Thread GMail


发自我的 iPhone

 在 2014年5月22日,22:26,Gregory Farnum g...@inktank.com 写道:
 
 On Thu, May 22, 2014 at 5:04 AM, Geert Lindemulder glindemul...@snow.nl 
 wrote:
 Hello All
 
 Trying to implement the osd leveldb backend at an existing ceph test
 cluster.
 The test cluster was updated from 0.72.1 to 0.80.1. The update was ok.
 After the update, the osd objectstore = keyvaluestore-dev setting was
 added to ceph.conf.
 
 Does that mean you tried to switch to the KeyValueStore on one of your
 existing OSDs? That isn't going to work; you'll need to create new
 ones (or knock out old ones and recreate them with it).
 
 After restarting an osd it gives the following error:
 2014-05-22 12:28:06.805290 7f2e7d9de800 -1 KeyValueStore::mount : stale
 version stamp 3. Please run the KeyValueStore update script before starting
 the OSD, or set keyvaluestore_update_to to 1
 
 How can the keyvaluestore_update_to parameter be set or where can i find
 the KeyValueStore update script
 
 Hmm, it looks like that config value isn't actually plugged in to the
 KeyValueStore, so you can't set it with the stock binaries. Maybe
 Haomai has an idea?

yes, the error is that keyvaluestore read version from existing osd data. The 
version is incorrect and maybe there should be more clear error message.

 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread jan.zeller
 -Ursprüngliche Nachricht-
 Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag
 von Sankar P
 Gesendet: Freitag, 23. Mai 2014 11:14
 An: ceph-users@lists.ceph.com
 Betreff: [ceph-users] Screencast/tutorial on setting up Ceph
 
 Hi,
 
 I have four old machines lying around. I would like to setup ceph on these
 machines.
 
 Are there any screencast or tutorial with commands, on how to obtain,
 install and configure on ceph on these machines ?
 
 The official documentation page OS Recommendations seem to list only
 old distros and not the new version of distros (openSUSE and Ubuntu).
 
 So I wanted to ask if there is a screencast or tutorial or techtalk on how to
 setup Ceph for a total newbie ?
 
 --
 Sankar P
 http://psankar.blogspot.com

Hi,

I am rookie too and only used just this : 
http://ceph.com/docs/master/start/

it's a very nice doc

---

jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
georg.hoellr...@xidras.com wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.


OK. I found that in the Requests. So it's the client, that states how 
many objects should be in the listing with sending the max-keys=1000 
variable:


- - - [23/May/2014:08:49:33 +] GET 
/test/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 715 - 
Cyberduck/4.4.4 (14505) (Windows NT (unknown)/6.2) (x86) 
xidrasservice.com:443



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.


No I'm not sure where the timeout comes from. As far as I can tell, 
apache times out after 300 seconds - so that should not be the problem.


I think I found something in the apache logs:
[Fri May 23 08:59:39.385548 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: comm with server 
/var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Fri May 23 08:59:39.385604 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: incomplete headers (0 
bytes) received from server /var/www/s3gw.fcgi


I've increased the timeout to 900 in the apache vhosts config:
FastCgiExternalServer /var/www/s3gw.fcgi -socket 
/var/run/ceph/radosgw.vvx-ceph-m-02 -idle-timeout 900

Now it's not working, and I don't get a log entry any more.

Most interesting when watching the debug output - I'm getting that rados 
successfully finished with the request. But at the same time, the client 
tells me, it failed.


I've shortened the log file, as far as I could see, the info repeats 
itself...


2014-05-23 09:38:43.051395 7f1b427fc700  1 == starting new request 
req=0x7f1b3400f1c0 =
2014-05-23 09:38:43.051597 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:120 UHXW458EH1RVULE1BCEH 
[getxattrs,stat] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b4640 con 
0x2455930
2014-05-23 09:38:43.053180 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.0 10.0.1.199:6800/14453 23  osd_op_reply(120 
UHXW458EH1RVULE1BCEH [getxattrs,stat] v0'0 uv1 ondisk = 0) v6  
229+0+20 (1060030390 0 1010060712) 0x7f1b58002540 con 0x2455930
2014-05-23 09:38:43.053380 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:121 UHXW458EH1RVULE1BCEH 
[read 0~524288] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b45d0 con 
0x2455930
2014-05-23 09:38:43.054359 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.0 10.0.1.199:6800/14453 24  osd_op_reply(121 
UHXW458EH1RVULE1BCEH [read 0~8] v0'0 uv1 ondisk = 0) v6  187+0+8 
(3510944971 0 3829959217) 0x7f1b580057b0 con 0x2455930
2014-05-23 09:38:43.054490 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:122 macm [getxattrs,stat] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b6010 con 0x2457de0
2014-05-23 09:38:43.055871 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.2 10.0.1.199:6806/15018 3  osd_op_reply(122 macm 
[getxattrs,stat] v0'0 uv46 ondisk = 0) v6  213+0+91 (22324782 0 
2022698800) 0x7f1b500025a0 con 0x2457de0
2014-05-23 09:38:43.055963 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:123 macm [read 0~524288] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b3950 con 0x2457de0
2014-05-23 09:38:43.057087 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.2 10.0.1.199:6806/15018 4  osd_op_reply(123 macm [read 0~310] 
v0'0 uv46 ondisk = 0) v6  171+0+310 (3762965810 0 1648184722) 
0x7f1b500026e0 con 0x2457de0
2014-05-23 09:38:43.057364 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.0.26:6809/4834 -- osd_op(client.72942.0:124 store [call 
version.read,getxattrs,stat] 5.c5755cee ack+read e279) v4 -- ?+0 
0x7f1b66b0 con 0x7f1b440022e0
2014-05-23 09:38:43.059223 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.7 10.0.0.26:6809/4834 37  

Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Karan Singh
Try increasing the placement groups for pools

ceph osd pool set data pg_num 128  
ceph osd pool set data pgp_num 128

similarly for other 2 pools as well.

- karan -


On 23 May 2014, at 11:50, jan.zel...@id.unibe.ch wrote:

 Dear ceph,
 
 I am trying to setup ceph 0.80.1 with the following components :
 
 1 x mon - Debian Wheezy (i386)
 3 x osds - Debian Wheezy (i386)
 
 (all are kvm powered)
 
 Status after the standard setup procedure :
 
 root@ceph-node2:~# ceph -s
cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
 stuck unclean
 monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 
 2, quorum 0 ceph-node1
 osdmap e11: 3 osds: 3 up, 3 in
  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
103 MB used, 15223 MB / 15326 MB avail
 192 incomplete
 
 root@ceph-node2:~# ceph health
 HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
 
 root@ceph-node2:~# ceph osd tree
 # idweight  type name   up/down reweight
 -1  0   root default
 -2  0   host ceph-node2
 0   0   osd.0   up  1
 -3  0   host ceph-node3
 1   0   osd.1   up  1
 -4  0   host ceph-node4
 2   0   osd.2   up  1
 
 
 root@ceph-node2:~# ceph osd dump
 epoch 11
 fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
 created 2014-05-23 09:00:08.780211
 modified 2014-05-23 09:01:33.438001
 flags 
 
 pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
 crash_replay_interval 45 stripe_width 0
 
 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
 stripe_width 0
 
 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
 stripe_width 0 max_osd 3
 
 osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval 
 [0,0) 192.168.123.49:6800/11373 192.168.123.49:6801/11373 
 192.168.123.49:6802/11373 192.168.123.49:6803/11373 exists,up 
 21a7d2a8-b709-4a28-bc3b-850913fe4c6b
 
 osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval 
 [0,0) 192.168.123.50:6800/10542 192.168.123.50:6801/10542 
 192.168.123.50:6802/10542 192.168.123.50:6803/10542 exists,up 
 c1cd3ad1-b086-438f-a22d-9034b383a1be
 
 osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
 [0,0) 192.168.123.53:6800/6962 192.168.123.53:6801/6962 
 192.168.123.53:6802/6962 192.168.123.53:6803/6962 exists,up 
 aa06d7e4-181c-4d70-bb8e-018b088c5053
 
 
 What am I doing wrong here ?
 Or what kind of additional information should be provided to get 
 troubleshooted.
 
 thanks,
 
 ---
 
 Jan
 
 P.S. with emperor 0.72.2 I had no such problems
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread Karan Singh
use my blogs if you like   
http://karan-mj.blogspot.fi/2013/12/ceph-storage-part-2.html 

- Karan Singh -

On 23 May 2014, at 12:30, jan.zel...@id.unibe.ch jan.zel...@id.unibe.ch 
wrote:

 -Ursprüngliche Nachricht-
 Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag
 von Sankar P
 Gesendet: Freitag, 23. Mai 2014 11:14
 An: ceph-users@lists.ceph.com
 Betreff: [ceph-users] Screencast/tutorial on setting up Ceph
 
 Hi,
 
 I have four old machines lying around. I would like to setup ceph on these
 machines.
 
 Are there any screencast or tutorial with commands, on how to obtain,
 install and configure on ceph on these machines ?
 
 The official documentation page OS Recommendations seem to list only
 old distros and not the new version of distros (openSUSE and Ubuntu).
 
 So I wanted to ask if there is a screencast or tutorial or techtalk on how to
 setup Ceph for a total newbie ?
 
 --
 Sankar P
 http://psankar.blogspot.com
 
 Hi,
 
 I am rookie too and only used just this : 
 http://ceph.com/docs/master/start/
 
 it's a very nice doc
 
 ---
 
 jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl
Thank you very much - I think I've solved the whole thing. It wasn't in 
radosgw.


The solution was,
- increase the timeout in Apache conf.
- when using haproxy, also increase the timeouts there!


Georg

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
georg.hoellr...@xidras.com wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.

Yehuda


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Michael
64 PG's per pool /shouldn't/ cause any issues while there's only 3 
OSD's. It'll be something to pay attention to if a lot more get added 
through.


Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if your 
step is set to osd or rack.

If it's not host then change it to that and pull it in again.

Check the docs on crush maps 
http://ceph.com/docs/master/rados/operations/crush-map/ for more info.


-Michael

On 23/05/2014 10:53, Karan Singh wrote:

Try increasing the placement groups for pools

ceph osd pool set data pg_num 128
ceph osd pool set data pgp_num 128

similarly for other 2 pools as well.

- karan -


On 23 May 2014, at 11:50, jan.zel...@id.unibe.ch 
mailto:jan.zel...@id.unibe.ch wrote:



Dear ceph,

I am trying to setup ceph 0.80.1 with the following components :

1 x mon - Debian Wheezy (i386)
3 x osds - Debian Wheezy (i386)

(all are kvm powered)

Status after the standard setup procedure :

root@ceph-node2:~# ceph -s
   cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 
192 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election 
epoch 2, quorum 0 ceph-node1

osdmap e11: 3 osds: 3 up, 3 in
 pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
   103 MB used, 15223 MB / 15326 MB avail
192 incomplete

root@ceph-node2:~# ceph health
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck 
unclean


root@ceph-node2:~# ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph-node2
0   0   osd.0   up  1
-3  0   host ceph-node3
1   0   osd.1   up  1
-4  0   host ceph-node4
2   0   osd.2   up  1


root@ceph-node2:~# ceph osd dump
epoch 11
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
created 2014-05-23 09:00:08.780211
modified 2014-05-23 09:01:33.438001
flags

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags 
hashpspool crash_replay_interval 45 stripe_width 0


pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags 
hashpspool stripe_width 0


pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0 max_osd 3


osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 
last_clean_interval [0,0) 192.168.123.49:6800/11373 
192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b


osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 
last_clean_interval [0,0) 192.168.123.50:6800/10542 
192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be


osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 
last_clean_interval [0,0) 192.168.123.53:6800/6962 
192.168.123.53:6801/6962 192.168.123.53:6802/6962 
192.168.123.53:6803/6962 exists,up aa06d7e4-181c-4d70-bb8e-018b088c5053



What am I doing wrong here ?
Or what kind of additional information should be provided to get 
troubleshooted.


thanks,

---

Jan

P.S. with emperor 0.72.2 I had no such problems
___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread John Spray
Hi Ricardo,

Let me share a few notes on metrics in calamari:
 * We're bundling graphite, and using diamond to send home metrics.
The diamond collector used in calamari has always been open source
[1].
 * The Calamari UI has its own graphs page that talks directly to the
graphite API (the calamari REST API does not duplicate any of the
graphing interface)
 * We also bundle the default graphite dashboard, so that folks can go
to /graphite/dashboard/ on the calamari server to plot anything custom
they want to.

It could be quite interesting hook in Grafana there in the same way
that we currently hook in the default graphite dashboard, as it
grafana definitely nicer and would give us a roadmap to influxdb (a
project I am quite excited about).

Cheers,
John

1. https://github.com/ceph/Diamond/commits/calamari

On Fri, May 23, 2014 at 1:58 AM, Ricardo Rocha rocha.po...@gmail.com wrote:
 Hi.

 I saw the thread a couple days ago on ceph-users regarding collectd...
 and yes, i've been working on something similar for the last few days
 :)

 https://github.com/rochaporto/collectd-ceph

 It has a set of collectd plugins pushing metrics which mostly map what
 the ceph commands return. In the setup we have it pushes them to
 graphite and the displays rely on grafana (check for a screenshot in
 the link above).

 As it relies on common building blocks, it's easily extensible and
 we'll come up with new dashboards soon - things like plotting osd data
 against the metrics from the collectd disk plugin, which we also
 deploy.

 This email is mostly to share the work, but also to check on Calamari?
 I asked Patrick after the RedHat/Inktank news and have no idea what it
 provides, but i'm sure it comes with lots of extra sauce - he
 suggested to ask in the list.

 What's the timeline to have it open sourced? It would be great to have
 a look at it, and as there's work from different people in this area
 maybe start working together on some fancier monitoring tools.

 Regards,
   Ricardo
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl



On 22.05.2014 17:30, Craig Lewis wrote:

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does many small
files mean to you?  Also, how are you separating them into
directories?  Are you just giving files in the same directory the
same leading string, like dir1_subdir1_filename?


I can only estimate how many files. ATM I've 25M files on the origin but 
only 1/10th has been synced to radosgw. These are distributed throuhg 20 
folders, each containing about 2k directories with ~ 100 - 500 files each.


Do you think that's too much in that usecase?


I'm putting about 1M objects, random sizes, in each bucket.  I'm not
having problems getting individual files, or uploading new ones.  It
does take a long time for s3cmd to list the contents of the bucket. The
only time I get timeouts is when my cluster is very unhealthy.

If you're doing a lot more than that, say 10M or 100M objects, then that
could cause a hot spot on disk.  You might be better off taking your
directories, and putting them in their own bucket.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter
http://www.twitter.com/centraldesktop  | Facebook
http://www.facebook.com/CentralDesktop  | LinkedIn
http://www.linkedin.com/groups?gid=147417  | Blog
http://cdblog.centraldesktop.com/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Occasional Missing Admin Sockets

2014-05-23 Thread Loic Dachary
Hi Mike,

Sorry I missed this message. Are you able to reproduce the problem ? Does it 
always happen when you logrotate --force or only sometimes ?

Cheers

On 13/05/2014 21:23, Gregory Farnum wrote:
 Yeah, I just did so. :(
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
 On Tue, May 13, 2014 at 11:41 AM, Mike Dawson mike.daw...@cloudapt.com 
 wrote:
 Greg/Loic,

 I can confirm that logrotate --force /etc/logrotate.d/ceph removes the
 monitor admin socket on my boxes running 0.80.1 just like the description in
 Issue 7188 [0].

 0: http://tracker.ceph.com/issues/7188

 Should that bug be reopened?

 Thanks,
 Mike Dawson



 On 5/13/2014 2:10 PM, Gregory Farnum wrote:

 On Tue, May 13, 2014 at 9:06 AM, Mike Dawson mike.daw...@cloudapt.com
 wrote:

 All,

 I have a recurring issue where the admin sockets
 (/var/run/ceph/ceph-*.*.asok) may vanish on a running cluster while the
 daemons keep running


 Hmm.

 (or restart without my knowledge).


 I'm guessing this might be involved:

 I see this issue on
 a dev cluster running Ubuntu and Ceph Emperor/Firefly, deployed with
 ceph-deploy using Upstart to control daemons. I never see this issue on
 Ubuntu / Dumpling / sysvinit.


 *goes and greps the git log*

 I'm betting it was commit 45600789f1ca399dddc5870254e5db883fb29b38
 (which has, in fact, been backported to dumpling and emperor),
 intended so that turning on a new daemon wouldn't remove the admin
 socket of an existing one. But I think that means that if you activate
 the new daemon before the old one has finished shutting down and
 unlinking, you would end up with a daemon that had no admin socket.
 Perhaps it's an incomplete fix and we need a tracker ticket?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com



-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Occasional Missing Admin Sockets

2014-05-23 Thread Loic Dachary


On 13/05/2014 20:10, Gregory Farnum wrote:
 On Tue, May 13, 2014 at 9:06 AM, Mike Dawson mike.daw...@cloudapt.com wrote:
 All,

 I have a recurring issue where the admin sockets
 (/var/run/ceph/ceph-*.*.asok) may vanish on a running cluster while the
 daemons keep running
 
 Hmm.
 
 (or restart without my knowledge).
 
 I'm guessing this might be involved:
 
 I see this issue on
 a dev cluster running Ubuntu and Ceph Emperor/Firefly, deployed with
 ceph-deploy using Upstart to control daemons. I never see this issue on
 Ubuntu / Dumpling / sysvinit.
 
 *goes and greps the git log*
 
 I'm betting it was commit 45600789f1ca399dddc5870254e5db883fb29b38
 (which has, in fact, been backported to dumpling and emperor),
 intended so that turning on a new daemon wouldn't remove the admin
 socket of an existing one. But I think that means that if you activate
 the new daemon before the old one has finished shutting down and
 unlinking, you would end up with a daemon that had no admin socket.
 Perhaps it's an incomplete fix and we need a tracker ticket?

https://github.com/ceph/ceph/commit/45600789f1ca399dddc5870254e5db883fb29b38

I see the race condition now, missed it the first time around, thanks Greg :-) 
I'll work on it.

Cheers

 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Alexandre DERUMIER
Hi,

if you use debian,

try to use a recent kernel from backport (3.10)

also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1  (debian 
wheezy version is too old)

I don't see it in ceph repo:
http://ceph.com/debian-firefly/pool/main/l/leveldb/

(only for squeeze ~bpo60+1)

but you can take it from our proxmox repository
http://download.proxmox.com/debian/dists/wheezy/pve-no-subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb


- Mail original - 

De: jan zeller jan.zel...@id.unibe.ch 
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 10:50:40 
Objet: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean 

Dear ceph, 

I am trying to setup ceph 0.80.1 with the following components : 

1 x mon - Debian Wheezy (i386) 
3 x osds - Debian Wheezy (i386) 

(all are kvm powered) 

Status after the standard setup procedure : 

root@ceph-node2:~# ceph -s 
cluster d079dd72-8454-4b4a-af92-ef4c424d96d8 
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck 
unclean 
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1 
osdmap e11: 3 osds: 3 up, 3 in 
pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects 
103 MB used, 15223 MB / 15326 MB avail 
192 incomplete 

root@ceph-node2:~# ceph health 
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean 

root@ceph-node2:~# ceph osd tree 
# id weight type name up/down reweight 
-1 0 root default 
-2 0 host ceph-node2 
0 0 osd.0 up 1 
-3 0 host ceph-node3 
1 0 osd.1 up 1 
-4 0 host ceph-node4 
2 0 osd.2 up 1 


root@ceph-node2:~# ceph osd dump 
epoch 11 
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8 
created 2014-05-23 09:00:08.780211 
modified 2014-05-23 09:01:33.438001 
flags 

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0 

pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0 

pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3 

osd.0 up in weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/11373 192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b 

osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/10542 192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be 

osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.53:6800/6962 192.168.123.53:6801/6962 192.168.123.53:6802/6962 
192.168.123.53:6803/6962 exists,up aa06d7e4-181c-4d70-bb8e-018b088c5053 


What am I doing wrong here ? 
Or what kind of additional information should be provided to get 
troubleshooted. 

thanks, 

--- 

Jan 

P.S. with emperor 0.72.2 I had no such problems 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about osd objectstore = keyvaluestore-dev setting

2014-05-23 Thread Geert Lindemulder
Hello Greg and Haomai,

Thanks for the answers.
I was trying to implement the osd leveldb backend at an existing ceph
test cluster.

At the moment i am removing the osd's one by one and recreate them with
the objectstore = keyvaluestore-dev option in place in ceph.conf.
This works fine and the backend is leveldb now for the new osd's.
The leveldb backend looks more efficient.

The error gave me the idea that migrating from non-leveldb backend osd
to new type leveldb was possible.
Will online migration of existings osd's be added in the future?

Thanks,
Geert

On 05/23/2014 11:31 AM, GMail wrote:
 implement the osd leveldb backend at an existing ceph test
  cluster.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
 -Ursprüngliche Nachricht-
 Von: Alexandre DERUMIER [mailto:aderum...@odiso.com]
 Gesendet: Freitag, 23. Mai 2014 13:20
 An: Zeller, Jan (ID)
 Cc: ceph-users@lists.ceph.com
 Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck
 unclean
 
 Hi,
 
 if you use debian,
 
 try to use a recent kernel from backport (3.10)
 
 also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1  (debian
 wheezy version is too old)
 
 I don't see it in ceph repo:
 http://ceph.com/debian-firefly/pool/main/l/leveldb/
 
 (only for squeeze ~bpo60+1)
 
 but you can take it from our proxmox repository
 http://download.proxmox.com/debian/dists/wheezy/pve-no-
 subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb
 

thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04.
May be it's going to be a bit more easier...

---

jan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Alexandre DERUMIER
thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04. 
May be it's going to be a bit more easier... 

Yes,I think you can use last ubuntu lts, I think ceph 0.79 is officialy 
supported, so it should not be a problem for firefly.


- Mail original - 

De: jan zeller jan.zel...@id.unibe.ch 
À: aderum...@odiso.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 13:36:04 
Objet: AW: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean 

 -Ursprüngliche Nachricht- 
 Von: Alexandre DERUMIER [mailto:aderum...@odiso.com] 
 Gesendet: Freitag, 23. Mai 2014 13:20 
 An: Zeller, Jan (ID) 
 Cc: ceph-users@lists.ceph.com 
 Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck 
 unclean 
 
 Hi, 
 
 if you use debian, 
 
 try to use a recent kernel from backport (3.10) 
 
 also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1 (debian 
 wheezy version is too old) 
 
 I don't see it in ceph repo: 
 http://ceph.com/debian-firefly/pool/main/l/leveldb/ 
 
 (only for squeeze ~bpo60+1) 
 
 but you can take it from our proxmox repository 
 http://download.proxmox.com/debian/dists/wheezy/pve-no- 
 subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb 
 

thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04. 
May be it's going to be a bit more easier... 

--- 

jan 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to create authentication signature for getting user details

2014-05-23 Thread Shanil S
Hi All,

I would like to create a function for getting the user details by passing a
user id ( id) using php and curl. I am planning to pass the user id as
'admin' ( admin is a user which is already there ) and get the details of
that user. Could you please tell me how we can create the authentication
signature for this ? I tried the way as like in
http://mashupguide.net/1.0/html/ch16s05.xhtml#ftn.d0e27318 but its not
working and getting a Failed to authenticate error ( this is because the
signature is not generating properly )

If anyone knows a proper ways to generate authentication signature using
php, please help me to solve this.

-- 
Regards
Shanil
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about osd objectstore = keyvaluestore-dev setting

2014-05-23 Thread Wang Haomai


Best Wishes!

 在 2014年5月23日,19:27,Geert Lindemulder glindemul...@snow.nl 写道:
 
 Hello Greg and Haomai,
 
 Thanks for the answers.
 I was trying to implement the osd leveldb backend at an existing ceph
 test cluster.
 
 At the moment i am removing the osd's one by one and recreate them with
 the objectstore = keyvaluestore-dev option in place in ceph.conf.
 This works fine and the backend is leveldb now for the new osd's.
 The leveldb backend looks more efficient.

Happy to see it, although I'm still try to improve performance for some 
workloads.

 
 The error gave me the idea that migrating from non-leveldb backend osd
 to new type leveldb was possible.
 Will online migration of existings osd's be added in the future?

Still not, I think it's a good feature. We can implement it at ObjectStore 
class and simply convert one type to another
 
 Thanks,
 Geert
 
 On 05/23/2014 11:31 AM, GMail wrote:
 implement the osd leveldb backend at an existing ceph test
 cluster.
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to get Object ID ?

2014-05-23 Thread Shashank Puntamkar
I want to know/read  Objet ID assigned by ceph to file which I
transfered via crossftp.
How can  I read 64bit Object ID?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd pool default pg num problem

2014-05-23 Thread Cao, Buddy
In Firefly, I added below lines to [global] section in ceph.conf, however, 
after creating the cluster, the default pool “metadata/data/rbd”’s pg num is 
still over 900 but not 375.  Any suggestion?


osd pool default pg num = 375
osd pool default pgp num = 375


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pool snaps

2014-05-23 Thread Thorwald Lundqvist
Hi!

I can't find any information about ceph osd pool snapshots, except for the
commands mksnap and rmsnap.

What features does snapshots enable? Can I do things such as
diff-export/import just like rbd can?


Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
Thanks for your tips  tricks.



This setup is now based on ubuntu 12.04, ceph version 0.80.1



Still using



1 x mon

3 x osds





root@ceph-node2:~# ceph osd tree

# idweight type name up/downreweight

-10 root default

-20 host ceph-node2

0 0 osd.0 up
  1

-30 host ceph-node3

1 0 osd.1 up
  1

-40 host ceph-node1

2 0 osd.2 up
  1



root@ceph-node2:~# ceph -s

cluster c30e1410-fe1a-4924-9112-c7a5d789d273

 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean

 monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1

 osdmap e11: 3 osds: 3 up, 3 in

  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects

102 MB used, 15224 MB / 15326 MB avail

 192 incomplete







root@ceph-node2:~# cat mycrushmap.txt

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1



# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2



# types

type 0 osd

type 1 host

type 2 chassis

type 3 rack

type 4 row

type 5 pdu

type 6 pod

type 7 room

type 8 datacenter

type 9 region

type 10 root



# buckets

host ceph-node2 {

id -2  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.0 weight 0.000

}

host ceph-node3 {

id -3  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.1 weight 0.000

}

host ceph-node1 {

id -4  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.2 weight 0.000

}

root default {

id -1  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item ceph-node2 weight 0.000

item ceph-node3 weight 0.000

item ceph-node1 weight 0.000

}



# rules

rule replicated_ruleset {

ruleset 0

type replicated

min_size 1

max_size 10

step take default

step chooseleaf firstn 0 type host

step emit

}



# end crush map





Is there anything wrong with it ?







root@ceph-node2:~# ceph osd dump

epoch 11

fsid c30e1410-fe1a-4924-9112-c7a5d789d273

created 2014-05-23 15:16:57.772981

modified 2014-05-23 15:18:17.022152

flags



pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0



pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0



pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3



osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/4714 192.168.123.49:6801/4714 192.168.123.49:6802/4714 
192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804



osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/4685 192.168.123.50:6801/4685 192.168.123.50:6802/4685 
192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60



osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
[0,0) 192.168.123.53:6800/16807 192.168.123.53:6801/16807 
192.168.123.53:6802/16807 192.168.123.53:6803/16807 exists,up 
80a302d0-3493-4c39-b34b-5af233b32ba1





thanks

Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von 
Michael
Gesendet: Freitag, 23. Mai 2014 12:36
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

64 PG's per pool shouldn't cause any issues while there's only 3 OSD's. It'll 
be something to pay attention to if a lot more get added through.

Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if your step 
is set to osd or rack.
If it's not host then change it to that and pull it in again.

Check the docs on 

Re: [ceph-users] Unable to update Swift ACL's on existing containers

2014-05-23 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Yehuda

On 23/05/14 02:25, Yehuda Sadeh wrote:
 That looks like a bug; generally the permission checks there are 
 broken. I opened issue #8428, and pushed a fix on top of the
 firefly branch to wip-8428.

I cherry picked the fix and tested - LGTM.

Thanks for the quick fix.

Cheers

James

- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTf06vAAoJEL/srsug59jDXokP/3FREIK0HPOl9ZvA3d+y+XUx
J6v5mR9BMzVpY4yE4VIB7iZB7FiOPk9McqUSacmDhYvBy1KEwA92NYcF8G79GMiI
eNOTYFh0hAg3Lw+y79X8jJ4eWlw2NJGyqsm84UfkOLYOTIPCBOzeqv8X9tVUhChv
k20rEmIb0HBJnLp6gScyTrNgX1csOu2MdK+3/GlLeV8MiQJscea8lkbehDhdIJDj
FzLfTxPi2tFM8vfR1O/zvcotsWSq1xq2HdXcM1KTIJukMF++mfH6pHMUGthSCUzF
/g7DETg+IkGL3crxoZSDODztFR/Q7tD7KCKbd5jH29za11fvhZy9ZamcfJp7gsem
G70NYm3gC2kGnFu9A06IBNlwjDDTCzr1cTpdk2xi+kzGBqfshbJ4ppGvnQIypb29
689xXvwLJpIPAR56EGRlxY4W88z7E5krX72XcBTNsrIZP/KvrpKxSMgEhj8N4xZu
o3PVZlkMUJ8sOfDG5tWQRF7Nas6AyFhHodBW3vWtykkmW/+aI5dBCMMpm6QoNlMu
8WTGReqs6Skv/kxrpwmhlNLtl9JYU6xrF42/MKKg5zy6pxvRIffSqWV+oy9MdISb
hmtTCHTA9Fuj0/n/nUOCi3ZAwroEzcFwknYTivHiTLDaFu7u2eSl28sAczCQ2vie
bWYkBOn4FLvFtlnJ2kPF
=m4xJ
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread John Spray
Those settings are applied when creating new pools with osd pool
create, but not to the pools that are created automatically during
cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so
maybe it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy buddy@intel.com wrote:
 In Firefly, I added below lines to [global] section in ceph.conf, however,
 after creating the cluster, the default pool “metadata/data/rbd”’s pg num is
 still over 900 but not 375.  Any suggestion?





 osd pool default pg num = 375

 osd pool default pgp num = 375






 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to find the disk partitions attached to a OSD

2014-05-23 Thread Sharmila Govind
Thanks:-) That helped.

Thanks  Regards,
Sharmila


On Thu, May 22, 2014 at 6:41 PM, Alfredo Deza alfredo.d...@inktank.comwrote:

 Hopefully I am not late to the party :)

 But ceph-deploy recently gained a `osd list` subcommand that does this
 plus a bunch of other interesting metadata:

 $ ceph-deploy osd list node1
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /Users/alfredo/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.2):
 /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd list node1
 [node1][DEBUG ] connected to host: node1
 [node1][DEBUG ] detect platform information from remote host
 [node1][DEBUG ] detect machine type
 [node1][INFO  ] Running command: sudo ceph --cluster=ceph osd tree
 --format=json
 [node1][DEBUG ] connected to host: node1
 [node1][DEBUG ] detect platform information from remote host
 [node1][DEBUG ] detect machine type
 [node1][INFO  ] Running command: sudo ceph-disk list
 [node1][INFO  ] 
 [node1][INFO  ] ceph-0
 [node1][INFO  ] 
 [node1][INFO  ] Path   /var/lib/ceph/osd/ceph-0
 [node1][INFO  ] ID 0
 [node1][INFO  ] Name   osd.0
 [node1][INFO  ] Status up
 [node1][INFO  ] Reweight   1.00
 [node1][INFO  ] Magic  ceph osd volume v026
 [node1][INFO  ] Journal_uuid   214a6865-416b-4c09-b031-a354d4f8bdff
 [node1][INFO  ] Active ok
 [node1][INFO  ] Device /dev/sdb1
 [node1][INFO  ] Whoami 0
 [node1][INFO  ] Journal path   /dev/sdb2
 [node1][INFO  ] 

 On Thu, May 22, 2014 at 8:30 AM, John Spray john.sp...@inktank.com
 wrote:
  On Thu, May 22, 2014 at 10:57 AM, Sharmila Govind
  sharmilagov...@gmail.com wrote:
  root@cephnode4:/mnt/ceph/osd2# mount |grep ceph
  /dev/sdc on /mnt/ceph/osd3 type ext4 (rw)
  /dev/sdb on /mnt/ceph/osd2 type ext4 (rw)
 
  All the above commands just pointed out the mount
 points(/mnt/ceph/osd3),
  the folders were named by me as ceph/osd. But, if a new user has to get
 the
  osd mapping to the mounted devices, would be difficult if we named the
 osd
  disk folders differently. Any other command which could give the mapping
  would be useful.
 
  It really depends on how you have set up the OSDs.  If you're using
  ceph-deploy or ceph-disk to partition and format the drives, they get
  a special partition type set which marks them as a Ceph OSD.  On a
  system set up that way, you get nice uniform output like this:
 
  # ceph-disk list
  /dev/sda :
   /dev/sda1 other, ext4, mounted on /boot
   /dev/sda2 other, LVM2_member
  /dev/sdb :
   /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdb2
   /dev/sdb2 ceph journal, for /dev/sdb1
  /dev/sdc :
   /dev/sdc1 ceph data, active, cluster ceph, osd.3, journal /dev/sdc2
   /dev/sdc2 ceph journal, for /dev/sdc1
 
  John
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann
Hello,

I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
Sadly the fsyncs done by mon-processes eat my hdd.

I was able to disable this impact by moving the mon-data-dir to ramfs.
This should work until at least 2 nodes are running, but I want to implement 
some kind of disaster recover.

What’s the correct way to backup mon-data - if there is any?

Thanks,

Fabian



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Dan Van Der Ster
Hi,
I think you’re rather brave (sorry, foolish) to store the mon data dir in 
ramfs. One power outage and your cluster is dead. Even with good backups of the 
data dir I wouldn't want to go through that exercise.

Saying that, we had a similar disk-io-bound problem with the mon data dirs, and 
solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
scheduler would help, since at least then the OSD and MON processes would get 
fair shares of the disk IOs.

Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
consistent leveldb before copying the data to a safe place.
Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 23 May 2014, at 15:45, Fabian Zimmermann f.zimmerm...@xplosion.de wrote:

 Hello,
 
 I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
 Sadly the fsyncs done by mon-processes eat my hdd.
 
 I was able to disable this impact by moving the mon-data-dir to ramfs.
 This should work until at least 2 nodes are running, but I want to implement 
 some kind of disaster recover.
 
 What’s the correct way to backup mon-data - if there is any?
 
 Thanks,
 
 Fabian
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] network Ports Linked to each OSD process

2014-05-23 Thread Sharmila Govind
Hi,

Iam trying to do some network control on the storage nodes. For this, I
need to know the ports opened for communication by each OSD processes.

I got to know from the link
http://ceph.com/docs/master/rados/configuration/network-config-ref/ , that
each OSD process requires 3 ports and they from port 6800 it is reserved
for OSD processes.

However, when I do a ceph osd dump command, it lists 4 ports in use for
each of the OSDs:

*root@cephnode2:~# ceph osd dump | grep osd*
*max_osd 4*
*osd.0 up   in  weight 1 up_from 71 up_thru 71 down_at 68
last_clean_interval [4,70) 10.223.169.166:6800/83380
http://10.223.169.166:6800/83380 10.223.169.166:6810/1083380
http://10.223.169.166:6810/1083380 10.223.169.166:6811/1083380
http://10.223.169.166:6811/1083380 10.223.169.166:6812/1083380
http://10.223.169.166:6812/1083380 exists,up
fdbbc6eb-7d9f-4ad8-a8c3-caf995422528*
*osd.1 up   in  weight 1 up_from 7 up_thru 71 down_at 0 last_clean_interval
[0,0) 10.223.169.201:6800/83569 http://10.223.169.201:6800/83569
10.223.169.201:6801/83569 http://10.223.169.201:6801/83569
10.223.169.201:6802/83569 http://10.223.169.201:6802/83569
10.223.169.201:6803/83569 http://10.223.169.201:6803/83569 exists,up
db545fd7-071f-4671-b1c4-c57221f894a3*
*osd.2 up   in  weight 1 up_from 64 up_thru 64 down_at 61
last_clean_interval [12,60) 10.223.169.166:6805/92402
http://10.223.169.166:6805/92402 10.223.169.166:6806/92402
http://10.223.169.166:6806/92402 10.223.169.166:6807/92402
http://10.223.169.166:6807/92402 10.223.169.166:6808/92402
http://10.223.169.166:6808/92402 exists,up
594b73b9-1908-4757-b914-d887d850b386*
*osd.3 up   in  weight 1 up_from 17 up_thru 71 down_at 0
last_clean_interval [0,0) 10.223.169.201:6805/84590
http://10.223.169.201:6805/84590 10.223.169.201:6806/84590
http://10.223.169.201:6806/84590 10.223.169.201:6807/84590
http://10.223.169.201:6807/84590 10.223.169.201:6808/84590
http://10.223.169.201:6808/84590 exists,up
37536050-ef92-4eba-95a7-e7a099c6d059*
*root@cephnode2:~# *



I also, listed the ports listening on the above highlighted OSD  process
using lsof

 root@cephnode2:~/nethogs# lsof -i | grep ceph | grep 83380
ntpd1627  ntp   19u  IPv4   33890  0t0  UDP
cephnode2.iind.intel.com:ntp
*ceph-osd   83380 root4u  IPv4 4881747  0t0  TCP *:6800 (LISTEN)*
*ceph-osd   83380 root5u  IPv4 5045544  0t0  TCP
cephnode2.iind.intel.com:6810 http://cephnode2.iind.intel.com:6810
(LISTEN)*
*ceph-osd   83380 root6u  IPv4 5045545  0t0  TCP
cephnode2.iind.intel.com:6811 http://cephnode2.iind.intel.com:6811
(LISTEN)*
*ceph-osd   83380 root7u  IPv4 5045546  0t0  TCP
cephnode2.iind.intel.com:6812 http://cephnode2.iind.intel.com:6812
(LISTEN)*
*ceph-osd   83380 root8u  IPv4 4881751  0t0  TCP *:6804 (LISTEN)*
ceph-osd   83380 root   19u  IPv4 5101954  0t0  TCP
cephnode2.iind.intel.com:6800-computeich.iind.intel.com:60781 (ESTABLISHED)
ceph-osd   83380 root   23u  IPv4 5013387  0t0  TCP
cephnode2.iind.intel.com:41878-cephnode4.iind.intel.com:6803 (ESTABLISHED)
ceph-osd   83380 root   25u  IPv4 5037728  0t0  TCP
cephnode2.iind.intel.com:44251-cephnode4.iind.intel.com:6802 (ESTABLISHED)
ceph-osd   83380 root   83u  IPv4 5025954  0t0  TCP
cephnode2.iind.intel.com:47863-cephnode4.iind.intel.com:6808 (ESTABLISHED)
ceph-osd   83380 root  111u  IPv4 4850005  0t0  TCP
cephnode2.iind.intel.com:43189-cephnode2.iind.intel.com:6807 (ESTABLISHED)
ceph-osd   83380 root  112u  IPv4 4850839  0t0  TCP
cephnode2.iind.intel.com:59738-cephnode2.iind.intel.com:6808 (ESTABLISHED)
ceph-osd   83380 root  130u  IPv4 5037729  0t0  TCP
cephnode2.iind.intel.com:41902-cephnode4.iind.intel.com:6807 (ESTABLISHED)
ceph-osd   83380 root  152u  IPv4 5013621  0t0  TCP
cephnode2.iind.intel.com:34798-cephmon.iind.intel.com:6789 (ESTABLISHED)
ceph-osd   83380 root  159u  IPv4 5040569  0t0  TCP
cephnode2.iind.intel.com:6811-cephnode4.iind.intel.com:35321 (ESTABLISHED)
ceph-osd   83380 root  160u  IPv4 5040570  0t0  TCP
cephnode2.iind.intel.com:6812-cephnode4.iind.intel.com:42682 (ESTABLISHED)
ceph-osd   83380 root  161u  IPv4 5043767  0t0  TCP
cephnode2.iind.intel.com:6812-cephnode4.iind.intel.com:42683 (ESTABLISHED)
ceph-osd   83380 root  162u  IPv4 5038664  0t0  TCP
cephnode2.iind.intel.com:6811-cephnode4.iind.intel.com:35324 (ESTABLISHED)


In the above list, it looks like it is listening to some additional
ports(6810-6812) from what is listed in the ceph osd dump command.

I would like to know, if there is any straight way of listing the ports
used by each OSD process.

Also, I would also like to understand the networking architecture of Ceph
in more detail. Is there any link/doc for the same?

Thanks in Advance,
Sharmila
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Wido den Hollander

On 05/23/2014 04:09 PM, Dan Van Der Ster wrote:

Hi,
I think you’re rather brave (sorry, foolish) to store the mon data dir in 
ramfs. One power outage and your cluster is dead. Even with good backups of the 
data dir I wouldn't want to go through that exercise.



Agreed. Foolish. I'd never do that.


Saying that, we had a similar disk-io-bound problem with the mon data dirs, and 
solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
scheduler would help, since at least then the OSD and MON processes would get 
fair shares of the disk IOs.

Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
consistent leveldb before copying the data to a safe place.


I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/


Wido


Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 23 May 2014, at 15:45, Fabian Zimmermann f.zimmerm...@xplosion.de wrote:


Hello,

I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
Sadly the fsyncs done by mon-processes eat my hdd.

I was able to disable this impact by moving the mon-data-dir to ramfs.
This should work until at least 2 nodes are running, but I want to implement 
some kind of disaster recover.

What’s the correct way to backup mon-data - if there is any?

Thanks,

Fabian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

2014-05-23 Thread Christian Balzer

For what it's worth (very little in my case)...

Since the cluster wasn't in production yet and Firefly (0.80.1) did hit
Debian Jessie today I upgraded it.

Big mistake...

I did the recommended upgrade song and dance, MONs first, OSDs after that.

Then applied ceph osd crush tunables default as per the update
instructions and since ceph -s was whining about it.

Lastly I did a ceph osd pool set rbd hashpspool true and after that was
finished (people with either a big cluster or slow network probably should
avoid this like the plague) I re-ran the below fio from a VM (old or new
client libraries made no difference) again.

The result, 2800 write IOPS instead of 3200 with Emperor.

So much for improved latency and whatnot...

Christian

On Wed, 14 May 2014 21:33:06 +0900 Christian Balzer wrote:

 
 Hello!
 
 On Wed, 14 May 2014 11:29:47 +0200 Josef Johansson wrote:
 
  Hi Christian,
  
  I missed this thread, haven't been reading the list that well the last
  weeks.
  
  You already know my setup, since we discussed it in an earlier thread.
  I don't have a fast backing store, but I see the slow IOPS when doing
  randwrite inside the VM, with rbd cache. Still running dumpling here
  though.
  
 Nods, I do recall that thread.
 
  A thought struck me that I could test with a pool that consists of OSDs
  that have tempfs-based disks, think I have a bit more latency than your
  IPoIB but I've pushed 100k IOPS with the same network devices before.
  This would verify if the problem is with the journal disks. I'll also
  try to run the journal devices in tempfs as well, as it would test
  purely Ceph itself.
 
 That would be interesting indeed.
 Given what I've seen (with the journal at 20% utilization and the actual
 filestore ataround 5%) I'd expect Ceph to be the culprit. 
  
  I'll get back to you with the results, hopefully I'll manage to get
  them done during this night.
 
 Looking forward to that. ^^
 
 
 Christian 
  Cheers,
  Josef
  
  On 13/05/14 11:03, Christian Balzer wrote:
   I'm clearly talking to myself, but whatever.
  
   For Greg, I've played with all the pertinent journal and filestore
   options and TCP nodelay, no changes at all.
  
   Is there anybody on this ML who's running a Ceph cluster with a fast
   network and FAST filestore, so like me with a big HW cache in front
   of a RAID/JBODs or using SSDs for final storage?
  
   If so, what results do you get out of the fio statement below per
   OSD? In my case with 4 OSDs and 3200 IOPS that's about 800 IOPS per
   OSD, which is of course vastly faster than the normal indvidual HDDs
   could do.
  
   So I'm wondering if I'm hitting some inherent limitation of how fast
   a single OSD (as in the software) can handle IOPS, given that
   everything else has been ruled out from where I stand.
  
   This would also explain why none of the option changes or the use of
   RBD caching has any measurable effect in the test case below. 
   As in, a slow OSD aka single HDD with journal on the same disk would
   clearly benefit from even the small 32MB standard RBD cache, while in
   my test case the only time the caching becomes noticeable is if I
   increase the cache size to something larger than the test data size.
   ^o^
  
   On the other hand if people here regularly get thousands or tens of
   thousands IOPS per OSD with the appropriate HW I'm stumped. 
  
   Christian
  
   On Fri, 9 May 2014 11:01:26 +0900 Christian Balzer wrote:
  
   On Wed, 7 May 2014 22:13:53 -0700 Gregory Farnum wrote:
  
   Oh, I didn't notice that. I bet you aren't getting the expected
   throughput on the RAID array with OSD access patterns, and that's
   applying back pressure on the journal.
  
   In the a picture being worth a thousand words tradition, I give
   you this iostat -x output taken during a fio run:
  
   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 50.820.00   19.430.170.00   29.58
  
   Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
   avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
   sda   0.0051.500.00 1633.50 0.00  7460.00
   9.13 0.180.110.000.11   0.01   1.40 sdb
   0.00 0.000.00 1240.50 0.00  5244.00 8.45 0.30
   0.250.000.25   0.02   2.00 sdc   0.00 5.00
   0.00 2468.50 0.00 13419.0010.87 0.240.100.00
   0.10   0.09  22.00 sdd   0.00 6.500.00 1913.00
   0.00 10313.0010.78 0.200.100.000.10   0.09
   16.60
  
   The %user CPU utilization is pretty much entirely the 2 OSD
   processes, note the nearly complete absence of iowait.
  
   sda and sdb are the OSDs RAIDs, sdc and sdd are the journal SSDs.
   Look at these numbers, the lack of queues, the low wait and service
   times (this is in ms) plus overall utilization.
  
   The only conclusion I can draw from these numbers and the network
   results below is that the latency happens 

Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann
Hi,

Am 23.05.2014 um 16:09 schrieb Dan Van Der Ster daniel.vanders...@cern.ch:

 Hi,
 I think you’re rather brave (sorry, foolish) to store the mon data dir in 
 ramfs. One power outage and your cluster is dead. Even with good backups of 
 the data dir I wouldn't want to go through that exercise.
 

I know - I’m still testing my env and I don’t really plan to use ramfs in prod, 
but technically it’s quite interesting ;)

 Saying that, we had a similar disk-io-bound problem with the mon data dirs, 
 and solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
 scheduler would help, since at least then the OSD and MON processes would get 
 fair shares of the disk IOs.

Oh, when did they switch the default sched to deadline? Thanks for the hint, 
moved to cfq - tests are running.

 Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
 consistent leveldb before copying the data to a safe place.

Well, this wouldn’t be a real problem, but I’m worrying about how effective 
this would be?

Is it enough to restore such a backup even if in the meantime (since the backup 
was done) data-objects have changed? I don’t think so :(

Conclude:

* ceph would stop/freeze as soon as amount of nodes is less than quorum
* ceph would continue to work as soon as node go up again
* I could create a fresh mon on every node directly on boot by importing 
current state  ceph-mon --force-sync --yes-i-really-mean-it ...

So, as long as there are enough mon to build the quorum, it should work with 
ramfs. 
If nodes fail one by one, ceph would stop if quorum is lost and continue if 
nodes are back.
But if all nodes stop (f.e. poweroutage) my ceph-cluster is dead and backups 
wouldn’t prevent this, isn’t it?

Maybe snapshotting the pool could help?

Backup:
* create a snapshot
* shutdown one mon
* backup mon-dir

Restore:
* import mon-dir
* create further mons until quorum is restored
* restore snapshot

Possible?.. :D

Thanks,

Fabian


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann

Hi,

 Am 23.05.2014 um 17:31 schrieb Wido den Hollander w...@42on.com:
 
 I wrote a blog about this: 
 http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/

so you assume restoring the old data is working, or did you proof this?

Fabian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Day Boston Schedule Released

2014-05-23 Thread Patrick McGarry
Hey cephers,

Just wanted to let you know that the schedule has been posted for Ceph
Day Boston happening on 10 June at the Sheraton Boston, MA:

http://www.inktank.com/cephdays/boston/

There are still a couple of talk title tweaks that are pending, but I
wanted to get the info out as soon as possible.  We have some really
solid speakers, including a couple of highly technical talks from the
CohortFS guys and a demo of one of the hot new ethernet drives that is
poised to take the market by storm.

If you haven't signed up yet, please don't wait!  We want to make sure
we can adequately accommodate everyone that wishes to attend.  Thanks,
and see you there!


Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] centos and 'print continue' support

2014-05-23 Thread Bryan Stillwell
Yesterday I went through manually configuring a ceph cluster with a
rados gateway on centos 6.5, and I have a question about the
documentation.  On this page:

https://ceph.com/docs/master/radosgw/config/

It mentions On CentOS/RHEL distributions, turn off print continue. If
you have it set to true, you may encounter problems with PUT
operations.  However, when I had 'rgw print continue = false' in my
ceph.conf, adding objects with the python boto module would hang at:

key.set_contents_from_string('Hello World!')

After switching it to 'rgw print continue = true' things started working.

I'm wondering if this is because I installed the custom
apache/mod_fastcgi packages from the instructions on this page?:

http://ceph.com/docs/master/install/install-ceph-gateway/#id2

If that's the case, could the docs be updated to mention that setting
'rgw print continue = false' is only needed if you're using the distro
packages?

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread McNamara, Bradley
The other thing to note, too, is that it appears you're trying to decrease the 
PG/PGP_num parameters, which is not supported.  In order to decrease those 
settings, you'll need to delete and recreate the pools.  All new pools created 
will use the settings defined in the ceph.conf file.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Friday, May 23, 2014 6:38 AM
To: Cao, Buddy
Cc: ceph-users@lists.ceph.com; ceph-u...@ceph.com
Subject: Re: [ceph-users] osd pool default pg num problem

Those settings are applied when creating new pools with osd pool create, but 
not to the pools that are created automatically during cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so maybe 
it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy buddy@intel.com wrote:
 In Firefly, I added below lines to [global] section in ceph.conf, 
 however, after creating the cluster, the default pool 
 “metadata/data/rbd”’s pg num is still over 900 but not 375.  Any suggestion?





 osd pool default pg num = 375

 osd pool default pgp num = 375






 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Designing a cluster with ceph and benchmark (ceph vs ext4)

2014-05-23 Thread Listas@Adminlinux

Hi !

I have failover clusters for some aplications. Generally with 2 members 
configured with Ubuntu + Drbd + Ext4. For example, my IMAP cluster works 
fine with ~ 50k email accounts and my HTTP cluster hosts ~2k sites.


See design here: http://adminlinux.com.br/cluster_design.txt

I would like to provide load balancing instead of just failover. So, I 
would like to use a distributed architecture of the filesystem. As we 
know, Ext4 isn't a distributed filesystem. So wish to use Ceph in my 
clusters.


Any suggestions for design of the cluster with Ubuntu+Ceph?

I built a simple cluster of 2 servers to test simultaneous reading and 
writing with Ceph. My conf:  http://adminlinux.com.br/ceph_conf.txt


But in my simultaneous benchmarks found errors in reading and writing. I 
ran iozone -t 5 -r 4k -s 2m simultaneously on both servers in the 
cluster. The performance was poor and had errors like this:


Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b660
Error in file: Position 1060864
Record # 259 Record size 4 kb
where b660 loop 0

Performance graphs of benchmark: http://adminlinux.com.br/ceph_bench.html

Can you help me find what I did wrong?

Thanks !

--
Thiago Henrique
www.adminlinux.com.br
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests

2014-05-23 Thread Craig Lewis

On 5/22/14 11:51 , Győrvári Gábor wrote:

Hello,

Got this kind of logs in two node of 3 node cluster both node has 2 
OSD, only affected 2 OSD on two separate node thats why i dont 
understand the situation. There wasnt any extra io on the system at 
the given time.


Using radosgw with s3 api to store objects under ceph average ops 
around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / 
sec written.


osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from 
[2] **

*


Are any of your PGs in recovery or backfill?

I've seen this happen two different ways.  The first time was because I 
had the recovery and backfill parameters set too high for my cluster.  
If your journals aren't SSDs, the default parameters are too high.  The 
recovery operation will use most of the IOps, and starve the clients.


The second time I saw this is when one disk was starting to fail. 
Sectors starting failing, and the drive spent a lot of time reading and 
remapping bad sectors.  Consumer class SATA disks will retry bad sectors 
for 30+ second.  It happens in the drive firmware, so it's not something 
you can stop.  Enterprise class drives will give up quicker, since they 
know you have another copy of the data.  (Nobody uses enterprise class 
drives stand-alone; they're always in some sort of storage array).


I've had reports of 6+ OSDs blocking subops, and I traced it back to one 
disk that was blocking others.  I replaced that disk, and the warnings 
went away.



If your cluster is healthy, check the SMART attributes for osd.2. If 
osd.2 looks good, it might another osd.  Check osd.2 logs, and check any 
osd that are blocking osd.2.  If your cluster is small, it might be 
faster to just check all disks instead of following the trail.




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread Craig Lewis
If you're not using CephFS, you don't need metadata or data pools.  You 
can delete them.

If you're not using RBD, you don't need the rbd pool.

If you are using CephFS, and you do delete and recreate the 
metadata/data pools, you'll need to tell CephFS.  I think the command is 
ceph mds add_data_pool new_data_pool_id.  I'm not using CephFS, so I 
can't test that.  I'm don't see any commands to set the metadata pool 
for CephFS, but it seems strange that you have to tell it about the data 
pool, but not the metadata pool.




On 5/23/14 11:22 , McNamara, Bradley wrote:

The other thing to note, too, is that it appears you're trying to decrease the 
PG/PGP_num parameters, which is not supported.  In order to decrease those 
settings, you'll need to delete and recreate the pools.  All new pools created 
will use the settings defined in the ceph.conf file.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Friday, May 23, 2014 6:38 AM
To: Cao, Buddy
Cc: ceph-users@lists.ceph.com; ceph-u...@ceph.com
Subject: Re: [ceph-users] osd pool default pg num problem

Those settings are applied when creating new pools with osd pool create, but 
not to the pools that are created automatically during cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so maybe 
it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy buddy@intel.com wrote:

In Firefly, I added below lines to [global] section in ceph.conf,
however, after creating the cluster, the default pool
“metadata/data/rbd”’s pg num is still over 900 but not 375.  Any suggestion?





osd pool default pg num = 375

osd pool default pgp num = 375






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Wido den Hollander

On 05/23/2014 06:30 PM, Fabian Zimmermann wrote:


Hi,


Am 23.05.2014 um 17:31 schrieb Wido den Hollander w...@42on.com:

I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/


so you assume restoring the old data is working, or did you proof this?



No, that won't work in ALL situations. But it's always better to have a 
backup of your mons instead of having none.



Fabian




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Craig Lewis

On 5/23/14 09:30 , Fabian Zimmermann wrote:

Hi,


Am 23.05.2014 um 17:31 schrieb Wido den Hollander w...@42on.com:

I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/

so you assume restoring the old data is working, or did you proof this?


I did some of the same things, but never tested a restore 
(http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/3087). 
There is a discussion, but I can't figure out how to get gmane to show 
me the threaded version from a google search.



I stopped doing the backups, because they seemed rather useless.

The monitors have a snapshot of the cluster state right now.  If you 
ever need to restore a monitor backup, you're effectively rolling the 
whole cluster back to that point in time.


What happens if you've added disks after the backup?
What happens if a disk has failed after the backup?
What happens if you write data to the cluster after the backup?
What happens if you delete data after the backup, and it gets garbage 
collected?


All questions that can be tested and answered... with a lot of time and 
experimentation.  I decided to add more monitors and stop taking backups.



I'm still thinking about doing manual backups before a major ceph 
version upgrade.  In that case, I'd only need to test the write/delete 
cases, because I can control the the add/remove disk cases.  The backups 
would only be useful between restarting the MON and the OSD processes 
though.  I can't really backup the OSD state[1], so once they're 
upgraded, there's no going back.



1: ZFS or Btrfs snapshots could do this, but neither one are recommended 
for production.  I do plan to make snapshots once either FS is 
production ready.  LVM snapshots could do it, but they're such a pain 
that I never bothered.  And I have the scripts I used to use to make LVM 
snapshots of MySQL data directories.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Dimitri Maziuk
On 05/23/2014 03:06 PM, Craig Lewis wrote:

 1: ZFS or Btrfs snapshots could do this, but neither one are recommended
 for production.

Out of curiosity, what's the current beef with zfs? I know what problems
are cited for btrfs, but I haven't heard much about zfs lately.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Craig Lewis

On 5/23/14 03:47 , Georg Höllrigl wrote:



On 22.05.2014 17:30, Craig Lewis wrote:

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does many small
files mean to you?  Also, how are you separating them into
directories?  Are you just giving files in the same directory the
same leading string, like dir1_subdir1_filename?


I can only estimate how many files. ATM I've 25M files on the origin 
but only 1/10th has been synced to radosgw. These are distributed 
throuhg 20 folders, each containing about 2k directories with ~ 100 - 
500 files each.


Do you think that's too much in that usecase?

The recommendations I've seen indicate that 25M objects per bucket is 
doable, but painful.  The bucket is itself an object stored in Ceph, 
which stores the list of objects in that bucket.   With a single bucket 
containing 25M objects, you're going to hotspot on the bucket.  Think of 
a bucket like a directory on a filesystem.  You wouldn't store 25M files 
in a single directory.


Buckets are a bit simpler than directories.  They don't have to track 
permissions, per file ACLs, and all the other things that POSIX 
filesystems do.  You can push them harder than a normal directory, but 
the same concepts still apply.  The more files you put in a 
bucket/directory, the slower it gets.  Most filesystems impose a hard 
limit on the number of files in a directory.  RadosGW doesn't have a 
limit, it just gets slower.


Even the list of buckets has this problem.  You wouldn't want to create 
25M buckets with one object each.  By default, there is a 1000 bucket 
limit per user, but you can increase that.



If you can handle using 20 buckets, it would be worthwhile to put each 
one of your top 20 folders into it's own bucket.  If you can break it 
apart even more, that would be even better.


I mentioned that I have a bunch of buckets with ~1M objects each. GET 
and PUT of objects is still fast, but listing the contents of the bucket 
takes a long time.  Each bucket takes 20-30 minutes to get a full 
listing.  If you're going to be doing a lot of bucket listing, you might 
want to keep each bucket below 1000 items.  Maybe each of your 2k 
directories gets it's own bucket.



If using more than one bucket is difficult, then 25M objects in one 
bucket will work.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions about zone and disater recovery

2014-05-23 Thread Craig Lewis

On 5/21/14 19:49 , wsnote wrote:

Hi,everyone!
I have 2 ceph clusters, one master zone, another secondary zone.
Now I have some question.
1. Can ceph have two or more secondary zones?


It's supposed to work, but I haven't tested it.



2. Can the role of master zone and secondary zone transform mutual?
I mean I can change the secondary zone to be master and the master 
zone to secondary.
Yes and no.  You can promote the slave to a master at any time by 
disabling replication, and writing to it.  You'll want to update your 
region and zone maps, but that's only required to make replication 
between zones work.


Converting the master to a secondary zone... I don't know. Everything 
will work if you delete the contents of the old master, set it up as a 
new secondary of the new master, and re-replicate everything.   Nobody 
wants to do that.  It would be nice if you could just point the old 
master (with it's existing data) at the new master, and it would start 
replicating.  I can't answer that.




3. How to deal with the situation when the master zone is down?
Now the secondary zone forbids all the operations of files, such as 
create objects, delete objects.
When the master zone is down, users can't do anything to the files 
except read objects from the secondary zone.
It's a bad user experience. Additionly, it will have a bad influence 
on the confidence of the users.
I know the limit of secondary zone is out of consideration for the 
consistency of data. However, is there another way to improve some 
experience?

I think:
There can be a config that allow the files operations of the secondary 
zone.If the master zone is down, the admin can enable it, then the 
users can do files opeartions as usually. The secondary record all the 
files operations of the files. When the master zone gets right, the 
admin can sync files to the master zone manually.




The secondary zone tracks what metadata operations that it has replayed 
from the master zone.  It does this per bucket.


In theory, there's no reason you can have additional buckets in the 
slave zone that the master zone doesn't have.  Since these buckets 
aren't replicated, there shouldn't be a problem writing to them.  In 
theory, you should even be able to write objects to the existing buckets 
in the slave, as long as the master doesn't have those objects.  I don't 
know what would happen if you created one of those buckets or objects on 
the master.  Maybe replication breaks, or maybe it just overwrites the 
data in the slave.


That's a lot of in theory though.  I wouldn't attempt it without a lot 
of simulation in test clusters.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Cédric Lemarchand
Hello Dimitri,
 Le 23 mai 2014 à 22:33, Dimitri Maziuk dmaz...@bmrb.wisc.edu a écrit :
 
 On 05/23/2014 03:06 PM, Craig Lewis wrote:
 
 1: ZFS or Btrfs snapshots could do this, but neither one are recommended
 for production.
 
 Out of curiosity, what's the current beef with zfs? I know what problems
 are cited for btrfs, but I haven't heard much about zfs lately.

The Linux implementation (ZoL) is actually stable for production, but is quiet 
memory hungry because of a spl/slab fragmentation issue ...

But I would ask a question : even with a snapshot capable FS, is it sufficient 
to achieve a consistent backup of a running leveldb ? Or did you plan to 
stop/snap/start the mon ? (No knowledge at all about leveldb ...)

Cheers 

 
 -- 
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread Ricardo Rocha
Hi John.

Thanks for the reply, sounds very good.

The extra visualizations from kibana (grafana only seems to pack a
small subset, but the codebase is basically the same) look cool, will
put some more in soon - seems like they can still be useful later.

Looking forward to some calamari.

Cheers,
  Ricardo

On Fri, May 23, 2014 at 10:42 PM, John Spray john.sp...@inktank.com wrote:
 Hi Ricardo,

 Let me share a few notes on metrics in calamari:
  * We're bundling graphite, and using diamond to send home metrics.
 The diamond collector used in calamari has always been open source
 [1].
  * The Calamari UI has its own graphs page that talks directly to the
 graphite API (the calamari REST API does not duplicate any of the
 graphing interface)
  * We also bundle the default graphite dashboard, so that folks can go
 to /graphite/dashboard/ on the calamari server to plot anything custom
 they want to.

 It could be quite interesting hook in Grafana there in the same way
 that we currently hook in the default graphite dashboard, as it
 grafana definitely nicer and would give us a roadmap to influxdb (a
 project I am quite excited about).

 Cheers,
 John

 1. https://github.com/ceph/Diamond/commits/calamari

 On Fri, May 23, 2014 at 1:58 AM, Ricardo Rocha rocha.po...@gmail.com wrote:
 Hi.

 I saw the thread a couple days ago on ceph-users regarding collectd...
 and yes, i've been working on something similar for the last few days
 :)

 https://github.com/rochaporto/collectd-ceph

 It has a set of collectd plugins pushing metrics which mostly map what
 the ceph commands return. In the setup we have it pushes them to
 graphite and the displays rely on grafana (check for a screenshot in
 the link above).

 As it relies on common building blocks, it's easily extensible and
 we'll come up with new dashboards soon - things like plotting osd data
 against the metrics from the collectd disk plugin, which we also
 deploy.

 This email is mostly to share the work, but also to check on Calamari?
 I asked Patrick after the RedHat/Inktank news and have no idea what it
 provides, but i'm sure it comes with lots of extra sauce - he
 suggested to ask in the list.

 What's the timeline to have it open sourced? It would be great to have
 a look at it, and as there's work from different people in this area
 maybe start working together on some fancier monitoring tools.

 Regards,
   Ricardo
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests

2014-05-23 Thread Győrvári Gábor

Hello,

No i dont see any backfill log in ceph.log during that period, drives 
are WD2000FYYZ-01UL1B1 but i did not find any informations in SMART, and 
yes i will check other drives too.


Could i determine somehow, in which PG placed the file?

Thanks

2014.05.23. 20:51 keltezéssel, Craig Lewis írta:

On 5/22/14 11:51 , Győrvári Gábor wrote:

Hello,

Got this kind of logs in two node of 3 node cluster both node has 2 
OSD, only affected 2 OSD on two separate node thats why i dont 
understand the situation. There wasnt any extra io on the system at 
the given time.


Using radosgw with s3 api to store objects under ceph average ops 
around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / 
sec written.


osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from 
[2] **

*


Are any of your PGs in recovery or backfill?

I've seen this happen two different ways.  The first time was because 
I had the recovery and backfill parameters set too high for my 
cluster.  If your journals aren't SSDs, the default parameters are too 
high.  The recovery operation will use most of the IOps, and starve 
the clients.


The second time I saw this is when one disk was starting to fail. 
Sectors starting failing, and the drive spent a lot of time reading 
and remapping bad sectors.  Consumer class SATA disks will retry bad 
sectors for 30+ second.  It happens in the drive firmware, so it's not 
something you can stop.  Enterprise class drives will give up quicker, 
since they know you have another copy of the data.  (Nobody uses 
enterprise class drives stand-alone; they're always in some sort of 
storage array).


I've had reports of 6+ OSDs blocking subops, and I traced it back to 
one disk that was blocking others.  I replaced that disk, and the 
warnings went away.



If your cluster is healthy, check the SMART attributes for osd.2. If 
osd.2 looks good, it might another osd.  Check osd.2 logs, and check 
any osd that are blocking osd.2.  If your cluster is small, it might 
be faster to just check all disks instead of following the trail.




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/




--
Győrvári Gábor - Scr34m
scr...@frontember.hu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Designing a cluster with ceph and benchmark (ceph vs ext4)

2014-05-23 Thread Christian Balzer

Hello,

On Fri, 23 May 2014 15:41:23 -0300 Listas@Adminlinux wrote:

 Hi !
 
 I have failover clusters for some aplications. Generally with 2 members 
 configured with Ubuntu + Drbd + Ext4. For example, my IMAP cluster works 
 fine with ~ 50k email accounts and my HTTP cluster hosts ~2k sites.
 
My mailbox servers are also multiple DRBD based cluster pairs. 
For performance in fully redundant storage there is isn't anything better
(in the OSS, generic hardware section at least).

 See design here: http://adminlinux.com.br/cluster_design.txt
 
 I would like to provide load balancing instead of just failover. So, I 
 would like to use a distributed architecture of the filesystem. As we 
 know, Ext4 isn't a distributed filesystem. So wish to use Ceph in my 
 clusters.

You will find that all cluster/distributed filesystems have severe
performance shortcomings when compared to something like Ext4.

On top of that, CephFS isn't ready for production as the MDS isn't HA.

A potential middle way might be to use Ceph/RBD volumes formatted in Ext4.
That doesn't give you shared access, but it will allow you to separate
storage and compute nodes, so when one compute node becomes busy, mount
that volume from a more powerful compute node instead.

That all said, I can't see any way and reason to replace my mailbox DRBD
clusters with Ceph in the foreseeable future.
To get similar performance/reliability to DRBD I would have to spend 3-4
times the money.

Where Ceph/RBD works well is situations where you can't fit the compute
needs into a storage node (as required with DRBD) and where you want to
access things from multiple compute nodes, primarily for migration
purposes. 
In short, as a shared storage for VMs.

 Any suggestions for design of the cluster with Ubuntu+Ceph?
 
 I built a simple cluster of 2 servers to test simultaneous reading and 
 writing with Ceph. My conf:  http://adminlinux.com.br/ceph_conf.txt
 
Again, CephFS isn't ready for production, but other than that I know very
little about it as I don't use it.
However your version of Ceph is severely outdated, you really should be
looking at something more recent to rule out you're experience long fixed
bugs. The same goes for your entire setup and kernel.

Also Ceph only starts to perform decently with many OSDs (disks) and
the journals on SSDs instead of being on the same disk.
Think DRBD AL metadata-internal, but with MUCH more impact.

Regards,

Christian
 But in my simultaneous benchmarks found errors in reading and writing. I 
 ran iozone -t 5 -r 4k -s 2m simultaneously on both servers in the 
 cluster. The performance was poor and had errors like this:
 
 Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b660
 Error in file: Position 1060864
 Record # 259 Record size 4 kb
 where b660 loop 0
 
 Performance graphs of benchmark: http://adminlinux.com.br/ceph_bench.html
 
 Can you help me find what I did wrong?
 
 Thanks !
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com