Re: [ceph-users] Cannot attach volumes

2014-06-11 Thread yalla.gnan.kumar
Hi Karan,

I have checked the cinder logs but could not find anything suspicious.


Thanks
Kumar

From: Karan Singh [mailto:karan.si...@csc.fi]
Sent: Tuesday, June 10, 2014 3:14 PM
To: Gnan Kumar, Yalla
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cannot attach volumes

Hi Kumar

Clock skew is just a warning and should not related to this problem. But its 
pretty easy to fix this warning either by setting up NTP on all Ceph cluster 
nodes or by adding  mon clock drift warn backoff = Seconds  into ceph.conf 
(do not do this in production)

WRT to your second problem , try to check out cinder volume logs and scheduler 
logs  , you would surely find something there. If not try increasing debug 
level of cinder and check for some clues.


- Karan Singh -

On 10 Jun 2014, at 09:53, 
yalla.gnan.ku...@accenture.commailto:yalla.gnan.ku...@accenture.com wrote:


Hi All,


I have four node ceph cluster.  I have another three node setup for openstack.  
I have integrated Ceph with openstack.
Whenever I try to create storage with ceph as storage backend for the openstack 
 vm,  the creation process goes on forever in the horizon dashboard.
It never completes. Also when attaching the ceph  volume to the VM in 
openstack, it freezes and goes on forever without completion.

To investigate this issue, I typed the 'ceph -s' command on the ceph nodes.  
The health of ceph cluster  is in warning state.  It says it detected clock skew
on two of the nodes.   Is this time synchronization the reason beyond the VM 
freezing while attaching volumes to VMs ?


Thanks
Kumar




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.comhttp://www.accenture.com/
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Selection Criteria for Deep-Scrub

2014-06-11 Thread Dan Van Der Ster
Hi Greg,
This tracker issue is relevant: http://tracker.ceph.com/issues/7288
Cheers, Dan

On 11 Jun 2014, at 00:30, Gregory Farnum g...@inktank.com wrote:

 Hey Mike, has your manual scheduling resolved this? I think I saw
 another similar-sounding report, so a feature request to improve scrub
 scheduling would be welcome. :)
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
 On Tue, May 20, 2014 at 5:46 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 I tend to set it whenever I don't want to be bothered by storage performance
 woes (nights I value sleep, etc).
 
 This cluster is bounded by relentless small writes (it has a couple dozen
 rbd volumes backing video surveillance DVRs). Some of the software we run is
 completely unaffected whereas other software falls apart during periods of
 deep-scrubs. I theorize it has to do with the individual software's attitude
 about flushing to disk / buffering.
 
 - Mike
 
 
 
 On 5/20/2014 8:31 PM, Aaron Ten Clay wrote:
 
 For what it's worth, version 0.79 has different headers, and the awk
 command needs $19 instead of $20. But here is the output I have on a
 small cluster that I recently rebuilt:
 
 $ ceph pg dump all | grep active | awk '{ print $19}' | sort -k1 | uniq -c
 dumped all in format plain
   1 2014-05-15
   2 2014-05-17
  19 2014-05-18
 193 2014-05-19
 105 2014-05-20
 
 I have set noscrub and nodeep-scrub, as well as noout and nodown off and
 on while I performed various maintenance, but that hasn't (apparently)
 impeded the regular schedule.
 
 With what frequency are you setting the nodeep-scrub flag?
 
 -Aaron
 
 
 On Tue, May 20, 2014 at 5:21 PM, Mike Dawson mike.daw...@cloudapt.com
 mailto:mike.daw...@cloudapt.com wrote:
 
Today I noticed that deep-scrub is consistently missing some of my
Placement Groups, leaving me with the following distribution of PGs
and the last day they were successfully deep-scrubbed.
 
# ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 |
uniq -c
   5 2013-11-06
 221 2013-11-20
   1 2014-02-17
  25 2014-02-19
  60 2014-02-20
   4 2014-03-06
   3 2014-04-03
   6 2014-04-04
   6 2014-04-05
  13 2014-04-06
   4 2014-04-08
   3 2014-04-10
   2 2014-04-11
  50 2014-04-12
  28 2014-04-13
  14 2014-04-14
   3 2014-04-15
  78 2014-04-16
  44 2014-04-17
   8 2014-04-18
   1 2014-04-20
  16 2014-05-02
  69 2014-05-04
 140 2014-05-05
 569 2014-05-06
9231 2014-05-07
 103 2014-05-08
 514 2014-05-09
1593 2014-05-10
 393 2014-05-16
2563 2014-05-17
1283 2014-05-18
1640 2014-05-19
1979 2014-05-20
 
I have been running the default osd deep scrub interval of once
per week, but have disabled deep-scrub on several occasions in an
attempt to avoid the associated degraded cluster performance I have
written about before.
 
To get the PGs longest in need of a deep-scrub started, I set the
nodeep-scrub flag, and wrote a script to manually kick off
deep-scrub according to age. It is processing as expected.
 
Do you consider this a feature request or a bug? Perhaps the code
that schedules PGs to deep-scrub could be improved to prioritize PGs
that have needed a deep-scrub the longest.
 
Thanks,
Mike Dawson
_
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-11 Thread Dan Van Der Ster
On 10 Jun 2014, at 11:59, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

 One idea I had was to check the behaviour under different disk io schedulers, 
 trying exploit thread io priorities with cfq. So I have a question for the 
 developers about using ionice or ioprio_set to lower the IO priorities of the 
 threads responsible for scrubbing: 
   - Are there dedicated threads always used for scrubbing only, and never for 
 client IOs? If so, can an admin identify the thread IDs so he can ionice 
 those? 
   - If OTOH a disk/op thread is switching between scrubbing and client IO 
 responsibilities, could Ceph use ioprio_set to change the io priorities on 
 the fly??

I just submitted a feature request for this:  
http://tracker.ceph.com/issues/8580

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] failed when activate the OSD

2014-06-11 Thread jiangdahui
after I created 1 mon and prepared  2 osds,I checked and found that the fsid of 
the three are same,but when I input *ceph-deploy osd activate 
node2:/var/local/osd0 node3:/var/local/osd1*, the error output were as follows:
node2][WARNIN] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid 
3e68a2b5-cbf3-4149-9462-b89e2a40236e


It was strange that the fsid in the output is different with that of the three 
node,and if I modified the three nodes' , another error happend as 
[node2][WARNIN] 2014-06-11 01:39:17.738451 b63cfb40  0 librados: 
client.bootstrap-osd authentication error (1) Operation not permitted


what should I do?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it still unsafe to map a RBD device on an OSD server?

2014-06-11 Thread Mikaël Cluseau

On 06/11/2014 08:20 AM, Sebastien Han wrote:

Thanks for your answers


u I have that for an apt-cache since more than 1 year now, never had 
an issue. Of course, your question is not about having a krbd device 
backing an OSD of the same cluster ;-)
attachment: mcluseau.vcf___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitor down

2014-06-11 Thread yalla.gnan.kumar
I have a four node ceph storage cluster. Ceph -s  is showing one monitor as 
down . How to start it and in which server do I have to start it ?


---
root@cephadmin:/home/oss# ceph -w
cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674
 health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2
 monmap e3: 3 mons at 
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
 election epoch 844, quorum 0,1 cephnode1,cephnode2
 mdsmap e225: 1/1/1 up {0=cephnode1=up:active}
 osdmap e297: 3 osds: 3 up, 3 in
  pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects
21881 MB used, 51663 MB / 77501 MB avail
 448 active+clean
--

Thanks
Kumar



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is CRUSH used on reading ?

2014-06-11 Thread Wido den Hollander

On 06/11/2014 12:51 PM, Florent B wrote:

Hi,

I would like to know if Ceph uses CRUSH algorithm when a read operation
occurs, for example to select the nearest OSD storing the asked object.


CRUSH is used when reading since it's THE algorithm inside Ceph to 
determine data placement.


CRUSH doesn't support reading the nearest object, it will always read 
from the primary OSD for a PG, but you can influence the primary affinity.




Thank you :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor down

2014-06-11 Thread Wido den Hollander

On 06/11/2014 01:23 PM, yalla.gnan.ku...@accenture.com wrote:

I have a four node ceph storage cluster. Ceph –s  is showing one monitor
as down . How to start it and in which server do I have to start it ?



It's cephnode3 which is down. Log in and do:

$ start ceph-mon-all


---

root@cephadmin:/home/oss# ceph -w

 cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674

  health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2

  monmap e3: 3 mons at
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
election epoch 844, quorum 0,1 cephnode1,cephnode2

  mdsmap e225: 1/1/1 up {0=cephnode1=up:active}

  osdmap e297: 3 osds: 3 up, 3 in

   pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects

 21881 MB used, 51663 MB / 77501 MB avail

  448 active+clean

--

Thanks

Kumar




This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and
its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of information
security and assessment of internal compliance with Accenture policy.
__

www.accenture.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to remove mds

2014-06-11 Thread yalla.gnan.kumar
Hi All,

I have a four node ceph cluster. The metadata service is showing as degraded in 
health. How to remove the mds service from ceph ?


=-
root@cephadmin:/home/oss# ceph -s
cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674
 health HEALTH_WARN mds cluster is degraded
 monmap e3: 3 mons at 
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
 election epoch 874, quorum 0,1,2 cephnode1,cephnode2,cephnode3
 mdsmap e227: 1/1/1 up {0=cephnode1=up:replay}
 osdmap e299: 3 osds: 3 up, 3 in
  pgmap v214988: 448 pgs, 5 pools, 9495 bytes data, 30 objects
22693 MB used, 50851 MB / 77501 MB avail
 448 active+clean
--

Thanks
Kumar





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-11 Thread Davide Fanciola
Hi,

we have a similar setup where we have SSD and HDD in the same hosts.
Our very basic crushmap is configured as follows:

# ceph osd tree
# id weight type name up/down reweight
-6 3 root ssd
3 1 osd.3 up 1
4 1 osd.4 up 1
5 1 osd.5 up 1
-5 3 root platters
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1
-1 3 root default
-2 1 host chgva-srv-stor-001
0 1 osd.0 up 1
3 1 osd.3 up 1
-3 1 host chgva-srv-stor-002
1 1 osd.1 up 1
4 1 osd.4 up 1
-4 1 host chgva-srv-stor-003
2 1 osd.2 up 1
5 1 osd.5 up 1


We do not seem to have problems with this setup, but i'm not sure if it's a
good practice to have elements appearing multiple times in different
branches.
On the other hand, I see no way to follow the physical hierarchy of a
datacenter for pools, since a pool can be spread among
servers/racks/rooms...

Can someone confirm this crushmap is any good for our configuration?

Thanks is advance.

BR
Davide



On Mon, Mar 3, 2014 at 12:48 PM, Wido den Hollander w...@42on.com wrote:

 On 03/03/2014 12:45 PM, Vikrant Verma wrote:

 Hi All,

 Is it possible to map OSDs from different hosts (servers) to a Pool in
 ceph cluster?

 In Crush Map we can add a bucket mentioning the host details (hostname
 and its weight).

 Is it possible to configure a bucket  which contains OSDs from different
 hosts?


 I think it's possible.

 But you can always try it and afterwards run crushtool with tests:

 $ crushtool -i mycrushmap --test --rule 0 --num-rep 3 --show-statistics

 That will run some tests on your compiled crushmap


 if possible please let me know how to configure it.

 Regards,
 Vikrant



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I have PGs that I can't deep-scrub

2014-06-11 Thread Sage Weil
Hi Craig,

It's hard to say what is going wrong with that level of logs.  Can you 
reproduce with debug ms = 1 and debug osd = 20?

There were a few things fixed in scrub between emperor and firefly.  Are 
you planning on upgrading soon?

sage


On Tue, 10 Jun 2014, Craig Lewis wrote:

 Every time I deep-scrub one PG, all of the OSDs responsible get kicked
 out of the cluster.  I've deep-scrubbed this PG 4 times now, and it
 fails the same way every time.  OSD logs are linked at the bottom.
 
 What can I do to get this deep-scrub to complete cleanly?
 
 This is the first time I've deep-scrubbed these PGs since Sage helped
 me recover from some OSD problems
 (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html)
 
 I can trigger the issue easily in this cluster, but have not been able
 to re-create in other clusters.
 
 
 
 
 
 
 The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp
 are 48009'1904117 2014-05-21 07:28:01.315996 respectively.  This PG is
 owned by OSDs [11,0]
 
 This is a secondary cluster, so I stopped all external I/O on it.  I
 set nodeep-scrub, and restarted both OSDs with:
   debug osd = 5/5
   debug filestore = 5/5
   debug journal = 1
   debug monc = 20/20
 
 then I ran a deep-scrub on this PG.
 
 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555
 active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used,
 77870 GB / 130 TB avail
 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554
 active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail
 
 
 At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU
 (100.3% +/- 1.0%).  Prior to this, they were both using ~30% CPU.  It
 might've started a few seconds sooner, I'm watching top.
 
 I forgot to watch IO stat until 10:56.  At this point, both OSDs are
 reading.  iostat reports that they're both doing ~100
 transactions/sec, reading ~1 MiBps, 0 writes.
 
 
 At 11:01:26, iostat reports that both osds are no longer consuming any
 disk I/O.  They both go for  30 seconds with 0 transactions, and 0
 kiB read/write.  There are small bumps of 2 transactions/sec for one
 second, then it's back to 0.
 
 
 At 11:02:41, the primary OSD gets kicked out by the monitors:
 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555
 active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2
 op/s
 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg
 stats for 903.825187seconds
 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in
 
 Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range).
 
 
 At ~11:10, I see that osd.11 has resumed reading from disk at the
 original levels (~100 tps, ~1MiBps read, 0 MiBps write).  Since it's
 down, but doing something, I let it run.
 
 Both the osd.11 and osd.0 repeat this pattern.  Reading for a while at
 ~1 MiBps, then nothing.  The duty cycle seems about 50%, with a 20
 minute period, but I haven't timed anything.  CPU usage remains at
 100%, regardless of whether IO is happening or not.
 
 
 At 12:24:15, osd.11 rejoins the cluster:
 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot
 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in
 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1
 stale+active+clean+scrubbing+deep, 2266 active+clean, 5
 stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd,
 18 op/s; 3617854/61758142 objects degraded (5.858%)
 
 
 osd.0's CPU usage drops back to normal when osd.11 rejoins the
 cluster.  The PG stats have not changed.   The last_deep_scrub and
 deep_scrub_stamp are still 48009'1904117 2014-05-21 07:28:01.315996
 respectively.
 
 
 This time, osd.0 did not get kicked out by the monitors.  In previous
 attempts, osd.0 was kicked out 5-10 minutes after osd.11.  When that
 happens, osd.0 rejoins the cluster after osd.11.
 
 
 I have several more PGs exhibiting the same behavior.  At least 3 that
 I know of, and many more that I haven't attempted to deep-scrub.
 
 
 
 
 
 
 ceph -v: ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 ceph.conf: https://cd.centraldesktop.com/p/eAAADvxuAHJRUk4
 ceph-osd.11.log (5.7 MiB):
 https://cd.centraldesktop.com/p/eAAADvxyABPwaeM
 ceph-osd.0.log (6.3 MiB):
 https://cd.centraldesktop.com/p/eAAADvx0ADWEGng
 ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADvxvAAylTW0
 
 (the pg query was collected at 13:24, after the above events)
 
 
 
 
 Things that probably don't matter:
 The OSD partitions were created using ceph-disk-prepare --dmcrypt.
 --
 To unsubscribe from this list: send the line unsubscribe 

[ceph-users] ceph-deploy - problem creating an osd

2014-06-11 Thread Markus Goldberg

Hi,
ceph-deploy-1.5.3 can make trouble, if a reboot is done between 
preparation and aktivation of an osd:


The osd-disk was /dev/sdb at this time, osd itself should go to sdb1, 
formatted to cleared, journal should go to sdb2, formatted to btrfs

I prepared an osd:

root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs 
prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v 
--overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
bd-1:/dev/sdb1:/dev/sdb2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
[bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal 
/dev/sdb2 activate False
[bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs 
--cluster ceph -- /dev/sdb1 /dev/sdb2

[bd-1][DEBUG ]
[bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
[bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
[bd-1][DEBUG ]
[bd-1][DEBUG ] fs created label (null) on /dev/sdb1
[bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
[bd-1][DEBUG ] Btrfs v3.12
[bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data
[bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink 
limit per file to 65536
[bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but 
we have been unable to inform the kernel of the change, probably because 
it/they are in use.  As a result, the old partition(s) will remain in 
use.  You should reboot now before making further changes.

[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
Unhandled exception in thread started by
sys.excepthook is missing
lost sys.stderr

ceph-deploy told me to do a reboot, so i did.
After the reboot the osd-disk changed from sdb to sda. This is a known 
problem of linux (ubuntu)


root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd 
activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
bd-1:/dev/sda1:/dev/sda2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart 
--mount /dev/sda1

[bd-1][WARNIN] got monmap epoch 1
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check: 
ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected 
fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal

[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1 
filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object 
store /var/lib/ceph/tmp/mnt.LryOxo journal 
/var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid 
08066b4a-3f36-4e3f-bd1e-15c006a09057
[bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: error 
reading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open 
/var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created new 
key in keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring

[bd-1][WARNIN] added key for osd.4
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph -s
cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
 health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; 
recovery 19/60 objects degraded (31.667%); clock skew detected on mon.bd-1
 monmap e1: 3 mons at 

[ceph-users] pid_max value?

2014-06-11 Thread Cao, Buddy
Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into create thread fail problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pid_max value?

2014-06-11 Thread Maciej Bonin
Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535
net.core.netdev_max_backlog = 3
net.core.somaxconn = 16384
net.ipv4.tcp_max_syn_backlog = 252144
net.ipv4.tcp_max_tw_buckets = 36
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem = 8388608 8388608 8388608
net.ipv4.route.flush = 1
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements
ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 
500 EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England  Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into create thread fail problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy - problem creating an osd

2014-06-11 Thread Alfredo Deza
On Wed, Jun 11, 2014 at 9:29 AM, Markus Goldberg
goldb...@uni-hildesheim.de wrote:
 Hi,
 ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation
 and aktivation of an osd:

 The osd-disk was /dev/sdb at this time, osd itself should go to sdb1,
 formatted to cleared, journal should go to sdb2, formatted to btrfs
 I prepared an osd:

 root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs
 prepare bd-1:/dev/sdb1:/dev/sdb2
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /root/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v
 --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
 bd-1:/dev/sdb1:/dev/sdb2
 [bd-1][DEBUG ] connected to host: bd-1
 [bd-1][DEBUG ] detect platform information from remote host
 [bd-1][DEBUG ] detect machine type
 [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
 [ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
 [bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
 [bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block
 --action=add
 [ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal
 /dev/sdb2 activate False
 [bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs --cluster
 ceph -- /dev/sdb1 /dev/sdb2
 [bd-1][DEBUG ]
 [bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
 [bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
 [bd-1][DEBUG ]
 [bd-1][DEBUG ] fs created label (null) on /dev/sdb1
 [bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
 [bd-1][DEBUG ] Btrfs v3.12
 [bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is
 not the same device as the osd data
 [bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink
 limit per file to 65536
 [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
 have been unable to inform the kernel of the change, probably because
 it/they are in use.  As a result, the old partition(s) will remain in use.
 You should reboot now before making further changes.
 [bd-1][INFO  ] checking OSD status...
 [bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
 [ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
 Unhandled exception in thread started by
 sys.excepthook is missing
 lost sys.stderr

 ceph-deploy told me to do a reboot, so i did.

This is actually not ceph-deploy asking you for a reboot but the
stderr captured from the
remote node (bd-1 in your case).

ceph-deploy will log output from remote nodes and will preface the
logs with the hostname when
the output happens remotely. stderr will be used as WARNING level and
stdout as DEBUG.

So in your case this line is output from ceph-disk-prepare/btrfs:

 [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
 have been unable to inform the kernel of the change, probably because
 it/they are in use.  As a result, the old partition(s) will remain in use.
 You should reboot now before making further changes.

Have you tried 'create' instead of 'prepare' and 'activate' ?

 After the reboot the osd-disk changed from sdb to sda. This is a known
 problem of linux (ubuntu)

 root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /root/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
 activate bd-1:/dev/sda1:/dev/sda2
 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
 bd-1:/dev/sda1:/dev/sda2
 [bd-1][DEBUG ] connected to host: bd-1
 [bd-1][DEBUG ] detect platform information from remote host
 [bd-1][DEBUG ] detect machine type
 [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
 [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
 [ceph_deploy.osd][DEBUG ] will use init type: upstart
 [bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
 --mount /dev/sda1
 [bd-1][WARNIN] got monmap epoch 1
 [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
 [bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check:
 ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected
 fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal
 [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
 [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
 [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
 [bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1
 filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find
 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
 [bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object
 store /var/lib/ceph/tmp/mnt.LryOxo journal
 /var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid
 08066b4a-3f36-4e3f-bd1e-15c006a09057
 [bd-1][WARNIN] 

Re: [ceph-users] pid_max value?

2014-06-11 Thread Cao, Buddy
Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each storage 
node?  Do you think kernel.pid_max = 4194303 is reasonable since it increase 
a lot from the default OS setting.


Wei Cao (Buddy)

-Original Message-
From: Maciej Bonin [mailto:maciej.bo...@m247.com] 
Sent: Wednesday, June 11, 2014 10:07 PM
To: Cao, Buddy; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 
net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 
net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 
net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 
net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 
8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 
net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 
net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements ISO 
27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 
EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England  Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into create thread fail problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash dump ?

2014-06-11 Thread Gregory Farnum
On Wednesday, June 11, 2014, Florent B flor...@coppint.com wrote:

 Hi every one,

 Sometimes my MDS crashes... sometimes after a few hours, sometimes after
 a few days.

 I know I could enable debugging and so on to get more information. But
 if it crashes after a few days, it generates gigabytes of debugging data
 that are not related to the crash.

 Is it possible to get just a crash dump when MDS is crashing, to see
 what's wrong ?


You should be getting a backtrace regardless of what debugging levels are
enabled, so I assume you mean having it dump out prior log lines when that
happens. And indeed you can.
Normally you specify something like
debug mds =10
And that dumps out the log. You can instead specify two values, separated
by a slash, and the daemon will take the time to generate all the log lines
at the second value but only dump to disk the first value:
debug mds = 0/10
That will put nothing in the log, but will generate debug output level 10
in a memory ring buffer (1 entries), and dump it on a crash. You can do
this with any debug setting.
-Greg






 Thank you.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com javascript:;
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] umount gets stuck when umounting a cloned rbd image

2014-06-11 Thread Alphe Salas
Hello I address you with this issue i noticed it with ceph 072.2 and 
linux ubuntu 13.10  and with 0.80.1 with ubuntu 14.04.

here is what i do:
1) I create and format to ext4 or xfs a rbd image of 4 TB . the image 
has --order 25 and --image-format 2

2) I create a snapshot of that rbd image
3) I protect that snapshot
4) I create a clone image of that inicial rbd image using the protected 
snapshot as reference.
5) I insert the line in /etc/ceph/rbdmap I map the new image. I mount 
the new image to my ceph client server.


Until here all is fine cool and dandy

6) I umount the /dev/rbd1 which is the previous mounted rbd clone image. 
and umount is stuck


in the client server with the umount stuck i have this message in the 
/var/log/syslog


Jun 11 12:26:10 tesla kernel: [63365.178657] libceph: osd8 
20.10.10.105:6803 socket error on read


as it seems the problem is somehow related to osd8 on my 20.10.10.105 
ceph node then i go there to get more information from log


in the /var/log/ceph-osd.8.log there is this message comming in endlessly

2014-06-11 12:31:51.692031 7fa26085c700  0 -- 20.10.10.105:6805/23321  
20.10.10.12:0/2563935849 pipe(0x9dd6780 sd=231 :6805 s=0 pgs=0 cs=0 l=0 
c=0x7ed6840).accept peer addr is really 20.10.10.12:0/2563935849 (socket 
is 20.10.10.12:33056/0)




Can anyone help me solve this issue ?

--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to remove mds

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 4:56 AM,  yalla.gnan.ku...@accenture.com wrote:
 Hi All,



 I have a four node ceph cluster. The metadata service is showing as degraded
 in health. How to remove the mds service from ceph ?

Unfortunately you can't remove it entirely right now, but if you
create a new filesystem using the newfs command, and don't turn on
an MDS daemon after that, it won't report a health error.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola dfanci...@gmail.com wrote:
 Hi,

 we have a similar setup where we have SSD and HDD in the same hosts.
 Our very basic crushmap is configured as follows:

 # ceph osd tree
 # id weight type name up/down reweight
 -6 3 root ssd
 3 1 osd.3 up 1
 4 1 osd.4 up 1
 5 1 osd.5 up 1
 -5 3 root platters
 0 1 osd.0 up 1
 1 1 osd.1 up 1
 2 1 osd.2 up 1
 -1 3 root default
 -2 1 host chgva-srv-stor-001
 0 1 osd.0 up 1
 3 1 osd.3 up 1
 -3 1 host chgva-srv-stor-002
 1 1 osd.1 up 1
 4 1 osd.4 up 1
 -4 1 host chgva-srv-stor-003
 2 1 osd.2 up 1
 5 1 osd.5 up 1


 We do not seem to have problems with this setup, but i'm not sure if it's a
 good practice to have elements appearing multiple times in different
 branches.
 On the other hand, I see no way to follow the physical hierarchy of a
 datacenter for pools, since a pool can be spread among
 servers/racks/rooms...

 Can someone confirm this crushmap is any good for our configuration?

If you accidentally use the default node anywhere, you'll get data
scattered across both classes of device. If you try and use both the
platters and ssd nodes within a single CRUSH rule, you might end
up with copies of data on the same host (reducing your data
resiliency). Otherwise this is just fine.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pid_max value?

2014-06-11 Thread Maciej Bonin
We have not experienced any downsides to this approach performance or 
stability-wise, if you prefer you can experiment with the values, but I see no 
real advantage in doing so.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements
ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 
500 EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England  Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 


-Original Message-
From: Cao, Buddy [mailto:buddy@intel.com] 
Sent: 11 June 2014 17:00
To: Maciej Bonin; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each storage 
node?  Do you think kernel.pid_max = 4194303 is reasonable since it increase 
a lot from the default OS setting.


Wei Cao (Buddy)

-Original Message-
From: Maciej Bonin [mailto:maciej.bo...@m247.com] 
Sent: Wednesday, June 11, 2014 10:07 PM
To: Cao, Buddy; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 
net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 
net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 
net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 
net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 
8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 
net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 
net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements ISO 
27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 
EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England  Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into create thread fail problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Moving Ceph cluster to different network segment

2014-06-11 Thread Fred Yang
We need to move Ceph cluster to different network segment for
interconnectivity between mon and osc, anybody has the procedure regarding
how that can be done? Note that the host name reference will be changed, so
originally the osd host referenced as cephnode1, in the new segment it will
be cephnode1-n.

Thanks,
Fred

Sent from my Samsung Galaxy S3
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] tiering : hit_set_count hit_set_period memory usage ?

2014-06-11 Thread Alexandre DERUMIER
Hi,

I'm reading tiering doc here
http://ceph.com/docs/firefly/dev/cache-pool/


The hit_set_count and hit_set_period define how much time each HitSet should 
cover, and how many such HitSets to store. Binning accesses over time allows 
Ceph to independently determine whether an object was accessed at least once 
and whether it was accessed more than once over some time period (“age” vs 
“temperature”). Note that the longer the period and the higher the count the 
more RAM will be consumed by the ceph-osd process. In particular, when the 
agent is active to flush or evict cache objects, all hit_set_count HitSets are 
loaded into RAM

about how much memory do we talk here ? any formula ? (nr object x ? )

I'm looking for hit_set_period like 12h or 24h


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tiering : hit_set_count hit_set_period memory usage ?

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER
aderum...@odiso.com wrote:
 Hi,

 I'm reading tiering doc here
 http://ceph.com/docs/firefly/dev/cache-pool/

 
 The hit_set_count and hit_set_period define how much time each HitSet should 
 cover, and how many such HitSets to store. Binning accesses over time allows 
 Ceph to independently determine whether an object was accessed at least once 
 and whether it was accessed more than once over some time period (“age” vs 
 “temperature”). Note that the longer the period and the higher the count the 
 more RAM will be consumed by the ceph-osd process. In particular, when the 
 agent is active to flush or evict cache objects, all hit_set_count HitSets 
 are loaded into RAM

 about how much memory do we talk here ? any formula ? (nr object x ? )

We haven't really quantified that yet. In particular, it's going to
depend on how many objects are accessed within a period; the OSD sizes
them based on the previous access count and the false positive
probability that you give it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH_WARN pool has too few pgs

2014-06-11 Thread Eric Eastman

Hi,

I am seeing the following warning on one of my test clusters:

# ceph health detail
HEALTH_WARN pool Ray has too few pgs
pool Ray objects per pg (24) is more than 12 times cluster average (2)

This is a reported issue and is set to Won't Fix at:
http://tracker.ceph.com/issues/8103

My test cluster has a mix of test data, and the pool showing the 
warning is used for RBD Images.



# ceph df detail
GLOBAL:
   SIZE  AVAIL RAW USED %RAW USED OBJECTS
   1009G 513G  496G 49.14 33396
POOLS:
NAME   ID CATEGORY USED   %USED 
OBJECTS DIRTY READ   WRITE
data   0  -0  0 0   
   0 0  0
metadata   1  -0  0 0   
   0 0  0
rbd2  -0  0 0   
   0 0  0
iscsi  3  -847M   0.08  241 
   211   11839k 10655k
cinder 4  -305M   0.03  53  
   2 51579  31584
glance 5  -65653M 6.35  
82227 512k   10405
.users.swift   7  -0  0 0   
   0 0  4
.rgw.root  8  -1045   0 4   
   4 23 5
.rgw.control   9  -0  0 8   
   8 0  0
.rgw   10 -2520 2   
   2 3  11
.rgw.gc11 -0  0 32  
   324958   3328
.users.uid 12 -5750 3   
   3 70 23
.users 13 -9  0 1   
   1 0  9
.users.email   14 -0  0 0   
   0 0  0
.rgw.buckets   15 -0  0 0   
   0 0  0
.rgw.buckets.index 16 -0  0 1   
   1 1  1
Ray17 -99290M 9.61  
24829   24829 0  0



It would be nice if we could turn off this message.

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] For ceph firefly, which version kernel client should be used?

2014-06-11 Thread Liu Baogang
Dear Sir,

In our test, we use ceph firefly to build a cluster. On a node with kernel 
3.10.xx, if using kernel client to mount cephfs, when use ‘ls’ command, 
sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far it 
seems it work well.

I guess that the kernel 3.10.xx is too old, so the kernel client does not work 
well. If it is right, which version of kernel shall we use?

Thanks,
Baogang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph pgs stuck inactive since forever

2014-06-11 Thread Akhil.Labudubariki
I installed ceph and then I was ceph health it gives me the following output

HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs 
stuck unclean; 2 near full osd(s)

This is the output of a single pg when I use ceph health detail

pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')

and similar line comes up for all the pgs.

This is the output of ceph - s

cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a
 health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs 
stuck unclean; 2 near full osd(s)
 monmap e1: 1 mons at {a=100.112.12.28:6789/0}, election epoch 2, quorum 0 a
 osdmap e5: 2 osds: 2 up, 2 in
  pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects
111 GB used, 8346 MB / 125 GB avail
 384 incomplete
Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem installing ceph from package manager / ceph repositories

2014-06-11 Thread Dimitri Maziuk
On 06/09/2014 03:08 PM, Karan Singh wrote:

 1. When installing Ceph using package manger and ceph repositores , the
 package manager i.e YUM does not respect the ceph.repo file and takes ceph
 package directly from EPEL .

Option 1: install yum-plugin-priorities, add priority = X to ceph.repo.
X should be less than EPEL's priority, the default is I believe 99.

Option 2: add exclude = ceph_package(s) to epel.repo.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN pool has too few pgs

2014-06-11 Thread Jean-Charles LOPEZ
Hi Eric,

increase the number of PGs in your pool with 
Step 1: ceph osd pool set poolname pg_num newvalue 
Step 2: ceph osd pool set poolname pgp_num newvalue 

You can check the number of PGs in your pool with ceph osd dump | grep ^pool

See documentation: http://ceph.com/docs/master/rados/operations/pools/

JC



On Jun 11, 2014, at 12:59, Eric Eastman eri...@aol.com wrote:

 Hi,
 
 I am seeing the following warning on one of my test clusters:
 
 # ceph health detail
 HEALTH_WARN pool Ray has too few pgs
 pool Ray objects per pg (24) is more than 12 times cluster average (2)
 
 This is a reported issue and is set to Won't Fix at:
 http://tracker.ceph.com/issues/8103
 
 My test cluster has a mix of test data, and the pool showing the warning is 
 used for RBD Images.
 
 
 # ceph df detail
 GLOBAL:
   SIZE  AVAIL RAW USED %RAW USED OBJECTS
   1009G 513G  496G 49.14 33396
 POOLS:
NAME   ID CATEGORY USED   %USED OBJECTS
  DIRTY READ   WRITE
data   0  -0  0 0  
 0 0  0
metadata   1  -0  0 0  
 0 0  0
rbd2  -0  0 0  
 0 0  0
iscsi  3  -847M   0.08  241
 211   11839k 10655k
cinder 4  -305M   0.03  53 
 2 51579  31584
glance 5  -65653M 6.35  8222   
  7 512k   10405
.users.swift   7  -0  0 0  
 0 0  4
.rgw.root  8  -1045   0 4  
 4 23 5
.rgw.control   9  -0  0 8  
 8 0  0
.rgw   10 -2520 2  
 2 3  11
.rgw.gc11 -0  0 32 
 324958   3328
.users.uid 12 -5750 3  
 3 70 23
.users 13 -9  0 1  
 1 0  9
.users.email   14 -0  0 0  
 0 0  0
.rgw.buckets   15 -0  0 0  
 0 0  0
.rgw.buckets.index 16 -0  0 1  
 1 1  1
Ray17 -99290M 9.61  24829  
  24829 0  0
 
 
 It would be nice if we could turn off this message.
 
 Eric
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem installing ceph from package manager / ceph repositories

2014-06-11 Thread Karan Singh
Hi Dimitri

It was already resolved , moderator took a long time to approve my email to get 
posted to mailing list.

Thanks for your solution .

- Karan -

On 12 Jun 2014, at 00:02, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:

 On 06/09/2014 03:08 PM, Karan Singh wrote:
 
1. When installing Ceph using package manger and ceph repositores , the
package manager i.e YUM does not respect the ceph.repo file and takes ceph
package directly from EPEL .
 
 Option 1: install yum-plugin-priorities, add priority = X to ceph.repo.
 X should be less than EPEL's priority, the default is I believe 99.
 
 Option 2: add exclude = ceph_package(s) to epel.repo.
 
 -- 
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I have PGs that I can't deep-scrub

2014-06-11 Thread Craig Lewis
New logs, with debug ms = 1, debug osd = 20.


In this timeline, I started the deep-scrub at 11:04:00  Ceph start
deep-scrubing at 11:04:03.

osd.11 started consuming 100% CPU around 11:07.  Same for osd.0.  CPU
usage is all user; iowait is  0.10%.  There is more variance in the
CPU usage now, ranging between 98.5% and 101.2%

This time, I didn't see any major IO, read or write.

osd.11 was marked down at 11:22:00:
2014-06-11 11:22:00.820118 mon.0 [INF] osd.11 marked down after no pg
stats for 902.656777seconds

osd.0 was marked down at 11:36:00:
 2014-06-11 11:36:00.890869 mon.0 [INF] osd.0 marked down after no pg
stats for 902.498894seconds




ceph.conf: https://cd.centraldesktop.com/p/eAAADwbcABIDZuE
ceph-osd.0.log.gz (140MiB, 18MiB compressed):
https://cd.centraldesktop.com/p/eAAADwbdAHnmhFQ
ceph-osd.11.log.gz (131MiB, 17MiB compressed):
https://cd.centraldesktop.com/p/eAAADwbeAEUR9AI
ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADwbfAEJcwvc





On Wed, Jun 11, 2014 at 5:42 AM, Sage Weil s...@inktank.com wrote:
 Hi Craig,

 It's hard to say what is going wrong with that level of logs.  Can you
 reproduce with debug ms = 1 and debug osd = 20?

 There were a few things fixed in scrub between emperor and firefly.  Are
 you planning on upgrading soon?

 sage


 On Tue, 10 Jun 2014, Craig Lewis wrote:

 Every time I deep-scrub one PG, all of the OSDs responsible get kicked
 out of the cluster.  I've deep-scrubbed this PG 4 times now, and it
 fails the same way every time.  OSD logs are linked at the bottom.

 What can I do to get this deep-scrub to complete cleanly?

 This is the first time I've deep-scrubbed these PGs since Sage helped
 me recover from some OSD problems
 (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html)

 I can trigger the issue easily in this cluster, but have not been able
 to re-create in other clusters.






 The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp
 are 48009'1904117 2014-05-21 07:28:01.315996 respectively.  This PG is
 owned by OSDs [11,0]

 This is a secondary cluster, so I stopped all external I/O on it.  I
 set nodeep-scrub, and restarted both OSDs with:
   debug osd = 5/5
   debug filestore = 5/5
   debug journal = 1
   debug monc = 20/20

 then I ran a deep-scrub on this PG.

 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555
 active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used,
 77870 GB / 130 TB avail
 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554
 active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail


 At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU
 (100.3% +/- 1.0%).  Prior to this, they were both using ~30% CPU.  It
 might've started a few seconds sooner, I'm watching top.

 I forgot to watch IO stat until 10:56.  At this point, both OSDs are
 reading.  iostat reports that they're both doing ~100
 transactions/sec, reading ~1 MiBps, 0 writes.


 At 11:01:26, iostat reports that both osds are no longer consuming any
 disk I/O.  They both go for  30 seconds with 0 transactions, and 0
 kiB read/write.  There are small bumps of 2 transactions/sec for one
 second, then it's back to 0.


 At 11:02:41, the primary OSD gets kicked out by the monitors:
 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555
 active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2
 op/s
 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg
 stats for 903.825187seconds
 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in

 Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range).


 At ~11:10, I see that osd.11 has resumed reading from disk at the
 original levels (~100 tps, ~1MiBps read, 0 MiBps write).  Since it's
 down, but doing something, I let it run.

 Both the osd.11 and osd.0 repeat this pattern.  Reading for a while at
 ~1 MiBps, then nothing.  The duty cycle seems about 50%, with a 20
 minute period, but I haven't timed anything.  CPU usage remains at
 100%, regardless of whether IO is happening or not.


 At 12:24:15, osd.11 rejoins the cluster:
 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot
 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in
 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1
 stale+active+clean+scrubbing+deep, 2266 active+clean, 5
 stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing;
 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd,
 18 op/s; 3617854/61758142 objects degraded (5.858%)


 osd.0's CPU usage drops back to normal when osd.11 rejoins the
 cluster.  The PG stats have not changed.   The last_deep_scrub and
 

Re: [ceph-users] Ceph pgs stuck inactive since forever

2014-06-11 Thread John Wilkins
I'll update the docs to incorporate the term incomplete. I believe this
is due to an inability to complete backfilling. Your cluster is nearly
full. You indicated that you installed Ceph. Did you store data in the
cluster? Your usage indicates that you have used 111GB of 125GB. So you
only have about 8GB left. Did it ever get to an active + clean state?


On Wed, Jun 11, 2014 at 6:08 AM, akhil.labudubar...@ril.com wrote:

  I installed ceph and then I was ceph health it gives me the following
 output



 *HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive;
 384 pgs stuck unclean; 2 near full osd(s)*



 This is the output of a single pg when I use ceph health detail



 *pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may
 help; search ceph.com/docs http://ceph.com/docs for 'incomplete')*



 and similar line comes up for all the pgs.



 This is the output of ceph - s



 *cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a*

 * health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384
 pgs stuck unclean; 2 near full osd(s)*

 * monmap e1: 1 mons at {a=100.112.12.28:6789/0
 http://100.112.12.28:6789/0}, election epoch 2, quorum 0 a*

 * osdmap e5: 2 osds: 2 up, 2 in*

 *  pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects*

 *111 GB used, 8346 MB / 125 GB avail*

 * 384 incomplete*


 *Confidentiality Warning*: This message and any attachments are intended
 only for the use of the intended recipient(s), are confidential and may be
 privileged. If you are not the intended recipient, you are hereby notified
 that any review, re-transmission, conversion to hard copy, copying,
 circulation or other use of this message and any attachments is strictly
 prohibited. If you are not the intended recipient, please notify the sender
 immediately by return email and delete this message and any attachments
 from your system.

 *Virus Warning:* Although the company has taken reasonable precautions to
 ensure no viruses are present in this email. The company cannot accept
 responsibility for any loss or damage arising from the use of this email or
 attachment.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift API Authentication Failure

2014-06-11 Thread Yehuda Sadeh
(resending also to list)
Right. So Basically the swift subuser wasn't created correctly. I created
issue #8587. Can you try creating a second subuser, see if it's created
correctly the second time?


On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss dcurtiss_c...@dcurtiss.com
wrote:

 Hmm Using that method, the subuser object appears to be an empty
 string.

 First, note that I skipped the Create Pools step:
 http://ceph.com/docs/master/radosgw/config/#create-pools
 because it says If the user you created has permissions, the gateway will
 create the pools automatically.

 And indeed, the .users.swift pool is there:

 $ rados lspools
 data
 metadata
 rbd
 .rgw.root
 .rgw.control
 .rgw
 .rgw.gc
 .users.uid
 .users.email
 .users
 .users.swift

 But the only entry in that pool is an empty string.

 $ rados ls -p .users.swift
 blank line

 And that is indeed a blank line (as opposed to 0 lines), because there is
 1 object in that pool:
 $ rados df
 pool name   category KB  objects   clones
 degraded  unfound   rdrd KB   wrwr KB
 ...
 .users.swift-  110
0   00011

 For comparison, the 'df' line for the .users pool lists 2 objects, which
 are as follows:

 $ rados ls -p .users
 4U5H60BMDL7OSI5ZBL8P
 F7HZCI4SL12KVVSJ9UVZ

 - David


 On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh yeh...@inktank.com wrote:

 Can you verify that the subuser object actually exist? Try doing:

 $ rados ls -p .users.swift

 (unless you have non default pools set)

 Yehuda

 On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss
 dcurtiss_c...@dcurtiss.com wrote:
  No good. In fact, for some reason when I tried to load up my cluster VMs
  today, I couldnt't get them to work (something to do with a pipe
 fault), so
  I recreated my VMs nearly from scratch, to no avail.
 
  Here are the commands I used to create the user and subuser:
  radosgw-admin user create --uid=hive_cache --display-name=Hive Cache
  --email=pds.supp...@ni.com
  radosgw-admin subuser create --uid=hive_cache --subuser=hive_cache:swift
  --access=full
  radosgw-admin key create --subuser=hive_cache:swift --key-type=swift
  --secret=QFAMEDSJP5DEKJO0DDXY
 
  - David
 
 
  On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh yeh...@inktank.com
 wrote:
 
  It seems that the subuser object was not created for some reason. Can
  you try recreating it?
 
  Yehuda
 
  On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss
  dcurtiss_c...@dcurtiss.com wrote:
   Here's the log: http://pastebin.com/bRt9kw9C
  
   Thanks,
   David
  
  
   On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh yeh...@inktank.com
   wrote:
  
   On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss
   dcurtiss_c...@dcurtiss.com wrote:
Over the last two days, I set up ceph on a set of ubuntu 12.04 VMs
(my
first
time working with ceph), and it seems to be working fine (I have
HEALTH_OK,
and can create a test document via the rados commandline tool),
 but I
can't
authenticate with the swift API.
   
I followed the quickstart guides to get ceph and radosgw
 installed.
(Listed
here, if you want to check my work: http://pastebin.com/nfPWCn9P
 )
   
Visiting the root of the web server shows the
 ListAllMyBucketsResult
XML, as
expected, but trying to authenticate always gives me 403
 Forbidden
errors.
   
Here's the output of radosgw-admin user info --uid=hive_cache:
http://pastebin.com/vwwbyd4c
And here's my curl invocation: http://pastebin.com/EfQ8nw8a
   
Any ideas on what might be wrong?
   
  
   Not sure. Can you try reproducing it with 'debug rgw = 20' and
 'debug
   ms = 1' on rgw and provide the log?
  
   Thanks,
   Yehuda
  
  
 
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Current OS kernel recommendations

2014-06-11 Thread Blair Bethwaite
This http://ceph.com/docs/master/start/os-recommendations/ appears to be a
bit out of date only goes to Ceph 0.72). Presumably Ubuntu Trusty should
now be on that list in some form, e.g., for Firefly?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] For ceph firefly, which version kernel client should be used?

2014-06-11 Thread Yan, Zheng
On Mon, Jun 9, 2014 at 3:49 PM, Liu Baogang liubaog...@gmail.com wrote:
 Dear Sir,

 In our test, we use ceph firefly to build a cluster. On a node with kernel
 3.10.xx, if using kernel client to mount cephfs, when use 'ls' command,
 sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far
 it seems it work well.

 I guess that the kernel 3.10.xx is too old, so the kernel client does not
 work well. If it is right, which version of kernel shall we use?

3.14


 Thanks,
 Baogang

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Selection Criteria for Deep-Scrub

2014-06-11 Thread David Zafman

The code checks the pg with the oldest scrub_stamp/deep_scrub_stamp to see 
whether the osd_scrub_min_interval/osd_deep_scrub_interval time has elapsed.  
So the output you are showing with the very old scrub stamps shouldn’t happen 
under default settings.  As soon set deep-scrub is re-enabled, the 5 pgs with 
that old stamp should be the first to get run.

A PG needs to have active and clean set to be scrubbed.   If any weren’t 
active+clean, then even a manual scrub would do nothing.

Now that I’m looking at the code I see that your symptom is possible if the 
values of osd_scrub_min_interval or osd_scrub_max_interval are larger than your 
osd_deep_scrub_interval.  Should the osd_scrub_min_interval be greater than 
osd_deep_scrub_interval, there won't be a deep scrub until the 
osd_scrub_min_interval has elapsed.  If an OSD is under load and the 
osd_scrub_max_interval is greater than the osd_deep_scrub_interval, there won't 
be a deep scrub until osd_scrub_max_interval has elapsed.

Please check the 3 interval config values.  Verify that your PGs are 
active+clean just to be sure.

David


On May 20, 2014, at 5:21 PM, Mike Dawson mike.daw...@cloudapt.com wrote:

 Today I noticed that deep-scrub is consistently missing some of my Placement 
 Groups, leaving me with the following distribution of PGs and the last day 
 they were successfully deep-scrubbed.
 
 # ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c
  5 2013-11-06
221 2013-11-20
  1 2014-02-17
 25 2014-02-19
 60 2014-02-20
  4 2014-03-06
  3 2014-04-03
  6 2014-04-04
  6 2014-04-05
 13 2014-04-06
  4 2014-04-08
  3 2014-04-10
  2 2014-04-11
 50 2014-04-12
 28 2014-04-13
 14 2014-04-14
  3 2014-04-15
 78 2014-04-16
 44 2014-04-17
  8 2014-04-18
  1 2014-04-20
 16 2014-05-02
 69 2014-05-04
140 2014-05-05
569 2014-05-06
   9231 2014-05-07
103 2014-05-08
514 2014-05-09
   1593 2014-05-10
393 2014-05-16
   2563 2014-05-17
   1283 2014-05-18
   1640 2014-05-19
   1979 2014-05-20
 
 I have been running the default osd deep scrub interval of once per week, 
 but have disabled deep-scrub on several occasions in an attempt to avoid the 
 associated degraded cluster performance I have written about before.
 
 To get the PGs longest in need of a deep-scrub started, I set the 
 nodeep-scrub flag, and wrote a script to manually kick off deep-scrub 
 according to age. It is processing as expected.
 
 Do you consider this a feature request or a bug? Perhaps the code that 
 schedules PGs to deep-scrub could be improved to prioritize PGs that have 
 needed a deep-scrub the longest.
 
 Thanks,
 Mike Dawson
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift API Authentication Failure

2014-06-11 Thread David Curtiss
Success! You nailed it. Thanks, Yehuda.

I can successfully use the second subuser.

Given this success, I also tried the following:

$ rados -p .users.swift get '' tmp
$ rados -p .users.swift put hive_cache:swift tmp
$ rados -p .users.swift rm ''
$ rados -p .users.swift ls
hive_cache:swift2
hive_cache:swift

So everything looked good, as far as I can tell, but I still can't
authenticate with the first subuser. (But at least the second one still
works.)

- David


On Wed, Jun 11, 2014 at 5:38 PM, Yehuda Sadeh yeh...@inktank.com wrote:

  (resending also to list)
 Right. So Basically the swift subuser wasn't created correctly. I created
 issue #8587. Can you try creating a second subuser, see if it's created
 correctly the second time?


 On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss dcurtiss_c...@dcurtiss.com
  wrote:

 Hmm Using that method, the subuser object appears to be an empty
 string.

 First, note that I skipped the Create Pools step:
 http://ceph.com/docs/master/radosgw/config/#create-pools
 because it says If the user you created has permissions, the gateway
 will create the pools automatically.

 And indeed, the .users.swift pool is there:

 $ rados lspools
 data
 metadata
 rbd
 .rgw.root
 .rgw.control
 .rgw
 .rgw.gc
 .users.uid
 .users.email
 .users
 .users.swift

 But the only entry in that pool is an empty string.

 $ rados ls -p .users.swift
 blank line

 And that is indeed a blank line (as opposed to 0 lines), because there is
 1 object in that pool:
 $ rados df
 pool name   category KB  objects   clones
 degraded  unfound   rdrd KB   wrwr KB
 ...
 .users.swift-  110
  0   00011

 For comparison, the 'df' line for the .users pool lists 2 objects, which
 are as follows:

 $ rados ls -p .users
 4U5H60BMDL7OSI5ZBL8P
 F7HZCI4SL12KVVSJ9UVZ

 - David


 On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh yeh...@inktank.com
 wrote:

 Can you verify that the subuser object actually exist? Try doing:

 $ rados ls -p .users.swift

 (unless you have non default pools set)

 Yehuda

 On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss
 dcurtiss_c...@dcurtiss.com wrote:
  No good. In fact, for some reason when I tried to load up my cluster
 VMs
  today, I couldnt't get them to work (something to do with a pipe
 fault), so
  I recreated my VMs nearly from scratch, to no avail.
 
  Here are the commands I used to create the user and subuser:
  radosgw-admin user create --uid=hive_cache --display-name=Hive Cache
  --email=pds.supp...@ni.com
  radosgw-admin subuser create --uid=hive_cache
 --subuser=hive_cache:swift
  --access=full
  radosgw-admin key create --subuser=hive_cache:swift --key-type=swift
  --secret=QFAMEDSJP5DEKJO0DDXY
 
  - David
 
 
  On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh yeh...@inktank.com
 wrote:
 
  It seems that the subuser object was not created for some reason. Can
  you try recreating it?
 
  Yehuda
 
  On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss
  dcurtiss_c...@dcurtiss.com wrote:
   Here's the log: http://pastebin.com/bRt9kw9C
  
   Thanks,
   David
  
  
   On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh yeh...@inktank.com
   wrote:
  
   On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss
   dcurtiss_c...@dcurtiss.com wrote:
Over the last two days, I set up ceph on a set of ubuntu 12.04
 VMs
(my
first
time working with ceph), and it seems to be working fine (I have
HEALTH_OK,
and can create a test document via the rados commandline tool),
 but I
can't
authenticate with the swift API.
   
I followed the quickstart guides to get ceph and radosgw
 installed.
(Listed
here, if you want to check my work: http://pastebin.com/nfPWCn9P
 )
   
Visiting the root of the web server shows the
 ListAllMyBucketsResult
XML, as
expected, but trying to authenticate always gives me 403
 Forbidden
errors.
   
Here's the output of radosgw-admin user info --uid=hive_cache:
http://pastebin.com/vwwbyd4c
And here's my curl invocation: http://pastebin.com/EfQ8nw8a
   
Any ideas on what might be wrong?
   
  
   Not sure. Can you try reproducing it with 'debug rgw = 20' and
 'debug
   ms = 1' on rgw and provide the log?
  
   Thanks,
   Yehuda
  
  
 
 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tiering : hit_set_count hit_set_period memory usage ?

2014-06-11 Thread Alexandre DERUMIER
We haven't really quantified that yet. In particular, it's going to
depend on how many objects are accessed within a period; the OSD sizes
them based on the previous access count and the false positive
probability that you give it

Ok, thanks Greg.



Another question, the doc describe how the objects are going from cache tier to 
base tier.
But how does it work from base tier to cache tier ? (cache-mode writeback)
Does any read on base tier promote the object in the cache tier ?
Or they are also statistics on the base tier ?

(I tell the question, because I have cold datas, but I have full backups jobs 
running each week, reading all theses cold datas)



- Mail original - 

De: Gregory Farnum g...@inktank.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: ceph-users ceph-users@lists.ceph.com 
Envoyé: Mercredi 11 Juin 2014 21:56:29 
Objet: Re: [ceph-users] tiering : hit_set_count  hit_set_period memory usage 
? 

On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER 
aderum...@odiso.com wrote: 
 Hi, 
 
 I'm reading tiering doc here 
 http://ceph.com/docs/firefly/dev/cache-pool/ 
 
  
 The hit_set_count and hit_set_period define how much time each HitSet should 
 cover, and how many such HitSets to store. Binning accesses over time allows 
 Ceph to independently determine whether an object was accessed at least once 
 and whether it was accessed more than once over some time period (“age” vs 
 “temperature”). Note that the longer the period and the higher the count the 
 more RAM will be consumed by the ceph-osd process. In particular, when the 
 agent is active to flush or evict cache objects, all hit_set_count HitSets 
 are loaded into RAM 
 
 about how much memory do we talk here ? any formula ? (nr object x ? ) 

We haven't really quantified that yet. In particular, it's going to 
depend on how many objects are accessed within a period; the OSD sizes 
them based on the previous access count and the false positive 
probability that you give it. 
-Greg 
Software Engineer #42 @ http://inktank.com | http://ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com