Re: [ceph-users] osds fails to start with mismatch in id

2014-11-11 Thread Ramakrishna Nishtala (rnishtal)
Hi
It appears that in case of pre-created partitions, ceph-deploy create, unable 
to change the partition guid’s. The parted guid remains as it is.

Ran manually sgdisk on each partition as
sgdisk --change-name=2:ceph data --partition-guid=2:${osd_uuid} 
--typecode=2:${ptype2} /dev/${i}.
The typecode for journal and data picked up from ceph-disk-udev.

Udev working fine now after reboot and not required to make any changes in 
fstab. All osd’s are up too.
ceph -s
cluster 9c6cd1ae-66bf-45ce-b7ba-0256b572a8b7
 health HEALTH_OK
 osdmap e358: 60 osds: 60 up, 60 in
  pgmap v1258: 4096 pgs, 1 pools, 0 bytes data, 0 objects
2802 MB used, 217 TB / 217 TB avail
4096 active+clean

Thanks to all who responded.

Regards,

Rama

From: Daniel Schwager [mailto:daniel.schwa...@dtnet.de]
Sent: Monday, November 10, 2014 10:39 PM
To: 'Irek Fasikhov'; Ramakrishna Nishtala (rnishtal); 'Gregory Farnum'
Cc: 'ceph-us...@ceph.com'
Subject: RE: [ceph-users] osds fails to start with mismatch in id

Hi Ramakrishna,

we use the phy. path (containing the serial number) to a disk to prevent 
complexity and wrong mapping... This path will never change:

/etc/ceph/ceph.conf
[osd.16]
devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z0SDCY-part1
osd_journal = 
/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
...

regards
Danny



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Irek 
Fasikhov
Sent: Tuesday, November 11, 2014 6:36 AM
To: Ramakrishna Nishtala (rnishtal); Gregory Farnum
Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: Re: [ceph-users] osds fails to start with mismatch in id

Hi, Ramakrishna.
I think you understand what the problem is:
[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami
56
[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami
57


Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) 
rnish...@cisco.commailto:rnish...@cisco.com:

Hi Greg,

Thanks for the pointer. I think you are right. The full story is like this.



After installation, everything works fine until I reboot. I do observe udevadm 
getting triggered in logs, but the devices do not come up after reboot. Exact 
issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while 
back per the case details.

As a workaround, I copied the contents from /proc/mounts to fstab and that’s 
where I landed into the issue.



After your suggestion, defined as UUID in fstab, but similar problem.

blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid 
explicitly to get the UUID’s. Goes in line with ceph-disk comments.



Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird 
that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc.



Before reboot

lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - 
../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e - 
../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - 
../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 - 
../../sdc2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 - 
../../sdb2



After reboot

lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - 
../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e - 
../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - 
../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 - 
../../sdb2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 - 
../../sdh2



Essentially, the transformation here is sdb2-sdh2 and sdc2- sdb2. In fact I 
haven’t partitioned my sdh at all before the test. The only difference probably 
from the standard procedure is I have pre-created the partitions for the 
journal and data, with parted.



/lib/udev/rules.d  osd rules has four different partition GUID codes,

45b0969e-9b03-4f30-b4c6-5ec00ceff106,

45b0969e-9b03-4f30-b4c6-b4b80ceff106,

4fbd7e29-9d25-41b8-afd0-062c0ceff05d,

4fbd7e29-9d25-41b8-afd0-5ec00ceff05d,



But all my partitions journal/data are having 
ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code.



Appreciate any help.



Regards,



Rama

=

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.commailto:g...@gregs42.com]
Sent: Sunday, November 09, 2014 3:36 PM
To: Ramakrishna Nishtala (rnishtal)
Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: Re: [ceph-users] osds fails to start with mismatch in id



On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) 
rnish...@cisco.commailto:rnish...@cisco.com wrote:

 Hi



 I am on ceph 0.87, RHEL 7



 Out of 60 few osd’s start and the rest complain about mismatch about

 id’s 

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
Thanks for your reply Sage!

I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
Stop osd.117
Export 8.6ae from osd.117
Remove 8.6ae from osd.117
start osd.117
restart osd.190 after still showing incomplete

After this the PG was still showing incomplete and ceph pg dump_stuck
inactive shows -
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

I then tried an export from OSD 190 to OSD 117 by doing -
Stop osd.190 and osd.117
Export pg 8.6ae from osd.190
Import from file generated in previous step into osd.117
Boot both osd.190 and osd.117

When osd.117 attempts to start it generates an failed assert, full log
is here http://pastebin.com/S4CXrTAL
-1 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
 0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
time 2014-11-11 17:25:18.602626
osd/OSD.h: 715: FAILED assert(ret)

 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xb8231b]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
 3: (OSD::load_pgs()+0x1b78) [0x6aae18]
 4: (OSD::init()+0x71f) [0x6abf5f]
 5: (main()+0x252c) [0x638cfc]
 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
 7: /usr/bin/ceph-osd() [0x651027]

I also attempted the same steps with 8.ca and got the same results.
The below is the current state of the pg with it removed from osd.111
-
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
12:57:58.162789

Any idea of where I can go from here?
One thought I had was setting osd.111 and osd.117 out of the cluster
and once the data is moved I can shut them down and mark them as lost
which would make osd.190 the only replica available for those PG's.

Thanks again

On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote:
 On Tue, 11 Nov 2014, Matthew Anderson wrote:
 Just an update, it appears that no data actually exists for those PG's
 on osd.117 and osd.111 but it's showing as incomplete anyway.

 So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
 filled with data.
 For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
 filled with data as before.

 Since all of the required data is on OSD.190, would there be a way to
 make osd.111 and osd.117 forget they have ever seen the two incomplete
 PG's and therefore restart backfilling?

 Ah, that's good news.  You should know that the copy on osd.190 is
 slightly out of date, but it is much better than losing the entire
 contents of the PG.  More specifically, for 8.6ae the latest version was
 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
 need to fsck the RBD images after this is all done.

 I don't think we've tested this recovery scenario, but I think you'll be
 able to recovery with ceph_objectstore_tool, which has an import/export
 function and a delete function.  First, try removing the newer version of
 the pg on osd.117.  First export it for good measure (even tho it's
 empty):

 stop the osd

 ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
 --journal-path /var/lib/ceph/osd/ceph-117/journal \
 --op export --pgid 8.6ae --file osd.117.8.7ae

 ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
 --journal-path /var/lib/ceph/osd/ceph-117/journal \
 --op remove --pgid 8.6ae

 and restart.  If that doesn't peer, you can also try exporting the pg from
 osd.190 and importing it into osd.117.  I think just removing the
 newer empty pg on osd.117 will do the trick, though...

 sage





 On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
 manderson8...@gmail.com wrote:
  Hi All,
 
  We've had a string of very unfortunate failures and need a hand fixing
  the incomplete PG's that we're now left with. We're configured with 3
  replicas over different hosts with 5 in total.
 
  The timeline goes -
  -1 week  :: A full server goes offline with a failed backplane. Still
  not working
  -1 day  ::  OSD 190 fails
  -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
  out several PG's and blocking IO
  Today  :: The first failed osd (osd.190) was cloned to a good drive
  with xfs_dump | xfs_restore and now boots fine. The last failed osd
  (osd.121) is completely unrecoverable and was marked as lost.
 
  What we're left with 

[ceph-users] Stackforge Puppet Module

2014-11-11 Thread Nick Fisk
Hi,

I'm just looking through the different methods of deploying Ceph and I
particularly liked the idea that the stackforge puppet module advertises of
using discover to automatically add new disks. I understand the principle of
how it should work; using ceph-disk list to find unknown disks, but I would
like to see in a little more detail on how it's been implemented.

I've been looking through the puppet module on Github, but I can't see
anyway where this discovery is carried out.

Could anyone confirm if this puppet modules does currently support the auto
discovery and where  in the code its carried out?

Many Thanks,
Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weight field in osd dump osd tree

2014-11-11 Thread Mallikarjun Biradar
Hi all

When Issued ceph osd dump it displays weight for that osd as 1 and when
issued osd tree it displays 0.35

output from osd dump:
{ osd: 20,
  uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e,
  up: 1,
  in: 1,
  weight: 1.00,
  primary_affinity: 1.00,
  last_clean_begin: 0,
  last_clean_end: 0,
  up_from: 103,
  up_thru: 106,
  down_at: 0,
  lost_at: 0,
  public_addr: 10.242.43.116:6820\/27623,
  cluster_addr: 10.242.43.116:6821\/27623,
  heartbeat_back_addr: 10.242.43.116:6822\/27623,
  heartbeat_front_addr: 10.242.43.116:6823\/27623,
  state: [
exists,
up]}],

output from osd tree:
# idweight  type name   up/down reweight
-1  7.35root default
-2  2.8 host rack6-storage-5
0   0.35osd.0   up  1
1   0.35osd.1   up  1
2   0.35osd.2   up  1
3   0.35osd.3   up  1
4   0.35osd.4   up  1
5   0.35osd.5   up  1
6   0.35osd.6   up  1
7   0.35osd.7   up  1
-3  2.8 host rack6-storage-4
8   0.35osd.8   up  1
9   0.35osd.9   up  1
10  0.35osd.10  up  1
11  0.35osd.11  up  1
12  0.35osd.12  up  1
13  0.35osd.13  up  1
14  0.35osd.14  up  1
15  0.35osd.15  up  1
-4  1.75host rack6-storage-6
16  0.35osd.16  up  1
17  0.35osd.17  up  1
18  0.35osd.18  up  1
19  0.35osd.19  up  1
20  0.35osd.20  up  1

Please help me to understand this

-regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stackforge Puppet Module

2014-11-11 Thread David Moreau Simard
Hi Nick,

The great thing about puppet-ceph's implementation on Stackforge is that it is 
both unit and integration tested.
You can see the integration tests here: 
https://github.com/ceph/puppet-ceph/tree/master/spec/system

Where I'm getting at is that the tests allow you to see how you can use the 
module to a certain extent.
For example, in the OSD integration tests:
- 
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L24
 and then:
- 
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L82-L110

There's no auto discovery mechanism built-in the module right now. It's kind of 
dangerous, you don't want to format the wrong disks.

Now, this doesn't mean you can't discover the disks yourself and pass them to 
the module from your site.pp or from a composition layer.
Here's something I have for my CI environment that uses the $::blockdevices 
fact to discover all devices, split that fact into a list of the devices and 
then reject the drives I don't want (such as the OS disk):

# Assume OS is installed on xvda/sda/vda.
# On an Openstack VM, vdb is ephemeral, we don't want to use vdc.
# WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH!
$block_devices = reject(split($::blockdevices, ','), 
'(xvda|sda|vda|vdc|sr0)')
$devices = prefix($block_devices, '/dev/')

And then you can pass $devices to the module.

Let me know if you have any questions !
--
David Moreau Simard

 On Nov 11, 2014, at 6:23 AM, Nick Fisk n...@fisk.me.uk wrote:
 
 Hi,
 
 I'm just looking through the different methods of deploying Ceph and I
 particularly liked the idea that the stackforge puppet module advertises of
 using discover to automatically add new disks. I understand the principle of
 how it should work; using ceph-disk list to find unknown disks, but I would
 like to see in a little more detail on how it's been implemented.
 
 I've been looking through the puppet module on Github, but I can't see
 anyway where this discovery is carried out.
 
 Could anyone confirm if this puppet modules does currently support the auto
 discovery and where  in the code its carried out?
 
 Many Thanks,
 Nick
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump osd tree

2014-11-11 Thread Christian Balzer
On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:

 Hi all
 
 When Issued ceph osd dump it displays weight for that osd as 1 and when
 issued osd tree it displays 0.35
 

There are many threads about this, google is your friend. For example:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html

In short, one is the CRUSH weight (usually based on the capacity of the
OSD), the other is the OSD weight (or reweight in the tree display). 

For example think about a cluster with 100 2TB OSDs and you're planning to
replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
speed, so if you would just replace things, more and more data would
migrate to your bigger OSDs, making the whole cluster actually slower.
Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
replacement is complete) will result in them getting the same allocation as
the 2TB ones, keeping things even.

Christian

 output from osd dump:
 { osd: 20,
   uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e,
   up: 1,
   in: 1,
   weight: 1.00,
   primary_affinity: 1.00,
   last_clean_begin: 0,
   last_clean_end: 0,
   up_from: 103,
   up_thru: 106,
   down_at: 0,
   lost_at: 0,
   public_addr: 10.242.43.116:6820\/27623,
   cluster_addr: 10.242.43.116:6821\/27623,
   heartbeat_back_addr: 10.242.43.116:6822\/27623,
   heartbeat_front_addr: 10.242.43.116:6823\/27623,
   state: [
 exists,
 up]}],
 
 output from osd tree:
 # idweight  type name   up/down reweight
 -1  7.35root default
 -2  2.8 host rack6-storage-5
 0   0.35osd.0   up  1
 1   0.35osd.1   up  1
 2   0.35osd.2   up  1
 3   0.35osd.3   up  1
 4   0.35osd.4   up  1
 5   0.35osd.5   up  1
 6   0.35osd.6   up  1
 7   0.35osd.7   up  1
 -3  2.8 host rack6-storage-4
 8   0.35osd.8   up  1
 9   0.35osd.9   up  1
 10  0.35osd.10  up  1
 11  0.35osd.11  up  1
 12  0.35osd.12  up  1
 13  0.35osd.13  up  1
 14  0.35osd.14  up  1
 15  0.35osd.15  up  1
 -4  1.75host rack6-storage-6
 16  0.35osd.16  up  1
 17  0.35osd.17  up  1
 18  0.35osd.18  up  1
 19  0.35osd.19  up  1
 20  0.35osd.20  up  1
 
 Please help me to understand this
 
 -regards,
 Mallikarjun Biradar


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump osd tree

2014-11-11 Thread Loic Dachary
Hi Christian,

On 11/11/2014 13:09, Christian Balzer wrote:
 On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:
 
 Hi all

 When Issued ceph osd dump it displays weight for that osd as 1 and when
 issued osd tree it displays 0.35

 
 There are many threads about this, google is your friend. For example:
 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html
 
 In short, one is the CRUSH weight (usually based on the capacity of the
 OSD), the other is the OSD weight (or reweight in the tree display). 
 
 For example think about a cluster with 100 2TB OSDs and you're planning to
 replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
 speed, so if you would just replace things, more and more data would
 migrate to your bigger OSDs, making the whole cluster actually slower.
 Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
 replacement is complete) will result in them getting the same allocation as
 the 2TB ones, keeping things even.

It is a great example. Would you like to add it to 
http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If you do 
not have time, I volunteer to do it :-)

Cheers

 
 Christian
 
 output from osd dump:
 { osd: 20,
   uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e,
   up: 1,
   in: 1,
   weight: 1.00,
   primary_affinity: 1.00,
   last_clean_begin: 0,
   last_clean_end: 0,
   up_from: 103,
   up_thru: 106,
   down_at: 0,
   lost_at: 0,
   public_addr: 10.242.43.116:6820\/27623,
   cluster_addr: 10.242.43.116:6821\/27623,
   heartbeat_back_addr: 10.242.43.116:6822\/27623,
   heartbeat_front_addr: 10.242.43.116:6823\/27623,
   state: [
 exists,
 up]}],

 output from osd tree:
 # idweight  type name   up/down reweight
 -1  7.35root default
 -2  2.8 host rack6-storage-5
 0   0.35osd.0   up  1
 1   0.35osd.1   up  1
 2   0.35osd.2   up  1
 3   0.35osd.3   up  1
 4   0.35osd.4   up  1
 5   0.35osd.5   up  1
 6   0.35osd.6   up  1
 7   0.35osd.7   up  1
 -3  2.8 host rack6-storage-4
 8   0.35osd.8   up  1
 9   0.35osd.9   up  1
 10  0.35osd.10  up  1
 11  0.35osd.11  up  1
 12  0.35osd.12  up  1
 13  0.35osd.13  up  1
 14  0.35osd.14  up  1
 15  0.35osd.15  up  1
 -4  1.75host rack6-storage-6
 16  0.35osd.16  up  1
 17  0.35osd.17  up  1
 18  0.35osd.18  up  1
 19  0.35osd.19  up  1
 20  0.35osd.20  up  1

 Please help me to understand this

 -regards,
 Mallikarjun Biradar
 
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied

2014-11-11 Thread ವಿನೋದ್ Vinod H I
Hi,
I am having problems accessing rados gateway using swift interface.
I am using ceph firefly version and have configured a us region as
explained in the docs.
There are two zones us-east and us-west.
us-east gateway is running on host ceph-node-1 and us-west gateway is
running on host ceph-node-2.

Here is the output when i try to connect with swift interface.

user1@ceph-node-4:~$ swift -A http://ceph-node-1/auth -U useast:swift -K
FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw --debug stat
INFO:urllib3.connectionpool:Starting new HTTP connection (1): ceph-node-1
DEBUG:urllib3.connectionpool:Setting read timeout to object object at
0x7f45834a7090
DEBUG:urllib3.connectionpool:GET /auth HTTP/1.1 403 23
INFO:swiftclient:REQ: curl -i http://ceph-node-1/auth -X GET
INFO:swiftclient:RESP STATUS: 403 Forbidden
INFO:swiftclient:RESP HEADERS: [('date', 'Tue, 11 Nov 2014 12:30:58 GMT'),
('accept-ranges', 'bytes'), ('content-type', 'application/json'),
('content-length', '23'), ('server', 'Apache/2.2.22 (Ubuntu)')]
INFO:swiftclient:RESP BODY: {Code:AccessDenied}
ERROR:swiftclient:Auth GET failed: http://ceph-node-1/auth 403 Forbidden
Traceback (most recent call last):
  File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181,
in _retry
self.url, self.token = self.get_auth()
  File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155,
in get_auth
insecure=self.insecure)
  File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318,
in get_auth
insecure=insecure)
  File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241,
in get_auth_1_0
http_reason=resp.reason)
ClientException: Auth GET failed: http://ceph-node-1/auth 403 Forbidden
Auth GET failed: http://ceph-node-1/auth 403 Forbidden

The region map is as follows.

vinod@ceph-node-1:~$ radosgw-admin region get
--name=client.radosgw.us-east-1

{ name: us,
  api_name: us,
  is_master: true,
  endpoints: [],
  master_zone: us-east,
  zones: [
{ name: us-east,
  endpoints: [
http:\/\/ceph-node-1:80\/],
  log_meta: true,
  log_data: true},
{ name: us-west,
  endpoints: [
http:\/\/ceph-node-2:80\/],
  log_meta: true,
  log_data: true}],
  placement_targets: [
{ name: default-placement,
  tags: []}],
  default_placement: default-placement}

The user info is follows.
vinod@ceph-node-1:~$ radosgw-admin user info --uid=useast
--name=client.radosgw.us-east-1
{ user_id: useast,
  display_name: Region-US Zone-East,
  email: ,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [
{ id: useast:swift,
  permissions: full-control}],
  keys: [
{ user: useast,
  access_key: 45BEF1XQ3Z94B0LIBTLX,
  secret_key: 123},
{ user: useast:swift,
  access_key: WF2QYTY0LDN66CHJ8JSE,
  secret_key: }],
  swift_keys: [
{ user: useast:swift,
  secret_key: FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw}],
  caps: [],
  op_mask: read, write, delete,
  system: true,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}

Contents of rgw-us-east.conf file is as follows.

vinod@ceph-node-1:~$ cat /etc/apache2/sites-enabled/rgw-us-east.conf
FastCgiExternalServer /var/www/s3gw.fcgi -socket
/var/run/ceph/client.radosgw.us-east-1.sock

VirtualHost *:80

ServerName ceph-node-1
ServerAdmin vinvi...@gmail.com
DocumentRoot /var/www
RewriteEngine On
RewriteRule  ^/(.*) /s3gw.fcgi?%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

IfModule mod_fastcgi.c
Directory /var/www
Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
Allow from all
AuthBasicAuthoritative Off
/Directory
/IfModule

AllowEncodedSlashes On
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined
ServerSignature Off

/VirtualHost

Can someone point out to me where am i doing wrong?

-- 
Vinod
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-11 Thread Jasper Siero
No problem thanks for helping. 
I don't want to disable the deep scrubbing process itself because its very 
useful but one placement group (3.30) is continuously deep scrubbing and it 
should finish after some time but it won't.

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: maandag 10 november 2014 18:24
Aan: Jasper Siero
CC: ceph-users; John Spray
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

It's supposed to do that; deep scrubbing is an ongoing
consistency-check mechanism. If you really want to disable it you can
set an osdmap flag to prevent it, but you'll have to check the docs
for exactly what that is as I can't recall.
Glad things are working for you; sorry it took so long!
-Greg

On Mon, Nov 10, 2014 at 8:49 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello John and Greg,

 I used the new patch and now the undump succeeded and the mds is working fine 
 and I can mount cephfs again!

 I still have one placement group which keeps deep scrubbing even after 
 restarting the ceph cluster:
 dumped all in format plain
 3.300   0   0   0   0   0   0   
 active+clean+scrubbing+deep 2014-11-10 17:21:15.866965  0'0 
 2414:418[1,9]   1   [1,9]   1   631'34632014-08-21 
 15:14:45.430926  602'31312014-08-18 15:14:37.494913

 I there a way to solve this?

 Kind regards,

 Jasper
 
 Van: Gregory Farnum [g...@gregs42.com]
 Verzonden: vrijdag 7 november 2014 22:42
 Aan: Jasper Siero
 CC: ceph-users; John Spray
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

 On Thu, Nov 6, 2014 at 11:49 AM, John Spray john.sp...@redhat.com wrote:
 This is still an issue on master, so a fix will be coming soon.
 Follow the ticket for updates:
 http://tracker.ceph.com/issues/10025

 Thanks for finding the bug!

 John is off for a vacation, but he pushed a branch wip-10025-firefly
 that if you install that (similar address to the other one) should
 work for you. You'll need to reset and undump again (I presume you
 still have the journal-as-a-file). I'll be merging them in to the
 stable branches pretty shortly as well.
 -Greg


 John

 On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote:
 Jasper,

 Thanks for this -- I've reproduced this issue in a development
 environment.  We'll see if this is also an issue on giant, and
 backport a fix if appropriate.  I'll update this thread soon.

 Cheers,
 John

 On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens 
 Gregory Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process 
 again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c 
 /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 
 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 

Re: [ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied

2014-11-11 Thread Daniel Schneller

On 2014-11-11 13:12:32 +, ವಿನೋದ್ Vinod H I said:


Hi,
I am having problems accessing rados gateway using swift interface.
I am using ceph firefly version and have configured a us region as 
explained in the docs.

There are two zones us-east and us-west.
us-east gateway is running on host ceph-node-1 and us-west gateway is 
running on host ceph-node-2.


[...]



Auth GET failed: http://ceph-node-1/auth 403 Forbidden
[...]



  swift_keys: [
        { user: useast:swift,
          secret_key: FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw}],


We have seen problems when the secret_key has special characters. I am 
not sure if + is one of them, but the manual states this somewhere. 
Try setting the key explictly or by re-generating one until you get one 
without any special chars.


Drove me nuts.

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump osd tree

2014-11-11 Thread Mallikarjun Biradar
Thanks christian.. Got clear about the concept.. thanks very much :)

On Tue, Nov 11, 2014 at 5:47 PM, Loic Dachary l...@dachary.org wrote:

 Hi Christian,

 On 11/11/2014 13:09, Christian Balzer wrote:
  On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:
 
  Hi all
 
  When Issued ceph osd dump it displays weight for that osd as 1 and when
  issued osd tree it displays 0.35
 
 
  There are many threads about this, google is your friend. For example:
  https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html
 
  In short, one is the CRUSH weight (usually based on the capacity of the
  OSD), the other is the OSD weight (or reweight in the tree display).
 
  For example think about a cluster with 100 2TB OSDs and you're planning
 to
  replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
  speed, so if you would just replace things, more and more data would
  migrate to your bigger OSDs, making the whole cluster actually slower.
  Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
  replacement is complete) will result in them getting the same allocation
 as
  the 2TB ones, keeping things even.

 It is a great example. Would you like to add it to
 http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If
 you do not have time, I volunteer to do it :-)

 Cheers

 
  Christian
 
  output from osd dump:
  { osd: 20,
uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e,
up: 1,
in: 1,
weight: 1.00,
primary_affinity: 1.00,
last_clean_begin: 0,
last_clean_end: 0,
up_from: 103,
up_thru: 106,
down_at: 0,
lost_at: 0,
public_addr: 10.242.43.116:6820\/27623,
cluster_addr: 10.242.43.116:6821\/27623,
heartbeat_back_addr: 10.242.43.116:6822\/27623,
heartbeat_front_addr: 10.242.43.116:6823\/27623,
state: [
  exists,
  up]}],
 
  output from osd tree:
  # idweight  type name   up/down reweight
  -1  7.35root default
  -2  2.8 host rack6-storage-5
  0   0.35osd.0   up  1
  1   0.35osd.1   up  1
  2   0.35osd.2   up  1
  3   0.35osd.3   up  1
  4   0.35osd.4   up  1
  5   0.35osd.5   up  1
  6   0.35osd.6   up  1
  7   0.35osd.7   up  1
  -3  2.8 host rack6-storage-4
  8   0.35osd.8   up  1
  9   0.35osd.9   up  1
  10  0.35osd.10  up  1
  11  0.35osd.11  up  1
  12  0.35osd.12  up  1
  13  0.35osd.13  up  1
  14  0.35osd.14  up  1
  15  0.35osd.15  up  1
  -4  1.75host rack6-storage-6
  16  0.35osd.16  up  1
  17  0.35osd.17  up  1
  18  0.35osd.18  up  1
  19  0.35osd.19  up  1
  20  0.35osd.20  up  1
 
  Please help me to understand this
 
  -regards,
  Mallikarjun Biradar
 
 

 --
 Loïc Dachary, Artisan Logiciel Libre


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett
Ok I believe I’ve made some progress here. I have everything syncing *except* 
data. The data is getting 500s when it tries to sync to the backup zone. I have 
a log from the radosgw with debug cranked up to 20:

2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request 
req=0x7f546800f3b0 =
2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header 
for bucket, generating default
2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 -- 
172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 
statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 
submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call 
statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 
172.16.10.103:6934/14875, have pipe.
2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 
osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 
193.1cf20a5a ondisk+write e47531) v4
2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer sending 48 0x7f534800d770
2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got ACK
2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got ack seq 48
2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got MSG
2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0
2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600
2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got front 190
2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).aborted = 0
2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 
statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6
2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 
0x7f51b4001950 prio 127
2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 

[ceph-users] long term support version?

2014-11-11 Thread Chad Seys
Hi all,

Did I notice correctly that firefly is going to be supported long term 
whereas Giant is not going to be supported as long?

http://ceph.com/releases/v0-80-firefly-released/
This release will form the basis for our long-term supported release Firefly, 
v0.80.x.

http://ceph.com/uncategorized/v0-87-giant-released/
This release will form the basis for the stable release Giant, v0.87.x.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-11 Thread Chad Seys
Thanks Craig,

I'll jiggle the OSDs around to see if that helps.

Otherwise, I'm almost certain removing the pool will work. :/

Have a good one,
Chad.

 I had the same experience with force_create_pg too.
 
 I ran it, and the PGs sat there in creating state.  I left the cluster
 overnight, and sometime in the middle of the night, they created.  The
 actual transition from creating to active+clean happened during the
 recovery after a single OSD was kicked out.  I don't recall if that single
 OSD was responsible for the creating PGs.  I really can't say what
 un-jammed my creating.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] InInstalling ceph on a single machine with cephdeploy ubuntu 14.04 64 bit

2014-11-11 Thread tejaksjy

Hi,

I am unable to figure out how to install and deploy ceph on a single machine 
with ceph deploy. I have ubuntu 14.04 - 64 bit installed in a virtual machine 
(on windows 8.1 through VMware player)  and have installed devstack on ubuntu. 
I am trying to install ceph on the same machine (Ubuntu) and interface with 
openstack. I have tried the following steps but it says that mkcephfs does not 
exist and I read that it is deprecated and ceph - deploy is there. But 
documentation talks about multiple nodes. I am lost as to how to use ceph 
deploy and install and setup ceph on a single machine. Pl guide me. I tried the 
following steps earlier which was given for mkcephfs.

( reference http://eu.ceph.com/docs/wip-6919/start/quick-start/ sudo apt-get 
update  sudo apt-get install ceph (2) Execute hostname -s on the command line 
to retrieve the name of your host. Then, replace {hostname} in the sample 
configuration file with your host name. Execute ifconfig on the command line to 
retrieve the IP address of your host. Then, replace {ip-address} with the IP 
address of your host. Finally, copy the contents of the modified configuration 
file and save it to /etc/ceph/ceph.conf. This file will configure Ceph to 
operate a monitor, two OSD daemons and one metadata server on your local machin

[osd] osd journal size = 1000 filestore xattr use omap = true
# Execute $ hostname to retrieve the name of your host,
# and replace {hostname} with the name of your host.
# For the monitor, replace {ip-address} with the IP
# address of your host.


[mon.a]
host = {hostname}
mon addr = {ip-address}:6789


[osd.0] host = {hostname}

[osd.1] host = {hostname}

[mds.a] host = {hostname}

sudo mkdir /var/lib/ceph/osd/ceph-0 sudo mkdir /var/lib/ceph/osd/ceph-1 sudo 
mkdir /var/lib/ceph/mon/ceph-a sudo mkdir /var/lib/ceph/mds/ceph-a

cd /etc/ceph sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

sudo service ceph start ceph health





Regards






Sent from Windows Mail___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long term support version?

2014-11-11 Thread Gregory Farnum
Yep! Every other stable release gets the LFS treatment. We're still fixing
bugs and backporting some minor features to Dumpling, but haven't done any
serious updates to Emperor since Firefly came out. Giant will be superseded
by Hammer in the February timeframe, if I have my dates right.
-Greg
On Tue, Nov 11, 2014 at 8:54 AM Chad Seys cws...@physics.wisc.edu wrote:

 Hi all,

 Did I notice correctly that firefly is going to be supported long term
 whereas Giant is not going to be supported as long?

 http://ceph.com/releases/v0-80-firefly-released/
 This release will form the basis for our long-term supported release
 Firefly,
 v0.80.x.

 http://ceph.com/uncategorized/v0-87-giant-released/
 This release will form the basis for the stable release Giant, v0.87.x.

 Thanks!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-11 Thread Gregory Farnum
On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote:

 Hello,

 One of my clusters has become busy enough (I'm looking at you, evil Window
 VMs that I shall banish elsewhere soon) to experience client noticeable
 performance impacts during deep scrub.
 Before this I instructed all OSDs to deep scrub in parallel at Saturday
 night and that finished before Sunday morning.
 So for now I'll fire them off one by one to reduce the load.

 Looking forward, that cluster doesn't need more space so instead of adding
 more hosts and OSDs I was thinking of a cache pool instead.

 I suppose that will keep the clients happy while the slow pool gets
 scrubbed.
 Is there anybody who tested cache pools with Firefly and compared the
 performance to Giant?

 For testing I'm currently playing with a single storage node and 8 SSD
 backed OSDs.
 Now what very much blew my mind is that a pool with a replication of 1
 still does quite the impressive read orgy, clearly reading all the data in
 the PGs.
 Why? And what is it comparing that data with, the cosmic background
 radiation?

Yeah, cache pools currently do full-object promotions whenever an
object is accessed. There are some ideas and projects to improve this
or reduce its effects, but they're mostly just getting started.
At least, I assume that's what you mean by a read orgy; perhaps you
are seeing something else entirely?

Also, even on cache pools you don't really want to run with 1x
replication as they hold the only copy of whatever data is dirty...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-11 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 No problem thanks for helping.
 I don't want to disable the deep scrubbing process itself because its very 
 useful but one placement group (3.30) is continuously deep scrubbing and it 
 should finish after some time but it won't.

Hmm, how are you determining that this one PG won't stop scrubbing?
This doesn't sound like any issues familiar to me.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Hi Guys,

We ran into this issue after we nearly max’ed out the sod’s. Since then, we 
have cleaned up a lot of data in the sod’s but pg’s seem to stuck for last 4 to 
5 days. I have run ceph osd reweight-by-utilization” and that did not seem to 
work.

Any suggestions? 


ceph -s
cluster 909c7fe9-0012-4c27-8087-01497c661511
 health HEALTH_WARN 224 pgs backfill; 130 pgs backfill_toofull; 86 pgs 
backfilling; 4 pgs degraded; 14 pgs recovery_wait; 324 pgs stuck unclean; 
recovery -11922/573322 objects degraded (-2.079%)
 monmap e5: 5 mons at 
{Lab-mon001=x.x.96.12:6789/0,Lab-mon002=x.x.96.13:6789/0,Lab-mon003=x.x.96.14:6789/0,Lab-mon004=x.x.96.15:6789/0,Lab-mon005=x.x.96.16:6789/0},
 election epoch 28, quorum 0,1,2,3,4 
Lab-mon001,Lab-mon002,Lab-mon003,Lab-mon004,Lab-mon005
 mdsmap e6: 1/1/1 up {0=Lab-mon001=up:active}
 osdmap e10598: 495 osds: 492 up, 492 in
  pgmap v1827231: 21568 pgs, 3 pools, 221 GB data, 184 kobjects
4142 GB used, 4982 GB / 9624 GB avail
-11922/573322 objects degraded (-2.079%)
   9 active+recovery_wait
   21244 active+clean
  90 active+remapped+wait_backfill
   5 active+recovery_wait+remapped
   4 active+degraded+remapped+wait_backfill
 130 active+remapped+wait_backfill+backfill_toofull
  86 active+remapped+backfilling
  client io 0 B/s rd, 0 op/s

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-11 Thread Alexandre DERUMIER
Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with 
a cisco 6500

rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms


(Seem to be lower than your 10gbe nexus)


- Mail original - 

De: Wido den Hollander w...@42on.com 
À: ceph-users@lists.ceph.com 
Envoyé: Lundi 10 Novembre 2014 17:22:04 
Objet: Re: [ceph-users] Typical 10GbE latency 

On 08-11-14 02:42, Gary M wrote: 
 Wido, 
 
 Take the switch out of the path between nodes and remeasure.. ICMP-echo 
 requests are very low priority traffic for switches and network stacks. 
 

I tried with a direct TwinAx and fiber cable. No difference. 

 If you really want to know, place a network analyzer between the nodes 
 to measure the request packet to response packet latency.. The ICMP 
 traffic to the ping application is not accurate in the sub-millisecond 
 range. And should only be used as a rough estimate. 
 

True, I fully agree with you. But, why is everybody showing a lower 
latency here? My latencies are about 40% higher then what I see in this 
setup and other setups. 

 You also may want to install the high resolution timer patch, sometimes 
 called HRT, to the kernel which may give you different results. 
 
 ICMP traffic takes a different path than the TCP traffic and should not 
 be considered an indicator of defect. 
 

Yes, I'm aware. But it still doesn't explain me why the latency on other 
systems, which are in production, is lower then on this idle system. 

 I believe the ping app calls the sendto system call.(sorry its been a 
 while since I last looked) Systems calls can take between .1us and .2us 
 each. However, the ping application makes several of these calls and 
 waits for a signal from the kernel. The wait for a signal means the ping 
 application must wait to be rescheduled to report the time.Rescheduling 
 will depend on a lot of other factors in the os. eg, timers, card 
 interrupts other tasks with higher priorities. Reporting the time must 
 add a few more systems calls for this to happen. As the ping application 
 loops to post the next ping request which again requires a few systems 
 calls which may cause a task switch while in each system call. 
 
 For the above factors, the ping application is not a good representation 
 of network performance due to factors in the application and network 
 traffic shaping performed at the switch and the tcp stacks. 
 

I think that netperf is probably a better tool, but that also does TCP 
latencies. 

I want the real IP latency, so I assumed that ICMP would be the most 
simple one. 

The other setups I have access to are in production and do not have any 
special tuning, yet their latency is still lower then on this new 
deployment. 

That's what gets me confused. 

Wido 

 cheers, 
 gary 
 
 
 On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło 
 jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: 
 
 Hi, 
 
 rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 
 
 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
 SFI/SFP+ Network Connection (rev 01) 
 
 at both hosts and Arista 7050S-64 between. 
 
 Both hosts were part of active ceph cluster. 
 
 
 On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com 
 mailto:w...@42on.com wrote: 
 
 Hello, 
 
 While working at a customer I've ran into a 10GbE latency which 
 seems 
 high to me. 
 
 I have access to a couple of Ceph cluster and I ran a simple 
 ping test: 
 
 $ ping -s 8192 -c 100 -n ip 
 
 Two results I got: 
 
 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms 
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms 
 
 Both these environment are running with Intel 82599ES 10Gbit 
 cards in 
 LACP. One with Extreme Networks switches, the other with Arista. 
 
 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 
 switches I'm 
 seeing: 
 
 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms 
 
 As you can see, the Cisco Nexus network has high latency 
 compared to the 
 other setup. 
 
 You would say the switches are to blame, but we also tried with 
 a direct 
 TwinAx connection, but that didn't help. 
 
 This setup also uses the Intel 82599ES cards, so the cards don't 
 seem to 
 be the problem. 
 
 The MTU is set to 9000 on all these networks and cards. 
 
 I was wondering, others with a Ceph cluster running on 10GbE, 
 could you 
 perform a simple network latency test like this? I'd like to 
 compare the 
 results. 
 
 -- 
 Wido den Hollander 
 42on B.V. 
 Ceph trainer and consultant 
 
 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 
 Skype: contact42on 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 
 
 
 
 -- 
 Łukasz Jagiełło 
 lukaszatjagiellodotorg 
 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com 
 

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
I've done a bit more work tonight and managed to get some more data
back. Osd.121, which was previously completely dead, has made it
through an XFS repair with a more fault tolerant HBA firmware and I
was able to export both of the placement groups required using
ceph_objectstore_tool. The osd would probably boot if I hadn't already
marked it as lost :(

I've basically got it down to two options.

I can import the exported data from osd.121 into osd.190 which would
complete the PG but this fails with a filestore feature mismatch
because the sharded objects feature is missing on the target osd.
Export has incompatible features set
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints}

The second one would be to run a ceph pg force_create_pg on each of
the problem PG's to reset them back to empty and them import the data
using ceph_objectstore_tool import-rados. Unfortunately this has
failed as well when I tested ceph pg force_create_pg on an incomplete
PG in another pool. The PG gets set to creating but then goes back to
incomplete after a few minutes.

I've trawled the mailing list for solutions but have come up empty,
neither problem appears to have been resolved before.

On Tue, Nov 11, 2014 at 5:54 PM, Matthew Anderson
manderson8...@gmail.com wrote:
 Thanks for your reply Sage!

 I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
 Stop osd.117
 Export 8.6ae from osd.117
 Remove 8.6ae from osd.117
 start osd.117
 restart osd.190 after still showing incomplete

 After this the PG was still showing incomplete and ceph pg dump_stuck
 inactive shows -
 pg_stat objects mip degr misp unf bytes log disklog state state_stamp
 v reported up up_primary acting acting_primary last_scrub scrub_stamp
 last_deep_scrub deep_scrub_stamp
 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

 I then tried an export from OSD 190 to OSD 117 by doing -
 Stop osd.190 and osd.117
 Export pg 8.6ae from osd.190
 Import from file generated in previous step into osd.117
 Boot both osd.190 and osd.117

 When osd.117 attempts to start it generates an failed assert, full log
 is here http://pastebin.com/S4CXrTAL
 -1 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
  0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
 function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
 time 2014-11-11 17:25:18.602626
 osd/OSD.h: 715: FAILED assert(ret)

  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x8b) [0xb8231b]
  2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
  3: (OSD::load_pgs()+0x1b78) [0x6aae18]
  4: (OSD::init()+0x71f) [0x6abf5f]
  5: (main()+0x252c) [0x638cfc]
  6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
  7: /usr/bin/ceph-osd() [0x651027]

 I also attempted the same steps with 8.ca and got the same results.
 The below is the current state of the pg with it removed from osd.111
 -
 pg_stat objects mip degr misp unf bytes log disklog state state_stamp
 v reported up up_primary acting acting_primary last_scrub scrub_stamp
 last_deep_scrub deep_scrub_stamp
 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
 12:57:58.162789

 Any idea of where I can go from here?
 One thought I had was setting osd.111 and osd.117 out of the cluster
 and once the data is moved I can shut them down and mark them as lost
 which would make osd.190 the only replica available for those PG's.

 Thanks again

 On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote:
 On Tue, 11 Nov 2014, Matthew Anderson wrote:
 Just an update, it appears that no data actually exists for those PG's
 on osd.117 and osd.111 but it's showing as incomplete anyway.

 So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
 filled with data.
 For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
 filled with data as before.

 Since all of the required data is on OSD.190, would there be a way to
 make osd.111 and osd.117 forget they have ever seen the two incomplete
 PG's and therefore restart backfilling?

 Ah, that's good news.  You should know that the copy on osd.190 is
 slightly out of date, but it is much better than losing the entire
 contents of the PG.  More specifically, for 8.6ae the latest version was
 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
 need to fsck the RBD images after this is all done.

 I don't think we've tested this recovery scenario, but I think you'll be
 able to recovery with 

[ceph-users] Not finding systemd files in Giant CentOS7 packages

2014-11-11 Thread Robert LeBlanc
I was trying to get systemd to bring up the monitor using the new systemd
files in Giant. However, I'm not finding the systemd files included in the
CentOS 7 packages. Are they missing or am I confused about how it should
work?

ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
Installed Packages
ceph.x86_64  1:0.87-0.el7.centos
   @Ceph
ceph-common.x86_64   1:0.87-0.el7.centos
   @Ceph
ceph-deploy.noarch   1.5.19-0
  @Ceph-noarch
ceph-release.noarch  1-0.el7
   installed
libcephfs1.x86_641:0.87-0.el7.centos
   @Ceph
python-ceph.x86_64   1:0.87-0.el7.centos
   @Ceph

Thanks,
Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Chad Seys
Find out which OSD it is:

ceph health detail

Squeeze blocks off the affected OSD:

ceph osd reweight OSDNUM 0.8

Repeat with any OSD which becomes toofull.

Your cluster is only about 50% used, so I think this will be enough.

Then when it finishes, allow data back on OSD:

ceph osd reweight OSDNUM 1

Hopefully ceph will someday be taught to move PGs in a better order!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis
Is that radosgw log from the primary or the secondary zone?  Nothing in
that log jumps out at me.

I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
issue with Apache 2.4 on the primary and replication.  It's fixed, just
waiting for the next firefly release.  Although, that causes 40x errors
with Apache 2.4, not 500 errors.

Have you verified that both system users can read and write to both
clusters?  (Just make sure you clean up the writes to the slave cluster).




On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com
wrote:

 Ok I believe I’ve made some progress here. I have everything syncing
 *except* data. The data is getting 500s when it tries to sync to the backup
 zone. I have a log from the radosgw with debug cranked up to 20:

 2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request
 req=0x7f546800f3b0 =
 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl
 header for bucket, generating default
 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 --
 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write
 e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381
 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
 statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote,
 172.16.10.103:6934/14875, have pipe.
 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415
 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
 statelog.add] 193.1cf20a5a ondisk+write e47531) v4
 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got MSG
 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190
 data=0 off 0
 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler
 0/104857600
 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got front 190
 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).aborted = 0
 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 
 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950
 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0)
 v6
 2014-11-11 14:37:06.695313 

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Thanks Chad. It seems to be working.

—Jiten

On Nov 11, 2014, at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote:

 Find out which OSD it is:
 
 ceph health detail
 
 Squeeze blocks off the affected OSD:
 
 ceph osd reweight OSDNUM 0.8
 
 Repeat with any OSD which becomes toofull.
 
 Your cluster is only about 50% used, so I think this will be enough.
 
 Then when it finishes, allow data back on OSD:
 
 ceph osd reweight OSDNUM 1
 
 Hopefully ceph will someday be taught to move PGs in a better order!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Craig Lewis
How many OSDs are nearfull?

I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I
dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a
bit, then put it back to normal once the scheduling deadlock finished.

Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT
then IN, the weight will be set to 1.0.  If you need something that's
persistent, you can use ceph osd crush reweight osd.NUM crust_weight.
Look at ceph osd tree to get the current weight.

I also recommend stepping towards your goal.  Changing either weight can
cause a lot of unrelated migrations, and the crush weight seems to cause
more than the osd weight.  I step osd weight by 0.125, and crush weight by
0.05.


On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote:

 Find out which OSD it is:

 ceph health detail

 Squeeze blocks off the affected OSD:

 ceph osd reweight OSDNUM 0.8

 Repeat with any OSD which becomes toofull.

 Your cluster is only about 50% used, so I think this will be enough.

 Then when it finishes, allow data back on OSD:

 ceph osd reweight OSDNUM 1

 Hopefully ceph will someday be taught to move PGs in a better order!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett

 On Nov 11, 2014, at 4:21 PM, Craig Lewis cle...@centraldesktop.com wrote:
 
 Is that radosgw log from the primary or the secondary zone?  Nothing in that 
 log jumps out at me.
This is the log from the secondary zone. That HTTP 500 response code coming 
back is the only problem I can find. There are a bunch of 404s from other 
requests to logs and stuff, but I assume those are normal because there’s no 
activity going on. I guess it’s just that cryptic  WARNING: set_req_state_err 
err_no=5 resorting to 500 line that’s the problem. I think I need to get a 
stack trace from that somehow. 

 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
 issue with Apache 2.4 on the primary and replication.  It's fixed, just 
 waiting for the next firefly release.  Although, that causes 40x errors with 
 Apache 2.4, not 500 errors.
It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug 
fix?

 
 Have you verified that both system users can read and write to both clusters? 
  (Just make sure you clean up the writes to the slave cluster).
Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was 
earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
syncing properly, as are the users. It seems like really the only thing that 
isn’t syncing is the .zone.rgw.buckets pool.

Thanks, Aaron 
 
 
 
 
 On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com 
 mailto:aa...@five3genomics.com wrote:
 Ok I believe I’ve made some progress here. I have everything syncing *except* 
 data. The data is getting 500s when it tries to sync to the backup zone. I 
 have a log from the radosgw with debug cranked up to 20:
 
 2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request 
 req=0x7f546800f3b0 =
 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header 
 for bucket, generating default
 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4 remote, 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875, have 
 pipe.
 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 
 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4
 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 
 206599450695048354
 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Actually there were 100’s that were too full. We manually set the OSD weights 
to 0.5 and it seems to be recovering.

Thanks of the tips on crush reweight. I will look into it.

—Jiten

On Nov 11, 2014, at 1:37 PM, Craig Lewis cle...@centraldesktop.com wrote:

 How many OSDs are nearfull?
 
 I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I 
 dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, 
 then put it back to normal once the scheduling deadlock finished. 
 
 Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT 
 then IN, the weight will be set to 1.0.  If you need something that's 
 persistent, you can use ceph osd crush reweight osd.NUM crust_weight.  Look 
 at ceph osd tree to get the current weight.
 
 I also recommend stepping towards your goal.  Changing either weight can 
 cause a lot of unrelated migrations, and the crush weight seems to cause more 
 than the osd weight.  I step osd weight by 0.125, and crush weight by 0.05.
 
 
 On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote:
 Find out which OSD it is:
 
 ceph health detail
 
 Squeeze blocks off the affected OSD:
 
 ceph osd reweight OSDNUM 0.8
 
 Repeat with any OSD which becomes toofull.
 
 Your cluster is only about 50% used, so I think this will be enough.
 
 Then when it finishes, allow data back on OSD:
 
 ceph osd reweight OSDNUM 1
 
 Hopefully ceph will someday be taught to move PGs in a better order!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread cwseys
0.5 might be too much.  All the PGs squeezed off of one OSD will need to 
be stored on another.  The fewer you move the less likely a different 
OSD will become toofull.


Better to adjust in small increments as Craig suggested.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-11 Thread Robert LeBlanc
Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you
are only sending one packet so LACP won't help) one direction is 0.061 ms,
double that and you are at 0.122 ms of bits in flight, then there is
context switching, switch latency (store and forward assumed for 1 Gbps),
etc which I'm not sure would fit in the rest of the 0.057 of you min time.
If it is a 8192 byte payload, then I'm really impressed!

On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER aderum...@odiso.com
wrote:

 Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links
 with a cisco 6500

 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms


 (Seem to be lower than your 10gbe nexus)


 - Mail original -

 De: Wido den Hollander w...@42on.com
 À: ceph-users@lists.ceph.com
 Envoyé: Lundi 10 Novembre 2014 17:22:04
 Objet: Re: [ceph-users] Typical 10GbE latency

 On 08-11-14 02:42, Gary M wrote:
  Wido,
 
  Take the switch out of the path between nodes and remeasure.. ICMP-echo
  requests are very low priority traffic for switches and network stacks.
 

 I tried with a direct TwinAx and fiber cable. No difference.

  If you really want to know, place a network analyzer between the nodes
  to measure the request packet to response packet latency.. The ICMP
  traffic to the ping application is not accurate in the sub-millisecond
  range. And should only be used as a rough estimate.
 

 True, I fully agree with you. But, why is everybody showing a lower
 latency here? My latencies are about 40% higher then what I see in this
 setup and other setups.

  You also may want to install the high resolution timer patch, sometimes
  called HRT, to the kernel which may give you different results.
 
  ICMP traffic takes a different path than the TCP traffic and should not
  be considered an indicator of defect.
 

 Yes, I'm aware. But it still doesn't explain me why the latency on other
 systems, which are in production, is lower then on this idle system.

  I believe the ping app calls the sendto system call.(sorry its been a
  while since I last looked) Systems calls can take between .1us and .2us
  each. However, the ping application makes several of these calls and
  waits for a signal from the kernel. The wait for a signal means the ping
  application must wait to be rescheduled to report the time.Rescheduling
  will depend on a lot of other factors in the os. eg, timers, card
  interrupts other tasks with higher priorities. Reporting the time must
  add a few more systems calls for this to happen. As the ping application
  loops to post the next ping request which again requires a few systems
  calls which may cause a task switch while in each system call.
 
  For the above factors, the ping application is not a good representation
  of network performance due to factors in the application and network
  traffic shaping performed at the switch and the tcp stacks.
 

 I think that netperf is probably a better tool, but that also does TCP
 latencies.

 I want the real IP latency, so I assumed that ICMP would be the most
 simple one.

 The other setups I have access to are in production and do not have any
 special tuning, yet their latency is still lower then on this new
 deployment.

 That's what gets me confused.

 Wido

  cheers,
  gary
 
 
  On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło
  jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote:
 
  Hi,
 
  rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms
 
  04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
  SFI/SFP+ Network Connection (rev 01)
 
  at both hosts and Arista 7050S-64 between.
 
  Both hosts were part of active ceph cluster.
 
 
  On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com
  mailto:w...@42on.com wrote:
 
  Hello,
 
  While working at a customer I've ran into a 10GbE latency which
  seems
  high to me.
 
  I have access to a couple of Ceph cluster and I ran a simple
  ping test:
 
  $ ping -s 8192 -c 100 -n ip
 
  Two results I got:
 
  rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
  rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
 
  Both these environment are running with Intel 82599ES 10Gbit
  cards in
  LACP. One with Extreme Networks switches, the other with Arista.
 
  Now, on a environment with Cisco Nexus 3000 and Nexus 7000
  switches I'm
  seeing:
 
  rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
 
  As you can see, the Cisco Nexus network has high latency
  compared to the
  other setup.
 
  You would say the switches are to blame, but we also tried with
  a direct
  TwinAx connection, but that didn't help.
 
  This setup also uses the Intel 82599ES cards, so the cards don't
  seem to
  be the problem.
 
  The MTU is set to 9000 on all these networks and cards.
 
  I was wondering, others with a Ceph cluster running on 10GbE,
  could you
  perform a simple network latency test like this? I'd like to
  compare the
  results.
 
  --
  Wido den Hollander
  42on B.V.
  

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
I agree. This was just our brute-force method on our test cluster. We won't do 
this on production cluster.

--Jiten

On Nov 11, 2014, at 2:11 PM, cwseys cws...@physics.wisc.edu wrote:

 0.5 might be too much.  All the PGs squeezed off of one OSD will need to be 
 stored on another.  The fewer you move the less likely a different OSD will 
 become toofull.
 
 Better to adjust in small increments as Craig suggested.
 
 Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-11 Thread Christian Balzer
On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote:

 On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote:
 
  Hello,
 
  One of my clusters has become busy enough (I'm looking at you, evil
  Window VMs that I shall banish elsewhere soon) to experience client
  noticeable performance impacts during deep scrub.
  Before this I instructed all OSDs to deep scrub in parallel at Saturday
  night and that finished before Sunday morning.
  So for now I'll fire them off one by one to reduce the load.
 
  Looking forward, that cluster doesn't need more space so instead of
  adding more hosts and OSDs I was thinking of a cache pool instead.
 
  I suppose that will keep the clients happy while the slow pool gets
  scrubbed.
  Is there anybody who tested cache pools with Firefly and compared the
  performance to Giant?
 
  For testing I'm currently playing with a single storage node and 8 SSD
  backed OSDs.
  Now what very much blew my mind is that a pool with a replication of 1
  still does quite the impressive read orgy, clearly reading all the
  data in the PGs.
  Why? And what is it comparing that data with, the cosmic background
  radiation?
 
 Yeah, cache pools currently do full-object promotions whenever an
 object is accessed. There are some ideas and projects to improve this
 or reduce its effects, but they're mostly just getting started.
Thanks for confirming that, so probably not much better than Firefly
_aside_ from the fact that SSD pools should be quite a bit faster in and
by themselves in Giant. 
Guess there is no other way to find out than to test things, I have a
feeling that determining the hot working set otherwise will be rather
difficult.

 At least, I assume that's what you mean by a read orgy; perhaps you
 are seeing something else entirely?
 
Indeed I did, this was just an observation that any pool with a replica of
1 will still read ALL the data during a deep-scrub. What good would that
do?

 Also, even on cache pools you don't really want to run with 1x
 replication as they hold the only copy of whatever data is dirty...

Oh, I agree, this is for testing only. 
Also a replica of 1 doesn't have to mean that the data is unsafe (the OSDs
could be RAIDed). But even though, in production the loss of a single node
shouldn't impact things. And once you go there, a replica of 2 comes
naturally.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis

 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
 issue with Apache 2.4 on the primary and replication.  It's fixed, just
 waiting for the next firefly release.  Although, that causes 40x errors
 with Apache 2.4, not 500 errors.

 It is apache 2.4, but I’m actually running 0.80.7 so I probably have that
 bug fix?


No, the unreleased 0.80.8 has the fix.




 Have you verified that both system users can read and write to both
 clusters?  (Just make sure you clean up the writes to the slave cluster).

 Yes I can write everywhere and radosgw-agent isn’t getting any 403s like
 it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index
 pool is syncing properly, as are the users. It seems like really the only
 thing that isn’t syncing is the .zone.rgw.buckets pool.


That's pretty much the same behavior I was seeing with Apache 2.4.

Try downgrading the primary cluster to Apache 2.2.  In my testing, the
secondary cluster could run 2.2 or 2.4.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.88 released

2014-11-11 Thread Sage Weil
This is the first development release after Giant.  The two main
features merged this round are the new AsyncMessenger (an alternative
implementation of the network layer) from Haomai Wang at UnitedStack,
and support for POSIX file locks in ceph-fuse and libcephfs from Yan,
Zheng.  There is also a big pile of smaller items that re merged while
we were stabilizing Giant, including a range of smaller performance
and bug fixes and some new tracepoints for LTTNG.

Notable Changes
---

* ceph-disk: Scientific Linux support (Dan van der Ster)
* ceph-disk: respect --statedir for keyring (Loic Dachary)
* ceph-fuse, libcephfs: POSIX file lock support (Yan, Zheng)
* ceph-fuse, libcephfs: fix cap flush overflow (Greg Farnum, Yan, Zheng)
* ceph-fuse, libcephfs: fix root inode xattrs (Yan, Zheng)
* ceph-fuse, libcephfs: preserve dir ordering (#9178 Yan, Zheng)
* ceph-fuse, libcephfs: trim inodes before reconnecting to MDS (Yan, 
  Zheng)
* ceph: do not parse injectargs twice (Loic Dachary)
* ceph: make 'ceph -s' output more readable (Sage Weil)
* ceph: new 'ceph tell mds.$name_or_rank_or_gid' (John Spray)
* ceph: test robustness (Joao Eduardo Luis)
* ceph_objectstore_tool: behave with sharded flag (#9661 David Zafman)
* cephfs-journal-tool: fix journal import (#10025 John Spray)
* cephfs-journal-tool: skip up to expire_pos (#9977 John Spray)
* cleanup rados.h definitions with macros (Ilya Dryomov)
* common: shared_cache unit tests (Cheng Cheng)
* config: add $cctid meta variable (Adam Crume)
* crush: fix buffer overrun for poorly formed rules (#9492 Johnu George)
* crush: improve constness (Loic Dachary)
* crushtool: add --location id command (Sage Weil, Loic Dachary)
* default to libnss instead of crypto++ (Federico Gimenez)
* doc: ceph osd reweight vs crush weight (Laurent Guerby)
* doc: document the LRC per-layer plugin configuration (Yuan Zhou)
* doc: erasure code doc updates (Loic Dachary)
* doc: misc updates (Alfredo Deza, VRan Liu)
* doc: preflight doc fixes (John Wilkins)
* doc: update PG count guide (Gerben Meijer, Laurent Guerby, Loic Dachary)
* keyvaluestore: misc fixes (Haomai Wang)
* keyvaluestore: performance improvements (Haomai Wang)
* librados: add rados_pool_get_base_tier() call (Adam Crume)
* librados: cap buffer length (Loic Dachary)
* librados: fix objecter races (#9617 Josh Durgin)
* libradosstriper: misc fixes (Sebastien Ponce)
* librbd: add missing python docstrings (Jason Dillaman)
* librbd: add readahead (Adam Crume)
* librbd: fix cache tiers in list_children and snap_unprotect (Adam Crume)
* librbd: fix performance regression in ObjectCacher (#9513 Adam Crume)
* librbd: lttng tracepoints (Adam Crume)
* librbd: misc fixes (Xinxin Shu, Jason Dillaman)
* mds: fix sessionmap lifecycle bugs (Yan, Zheng)
* mds: initialize root inode xattr version (Yan, Zheng)
* mds: introduce auth caps (John Spray)
* mds: misc bugs (Greg Farnum, John Spray, Yan, Zheng, Henry Change)
* misc coverity fixes (Danny Al-Gaaf)
* mon: add 'ceph osd rename-bucket ...' command (Loic Dachary)
* mon: clean up auth list output (Loic Dachary)
* mon: fix 'osd crush link' id resolution (John Spray)
* mon: fix misc error paths (Joao Eduardo Luis)
* mon: fix paxos off-by-one corner case (#9301 Sage Weil)
* mon: new 'ceph pool ls [detail]' command (Sage Weil)
* mon: wait for writeable before cross-proposing (#9794 Joao Eduardo Luis)
* msgr: avoid useless new/delete (Haomai Wang)
* msgr: fix delay injection bug (#9910 Sage Weil, Greg Farnum)
* msgr: new AsymcMessenger alternative implementation (Haomai Wang)
* msgr: prefetch data when doing recv (Yehuda Sadeh)
* osd: add erasure code corpus (Loic Dachary)
* osd: add misc tests (Loic Dachary, Danny Al-Gaaf)
* osd: cleanup boost optionals (William Kennington)
* osd: expose non-journal backends via ceph-osd CLI (Hoamai Wang)
* osd: fix JSON output for stray OSDs (Loic Dachary)
* osd: fix ioprio options (Loic Dachary)
* osd: fix transaction accounting (Jianpeng Ma)
* osd: misc optimizations (Xinxin Shu, Zhiqiang Wang, Xinze Chi)
* osd: use FIEMAP_FLAGS_SYNC instead of fsync (Jianpeng Ma)
* rados: fix put of /dev/null (Loic Dachary)
* rados: parse command-line arguments more strictly (#8983 Adam Crume)
* rbd-fuse: fix memory leak (Adam Crume)
* rbd-replay-many (Adam Crume)
* rbd-replay: --anonymize flag to rbd-replay-prep (Adam Crume)
* rbd: fix 'rbd diff' for non-existent objects (Adam Crume)
* rbd: fix error when striping with format 1 (Sebastien Han)
* rbd: fix export for image sizes over 2GB (Vicente Cheng)
* rbd: use rolling average for rbd bench-write throughput (Jason Dillaman)
* rgw: send explicit HTTP status string (Yehuda Sadeh)
* rgw: set length for keystone token validation request (#7796 Yehuda 
  Sadeh, Mark Kirkwood)
* udev: fix rules for CentOS7/RHEL7 (Loic Dachary)
* use clock_gettime instead of gettimeofday (Jianpeng Ma)
* vstart.sh: set up environment for s3-tests (Luis Pabon)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at 

[ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-11 Thread Scott Laird
I'm having a problem with my cluster.  It's running 0.87 right now, but I
saw the same behavior with 0.80.5 and 0.80.7.

The problem is that my logs are filling up with replacing existing (lossy)
channel log lines (see below), to the point where I'm filling drives to
100% almost daily just with logs.

It doesn't appear to be network related, because it happens even when
talking to other OSDs on the same host.  The logs pretty much all point to
port 0 on the remote end.  Is this an indicator that it's failing to
resolve port numbers somehow, or is this normal at this point in connection
setup?

The systems that are causing this problem are somewhat unusual; they're
running OSDs in Docker containers, but they *should* be configured to run
as root and have full access to the host's network stack.  They manage to
work, mostly, but things are still really flaky.

Also, is there documentation on what the various fields mean, short of
digging through the source?  And how does Ceph resolve OSD numbers into
host/port addresses?


2014-11-12 01:50:40.802604 7f7828db8700  0 -- 10.2.0.36:6819/1 
10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.802708 7f7816538700  0 -- 10.2.0.36:6830/1 
10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.803346 7f781ba8d700  0 -- 10.2.0.36:6819/1 
10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.803944 7f781996c700  0 -- 10.2.0.36:6830/1 
10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.804185 7f7816538700  0 -- 10.2.0.36:6819/1 
10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.805235 7f7813407700  0 -- 10.2.0.36:6819/1 
10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1
c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.806364 7f781bc8f700  0 -- 10.2.0.36:6819/1 
10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1
c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.806425 7f781aa7d700  0 -- 10.2.0.36:6830/1 
10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress

2014-11-11 Thread Mallikarjun Biradar
Hi Greg,

I am using 0.86

refering to osd logs to check scrub behaviour.. Please have look at log
snippet from osd log

##Triggered scrub on osd.10---
2014-11-12 16:24:21.393135 7f5026f31700  0 log_channel(default) log [INF] :
0.4 scrub ok
2014-11-12 16:24:24.393586 7f5026f31700  0 log_channel(default) log [INF] :
0.20 scrub ok
2014-11-12 16:24:30.393989 7f5026f31700  0 log_channel(default) log [INF] :
0.21 scrub ok
2014-11-12 16:24:33.394764 7f5026f31700  0 log_channel(default) log [INF] :
0.23 scrub ok
2014-11-12 16:24:34.395293 7f5026f31700  0 log_channel(default) log [INF] :
0.36 scrub ok
2014-11-12 16:24:35.941704 7f5026f31700  0 log_channel(default) log [INF] :
1.1 scrub ok
2014-11-12 16:24:39.533780 7f5026f31700  0 log_channel(default) log [INF] :
1.d scrub ok
2014-11-12 16:24:41.811185 7f5026f31700  0 log_channel(default) log [INF] :
1.44 scrub ok
2014-11-12 16:24:54.257384 7f5026f31700  0 log_channel(default) log [INF] :
1.5b scrub ok
2014-11-12 16:25:02.973101 7f5026f31700  0 log_channel(default) log [INF] :
1.67 scrub ok
2014-11-12 16:25:17.597546 7f5026f31700  0 log_channel(default) log [INF] :
1.6b scrub ok
##Previous scrub is still in progress, triggered scrub on osd.10 again--
CEPH re-started scrub operation
20104-11-12 16:25:19.394029 7f5026f31700  0 log_channel(default) log [INF]
: 0.4 scrub ok
2014-11-12 16:25:22.402630 7f5026f31700  0 log_channel(default) log [INF] :
0.20 scrub ok
2014-11-12 16:25:24.695565 7f5026f31700  0 log_channel(default) log [INF] :
0.21 scrub ok
2014-11-12 16:25:25.408821 7f5026f31700  0 log_channel(default) log [INF] :
0.23 scrub ok
2014-11-12 16:25:29.467527 7f5026f31700  0 log_channel(default) log [INF] :
0.36 scrub ok
2014-11-12 16:25:32.558838 7f5026f31700  0 log_channel(default) log [INF] :
1.1 scrub ok
2014-11-12 16:25:35.763056 7f5026f31700  0 log_channel(default) log [INF] :
1.d scrub ok
2014-11-12 16:25:38.166853 7f5026f31700  0 log_channel(default) log [INF] :
1.44 scrub ok
2014-11-12 16:25:40.602758 7f5026f31700  0 log_channel(default) log [INF] :
1.5b scrub ok
2014-11-12 16:25:42.169788 7f5026f31700  0 log_channel(default) log [INF] :
1.67 scrub ok
2014-11-12 16:25:45.851419 7f5026f31700  0 log_channel(default) log [INF] :
1.6b scrub ok
2014-11-12 16:25:51.259453 7f5026f31700  0 log_channel(default) log [INF] :
1.a8 scrub ok
2014-11-12 16:25:53.012220 7f5026f31700  0 log_channel(default) log [INF] :
1.a9 scrub ok
2014-11-12 16:25:54.009265 7f5026f31700  0 log_channel(default) log [INF] :
1.cb scrub ok
2014-11-12 16:25:56.516569 7f5026f31700  0 log_channel(default) log [INF] :
1.e2 scrub ok


 -Thanks  regards,
Mallikarjun Biradar

On Tue, Nov 11, 2014 at 12:18 PM, Gregory Farnum g...@gregs42.com wrote:

 On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar
 mallikarjuna.bira...@gmail.com wrote:
  Hi all,
 
  Triggering shallow scrub on OSD where scrub is already in progress,
 restarts
  scrub from beginning on that OSD.
 
 
  Steps:
  Triggered shallow scrub on an OSD (Cluster is running heavy IO)
  While scrub is in progress, triggered shallow scrub again on that OSD.
 
  Observed behavior, is scrub restarted from beginning on that OSD.
 
  Please let me know, whether its expected behaviour?

 What version of Ceph are you seeing this on? How are you identifying
 that scrub is restarting from the beginning? It sounds sort of
 familiar to me, but I thought this was fixed so it was a no-op if you
 issue another scrub. (That's not authoritative though; I might just be
 missing a reason we want to restart it.)
 -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados mkpool fails, but not ceph osd pool create

2014-11-11 Thread Gauvain Pocentek

Hi all,

I'm facing a problem on a ceph deployment. rados mkpool always fails:

# rados -n client.admin mkpool test
error creating pool test: (2) No such file or directory

rados lspool and rmpool commands work just fine, and the following also 
works:


# ceph osd pool create test 128 128
pool 'test' created

I've enabled rados debug but it really didn't help much. Should I look 
at mons or osds debug logs?


Any idea about what could be happening?

Thanks,
Gauvain Pocentek

Objectif Libre - Infrastructure et Formations Linux
http://www.objectif-libre.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com