Re: [ceph-users] Recovery question

2015-07-30 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I'm glad you were able to recover. I'm sure you learned a lot about
Ceph through the exercise (always seems to be the case for me with
things). I'll look forward to your report so that we can include it in
our operations manual, just in case.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jul 30, 2015 at 12:41 PM, Peter Hinman  wrote:
> For the record, I have been able to recover.  Thank you very much for the
> guidance.
>
> I hate searching the web and finding only partial information on threads
> like this, so I'm going to document and post what I've learned as best I can
> in hopes that it will help someone else out in the future.
>
> --
> Peter Hinman
>
> On 7/29/2015 5:15 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> If you had multiple monitors, you should recover if possible more than
>> 50% of them (they will need to form a quorum). If you can't, it is
>> messy but, you can manually remove enough monitors to start a quorum.
>>  From /etc/ceph/ you will want the keyring and the ceph.conf at a
>> minimim. The keys for the monitor I think are in the store.db which
>> will let the monitors start, but the keyring has the admin key which
>> lets you manage the cluster once you get it up. rbdmap is not needed
>> for recovery (only automatically mounting RBDs at boot time), we can
>> deal with that later if needed.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 4:40 PM, Peter Hinman  wrote:
>>>
>>> Ok - that is encouraging.  I've believe I've got data from a previous
>>> monitor. I see files in a store.db dated yesterday, with a
>>> MANIFEST-
>>> file that is significantly greater than the MANIFEST-07 file listed
>>> for
>>> the current monitors.
>>>
>>> I've actually found data for two previous monitors.  Any idea which one I
>>> should select? The one with the highest manifest number? The most recent
>>> time stamp?
>>>
>>> What files should I be looking for in /etc/conf?  Just the keyring and
>>> rbdmap files?  How important is it to use the same keyring file?
>>>
>>> --
>>> Peter Hinman
>>>
>>>
>>> On 7/29/2015 3:47 PM, Robert LeBlanc wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
 me). You will also need the information from /etc/ceph/ to reconstruct
 the data. I *think* you should be able to just copy this to a new box
 with the same name and IP address and start it up.

 I haven't actually done this, so there still may be some bumps.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
>
> Thanks Robert -
>
> Where would that monitor data (database) be found?
>
> --
> Peter Hinman
>
>
> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> If you built new monitors, this will not work. You would have to
>> recover the monitor data (database) from at least one monitor and
>> rebuild the monitor. The new monitors would not have any information
>> about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>>>
>>> Hi Greg -
>>>
>>> So at the moment, I seem to be trying to resolve a permission error.
>>>
>>> === osd.3 ===
>>> Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>>> 2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
>>> authentication
>>> error (1) Operation not permitted
>>> Error connecting to cluster: PermissionError
>>> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
>>> --name=osd.3
>>> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move
>>> --
>>> 3
>>> 3.64 host=stor-2 root=default'
>>> ceph-disk: Error: ceph osd start failed: Command
>>> '['/usr/sbin/service',
>>> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero
>>> exit
>>> status 1
>>> ceph-disk: Error: One or more partitions failed to activate
>>>
>>>
>>> Is there a way to identify the cause of this PermissionError?  I've
>>> copied
>>> the client.bootstrap-osd key from the output of ceph auth list, and
>>> pasted
>>> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
>>> resolve
>>> the error.
>>>
>>> But it sounds like you are saying that even once I get this resolved,
>>> I
>>

Re: [ceph-users] Recovery question

2015-07-30 Thread Peter Hinman
For the record, I have been able to recover.  Thank you very much for 
the guidance.


I hate searching the web and finding only partial information on threads 
like this, so I'm going to document and post what I've learned as best I 
can in hopes that it will help someone else out in the future.


--
Peter Hinman

On 7/29/2015 5:15 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you had multiple monitors, you should recover if possible more than
50% of them (they will need to form a quorum). If you can't, it is
messy but, you can manually remove enough monitors to start a quorum.
 From /etc/ceph/ you will want the keyring and the ceph.conf at a
minimim. The keys for the monitor I think are in the store.db which
will let the monitors start, but the keyring has the admin key which
lets you manage the cluster once you get it up. rbdmap is not needed
for recovery (only automatically mounting RBDs at boot time), we can
deal with that later if needed.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 4:40 PM, Peter Hinman  wrote:

Ok - that is encouraging.  I've believe I've got data from a previous
monitor. I see files in a store.db dated yesterday, with a MANIFEST-
file that is significantly greater than the MANIFEST-07 file listed for
the current monitors.

I've actually found data for two previous monitors.  Any idea which one I
should select? The one with the highest manifest number? The most recent
time stamp?

What files should I be looking for in /etc/conf?  Just the keyring and
rbdmap files?  How important is it to use the same keyring file?

--
Peter Hinman


On 7/29/2015 3:47 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
me). You will also need the information from /etc/ceph/ to reconstruct
the data. I *think* you should be able to just copy this to a new box
with the same name and IP address and start it up.

I haven't actually done this, so there still may be some bumps.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:

Thanks Robert -

Where would that monitor data (database) be found?

--
Peter Hinman


On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

=== osd.3 ===
Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
authentication
error (1) Operation not permitted
Error connecting to cluster: PermissionError
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
--name=osd.3
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move --
3
3.64 host=stor-2 root=default'
ceph-disk: Error: ceph osd start failed: Command
'['/usr/sbin/service',
'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
status 1
ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've
copied
the client.bootstrap-osd key from the output of ceph auth list, and
pasted
it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
resolve
the error.

But it sounds like you are saying that even once I get this resolved, I
have
no hope of recovering the data?

--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after
destroying
the
monitors. That is...not going to work well. The monitors define the
cluster
and you can't move OSDs into different clusters. We have ideas for how
to
reconstruct monitors and it can be done manually with a lot of hassle,
but
the process isn't written down and there aren't really fools I help
with
it.
:/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal
ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and initialize
and
add "new" osds, but I don't see anything on rebuilding with existing
osd
disks.

Could somebody provide guidance on how to do this?  I'm

Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you had multiple monitors, you should recover if possible more than
50% of them (they will need to form a quorum). If you can't, it is
messy but, you can manually remove enough monitors to start a quorum.
>From /etc/ceph/ you will want the keyring and the ceph.conf at a
minimim. The keys for the monitor I think are in the store.db which
will let the monitors start, but the keyring has the admin key which
lets you manage the cluster once you get it up. rbdmap is not needed
for recovery (only automatically mounting RBDs at boot time), we can
deal with that later if needed.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 4:40 PM, Peter Hinman  wrote:
> Ok - that is encouraging.  I've believe I've got data from a previous
> monitor. I see files in a store.db dated yesterday, with a MANIFEST-
> file that is significantly greater than the MANIFEST-07 file listed for
> the current monitors.
>
> I've actually found data for two previous monitors.  Any idea which one I
> should select? The one with the highest manifest number? The most recent
> time stamp?
>
> What files should I be looking for in /etc/conf?  Just the keyring and
> rbdmap files?  How important is it to use the same keyring file?
>
> --
> Peter Hinman
> International Bridge / ParcelPool.com
>
> On 7/29/2015 3:47 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
>> me). You will also need the information from /etc/ceph/ to reconstruct
>> the data. I *think* you should be able to just copy this to a new box
>> with the same name and IP address and start it up.
>>
>> I haven't actually done this, so there still may be some bumps.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
>>>
>>> Thanks Robert -
>>>
>>> Where would that monitor data (database) be found?
>>>
>>> --
>>> Peter Hinman
>>>
>>>
>>> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 If you built new monitors, this will not work. You would have to
 recover the monitor data (database) from at least one monitor and
 rebuild the monitor. The new monitors would not have any information
 about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>
> Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>=== osd.3 ===
>Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
> authentication
> error (1) Operation not permitted
>Error connecting to cluster: PermissionError
>failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
> --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move --
> 3
> 3.64 host=stor-2 root=default'
>ceph-disk: Error: ceph osd start failed: Command
> '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've
> copied
> the client.bootstrap-osd key from the output of ceph auth list, and
> pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
> resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I
> have
> no hope of recovering the data?
>
> --
> Peter Hinman
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after
> destroying
> the
> monitors. That is...not going to work well. The monitors define the
> cluster
> and you can't move OSDs into different clusters. We have ideas for how
> to
> reconstruct monitors and it can be done manually with a lot of hassle,
> but
> the process isn't written down and there aren't really fools I help
> with
> it.
> :/
> -Greg
>
> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:
>>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal
>> ssds
>> and am attempting to bring them back up again on new hardwa

Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
Ok - that is encouraging.  I've believe I've got data from a previous 
monitor. I see files in a store.db dated yesterday, with a 
MANIFEST- file that is significantly greater than the 
MANIFEST-07 file listed for the current monitors.


I've actually found data for two previous monitors.  Any idea which one 
I should select? The one with the highest manifest number? The most 
recent time stamp?


What files should I be looking for in /etc/conf?  Just the keyring and 
rbdmap files?  How important is it to use the same keyring file?


--
Peter Hinman
International Bridge / ParcelPool.com

On 7/29/2015 3:47 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
me). You will also need the information from /etc/ceph/ to reconstruct
the data. I *think* you should be able to just copy this to a new box
with the same name and IP address and start it up.

I haven't actually done this, so there still may be some bumps.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:

Thanks Robert -

Where would that monitor data (database) be found?

--
Peter Hinman


On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

   === osd.3 ===
   Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
   2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
authentication
error (1) Operation not permitted
   Error connecting to cluster: PermissionError
   failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
3.64 host=stor-2 root=default'
   ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
status 1
   ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've
copied
the client.bootstrap-osd key from the output of ceph auth list, and
pasted
it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
resolve
the error.

But it sounds like you are saying that even once I get this resolved, I
have
no hope of recovering the data?

--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after destroying
the
monitors. That is...not going to work well. The monitors define the
cluster
and you can't move OSDs into different clusters. We have ideas for how to
reconstruct monitors and it can be done manually with a lot of hassle,
but
the process isn't written down and there aren't really fools I help with
it.
:/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and initialize and
add "new" osds, but I don't see anything on rebuilding with existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aED

Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman

Thanks Robert -

Where would that monitor data (database) be found?

--
Peter Hinman

On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

  === osd.3 ===
  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
error (1) Operation not permitted
  Error connecting to cluster: PermissionError
  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
3.64 host=stor-2 root=default'
  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
status 1
  ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've copied
the client.bootstrap-osd key from the output of ceph auth list, and pasted
it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
the error.

But it sounds like you are saying that even once I get this resolved, I have
no hope of recovering the data?

--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after destroying the
monitors. That is...not going to work well. The monitors define the cluster
and you can't move OSDs into different clusters. We have ideas for how to
reconstruct monitors and it can be done manually with a lot of hassle, but
the process isn't written down and there aren't really fools I help with it.
:/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and initialize and
add "new" osds, but I don't see anything on rebuilding with existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
b23I
=1NQh
-END PGP SIGNATURE-



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
me). You will also need the information from /etc/ceph/ to reconstruct
the data. I *think* you should be able to just copy this to a new box
with the same name and IP address and start it up.

I haven't actually done this, so there still may be some bumps.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
> Thanks Robert -
>
> Where would that monitor data (database) be found?
>
> --
> Peter Hinman
>
>
> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> If you built new monitors, this will not work. You would have to
>> recover the monitor data (database) from at least one monitor and
>> rebuild the monitor. The new monitors would not have any information
>> about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>>>
>>> Hi Greg -
>>>
>>> So at the moment, I seem to be trying to resolve a permission error.
>>>
>>>   === osd.3 ===
>>>   Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>>>   2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
>>> authentication
>>> error (1) Operation not permitted
>>>   Error connecting to cluster: PermissionError
>>>   failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
>>> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
>>> 3.64 host=stor-2 root=default'
>>>   ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
>>> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
>>> status 1
>>>   ceph-disk: Error: One or more partitions failed to activate
>>>
>>>
>>> Is there a way to identify the cause of this PermissionError?  I've
>>> copied
>>> the client.bootstrap-osd key from the output of ceph auth list, and
>>> pasted
>>> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
>>> resolve
>>> the error.
>>>
>>> But it sounds like you are saying that even once I get this resolved, I
>>> have
>>> no hope of recovering the data?
>>>
>>> --
>>> Peter Hinman
>>>
>>> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>>>
>>> This sounds like you're trying to reconstruct a cluster after destroying
>>> the
>>> monitors. That is...not going to work well. The monitors define the
>>> cluster
>>> and you can't move OSDs into different clusters. We have ideas for how to
>>> reconstruct monitors and it can be done manually with a lot of hassle,
>>> but
>>> the process isn't written down and there aren't really fools I help with
>>> it.
>>> :/
>>> -Greg
>>>
>>> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

 I've got a situation that seems on the surface like it should be
 recoverable, but I'm struggling to understand how to do it.

 I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
 multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
 and am attempting to bring them back up again on new hardware in a new
 cluster.  I see plenty of documentation on how to zap and initialize and
 add "new" osds, but I don't see anything on rebuilding with existing osd
 disks.

 Could somebody provide guidance on how to do this?  I'm running 94.2 on
 all machines.

 Thanks,

 --
 Peter Hinman


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
>> hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
>> 7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
>> fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
>> st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
>> BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
>> zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
>> +grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
>> dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
>> aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
>> /mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
>> 12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
>> b23I
>> =1NQh
>> -END PGP SIGNATURE-
>
>
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.

Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
> Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>  === osd.3 ===
>  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
> error (1) Operation not permitted
>  Error connecting to cluster: PermissionError
>  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
> 3.64 host=stor-2 root=default'
>  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>  ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've copied
> the client.bootstrap-osd key from the output of ceph auth list, and pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I have
> no hope of recovering the data?
>
> --
> Peter Hinman
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after destroying the
> monitors. That is...not going to work well. The monitors define the cluster
> and you can't move OSDs into different clusters. We have ideas for how to
> reconstruct monitors and it can be done manually with a lot of hassle, but
> the process isn't written down and there aren't really fools I help with it.
> :/
> -Greg
>
> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:
>>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
>> and am attempting to bring them back up again on new hardware in a new
>> cluster.  I see plenty of documentation on how to zap and initialize and
>> add "new" osds, but I don't see anything on rebuilding with existing osd
>> disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 on
>> all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
b23I
=1NQh
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
The end goal is to recover the data.  I don't need to re-implement the 
cluster as it was - that just appeared to the the natural way to recover 
the data.


What monitor data would be required to re-implement the cluster?

--
Peter Hinman
International Bridge / ParcelPool.com

On 7/29/2015 2:55 PM, Gregory Farnum wrote:



On Wednesday, July 29, 2015, Peter Hinman > wrote:


Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

 === osd.3 ===
 Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
 2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
authentication error (1) Operation not permitted
 Error connecting to cluster: PermissionError
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
--name=osd.3 --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush
create-or-move -- 3 3.64 host=stor-2 root=default'
 ceph-disk: Error: ceph osd start failed: Command
'['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
'osd.3']' returned non-zero exit status 1
 ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError? I've
copied the client.bootstrap-osd key from the output of ceph auth
list, and pasted it into /var/lib/ceph/bootstrap-osd/ceph.keyring,
but that has not resolve the error.

But it sounds like you are saying that even once I get this
resolved, I have no hope of recovering the data?


Well, I think you'd need to buy help to assemble a working cluster 
with these OSDs. But if you have rbd images you want to get out, you 
might be able to string together the tools to make that happen. I'd 
have to defer to David (for OSD object extraction options) or 
Josh/Jason (for rbd export/import) for that, though.


ceph-objectstore-tool will I think be part of your solution, but I'm 
not sure how much of can do on its own. What's your end goal?



-- 
Peter Hinman


On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after
destroying the monitors. That is...not going to work well. The
monitors define the cluster and you can't move OSDs into
different clusters. We have ideas for how to reconstruct monitors
and it can be done manually with a lot of hassle, but the process
isn't written down and there aren't really fools I help with it. :/


*tools to help with it. Sorry for the unfortunate autocorrect!




On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman
> wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal
ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3
journal ssds
and am attempting to bring them back up again on new hardware
in a new
cluster.  I see plenty of documentation on how to zap and
initialize and
add "new" osds, but I don't see anything on rebuilding with
existing osd
disks.

Could somebody provide guidance on how to do this?  I'm
running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
On Wednesday, July 29, 2015, Peter Hinman  wrote:

>  Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>  === osd.3 ===
>  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
> error (1) Operation not permitted
>  Error connecting to cluster: PermissionError
>  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
> 3.64 host=stor-2 root=default'
>  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>  ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've copied
> the client.bootstrap-osd key from the output of ceph auth list, and pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I
> have no hope of recovering the data?
>

Well, I think you'd need to buy help to assemble a working cluster with
these OSDs. But if you have rbd images you want to get out, you might be
able to string together the tools to make that happen. I'd have to defer to
David (for OSD object extraction options) or Josh/Jason (for rbd
export/import) for that, though.

ceph-objectstore-tool will I think be part of your solution, but I'm not
sure how much of can do on its own. What's your end goal?


>
> --
> Peter Hinman
>
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after destroying
> the monitors. That is...not going to work well. The monitors define the
> cluster and you can't move OSDs into different clusters. We have ideas for
> how to reconstruct monitors and it can be done manually with a lot of
> hassle, but the process isn't written down and there aren't really fools I
> help with it. :/
>
> *tools to help with it. Sorry for the unfortunate autocorrect!




>
>
>
>  On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman <
> 
> peter.hin...@myib.com
> > wrote:
>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
>> and am attempting to bring them back up again on new hardware in a new
>> cluster.  I see plenty of documentation on how to zap and initialize and
>> add "new" osds, but I don't see anything on rebuilding with existing osd
>> disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 on
>> all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

 === osd.3 ===
 Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
 2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 
authentication error (1) Operation not permitted

 Error connecting to cluster: PermissionError
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3 
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 
3 3.64 host=stor-2 root=default'
 ceph-disk: Error: ceph osd start failed: Command 
'['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' 
returned non-zero exit status 1

 ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've 
copied the client.bootstrap-osd key from the output of ceph auth list, 
and pasted it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that 
has not resolve the error.


But it sounds like you are saying that even once I get this resolved, I 
have no hope of recovering the data?


--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:
This sounds like you're trying to reconstruct a cluster after 
destroying the monitors. That is...not going to work well. The 
monitors define the cluster and you can't move OSDs into different 
clusters. We have ideas for how to reconstruct monitors and it can be 
done manually with a lot of hassle, but the process isn't written down 
and there aren't really fools I help with it. :/

-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman > wrote:


I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal
ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and
initialize and
add "new" osds, but I don't see anything on rebuilding with
existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running
94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds like you're trying to reconstruct a cluster after destroying
the monitors. That is...not going to work well. The monitors define the
cluster and you can't move OSDs into different clusters. We have ideas for
how to reconstruct monitors and it can be done manually with a lot of
hassle, but the process isn't written down and there aren't really fools I
help with it. :/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

> I've got a situation that seems on the surface like it should be
> recoverable, but I'm struggling to understand how to do it.
>
> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
> and am attempting to bring them back up again on new hardware in a new
> cluster.  I see plenty of documentation on how to zap and initialize and
> add "new" osds, but I don't see anything on rebuilding with existing osd
> disks.
>
> Could somebody provide guidance on how to do this?  I'm running 94.2 on
> all machines.
>
> Thanks,
>
> --
> Peter Hinman
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds odd. Can you create a ticket in the tracker with all the
details you can remember or reconstruct?
-Greg

On Wed, Jul 29, 2015 at 8:34 PM Steve Taylor 
wrote:

> I recently migrated 240 OSDs to new servers this way in a single cluster,
> and it worked great. There are two additional items I would note based on
> my experience though.
>
> First, if you're using dmcrypt then of course you need to copy the dmcrypt
> keys for the OSDs to the new host(s). I had to do this in my case, but it
> was very straightforward.
>
> Second was an issue I didn't expect, probably just because of my
> ignorance. I was not able to migrate existing OSDs from different failure
> domains into a new, single failure domain without waiting for full recovery
> to HEALTH_OK in between. The very first server I put OSD disks from two
> different failure domains into had issues. The OSDs came up and in just
> fine, but immediately started flapping and failed to make progress toward
> recovery. I removed the disks from one failure domain and left the others,
> and recovery progressed as expected. As soon as I saw HEALTH_OK I
> re-migrated the OSDs from the other failure domain and again the cluster
> recovered as expected. Proceeding via this method allowed me to migrate all
> 240 OSDs without any further problems. I was also able to migrate as many
> OSDs as I wanted to simultaneously as long as I didn't mix OSDs from
> different, old failure domains in a new failure domain without recovering
> in between. I understand mixing failure domains li
>  ke this is risky, but I sort of expected it to work anyway. Maybe it was
> better in the end that Ceph forced me to do it more safely.
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | Fax: 801.545.4705
>
> If you are not the intended recipient of this message, be advised that any
> dissemination or copying of this message is prohibited.
> If you received this message erroneously, please notify the sender and
> delete it, together with any attachments.
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Peter Hinman
> Sent: Wednesday, July 29, 2015 12:58 PM
> To: Robert LeBlanc 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Recovery question
>
> Thanks for the guidance.  I'm working on building a valid ceph.conf right
> now.  I'm not familiar with the osd-bootstrap key. Is that the standard
> filename for it?  Is it the keyring that is stored on the osd?
>
> I'll see if the logs turn up anything I can decipher after I rebuild the
> ceph.conf file.
>
> --
> Peter Hinman
>
> On 7/29/2015 12:49 PM, Robert LeBlanc wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> >
> > Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
> > should use udev to start he OSDs. In that case, a new host that has
> > the correct ceph.conf and osd-bootstrap key should be able to bring up
> > the OSDs into the cluster automatically. Just make sure you have the
> > correct journal in the same host with the matching OSD disk, udev
> > should do the magic.
> >
> > The OSD logs are your friend if they don't start properly.
> > - 
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >
> >
> > On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
> >> I've got a situation that seems on the surface like it should be
> >> recoverable, but I'm struggling to understand how to do it.
> >>
> >> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> >> multiple hardware failures, I pulled the 3 osd disks and 3 journal
> >> ssds and am attempting to bring them back up again on new hardware in a
> new cluster.
> >> I see plenty of documentation on how to zap and initialize and add "new"
> >> osds, but I don't see anything on rebuilding with existing osd disks.
> >>
> >> Could somebody provide guidance on how to do this?  I'm running 94.2
> >> on all machines.
> >>
> >> Thanks,
> >>
> >> --
> >> Peter Hinman
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > -BEGIN PGP SIGNATURE-
> > Version: Mailvelope v0.13.1
> > Comment: ht

Re: [ceph-users] Recovery question

2015-07-29 Thread Steve Taylor
I recently migrated 240 OSDs to new servers this way in a single cluster, and 
it worked great. There are two additional items I would note based on my 
experience though.

First, if you're using dmcrypt then of course you need to copy the dmcrypt keys 
for the OSDs to the new host(s). I had to do this in my case, but it was very 
straightforward.

Second was an issue I didn't expect, probably just because of my ignorance. I 
was not able to migrate existing OSDs from different failure domains into a 
new, single failure domain without waiting for full recovery to HEALTH_OK in 
between. The very first server I put OSD disks from two different failure 
domains into had issues. The OSDs came up and in just fine, but immediately 
started flapping and failed to make progress toward recovery. I removed the 
disks from one failure domain and left the others, and recovery progressed as 
expected. As soon as I saw HEALTH_OK I re-migrated the OSDs from the other 
failure domain and again the cluster recovered as expected. Proceeding via this 
method allowed me to migrate all 240 OSDs without any further problems. I was 
also able to migrate as many OSDs as I wanted to simultaneously as long as I 
didn't mix OSDs from different, old failure domains in a new failure domain 
without recovering in between. I understand mixing failure domains li
 ke this is risky, but I sort of expected it to work anyway. Maybe it was 
better in the end that Ceph forced me to do it more safely.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Peter 
Hinman
Sent: Wednesday, July 29, 2015 12:58 PM
To: Robert LeBlanc 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Recovery question

Thanks for the guidance.  I'm working on building a valid ceph.conf right now.  
I'm not familiar with the osd-bootstrap key. Is that the standard filename for 
it?  Is it the keyring that is stored on the osd?

I'll see if the logs turn up anything I can decipher after I rebuild the 
ceph.conf file.

--
Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it 
> should use udev to start he OSDs. In that case, a new host that has 
> the correct ceph.conf and osd-bootstrap key should be able to bring up 
> the OSDs into the cluster automatically. Just make sure you have the 
> correct journal in the same host with the matching OSD disk, udev 
> should do the magic.
>
> The OSD logs are your friend if they don't start properly.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
>> I've got a situation that seems on the surface like it should be 
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After 
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal 
>> ssds and am attempting to bring them back up again on new hardware in a new 
>> cluster.
>> I see plenty of documentation on how to zap and initialize and add "new"
>> osds, but I don't see anything on rebuilding with existing osd disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 
>> on all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
> 13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
> ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
> l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
> I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
> diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
> KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
> uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
> Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
> UWyCA9IfxLYsC5tPlii7

Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
Thanks for the guidance.  I'm working on building a valid ceph.conf 
right now.  I'm not familiar with the osd-bootstrap key. Is that the 
standard filename for it?  Is it the keyring that is stored on the osd?


I'll see if the logs turn up anything I can decipher after I rebuild the 
ceph.conf file.


--
Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
should use udev to start he OSDs. In that case, a new host that has
the correct ceph.conf and osd-bootstrap key should be able to bring up
the OSDs into the cluster automatically. Just make sure you have the
correct journal in the same host with the matching OSD disk, udev
should do the magic.

The OSD logs are your friend if they don't start properly.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds and
am attempting to bring them back up again on new hardware in a new cluster.
I see plenty of documentation on how to zap and initialize and add "new"
osds, but I don't see anything on rebuilding with existing osd disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on all
machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
7AWQ
=VJV0
-END PGP SIGNATURE-



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
should use udev to start he OSDs. In that case, a new host that has
the correct ceph.conf and osd-bootstrap key should be able to bring up
the OSDs into the cluster automatically. Just make sure you have the
correct journal in the same host with the matching OSD disk, udev
should do the magic.

The OSD logs are your friend if they don't start properly.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
> I've got a situation that seems on the surface like it should be
> recoverable, but I'm struggling to understand how to do it.
>
> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds and
> am attempting to bring them back up again on new hardware in a new cluster.
> I see plenty of documentation on how to zap and initialize and add "new"
> osds, but I don't see anything on rebuilding with existing osd disks.
>
> Could somebody provide guidance on how to do this?  I'm running 94.2 on all
> machines.
>
> Thanks,
>
> --
> Peter Hinman
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
7AWQ
=VJV0
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
I've got a situation that seems on the surface like it should be 
recoverable, but I'm struggling to understand how to do it.


I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After 
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds 
and am attempting to bring them back up again on new hardware in a new 
cluster.  I see plenty of documentation on how to zap and initialize and 
add "new" osds, but I don't see anything on rebuilding with existing osd 
disks.


Could somebody provide guidance on how to do this?  I'm running 94.2 on 
all machines.


Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com