Re: Files lost after mds rebuild

2012-12-07 Thread Gregory Farnum
On Wed, Nov 21, 2012 at 11:23 PM, Drunkard Zhang gongfan...@gmail.com wrote:

 2012/11/22 Gregory Farnum g...@inktank.com:
  On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com 
  wrote:
  2012/11/21 Gregory Farnum g...@inktank.com:
  No, absolutely not. There is no relationship between different RADOS
  pools. If you've been using the cephfs tool to place some filesystem
  data in different pools then your configuration is a little more
  complicated (have you done that?), but deleting one pool is never
  going to remove data from the others.
  -Greg
 
  I think that should be a bug. Here's the story I did:
  I created one directory 'audit' in running ceph filesystem, and put
  some data into the directory (about 100GB) before these commands:
  ceph osd pool create audit
  ceph mds add_data_pool 4
  cephfs /mnt/temp/audit/ set_layout -p 4
 
  log3 ~ # ceph osd dump | grep audit
  pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
  8 pgp_num 8 last_change 1558 owner 0
 
  at this time, all data in audit still usable, after 'ceph osd pool
  delete data', the disk space recycled (forgot to test if the data
  still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
  thinking, the data stored before pool created won't follow the pool,
  it still follows the default pool 'data', is this a bug, or intended
  behavior?
 
  Oh, I see. Data is not moved when you set directory layouts; it only
  impacts files created after that point. This is intended behavior —
  Ceph would need to copy the data around anyway in order to make it
  follow the pool. There's no sense in hiding that from the user,
  especially given the complexity involved in doing so safely —
  especially when there are many use cases where you want the files in
  different pools.
  -Greg

 Got you, but how can I know which pools a file lives in? Is there any 
 commands?

You can get this information with the cephfs program if you're using
the kernel client. There's not yet a way to get it out of ceph-fuse,
although we will be implementing it as virtual xattrs in the
not-too-distant future.


 About data and pools relationship, I thought that objects is hooked to
 a pool, when the pool changed, just unhook this and hook to another,
 seems I was wrong.

Indeed that's incorrect. Pools are a logical namespace; when you
delete the pool you are also deleting everything else in it. Doing
otherwise is totally infeasible with Ceph since they also represent
placement policies.
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-22 Thread Drunkard Zhang
2012/11/22 Gregory Farnum g...@inktank.com:
 On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote:
 2012/11/21 Gregory Farnum g...@inktank.com:
 No, absolutely not. There is no relationship between different RADOS
 pools. If you've been using the cephfs tool to place some filesystem
 data in different pools then your configuration is a little more
 complicated (have you done that?), but deleting one pool is never
 going to remove data from the others.
 -Greg

 I think that should be a bug. Here's the story I did:
 I created one directory 'audit' in running ceph filesystem, and put
 some data into the directory (about 100GB) before these commands:
 ceph osd pool create audit
 ceph mds add_data_pool 4
 cephfs /mnt/temp/audit/ set_layout -p 4

 log3 ~ # ceph osd dump | grep audit
 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 1558 owner 0

 at this time, all data in audit still usable, after 'ceph osd pool
 delete data', the disk space recycled (forgot to test if the data
 still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
 thinking, the data stored before pool created won't follow the pool,
 it still follows the default pool 'data', is this a bug, or intended
 behavior?

 Oh, I see. Data is not moved when you set directory layouts; it only
 impacts files created after that point. This is intended behavior —
 Ceph would need to copy the data around anyway in order to make it
 follow the pool. There's no sense in hiding that from the user,
 especially given the complexity involved in doing so safely —
 especially when there are many use cases where you want the files in
 different pools.
 -Greg

Got you, but how can I know which pools a file lives in? Is there any commands?

About data and pools relationship, I thought that objects is hooked to
a pool, when the pool changed, just unhook this and hook to another,
seems I was wrong.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-21 Thread Gregory Farnum
On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote:
 2012/11/21 Gregory Farnum g...@inktank.com:
 No, absolutely not. There is no relationship between different RADOS
 pools. If you've been using the cephfs tool to place some filesystem
 data in different pools then your configuration is a little more
 complicated (have you done that?), but deleting one pool is never
 going to remove data from the others.
 -Greg

 I think that should be a bug. Here's the story I did:
 I created one directory 'audit' in running ceph filesystem, and put
 some data into the directory (about 100GB) before these commands:
 ceph osd pool create audit
 ceph mds add_data_pool 4
 cephfs /mnt/temp/audit/ set_layout -p 4

 log3 ~ # ceph osd dump | grep audit
 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 1558 owner 0

 at this time, all data in audit still usable, after 'ceph osd pool
 delete data', the disk space recycled (forgot to test if the data
 still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
 thinking, the data stored before pool created won't follow the pool,
 it still follows the default pool 'data', is this a bug, or intended
 behavior?

Oh, I see. Data is not moved when you set directory layouts; it only
impacts files created after that point. This is intended behavior —
Ceph would need to copy the data around anyway in order to make it
follow the pool. There's no sense in hiding that from the user,
especially given the complexity involved in doing so safely —
especially when there are many use cases where you want the files in
different pools.
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-20 Thread Gregory Farnum
On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote:
 2012/11/20 Gregory Farnum g...@inktank.com:
 On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote:
 I created a ceph cluster for test, here's mistake I made:
 Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
 removed the mds just added;
 Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy.
 As I can't repair mds.aa, so did 'ceph mds newfs metadata data
 --yes-i-really-mean-it';

 So this command is a mkfs sort of thing. It's deleted all the
 allocation tables and filesystem metadata in favor of new, empty
 ones. You should not run --yes-i-really-mean-it commands if you
 don't know exactly what the command is doing and why you're using it.

 mds.aa was back, but 1TB data was in cluster lost, but disk space
 still used, by 'ceps -s'.

 Is there any chance I can get my data back? If can't, how can I
 retrieve back the disk space.

 There's not currently a great way to get that data back. With
 sufficient energy it could be re-constructed by looking through all
 the RADOS objects and putting something together.
 To retrieve the disk space, you'll want to delete the data and
 metadata RADOS pools. This will of course *eliminate* the data you
 have in your new filesystem, so grab that out first if there's
 anything there you care about. Then create the pools and run the newfs
 command again.
 Also, you've got the syntax wrong on that newfs command. You should be
 using pool IDs:
 ceph mds newfs 1 0 --yes-i-really-mean-it
 (Though these IDs may change after re-creating the pools.)
 -Greg

 I followed your instructions, but didn't success, 'ceph mds newfs 1 0
 --yes-i-really-mean-it' changed nothing, do I have to delete all pools
 I created first? why is this way? Confused.

If you look below at your pools, you no longer have pool IDs 0 and 1.
They were the old data and metadata pools that you just deleted.
You will need to create new pools for the filesystem and use their
IDs.

 While testing, I found that the default pool is parent of all pools I
 created later, right? So, delete the default 'data' pool also deleted
 data belongs to other pools, is this true?

No, absolutely not. There is no relationship between different RADOS
pools. If you've been using the cephfs tool to place some filesystem
data in different pools then your configuration is a little more
complicated (have you done that?), but deleting one pool is never
going to remove data from the others.
-Greg

 log3 ~ # ceph osd pool delete data
 pool 'data' deleted
 log3 ~ # ceph osd pool delete metadata
 pool 'metadata' deleted

 log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it
 new fs with metadata pool 1 and data pool 0
 log3 ~ # ceph osd dump | grep ^pool
 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num
 320 pgp_num 320 last_change 1 owner 0
 pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 1556 owner 0
 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 1558 owner 0
 pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 1561 owner 0
 log3 ~ # ceph -s
health HEALTH_OK
monmap e1: 1 mons at {log3=10.205.119.2:6789/0}, election epoch 0,
 quorum 0 log3
osdmap e1581: 28 osds: 20 up, 20 in
 pgmap v57715: 344 pgs: 344 active+clean; 0 bytes data, 22050 MB
 used, 53628 GB / 55890 GB avail
mdsmap e825: 0/0/1 up
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-20 Thread Drunkard Zhang
'

2012/11/21 Gregory Farnum g...@inktank.com:
 On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote:
 2012/11/20 Gregory Farnum g...@inktank.com:
 On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com 
 wrote:
 I created a ceph cluster for test, here's mistake I made:
 Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
 removed the mds just added;
 Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy.
 As I can't repair mds.aa, so did 'ceph mds newfs metadata data
 --yes-i-really-mean-it';

 So this command is a mkfs sort of thing. It's deleted all the
 allocation tables and filesystem metadata in favor of new, empty
 ones. You should not run --yes-i-really-mean-it commands if you
 don't know exactly what the command is doing and why you're using it.

 mds.aa was back, but 1TB data was in cluster lost, but disk space
 still used, by 'ceps -s'.

 Is there any chance I can get my data back? If can't, how can I
 retrieve back the disk space.

 There's not currently a great way to get that data back. With
 sufficient energy it could be re-constructed by looking through all
 the RADOS objects and putting something together.
 To retrieve the disk space, you'll want to delete the data and
 metadata RADOS pools. This will of course *eliminate* the data you
 have in your new filesystem, so grab that out first if there's
 anything there you care about. Then create the pools and run the newfs
 command again.
 Also, you've got the syntax wrong on that newfs command. You should be
 using pool IDs:
 ceph mds newfs 1 0 --yes-i-really-mean-it
 (Though these IDs may change after re-creating the pools.)
 -Greg

 I followed your instructions, but didn't success, 'ceph mds newfs 1 0
 --yes-i-really-mean-it' changed nothing, do I have to delete all pools
 I created first? why is this way? Confused.

 If you look below at your pools, you no longer have pool IDs 0 and 1.
 They were the old data and metadata pools that you just deleted.
 You will need to create new pools for the filesystem and use their
 IDs.

I did it, but didn't successed:
log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it
new fs with metadata pool 1 and data pool 0
log3 ~ # ceph osd dump | grep ^pool
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num
320 pgp_num 320 last_change 1 owner 0
pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 1556 owner 0
pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 1558 owner 0
pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 1561 owner 0

do I have to delete all pool [345] before recreate mds?

 While testing, I found that the default pool is parent of all pools I
 created later, right? So, delete the default 'data' pool also deleted
 data belongs to other pools, is this true?

 No, absolutely not. There is no relationship between different RADOS
 pools. If you've been using the cephfs tool to place some filesystem
 data in different pools then your configuration is a little more
 complicated (have you done that?), but deleting one pool is never
 going to remove data from the others.
 -Greg

I think that should be a bug. Here's the story I did:
I created one directory 'audit' in running ceph filesystem, and put
some data into the directory (about 100GB) before these commands:
ceph osd pool create audit
ceph mds add_data_pool 4
cephfs /mnt/temp/audit/ set_layout -p 4

log3 ~ # ceph osd dump | grep audit
pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 1558 owner 0

at this time, all data in audit still usable, after 'ceph osd pool
delete data', the disk space recycled (forgot to test if the data
still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
thinking, the data stored before pool created won't follow the pool,
it still follows the default pool 'data', is this a bug, or intended
behavior?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-20 Thread Sage Weil
On Wed, 21 Nov 2012, Drunkard Zhang wrote:
 2012/11/21 Gregory Farnum g...@inktank.com:
  On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com 
  wrote:
  2012/11/20 Gregory Farnum g...@inktank.com:
  On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com 
  wrote:
  I created a ceph cluster for test, here's mistake I made:
  Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
  removed the mds just added;
  Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became 
  laggy.
  As I can't repair mds.aa, so did 'ceph mds newfs metadata data
  --yes-i-really-mean-it';
 
  So this command is a mkfs sort of thing. It's deleted all the
  allocation tables and filesystem metadata in favor of new, empty
  ones. You should not run --yes-i-really-mean-it commands if you
  don't know exactly what the command is doing and why you're using it.
 
  mds.aa was back, but 1TB data was in cluster lost, but disk space
  still used, by 'ceps -s'.
 
  Is there any chance I can get my data back? If can't, how can I
  retrieve back the disk space.
 
  There's not currently a great way to get that data back. With
  sufficient energy it could be re-constructed by looking through all
  the RADOS objects and putting something together.
  To retrieve the disk space, you'll want to delete the data and
  metadata RADOS pools. This will of course *eliminate* the data you
  have in your new filesystem, so grab that out first if there's
  anything there you care about. Then create the pools and run the newfs
  command again.
  Also, you've got the syntax wrong on that newfs command. You should be
  using pool IDs:
  ceph mds newfs 1 0 --yes-i-really-mean-it
  (Though these IDs may change after re-creating the pools.)
  -Greg
 
  I followed your instructions, but didn't success, 'ceph mds newfs 1 0
  --yes-i-really-mean-it' changed nothing, do I have to delete all pools
  I created first? why is this way? Confused.
 
  If you look below at your pools, you no longer have pool IDs 0 and 1.
  They were the old data and metadata pools that you just deleted.
  You will need to create new pools for the filesystem and use their
  IDs.
 
 I did it, but didn't successed:
 log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it
 new fs with metadata pool 1 and data pool 0

Those pool #'s need to refer to pools that currently exist.

 ceph osd pool create data
 ceph osd pool create metadata
 ceph osd dump | grep ^pool

to figure out the new pool IDs, and then do the newfs command and 
substitute *those* in instead of 1 and 0.

 log3 ~ # ceph osd dump | grep ^pool
 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num
 320 pgp_num 320 last_change 1 owner 0
 pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 1556 owner 0
 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 1558 owner 0
 pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 1561 owner 0
 
 do I have to delete all pool [345] before recreate mds?

The other pools are ignored; no need to remove them.

Good luck!
sage


 
  While testing, I found that the default pool is parent of all pools I
  created later, right? So, delete the default 'data' pool also deleted
  data belongs to other pools, is this true?
 
  No, absolutely not. There is no relationship between different RADOS
  pools. If you've been using the cephfs tool to place some filesystem
  data in different pools then your configuration is a little more
  complicated (have you done that?), but deleting one pool is never
  going to remove data from the others.
  -Greg
 
 I think that should be a bug. Here's the story I did:
 I created one directory 'audit' in running ceph filesystem, and put
 some data into the directory (about 100GB) before these commands:
 ceph osd pool create audit
 ceph mds add_data_pool 4
 cephfs /mnt/temp/audit/ set_layout -p 4
 
 log3 ~ # ceph osd dump | grep audit
 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 1558 owner 0
 
 at this time, all data in audit still usable, after 'ceph osd pool
 delete data', the disk space recycled (forgot to test if the data
 still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
 thinking, the data stored before pool created won't follow the pool,
 it still follows the default pool 'data', is this a bug, or intended
 behavior?
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Files lost after mds rebuild

2012-11-19 Thread Gregory Farnum
On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote:
 I created a ceph cluster for test, here's mistake I made:
 Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
 removed the mds just added;
 Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy.
 As I can't repair mds.aa, so did 'ceph mds newfs metadata data
 --yes-i-really-mean-it';

So this command is a mkfs sort of thing. It's deleted all the
allocation tables and filesystem metadata in favor of new, empty
ones. You should not run --yes-i-really-mean-it commands if you
don't know exactly what the command is doing and why you're using it.

 mds.aa was back, but 1TB data was in cluster lost, but disk space
 still used, by 'ceps -s'.

 Is there any chance I can get my data back? If can't, how can I
 retrieve back the disk space.

There's not currently a great way to get that data back. With
sufficient energy it could be re-constructed by looking through all
the RADOS objects and putting something together.
To retrieve the disk space, you'll want to delete the data and
metadata RADOS pools. This will of course *eliminate* the data you
have in your new filesystem, so grab that out first if there's
anything there you care about. Then create the pools and run the newfs
command again.
Also, you've got the syntax wrong on that newfs command. You should be
using pool IDs:
ceph mds newfs 1 0 --yes-i-really-mean-it
(Though these IDs may change after re-creating the pools.)
-Greg


 Now it looks like:
 log3 ~ # ceph -s
health HEALTH_OK
monmap e1: 1 mons at {log3=10.205.119.2:6789/0}, election epoch 0,
 quorum 0 log3
osdmap e1555: 28 osds: 20 up, 20 in
 pgmap v56518: 960 pgs: 960 active+clean; 1134 GB data, 2306 GB
 used, 51353 GB / 55890 GB avail
mdsmap e703: 1/1/1 up {0=aa=up:active}, 1 up:standby

 log3 ~ # df | grep osd |sort
 /dev/sdb1   2.8T  124G  2.5T   5% /ceph/osd.0
 /dev/sdc1   2.8T  104G  2.6T   4% /ceph/osd.1
 /dev/sdd1   2.8T   84G  2.6T   4% /ceph/osd.2
 /dev/sde1   2.8T  117G  2.6T   5% /ceph/osd.3
 /dev/sdf1   2.8T  105G  2.6T   4% /ceph/osd.4
 /dev/sdg1   2.8T   84G  2.6T   4% /ceph/osd.5
 /dev/sdh1   2.8T  140G  2.5T   6% /ceph/osd.6
 /dev/sdi1   2.8T  134G  2.5T   5% /ceph/osd.8
 /dev/sdj1   2.8T  112G  2.6T   5% /ceph/osd.7
 /dev/sdk1   2.8T  159G  2.5T   6% /ceph/osd.9
 /dev/sdl1   2.8T  126G  2.5T   5% /ceph/osd.10

 osd on another host didn't show.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html