Re: Files lost after mds rebuild
On Wed, Nov 21, 2012 at 11:23 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/22 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/21 Gregory Farnum g...@inktank.com: No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg I think that should be a bug. Here's the story I did: I created one directory 'audit' in running ceph filesystem, and put some data into the directory (about 100GB) before these commands: ceph osd pool create audit ceph mds add_data_pool 4 cephfs /mnt/temp/audit/ set_layout -p 4 log3 ~ # ceph osd dump | grep audit pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 at this time, all data in audit still usable, after 'ceph osd pool delete data', the disk space recycled (forgot to test if the data still usable), only 200MB used, from 'ceph -s'. So, here's what I'm thinking, the data stored before pool created won't follow the pool, it still follows the default pool 'data', is this a bug, or intended behavior? Oh, I see. Data is not moved when you set directory layouts; it only impacts files created after that point. This is intended behavior — Ceph would need to copy the data around anyway in order to make it follow the pool. There's no sense in hiding that from the user, especially given the complexity involved in doing so safely — especially when there are many use cases where you want the files in different pools. -Greg Got you, but how can I know which pools a file lives in? Is there any commands? You can get this information with the cephfs program if you're using the kernel client. There's not yet a way to get it out of ceph-fuse, although we will be implementing it as virtual xattrs in the not-too-distant future. About data and pools relationship, I thought that objects is hooked to a pool, when the pool changed, just unhook this and hook to another, seems I was wrong. Indeed that's incorrect. Pools are a logical namespace; when you delete the pool you are also deleting everything else in it. Doing otherwise is totally infeasible with Ceph since they also represent placement policies. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
2012/11/22 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/21 Gregory Farnum g...@inktank.com: No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg I think that should be a bug. Here's the story I did: I created one directory 'audit' in running ceph filesystem, and put some data into the directory (about 100GB) before these commands: ceph osd pool create audit ceph mds add_data_pool 4 cephfs /mnt/temp/audit/ set_layout -p 4 log3 ~ # ceph osd dump | grep audit pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 at this time, all data in audit still usable, after 'ceph osd pool delete data', the disk space recycled (forgot to test if the data still usable), only 200MB used, from 'ceph -s'. So, here's what I'm thinking, the data stored before pool created won't follow the pool, it still follows the default pool 'data', is this a bug, or intended behavior? Oh, I see. Data is not moved when you set directory layouts; it only impacts files created after that point. This is intended behavior — Ceph would need to copy the data around anyway in order to make it follow the pool. There's no sense in hiding that from the user, especially given the complexity involved in doing so safely — especially when there are many use cases where you want the files in different pools. -Greg Got you, but how can I know which pools a file lives in? Is there any commands? About data and pools relationship, I thought that objects is hooked to a pool, when the pool changed, just unhook this and hook to another, seems I was wrong. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/21 Gregory Farnum g...@inktank.com: No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg I think that should be a bug. Here's the story I did: I created one directory 'audit' in running ceph filesystem, and put some data into the directory (about 100GB) before these commands: ceph osd pool create audit ceph mds add_data_pool 4 cephfs /mnt/temp/audit/ set_layout -p 4 log3 ~ # ceph osd dump | grep audit pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 at this time, all data in audit still usable, after 'ceph osd pool delete data', the disk space recycled (forgot to test if the data still usable), only 200MB used, from 'ceph -s'. So, here's what I'm thinking, the data stored before pool created won't follow the pool, it still follows the default pool 'data', is this a bug, or intended behavior? Oh, I see. Data is not moved when you set directory layouts; it only impacts files created after that point. This is intended behavior — Ceph would need to copy the data around anyway in order to make it follow the pool. There's no sense in hiding that from the user, especially given the complexity involved in doing so safely — especially when there are many use cases where you want the files in different pools. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then removed the mds just added; Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy. As I can't repair mds.aa, so did 'ceph mds newfs metadata data --yes-i-really-mean-it'; So this command is a mkfs sort of thing. It's deleted all the allocation tables and filesystem metadata in favor of new, empty ones. You should not run --yes-i-really-mean-it commands if you don't know exactly what the command is doing and why you're using it. mds.aa was back, but 1TB data was in cluster lost, but disk space still used, by 'ceps -s'. Is there any chance I can get my data back? If can't, how can I retrieve back the disk space. There's not currently a great way to get that data back. With sufficient energy it could be re-constructed by looking through all the RADOS objects and putting something together. To retrieve the disk space, you'll want to delete the data and metadata RADOS pools. This will of course *eliminate* the data you have in your new filesystem, so grab that out first if there's anything there you care about. Then create the pools and run the newfs command again. Also, you've got the syntax wrong on that newfs command. You should be using pool IDs: ceph mds newfs 1 0 --yes-i-really-mean-it (Though these IDs may change after re-creating the pools.) -Greg I followed your instructions, but didn't success, 'ceph mds newfs 1 0 --yes-i-really-mean-it' changed nothing, do I have to delete all pools I created first? why is this way? Confused. If you look below at your pools, you no longer have pool IDs 0 and 1. They were the old data and metadata pools that you just deleted. You will need to create new pools for the filesystem and use their IDs. While testing, I found that the default pool is parent of all pools I created later, right? So, delete the default 'data' pool also deleted data belongs to other pools, is this true? No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg log3 ~ # ceph osd pool delete data pool 'data' deleted log3 ~ # ceph osd pool delete metadata pool 'metadata' deleted log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it new fs with metadata pool 1 and data pool 0 log3 ~ # ceph osd dump | grep ^pool pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1556 owner 0 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1561 owner 0 log3 ~ # ceph -s health HEALTH_OK monmap e1: 1 mons at {log3=10.205.119.2:6789/0}, election epoch 0, quorum 0 log3 osdmap e1581: 28 osds: 20 up, 20 in pgmap v57715: 344 pgs: 344 active+clean; 0 bytes data, 22050 MB used, 53628 GB / 55890 GB avail mdsmap e825: 0/0/1 up -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
' 2012/11/21 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then removed the mds just added; Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy. As I can't repair mds.aa, so did 'ceph mds newfs metadata data --yes-i-really-mean-it'; So this command is a mkfs sort of thing. It's deleted all the allocation tables and filesystem metadata in favor of new, empty ones. You should not run --yes-i-really-mean-it commands if you don't know exactly what the command is doing and why you're using it. mds.aa was back, but 1TB data was in cluster lost, but disk space still used, by 'ceps -s'. Is there any chance I can get my data back? If can't, how can I retrieve back the disk space. There's not currently a great way to get that data back. With sufficient energy it could be re-constructed by looking through all the RADOS objects and putting something together. To retrieve the disk space, you'll want to delete the data and metadata RADOS pools. This will of course *eliminate* the data you have in your new filesystem, so grab that out first if there's anything there you care about. Then create the pools and run the newfs command again. Also, you've got the syntax wrong on that newfs command. You should be using pool IDs: ceph mds newfs 1 0 --yes-i-really-mean-it (Though these IDs may change after re-creating the pools.) -Greg I followed your instructions, but didn't success, 'ceph mds newfs 1 0 --yes-i-really-mean-it' changed nothing, do I have to delete all pools I created first? why is this way? Confused. If you look below at your pools, you no longer have pool IDs 0 and 1. They were the old data and metadata pools that you just deleted. You will need to create new pools for the filesystem and use their IDs. I did it, but didn't successed: log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it new fs with metadata pool 1 and data pool 0 log3 ~ # ceph osd dump | grep ^pool pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1556 owner 0 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1561 owner 0 do I have to delete all pool [345] before recreate mds? While testing, I found that the default pool is parent of all pools I created later, right? So, delete the default 'data' pool also deleted data belongs to other pools, is this true? No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg I think that should be a bug. Here's the story I did: I created one directory 'audit' in running ceph filesystem, and put some data into the directory (about 100GB) before these commands: ceph osd pool create audit ceph mds add_data_pool 4 cephfs /mnt/temp/audit/ set_layout -p 4 log3 ~ # ceph osd dump | grep audit pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 at this time, all data in audit still usable, after 'ceph osd pool delete data', the disk space recycled (forgot to test if the data still usable), only 200MB used, from 'ceph -s'. So, here's what I'm thinking, the data stored before pool created won't follow the pool, it still follows the default pool 'data', is this a bug, or intended behavior? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
On Wed, 21 Nov 2012, Drunkard Zhang wrote: 2012/11/21 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then removed the mds just added; Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy. As I can't repair mds.aa, so did 'ceph mds newfs metadata data --yes-i-really-mean-it'; So this command is a mkfs sort of thing. It's deleted all the allocation tables and filesystem metadata in favor of new, empty ones. You should not run --yes-i-really-mean-it commands if you don't know exactly what the command is doing and why you're using it. mds.aa was back, but 1TB data was in cluster lost, but disk space still used, by 'ceps -s'. Is there any chance I can get my data back? If can't, how can I retrieve back the disk space. There's not currently a great way to get that data back. With sufficient energy it could be re-constructed by looking through all the RADOS objects and putting something together. To retrieve the disk space, you'll want to delete the data and metadata RADOS pools. This will of course *eliminate* the data you have in your new filesystem, so grab that out first if there's anything there you care about. Then create the pools and run the newfs command again. Also, you've got the syntax wrong on that newfs command. You should be using pool IDs: ceph mds newfs 1 0 --yes-i-really-mean-it (Though these IDs may change after re-creating the pools.) -Greg I followed your instructions, but didn't success, 'ceph mds newfs 1 0 --yes-i-really-mean-it' changed nothing, do I have to delete all pools I created first? why is this way? Confused. If you look below at your pools, you no longer have pool IDs 0 and 1. They were the old data and metadata pools that you just deleted. You will need to create new pools for the filesystem and use their IDs. I did it, but didn't successed: log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it new fs with metadata pool 1 and data pool 0 Those pool #'s need to refer to pools that currently exist. ceph osd pool create data ceph osd pool create metadata ceph osd dump | grep ^pool to figure out the new pool IDs, and then do the newfs command and substitute *those* in instead of 1 and 0. log3 ~ # ceph osd dump | grep ^pool pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1556 owner 0 pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1561 owner 0 do I have to delete all pool [345] before recreate mds? The other pools are ignored; no need to remove them. Good luck! sage While testing, I found that the default pool is parent of all pools I created later, right? So, delete the default 'data' pool also deleted data belongs to other pools, is this true? No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some filesystem data in different pools then your configuration is a little more complicated (have you done that?), but deleting one pool is never going to remove data from the others. -Greg I think that should be a bug. Here's the story I did: I created one directory 'audit' in running ceph filesystem, and put some data into the directory (about 100GB) before these commands: ceph osd pool create audit ceph mds add_data_pool 4 cephfs /mnt/temp/audit/ set_layout -p 4 log3 ~ # ceph osd dump | grep audit pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1558 owner 0 at this time, all data in audit still usable, after 'ceph osd pool delete data', the disk space recycled (forgot to test if the data still usable), only 200MB used, from 'ceph -s'. So, here's what I'm thinking, the data stored before pool created won't follow the pool, it still follows the default pool 'data', is this a bug, or intended behavior? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Files lost after mds rebuild
On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then removed the mds just added; Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy. As I can't repair mds.aa, so did 'ceph mds newfs metadata data --yes-i-really-mean-it'; So this command is a mkfs sort of thing. It's deleted all the allocation tables and filesystem metadata in favor of new, empty ones. You should not run --yes-i-really-mean-it commands if you don't know exactly what the command is doing and why you're using it. mds.aa was back, but 1TB data was in cluster lost, but disk space still used, by 'ceps -s'. Is there any chance I can get my data back? If can't, how can I retrieve back the disk space. There's not currently a great way to get that data back. With sufficient energy it could be re-constructed by looking through all the RADOS objects and putting something together. To retrieve the disk space, you'll want to delete the data and metadata RADOS pools. This will of course *eliminate* the data you have in your new filesystem, so grab that out first if there's anything there you care about. Then create the pools and run the newfs command again. Also, you've got the syntax wrong on that newfs command. You should be using pool IDs: ceph mds newfs 1 0 --yes-i-really-mean-it (Though these IDs may change after re-creating the pools.) -Greg Now it looks like: log3 ~ # ceph -s health HEALTH_OK monmap e1: 1 mons at {log3=10.205.119.2:6789/0}, election epoch 0, quorum 0 log3 osdmap e1555: 28 osds: 20 up, 20 in pgmap v56518: 960 pgs: 960 active+clean; 1134 GB data, 2306 GB used, 51353 GB / 55890 GB avail mdsmap e703: 1/1/1 up {0=aa=up:active}, 1 up:standby log3 ~ # df | grep osd |sort /dev/sdb1 2.8T 124G 2.5T 5% /ceph/osd.0 /dev/sdc1 2.8T 104G 2.6T 4% /ceph/osd.1 /dev/sdd1 2.8T 84G 2.6T 4% /ceph/osd.2 /dev/sde1 2.8T 117G 2.6T 5% /ceph/osd.3 /dev/sdf1 2.8T 105G 2.6T 4% /ceph/osd.4 /dev/sdg1 2.8T 84G 2.6T 4% /ceph/osd.5 /dev/sdh1 2.8T 140G 2.5T 6% /ceph/osd.6 /dev/sdi1 2.8T 134G 2.5T 5% /ceph/osd.8 /dev/sdj1 2.8T 112G 2.6T 5% /ceph/osd.7 /dev/sdk1 2.8T 159G 2.5T 6% /ceph/osd.9 /dev/sdl1 2.8T 126G 2.5T 5% /ceph/osd.10 osd on another host didn't show. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html