Re: [ceph-users] replace dead SSD journal
yes I know, but to late now, I'm afraid :) On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote: Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] metadata management in case of ceph object storage and ceph block storage
Thanks to all for your reply It is clear that Monitor keeps track of Pools and PGs in the cluster and there is no need of MDS for Ceph object storage and block storage, since both of these storage types do not require to maintain a file hierarchy.But, I have a question:In Ceph Object Storage, If we use Swift APIs to store objects, then objects are stored in some container within an account according to user interface. but in backend, objects are stored in PGs within a pool.Then, from user perspective, there is a need to maintain the information about number of objects stored in a container and number of container in an account etc. and this information is needed to be updated timely as there are any changes in the count. How does Ceph maintain all these information? Please reply. -RegardsPragya JainDepartment of Computer ScienceUniversity of DelhiDelhi, India On Saturday, 18 April 2015 9:16 AM, pragya jain prag_2...@yahoo.co.in wrote: Thanks to all for your reply -RegardsPragya JainDepartment of Computer ScienceUniversity of DelhiDelhi, India On Friday, 17 April 2015 4:36 PM, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 07.33, Josef Johansson jose...@gmail.com wrote: To your question, which I’m not sure I understand completely. So yes, you don’t need the MDS if you just keep track of block storage and object storage. (i.e. images for KVM) So the Mon keeps track of the metadata for the Pool and PG Well there really ain’t no metadata at all as with a traditional File System, monitors keep track of status of OSDs. Client compute which OSDs to go talk to to get to wanted objects. thus no need for central meta data service to tell clients where data are stored. Ceph is a distributed object storage system with potential no SPF and ability to scale out.Try studying Ross’ slides f.ex. here:http://www.slideshare.net/buildacloud/ceph-intro-and-architectural-overview-by-ross-turkor many other good intros on the net, youtube etc. Clients of a Ceph Cluster can access ‘objects’ (blobs with data) through several means, programatic with librados, as virtual block devices through librbd+librados, and finally as a S3 service through rados GW over http[s] the meta data (users + ACLs, buckets+data…) for S3 objects are stored in various pools in Ceph. CephFS built on top of a Ceph object store can best be compared with combination of a POSIX File System and other Networked File Systems f.ex. NFS,CiFS, AFP, only with a different protocol + access mean (FUSE daemon or kernel module). As it implements a regular file name space, it needs to store meta data of which files exist in such a name space, this is the job of the MDS server(s) which of course uses Ceph object store pools to persistent store this file system meta data info and the MDS keep track of all the files, hence the MDS should have at least 10x the memory of what the Mon have. Hmm 10x memory isn’t a rule of thumb in my book, it all depends of use case at hand.MDS tracks meta data of files stored in a CephFS, which usually is far from all data of a cluster unless CephFS is the only usage of course :)Many use Ceph for sharing virtual block devices among multiple Hypervisors as disk devices for virtual machines (VM images), f.ex. with Openstack, Proxmox etc. I’m no Ceph expert, especially not on CephFS, but this is my picture of it :) Maybe the architecture docs could help you out? http://docs.ceph.com/docs/master/architecture/#cluster-map Hope that resolves your question. Cheers,Josef On 06 Apr 2015, at 18:51, pragya jain prag_2...@yahoo.co.in wrote: Please somebody reply my queries. Thank yuo -RegardsPragya JainDepartment of Computer ScienceUniversity of DelhiDelhi, India On Saturday, 4 April 2015 3:24 PM, pragya jain prag_2...@yahoo.co.in wrote: hello all! As the documentation said One of the unique features of Ceph is that it decouples data and metadata.for applying the mechanism of decoupling, Ceph uses Metadata Server (MDS) cluster.MDS cluster manages metadata operations, like open or rename a file On the other hand, Ceph implementation for object storage as a service and block storage as a service does not require MDS implementation. My question is:In case of object storage and block storage, how does Ceph manage the metadata? Please help me to understand this concept more clearly. Thank you -RegardsPragya JainDepartment of Computer ScienceUniversity of DelhiDelhi, India ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] replace dead SSD journal
might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Questions about an example of ceph infrastructure
Hi, We are thinking about a ceph infrastructure and I have questions. Here is the conceived (but not yet implemented) infrastructure: (please, be careful to read the schema with a monospace font ;)) +-+ | users | |(browser)| +++ | | +++ | | +--+ WAN ++ | | || | +-+| | | | | +-+-+ +-+-+ | | | | | monitor-1 | | monitor-3 | | monitor-2 | | | | | Fiber connection | | | +-+ | | OSD-1| | OSD-13 | | OSD-2| | OSD-14 | | ... | | ... | | OSD-12 | | OSD-24 | | | | | | client-a1 | | client-a2 | | client-b1 | | client-b2 | | | | | +---+ +---+ Datacenter1 Datacenter2 (DC1) (DC2) In DC1: 2 OSD nodes each with 6 OSDs daemons, one per disk. Journals in SSD, there are 2 SSD so 3 journals per SSD. In DC2: the same config. You can imagine for instance that: - client-a1 and client-a2 are radosgw - client-b1 and client-b2 are web servers which use the Cephfs of the cluster. And of course, the principle is to have data dispatched in DC1 and DC2 (size == 2, one copy of the object in DC1, the other in DC2). 1. If I suppose that the latency between DC1 and DC2 (via the fiber connection) is ok, I would like to know which throughput do I need to avoid network bottleneck? Is there a rule to compute the needed throughput? I suppose it depends on the disk throughputs? For instance, I suppose the OSD disks in DC1 (and in DC2) has a throughput equal to 150 MB/s, so with 12 OSD disk in each DC, I have: 12 x 150 = 1800 MB/s ie 1.8 GB/s, ie 14.4 Mbps So, in the fiber, I need to have 14.4 Mbs. Is it correct? Maybe is it too naive reasoning? Furthermore I have not taken into account the SSD. How evaluate the needed throughput more precisely? 2. I'm thinking about disaster recoveries too. For instance, if there is a disaster in DC2, DC1 will work (fine). But if there is a disaster in DC1, DC2 will not work (no quorum). But now, I suppose there is a long and big disaster in DC1. So I suppose DC1 is totally unreachable. In this case, I want to start (manually) my ceph cluster in DC2. No problem with that, I have seen explanations in the documentation to do that: - I stop monitor-3 - I extract the monmap - I remove monitor-1 and monitor-2 from this monmap - I inject the new monmap in monitor-3 - I restart monitor-3 After that, I have a DC1 unreachable but DC2 is working (with only one monitor). But what happens if DC1 becomes again reachable? What will the behavior of monitor-1 and monitor-2 in this case? Do monitor-1 and monitor-2 understand that they belong no longer to the ceph cluster? And now I imagine the worst scenario: DC1 becomes again reachable but the switch in DC1 which is connected on the fiber is very long to restart so that, during a short period, DC1 is reachable but the connection with DC2 is not yet operational. What happens in this period? client-a1 and client-b1 could write data in the cluster in this case, right? And the data in the cluster could be compromised because DC1 in not aware of writings in DC2. Am I wrong? My conclusion about that is: in case of long disaster in DC1, I can restart the ceph cluster in DC2 with the method described above (removing monitor-1 and monitor-2 from the monmap in monitor-3 etc.) but *only* *if* I can definitively stop monitor-1 and monitor-2 in DC1 before (and if I can't, I do nothing and I wait). Is it correct? Thanks in advance for your explanations. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What is a dirty object
Hi, With my testing cluster (Hammer on Ubuntu 14.04), I have this: -- ~# ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 4073G 3897G 176G 4.33 23506 POOLS: NAME ID CATEGORY USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE data 0 -20579M 0.49 1934G 6973 6973 597k 2898k metadata 1 -81447k 0 1934G 5353 243 135k volumes3 -56090M 1.34 1934G 14393 14393 208k 2416k images 4 -12194M 0.29 1934G 1551 1551 6263 5912 .rgw.buckets 13 - 362M 0 1934G 445 445 9244 14954 .users 25 -26 0 1934G 3 30 3 .users.email 26 -26 0 1934G 3 30 3 .users.uid 27 - 1059 0 1934G 6 6 12 6 .rgw.root 28 - 840 0 1934G 3 3 63 3 .rgw.control 29 - 0 0 1934G 8 80 8 .rgw.buckets.extra 30 - 0 0 1934G 8 80 8 .rgw.buckets.index 31 - 0 0 1934G 1111011 .rgw.gc32 - 0 0 1934G 3232032 .rgw 33 - 3064 0 1934G 1717017 -- If I understand well, all objects in the cluster are dirty. Is it normal? What is a dirty object? Thanks for your help. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
heh :) yes, intresting last name :) anyway, all are the exact same age, we implememnted new CEPH nodes at exactly same time - but it's now wearing problem - the dead SSDs were siply DEAD - smartctl-a showing nothing, except 600 PB space/size :) On 18 April 2015 at 09:41, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com