Re: [ceph-users] Disk failures
Il 15 giu 2016 03:27, "Christian Balzer" ha scritto: > And that makes deep-scrubbing something of quite limited value. This is not true. If you checksum *before* writing to disk (so when data is still in ram) then when reading back from disk you could do the checksum verification and if doesn't match you can heal from the other nodes Obviously you have to replicate directly from ram when bitrot couldn't happen. if you write to disk and then replicate the wrote data you could replicate a rotted value. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disk failures
This is why I use btrfs mirror sets underneath ceph and hopefully more than make up for the space loss by going with 2 replicas instead of 3 and on the fly lzo compression. The ceph deep scrubs replace any need for btrfs scrubs, but I still get the benefit of self healing when btrfs finds bit rot. The only errors I've run into are from hard shutdowns and possible ecc errors due to working with consumer hardware and memory. I've been on top of btrfs using gentoo since Firefly. Bill Sharer On 06/14/2016 09:27 PM, Christian Balzer wrote: Hello, On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote: Hi, bit rot is not "bit rot" per se - nothing is rotting on the drive platter. Never mind that I used the wrong terminology (according to Wiki) and that my long experience with "laser-rot" probably caused me to choose that term, there are data degradation scenarios that are caused by undetected media failures or by the corruption happening in the write path, thus making them quite reproducible. It occurs during reads (mostly, anyway), and it's random. You can happily read a block and get the correct data, then read it again and get garbage, then get correct data again. This could be caused by a worn out cell on SSD but firmwares look for than and rewrite it if the signal is attentuated too much. On spinners there are no cells to refresh so rewriting it doesn't help either. You can't really "look for" bit rot due to the reasons above, strong checksumming/hash verification during reads is the only solution. Which is what I've been saying in the mail below and for years on this ML. And that makes deep-scrubbing something of quite limited value. Christian And trust me, bit rot is a very real thing and very dangerous as well - do you think companies like Seagate or WD would lie about bit rot if it's not real? I'd buy a drive with BER 10^999 over one with 10^14, wouldn't everyone? And it is especially dangerous when something like Ceph handles much larger blocks of data than the client does. While the client (or an app) has some knowledge of the data _and_ hopefully throws an error if it read garbage, Ceph will (if for example snapshots are used and FIEMAP is off) actually have to read the whole object (say 4MiB) and write it elsewhere, without any knowledge whether what it read (and wrote) made any sense to the app. This way corruption might spread silently into your backups if you don't validate the data somehow (or dump it from a database for example, where it's likely to get detected). Btw just because you think you haven't seen it doesn't mean you haven't seen it - never seen artefacting in movies? Just a random bug in the decoder, is it? VoD guys would tell you... For things like databases this is somewhat less impactful - bit rot doesn't "flip a bit" but affects larger blocks of data (like one sector), so databases usually catch this during read and err instead of returning garbage to the client. Jan On 09 Jun 2016, at 09:16, Christian Balzer wrote: Hello, On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote: Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: Ceph currently doesn't do any (relevant) checksumming at all, so if a PRIMARY PG suffers from bit-rot this will be undetected until the next deep-scrub. This is one of the longest and gravest outstanding issues with Ceph and supposed to be addressed with bluestore (which currently doesn't have checksum verified reads either). So if bit rot happens on primary PG, ceph is spreading the currupted data across the cluster? No. You will want to re-read the Ceph docs and the countless posts here about replication within Ceph works. http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale A client write goes to the primary OSD/PG and will not be ACK'ed to the client until is has reached all replica OSDs. This happens while the data is in-flight (in RAM), it's not read from the journal or filestore. What would be sent to the replica, the original data or the saved one? When bit rot happens I'll have 1 corrupted object and 2 good. how do you manage this between deep scrubs? Which data would be used by ceph? I think that a bitrot on a huge VM block device could lead to a mess like the whole device corrupted VM affected by bitrot would be able to stay up and running? And bitrot on a qcow2 file? Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run nor on other systems here where I (can) actually check for it. As to how it would affect things, that very much depends. If it's something like a busy directory inode that gets corrupted, the data in question will be in RAM (SLAB) and the next update will correct things. If it's a logfile, you're likely to never notice until deep-scrub detects it eventually. This isn't a Ceph specific question, on all systems that aren't backed by something like ZFS or BTRFS you're potentially vulnerable to this. Of course if you're that
Re: [ceph-users] striping for a small cluster
looks like well rebuild the cluster when bluestore is released anyway. thanks! On Tue, Jun 14, 2016 at 7:02 PM Christian Balzer wrote: > > Hello, > > On Wed, 15 Jun 2016 00:22:51 + pixelfairy wrote: > > > We have a small cluster, 3mons, each which also have 6 4tb osds, and a > > 20gig link to the cluster (2x10gig lacp to a stacked pair of switches). > > well have at least replica pool (size=3) and one erasure coded pool. > > I'm neither particular knowledgeable nor a fan of EC pools, but keep in > mind that the coding is dictated by the number of OSD nodes, so 3 doesn't > give a lot of options, IIRC. > In fact, it will be the same as a RAID5 and only sustain the loss of one > OSD/disk, something nobody in their right mind does these days. > > > current plan is to have journals coexist with osds as that seems to the > > be safest and most economical. > > > You will be thoroughly disappointed by the performance if you do this, > unless your use case is something like a backup server with very few > random I/Os. > Any performance optimizations will suggest looking at journal SSDs first. > > > what levels of striping would you recommend for this size cluster? any > > other optimization conciderations? looking for a starting point to work > > from. > > > Striping is one of the last things to ponder. > Not only does it depend a LOT on your use case, it's also not possible to > change later on, so getting it right for the initial size and future > growth is an interesting challenge. > > > also, any recommendations for testing / benchmarking these > > configurations? > > > > so far, looking at > > https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/ > > bsd rebuilding itself, and maybe phoronix. > > > Those benchmarks are very much out-dated, both in terms of Ceph versions > and capabilities as well as the tools used (fio is the most common > benchmark tool for some time now). > Once bluestore comes along (in a year or so), there will be another > performance and HW design shift. > > Christian > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] striping for a small cluster
Hello, On Wed, 15 Jun 2016 00:22:51 + pixelfairy wrote: > We have a small cluster, 3mons, each which also have 6 4tb osds, and a > 20gig link to the cluster (2x10gig lacp to a stacked pair of switches). > well have at least replica pool (size=3) and one erasure coded pool. I'm neither particular knowledgeable nor a fan of EC pools, but keep in mind that the coding is dictated by the number of OSD nodes, so 3 doesn't give a lot of options, IIRC. In fact, it will be the same as a RAID5 and only sustain the loss of one OSD/disk, something nobody in their right mind does these days. > current plan is to have journals coexist with osds as that seems to the > be safest and most economical. > You will be thoroughly disappointed by the performance if you do this, unless your use case is something like a backup server with very few random I/Os. Any performance optimizations will suggest looking at journal SSDs first. > what levels of striping would you recommend for this size cluster? any > other optimization conciderations? looking for a starting point to work > from. > Striping is one of the last things to ponder. Not only does it depend a LOT on your use case, it's also not possible to change later on, so getting it right for the initial size and future growth is an interesting challenge. > also, any recommendations for testing / benchmarking these > configurations? > > so far, looking at > https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/ > bsd rebuilding itself, and maybe phoronix. > Those benchmarks are very much out-dated, both in terms of Ceph versions and capabilities as well as the tools used (fio is the most common benchmark tool for some time now). Once bluestore comes along (in a year or so), there will be another performance and HW design shift. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disk failures
Hello, On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote: > Hi, > bit rot is not "bit rot" per se - nothing is rotting on the drive > platter. Never mind that I used the wrong terminology (according to Wiki) and that my long experience with "laser-rot" probably caused me to choose that term, there are data degradation scenarios that are caused by undetected media failures or by the corruption happening in the write path, thus making them quite reproducible. > It occurs during reads (mostly, anyway), and it's random. You > can happily read a block and get the correct data, then read it again > and get garbage, then get correct data again. This could be caused by a > worn out cell on SSD but firmwares look for than and rewrite it if the > signal is attentuated too much. On spinners there are no cells to > refresh so rewriting it doesn't help either. > > You can't really "look for" bit rot due to the reasons above, strong > checksumming/hash verification during reads is the only solution. > Which is what I've been saying in the mail below and for years on this ML. And that makes deep-scrubbing something of quite limited value. Christian > And trust me, bit rot is a very real thing and very dangerous as well - > do you think companies like Seagate or WD would lie about bit rot if > it's not real? I'd buy a drive with BER 10^999 over one with 10^14, > wouldn't everyone? And it is especially dangerous when something like > Ceph handles much larger blocks of data than the client does. While the > client (or an app) has some knowledge of the data _and_ hopefully throws > an error if it read garbage, Ceph will (if for example snapshots are > used and FIEMAP is off) actually have to read the whole object (say > 4MiB) and write it elsewhere, without any knowledge whether what it read > (and wrote) made any sense to the app. This way corruption might spread > silently into your backups if you don't validate the data somehow (or > dump it from a database for example, where it's likely to get detected). > > Btw just because you think you haven't seen it doesn't mean you haven't > seen it - never seen artefacting in movies? Just a random bug in the > decoder, is it? VoD guys would tell you... > > For things like databases this is somewhat less impactful - bit rot > doesn't "flip a bit" but affects larger blocks of data (like one > sector), so databases usually catch this during read and err instead of > returning garbage to the client. > > Jan > > > > > On 09 Jun 2016, at 09:16, Christian Balzer wrote: > > > > > > Hello, > > > > On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote: > > > >> Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: > >>> Ceph currently doesn't do any (relevant) checksumming at all, so if a > >>> PRIMARY PG suffers from bit-rot this will be undetected until the > >>> next deep-scrub. > >>> > >>> This is one of the longest and gravest outstanding issues with Ceph > >>> and supposed to be addressed with bluestore (which currently doesn't > >>> have checksum verified reads either). > >> > >> So if bit rot happens on primary PG, ceph is spreading the currupted > >> data across the cluster? > > No. > > > > You will want to re-read the Ceph docs and the countless posts here > > about replication within Ceph works. > > http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale > > > > A client write goes to the primary OSD/PG and will not be ACK'ed to the > > client until is has reached all replica OSDs. > > This happens while the data is in-flight (in RAM), it's not read from > > the journal or filestore. > > > >> What would be sent to the replica, the original data or the saved > >> one? > >> > >> When bit rot happens I'll have 1 corrupted object and 2 good. > >> how do you manage this between deep scrubs? Which data would be used > >> by ceph? I think that a bitrot on a huge VM block device could lead > >> to a mess like the whole device corrupted > >> VM affected by bitrot would be able to stay up and running? > >> And bitrot on a qcow2 file? > >> > > Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run > > nor on other systems here where I (can) actually check for it. > > > > As to how it would affect things, that very much depends. > > > > If it's something like a busy directory inode that gets corrupted, the > > data in question will be in RAM (SLAB) and the next update will > > correct things. > > > > If it's a logfile, you're likely to never notice until deep-scrub > > detects it eventually. > > > > This isn't a Ceph specific question, on all systems that aren't backed > > by something like ZFS or BTRFS you're potentially vulnerable to this. > > > > Of course if you're that worried, you could always run BTRFS of ZFS > > inside your VM and notice immediately when something goes wrong. > > I personally wouldn't though, due to the performance penalties involved > > (CoW). > > > > > >> Let me try to explain
Re: [ceph-users] ceph-deploy jewel install dependencies
Working for me now. Thanks for taking care of this. - Noah On Tue, Jun 14, 2016 at 5:42 PM, Alfredo Deza wrote: > We are now good to go. > > Sorry for all the troubles, some packages were missed in the metadata, > had to resync+re-sign them to get everything in order. > > Just tested it out and it works as expected. Let me know if you have any > issues. > > On Tue, Jun 14, 2016 at 5:57 PM, Noah Watkins wrote: >> Yeh, I'm still seeing the problem, too Thanks. >> >> On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza wrote: >>> >>> On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza wrote: >>> > Is it possible you tried to install just when I was syncing 10.2.2 ? >>> > >>> > :) >>> > >>> > Would you mind trying this again and see if you are good? >>> > >>> > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins >>> > wrote: >>> >> Installing Jewel with ceph-deploy has been working for weeks. Today I >>> >> started to get some dependency issues: >>> >> >>> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies: >>> >> [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but >>> >> it >>> >> is not going to be installed >>> >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but >>> >> it >>> >> is not going to be installed >>> >> [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= >>> >> 10.2.1-1trusty) but >>> >> it is not going to be installed >>> >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held >>> >> broken >>> >> packages. >>> >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit >>> >> status: >>> >> 100 >>> >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env >>> >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get >>> >> --assume-yes >>> >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew >>> >> ceph >>> >> ceph-mds radosgw >>> >> >>> >> Seems to be an issue with 10.2.1 vs 10.2.2? >>> >>> Bah, it looks like this is still an issue even right now. >>> >>> I will update once I know what is going on >>> >> >>> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base >>> >> Reading package lists... Done >>> >> Building dependency tree >>> >> Reading state information... Done >>> >> Some packages could not be installed. This may mean that you have >>> >> requested an impossible situation or if you are using the unstable >>> >> distribution that some required packages have not yet been created >>> >> or been moved out of Incoming. >>> >> The following information may help to resolve the situation: >>> >> >>> >> The following packages have unmet dependencies: >>> >> ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is >>> >> to >>> >> be installed >>> >> E: Unable to correct problems, you have held broken packages. >>> >> >>> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Spreading deep-scrubbing load
Hello, On Wed, 15 Jun 2016 00:01:42 + Jared Curtis wrote: > I’ve just started looking into one of our ceph clusters because a weekly > deep scrub had a major IO impact on the cluster which caused multiple > VMs to grind to a halt. > A story you will find aplenty in the ML archives. > So far I’ve discovered that this particular cluster is configured > incorrectly for the number of PGS per OSD. Currently that setting is 6 > but should be closer to ~4096 based on the calc tool. > You're having a case of apples and oranges here. PGs (and PGPs, don't forget them!) are configured per pool, the amount of PGs per OSD is a result of all PGs in all pools. Output of "ceph osd pool ls detail" would be helpful for us. > If I change the number of PGS to the suggested values what should I > expect specially around the deep scrub performance but also just in > general as I’m very new to ceph. We're not psychic. The amount of PGs will have an impact, but that very much depends on your existing setup. So the usual, all versions (Ceph/OS), detailed cluster description (all HW details down to the SSD model if you have them, network, etc). Generally speaking, deep-scrub is a very expensive operation with a questionable value, see the current "Disk failures" thread for example. That said, your cluster should be able to cope with it, as the deep-scrub impact is a lot like what you'd get from recovery and/or backfilling operations. Think of deep-scrub causing pain as an early warning sign that your cluster is underpowered and/or badly configured. >What I’m hoping will happen is that > instead of a single weekly deep scrub that runs for 24+ hours we would > have lots of smaller deep scrubs that can hopefully finish in a > reasonable time with minimal cluster impact. > Google and the (albeit often lacking behind) documentation are your friends. These are scrub related configuration parameters, this sample is from my Hammer test cluster and comments below for relevant ones: "osd_scrub_thread_timeout": "60", "osd_scrub_thread_suicide_timeout": "300", "osd_scrub_finalize_thread_timeout": "600", "osd_scrub_invalid_stats": "true", "osd_max_scrubs": "1", Default AFAIK, no more than one scrub per OSD, alas deep scrubs from other OSDs of course might want data from this one as well. "osd_scrub_begin_hour": "0", "osd_scrub_end_hour": "6", These 2 are perfect if your cluster can finish a deep scrub within off-peak hours. "osd_scrub_load_threshold": "0.5", Adjust to not starve your I/O. "osd_scrub_min_interval": "86400", "osd_scrub_max_interval": "604800", "osd_scrub_interval_randomize_ratio": "0.5", Latest Hammer and afterwards can randomize things (spreading the load out), but if you want things to happen within a certain time frame this might not be helpful. "osd_scrub_chunk_min": "5", "osd_scrub_chunk_max": "25", "osd_scrub_sleep": "0.1", This will allow client I/O to get a foot in and tends to be the biggest help in Hammer and before. In Jewel the combined I/O queue should help a lot as well. "osd_deep_scrub_interval": "604800", Once that's exceeded, Ceph will deep-scrub, come hell or high water, ignoring at the very least the load setting above. "osd_deep_scrub_stride": "524288", "osd_deep_scrub_update_digest_min_age": "7200", "osd_debug_scrub_chance_rewrite_digest": "0", Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy jewel install dependencies
We are now good to go. Sorry for all the troubles, some packages were missed in the metadata, had to resync+re-sign them to get everything in order. Just tested it out and it works as expected. Let me know if you have any issues. On Tue, Jun 14, 2016 at 5:57 PM, Noah Watkins wrote: > Yeh, I'm still seeing the problem, too Thanks. > > On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza wrote: >> >> On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza wrote: >> > Is it possible you tried to install just when I was syncing 10.2.2 ? >> > >> > :) >> > >> > Would you mind trying this again and see if you are good? >> > >> > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins >> > wrote: >> >> Installing Jewel with ceph-deploy has been working for weeks. Today I >> >> started to get some dependency issues: >> >> >> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies: >> >> [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but >> >> it >> >> is not going to be installed >> >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but >> >> it >> >> is not going to be installed >> >> [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= >> >> 10.2.1-1trusty) but >> >> it is not going to be installed >> >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held >> >> broken >> >> packages. >> >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit >> >> status: >> >> 100 >> >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env >> >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get >> >> --assume-yes >> >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew >> >> ceph >> >> ceph-mds radosgw >> >> >> >> Seems to be an issue with 10.2.1 vs 10.2.2? >> >> Bah, it looks like this is still an issue even right now. >> >> I will update once I know what is going on >> >> >> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base >> >> Reading package lists... Done >> >> Building dependency tree >> >> Reading state information... Done >> >> Some packages could not be installed. This may mean that you have >> >> requested an impossible situation or if you are using the unstable >> >> distribution that some required packages have not yet been created >> >> or been moved out of Incoming. >> >> The following information may help to resolve the situation: >> >> >> >> The following packages have unmet dependencies: >> >> ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is >> >> to >> >> be installed >> >> E: Unable to correct problems, you have held broken packages. >> >> >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] striping for a small cluster
We have a small cluster, 3mons, each which also have 6 4tb osds, and a 20gig link to the cluster (2x10gig lacp to a stacked pair of switches). well have at least replica pool (size=3) and one erasure coded pool. current plan is to have journals coexist with osds as that seems to the be safest and most economical. what levels of striping would you recommend for this size cluster? any other optimization conciderations? looking for a starting point to work from. also, any recommendations for testing / benchmarking these configurations? so far, looking at https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/ bsd rebuilding itself, and maybe phoronix. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Spreading deep-scrubbing load
I’ve just started looking into one of our ceph clusters because a weekly deep scrub had a major IO impact on the cluster which caused multiple VMs to grind to a halt. So far I’ve discovered that this particular cluster is configured incorrectly for the number of PGS per OSD. Currently that setting is 6 but should be closer to ~4096 based on the calc tool. If I change the number of PGS to the suggested values what should I expect specially around the deep scrub performance but also just in general as I’m very new to ceph. What I’m hoping will happen is that instead of a single weekly deep scrub that runs for 24+ hours we would have lots of smaller deep scrubs that can hopefully finish in a reasonable time with minimal cluster impact. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy jewel install dependencies
Yeh, I'm still seeing the problem, too Thanks. On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza wrote: > On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza wrote: > > Is it possible you tried to install just when I was syncing 10.2.2 ? > > > > :) > > > > Would you mind trying this again and see if you are good? > > > > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins > wrote: > >> Installing Jewel with ceph-deploy has been working for weeks. Today I > >> started to get some dependency issues: > >> > >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies: > >> [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but > it > >> is not going to be installed > >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but > it > >> is not going to be installed > >> [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= > 10.2.1-1trusty) but > >> it is not going to be installed > >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held > broken > >> packages. > >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit > status: > >> 100 > >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env > >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get > --assume-yes > >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew > ceph > >> ceph-mds radosgw > >> > >> Seems to be an issue with 10.2.1 vs 10.2.2? > > Bah, it looks like this is still an issue even right now. > > I will update once I know what is going on > >> > >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base > >> Reading package lists... Done > >> Building dependency tree > >> Reading state information... Done > >> Some packages could not be installed. This may mean that you have > >> requested an impossible situation or if you are using the unstable > >> distribution that some required packages have not yet been created > >> or been moved out of Incoming. > >> The following information may help to resolve the situation: > >> > >> The following packages have unmet dependencies: > >> ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is > to > >> be installed > >> E: Unable to correct problems, you have held broken packages. > >> > >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy jewel install dependencies
On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza wrote: > Is it possible you tried to install just when I was syncing 10.2.2 ? > > :) > > Would you mind trying this again and see if you are good? > > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins wrote: >> Installing Jewel with ceph-deploy has been working for weeks. Today I >> started to get some dependency issues: >> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies: >> [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it >> is not going to be installed >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it >> is not going to be installed >> [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= 10.2.1-1trusty) but >> it is not going to be installed >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken >> packages. >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status: >> 100 >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph >> ceph-mds radosgw >> >> Seems to be an issue with 10.2.1 vs 10.2.2? Bah, it looks like this is still an issue even right now. I will update once I know what is going on >> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base >> Reading package lists... Done >> Building dependency tree >> Reading state information... Done >> Some packages could not be installed. This may mean that you have >> requested an impossible situation or if you are using the unstable >> distribution that some required packages have not yet been created >> or been moved out of Incoming. >> The following information may help to resolve the situation: >> >> The following packages have unmet dependencies: >> ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to >> be installed >> E: Unable to correct problems, you have held broken packages. >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy jewel install dependencies
Is it possible you tried to install just when I was syncing 10.2.2 ? :) Would you mind trying this again and see if you are good? On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins wrote: > Installing Jewel with ceph-deploy has been working for weeks. Today I > started to get some dependency issues: > > [b61808c8624c][DEBUG ] The following packages have unmet dependencies: > [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it > is not going to be installed > [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it > is not going to be installed > [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= 10.2.1-1trusty) but > it is not going to be installed > [b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken > packages. > [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status: > 100 > [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env > DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes > -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph > ceph-mds radosgw > > Seems to be an issue with 10.2.1 vs 10.2.2? > > root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base > Reading package lists... Done > Building dependency tree > Reading state information... Done > Some packages could not be installed. This may mean that you have > requested an impossible situation or if you are using the unstable > distribution that some required packages have not yet been created > or been moved out of Incoming. > The following information may help to resolve the situation: > > The following packages have unmet dependencies: > ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to > be installed > E: Unable to correct problems, you have held broken packages. > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs reporting 2x data available
Thanks John, I just wanted to make sure I wasnt doing anything wrong, that should work fine. Dan On 06/14/2016 03:24 PM, John Spray wrote: On Tue, Jun 14, 2016 at 7:45 PM, Daniel Davidson wrote: I have just deployed a cluster and started messing with it, which I think two replicas. However when I have a metadata server and mount via fuse, it is reporting its full size. With two replicas, I thought it would be only reporting half of that. Did I make a mistake, or is there something I can change to get around that? It reports the overall (raw) free space available on the cluster, i.e. not accounting for replication. I'm assuming that by "it is reporting" you mean that "df" is reporting this on your ceph-fuse mount. Because the replica count is a per-pool thing, and a filesystem can use multiple pools with different replica counts (via files having different layouts), giving the raw free space is the most consistent thing we can do. If you want to see a smarter view of available space, use "ceph df", which gives you a pool breakdown and and an "available" size that takes account of replication. John How do you check that your replicas are actually set correct? It is set in my ceph.conf file, but I am guessing there is someplace else I should look at. Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
But is there any way to recreate bucket index for existing bucket? Is it possible to change bucket's index pool to some new pool in its metadata and them tell RadosGW to rebuild (--check --fix) index? Sounds really crazy but will it work? Will the new index become sharded? 2016-06-14 13:18 GMT+03:00 Ansgar Jazdzewski : > Hi, > > your cluster will be in warning state if you disable scrubbing, and > you relay need it in case of some data loss > > cheers, > Ansgar > > 2016-06-14 11:05 GMT+02:00 Wido den Hollander : >> >>> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : >>> >>> >>> Is it a good idea to disable scrub and deep-scrub for bucket.index >>> pool? What negative consequences it may cause? >>> >> >> No, I would not do that. Scrubbing is essential to detect (silent) data >> corruption. >> >> You should really scrub all your data. >> >>> 2016-06-14 11:51 GMT+03:00 Wido den Hollander : >>> > >>> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >>> >> : >>> >> >>> >> >>> >> Hi, >>> >> >>> >> we are using ceph and radosGW to store images (~300kb each) in S3, >>> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >>> >> >>> >> my questions is: >>> >> >>> >> in case of that amount of objects/files is it better to calculate the >>> >> PGs on a object-bases instant of the volume size? and how it should be >>> >> done? >>> >> >>> > >>> > Do you have bucket sharding enabled? >>> > >>> > And how many objects do you have in a single bucket? >>> > >>> > If sharding is not enabled for the bucket index you might have large >>> > RADOS objects with bucket indexes which are hard to scrub. >>> > >>> > Wido >>> > >>> >> thanks >>> >> Ansgar >>> >> ___ >>> >> ceph-users mailing list >>> >> ceph-users@lists.ceph.com >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > ___ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Protecting rbd from multiple simultaneous mapping.
The email thread here : http://www.spinics.net/lists/ceph-devel/msg12226.html discusses a way of preventing multiple simultaneous clients from mapping an rbd via the legacy advisory locking scheme, along with osd blacklisting. Is it now advisable to use the exclusive lock feature, discussed here : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004857.html ? In other words, does the exclusive lock feature automatically break the lock of any older lock holders and prevent any writes to the rbd from the older holder ? Another way to frame my question would be : what is the recommended way of preventing multiple simultaneous rbd mappings, based on the state-of-the-art in ceph? thanks in advance, - Puneet ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy jewel install dependencies
Installing Jewel with ceph-deploy has been working for weeks. Today I started to get some dependency issues: [b61808c8624c][DEBUG ] The following packages have unmet dependencies: [b61808c8624c][DEBUG ] ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it is not going to be installed [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it is not going to be installed [b61808c8624c][DEBUG ] ceph-mds : Depends: ceph-base (= 10.2.1-1trusty) but it is not going to be installed [b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken packages. [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status: 100 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph ceph-mds radosgw Seems to be an issue with 10.2.1 vs 10.2.2? root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to be installed E: Unable to correct problems, you have held broken packages. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs reporting 2x data available
On Tue, Jun 14, 2016 at 7:45 PM, Daniel Davidson wrote: > I have just deployed a cluster and started messing with it, which I think > two replicas. However when I have a metadata server and mount via fuse, it > is reporting its full size. With two replicas, I thought it would be only > reporting half of that. Did I make a mistake, or is there something I can > change to get around that? It reports the overall (raw) free space available on the cluster, i.e. not accounting for replication. I'm assuming that by "it is reporting" you mean that "df" is reporting this on your ceph-fuse mount. Because the replica count is a per-pool thing, and a filesystem can use multiple pools with different replica counts (via files having different layouts), giving the raw free space is the most consistent thing we can do. If you want to see a smarter view of available space, use "ceph df", which gives you a pool breakdown and and an "available" size that takes account of replication. John > > How do you check that your replicas are actually set correct? It is set in > my ceph.conf file, but I am guessing there is someplace else I should look > at. > > Dan > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Clearing Incomplete Clones State
Hi, Additional information. It seem that snapshot state is wrong. Any idea on my case? How to manually edit pool flags for removing "incomplete_clones" flag? [root@management-b ~]# rados -p rbd ls rbd_directory [root@management-b ~]# rados -p rbd_cache ls rbd_directory [root@management-b ~]# rados -p rbd lssnap 0 snaps [root@management-b ~]# rados -p rbd_cache lssnap 0 snaps Best regards, On Tue, Jun 14, 2016 at 4:55 AM, Lazuardi Nasution wrote: > Hi, > > I have removed cache tiering due to "missing hit_sets" warning. After > removing, I want to try to add tiering again with the same cache pool and > storage pool, but I can't even the cache pool is empty or forced to clear. > Following is some output. How can I deal with this? Is it possible to clear > "incomplete_clones" and "snapshot state"? How to avoid "missing hit_sets" > warning to appear again? > > [root@management-b ~]# ceph osd tier add rbd rbd_cache > Error ENOTEMPTY: tier pool 'rbd_cache' is not empty; --force-nonempty to > force > [root@management-b ~]# ceph osd tier add rbd rbd_cache --force-nonempty > Error ENOTEMPTY: tier pool 'rbd_cache' has snapshot state; it cannot be > added as a tier without breaking the pool > [root@management-b ~]# rados -p rbd_cache ls > rbd_directory > [root@management-b ~]# rados -p rbd lssnap > 0 snaps > [root@management-b ~]# ceph osd dump | grep rbd_cache > pool 6 'rbd_cache' replicated size 3 min_size 1 crush_ruleset 1 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 8090 flags > hashpspool,incomplete_clones stripe_width 0 > [root@management-b ~]# ceph -v > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > [root@management-b ~]# > > Best regards, > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph and Openstack
On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote: :Hi Jon, : Which is the hypervisor used for your Openstack deployment? We have lots :of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package, RDB :driver was not supported ) we're using kvm (Ubuntu 14.04, libvirt 1.2.12 ) -Jon : :Regards, I : :2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx : : :> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote: :> :Hi all, :> : :> :I have a problem integration Glance with Ceph. :> : :> :Openstack Mitaka :> :Ceph Jewel :> : :> :I've following the Ceph doc ( :> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to :> list :> :or create images, I have an error "Unable to establish connection to :> :http://IP:9292/v2/images";, and in the debug mode I can see this: :> :> This suggests that the Glance API service isn't running properly :> and probably isn't related to the rbd backend. :> :> You should be able to conncet to the glance API endpoint even if the :> ceph config is wrong (though you'd probably get 'internal server :> errors' if the storage backend isn't set up correctly). :> :> In either case you'll probably get better resonse on the openstack :> lists, but my suggestion would be to try the regular file backend to :> verify your glance setup is working, then switch to the rbd backend. :> :> -Jon :> :> : :> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store :> :glance_store._drivers.rbd.Store doesn't support updating dynamic storage :> :capabilities. Please overwrite 'update_capabilities' method of the store :> to :> :implement updating logics if needed. update_capabilities :> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98 :> : :> :I've also tried to remove the database and populate again but the same :> :error. :> :Cinder with Ceph works correctly. :> : :> :Any suggestions? :> : :> :Thanks, :> :Fran. :> :> :___ :> :ceph-users mailing list :> :ceph-users@lists.ceph.com :> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com :> :> :> -- :> ___ :> ceph-users mailing list :> ceph-users@lists.ceph.com :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com :> : : : :-- : :Iban Cabrillo Bartolome :Instituto de Fisica de Cantabria (IFCA) :Santander, Spain :Tel: +34942200969 :PGP PUBLIC KEY: :http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC : :Bertrand Russell: :*"El problema con el mundo es que los estúpidos están seguros de todo y los :inteligentes están llenos de dudas*" -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?
On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger wrote: > I just realized that this issue is probably because I’m running jewel 10.2.1 > on the servers side, but accessing from a client running hammer 0.94.7 or > infernalis 9.2.1 > > Here is what happens if I run rbd ls from a client on infernalis. I was > testing this access since we weren’t planning on building rpms for Jewel on > CentOS 6 > > $ rbd ls > 2016-06-13 11:24:06.881591 7fe61e568700 0 -- :/3877046932 >> > 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x562ed3ea0ac0).fault > 2016-06-13 11:24:09.882051 7fe61137f700 0 -- :/3877046932 >> > 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe608004ef0).fault > 2016-06-13 11:24:12.882389 7fe61e568700 0 -- :/3877046932 >> > 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe60800c5f0).fault > 2016-06-13 11:24:18.883642 7fe61e568700 0 -- :/3877046932 >> > 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe6080078e0).fault > 2016-06-13 11:24:21.884259 7fe61137f700 0 -- :/3877046932 >> > 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe608007110).fault Accessing jewel with older clients should work as long as you don't enable jewel tunables and such; the same goes for older kernels. Can you do rbd --debug-ms=20 ls and attach the output? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs reporting 2x data available
I have just deployed a cluster and started messing with it, which I think two replicas. However when I have a metadata server and mount via fuse, it is reporting its full size. With two replicas, I thought it would be only reporting half of that. Did I make a mistake, or is there something I can change to get around that? How do you check that your replicas are actually set correct? It is set in my ceph.conf file, but I am guessing there is someplace else I should look at. Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados and multithreading
Thank you, Jason. 2016-06-14 18:43 GMT+03:00 Jason Dillaman : > On Fri, Jun 10, 2016 at 12:37 PM, Юрий Соколов wrote: >> Good day, all. >> >> I found this issue: https://github.com/ceph/ceph/pull/5991 >> >> Did this issue affected librados ? > > No -- this affected the start-up and shut-down of librbd as described > in the associated tracker ticket. > >> Were it safe to use single rados_ioctx_t from multiple threads before this >> fix? > > Yes. > >> >> -- >> With regards, >> Sokolov Yura aka funny_falcon >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason -- With regards, Sokolov Yura aka funny_falcon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph and Openstack
Hi Jon, Which is the hypervisor used for your Openstack deployment? We have lots of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package, RDB driver was not supported ) Regards, I 2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx : > On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote: > :Hi all, > : > :I have a problem integration Glance with Ceph. > : > :Openstack Mitaka > :Ceph Jewel > : > :I've following the Ceph doc ( > :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to > list > :or create images, I have an error "Unable to establish connection to > :http://IP:9292/v2/images";, and in the debug mode I can see this: > > This suggests that the Glance API service isn't running properly > and probably isn't related to the rbd backend. > > You should be able to conncet to the glance API endpoint even if the > ceph config is wrong (though you'd probably get 'internal server > errors' if the storage backend isn't set up correctly). > > In either case you'll probably get better resonse on the openstack > lists, but my suggestion would be to try the regular file backend to > verify your glance setup is working, then switch to the rbd backend. > > -Jon > > : > :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store > :glance_store._drivers.rbd.Store doesn't support updating dynamic storage > :capabilities. Please overwrite 'update_capabilities' method of the store > to > :implement updating logics if needed. update_capabilities > :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98 > : > :I've also tried to remove the database and populate again but the same > :error. > :Cinder with Ceph works correctly. > : > :Any suggestions? > : > :Thanks, > :Fran. > > :___ > :ceph-users mailing list > :ceph-users@lists.ceph.com > :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA) Santander, Spain Tel: +34942200969 PGP PUBLIC KEY: http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC Bertrand Russell: *"El problema con el mundo es que los estúpidos están seguros de todo y los inteligentes están llenos de dudas*" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados and multithreading
On Fri, Jun 10, 2016 at 12:37 PM, Юрий Соколов wrote: > Good day, all. > > I found this issue: https://github.com/ceph/ceph/pull/5991 > > Did this issue affected librados ? No -- this affected the start-up and shut-down of librbd as described in the associated tracker ticket. > Were it safe to use single rados_ioctx_t from multiple threads before this > fix? Yes. > > -- > With regards, > Sokolov Yura aka funny_falcon > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph and Openstack
On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote: :Hi all, : :I have a problem integration Glance with Ceph. : :Openstack Mitaka :Ceph Jewel : :I've following the Ceph doc ( :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to list :or create images, I have an error "Unable to establish connection to :http://IP:9292/v2/images";, and in the debug mode I can see this: This suggests that the Glance API service isn't running properly and probably isn't related to the rbd backend. You should be able to conncet to the glance API endpoint even if the ceph config is wrong (though you'd probably get 'internal server errors' if the storage backend isn't set up correctly). In either case you'll probably get better resonse on the openstack lists, but my suggestion would be to try the regular file backend to verify your glance setup is working, then switch to the rbd backend. -Jon : :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store :glance_store._drivers.rbd.Store doesn't support updating dynamic storage :capabilities. Please overwrite 'update_capabilities' method of the store to :implement updating logics if needed. update_capabilities :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98 : :I've also tried to remove the database and populate again but the same :error. :Cinder with Ceph works correctly. : :Any suggestions? : :Thanks, :Fran. :___ :ceph-users mailing list :ceph-users@lists.ceph.com :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: ERROR: failed to distribute cache
BTW, I have 10 RGW load balanced through Apache. When restarting one of them I get the following messages in log: 2016-06-14 14:44:15.919801 7fd4728dea40 2 all 8 watchers are set, enabling cache 2016-06-14 14:44:15.919879 7fce370f7700 2 garbage collection: start 2016-06-14 14:44:15.919990 7fce368f6700 2 object expiration: start 2016-06-14 14:44:15.920534 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.15 2016-06-14 14:44:15.921257 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.16 2016-06-14 14:44:15.922145 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.17 2016-06-14 14:44:15.923772 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.18 2016-06-14 14:44:15.924557 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.19 2016-06-14 14:44:15.925400 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.20 2016-06-14 14:44:15.926349 7fd4728dea40 0 starting handler: fastcgi 2016-06-14 14:44:15.927125 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.21 2016-06-14 14:44:15.927897 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.22 2016-06-14 14:44:15.928412 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.23 2016-06-14 14:44:15.929042 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.24 2016-06-14 14:44:15.930752 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.25 2016-06-14 14:44:15.931313 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.26 2016-06-14 14:44:15.932482 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.27 2016-06-14 14:44:15.933237 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.28 2016-06-14 14:44:15.934097 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.29 2016-06-14 14:44:15.934660 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.30 2016-06-14 14:44:15.936322 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.31 2016-06-14 14:44:15.936979 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.0 2016-06-14 14:44:15.937559 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.1 2016-06-14 14:44:15.938222 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.2 2016-06-14 14:44:15.939000 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.3 2016-06-14 14:44:15.939622 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.4 2016-06-14 14:44:15.940135 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.5 2016-06-14 14:44:15.940669 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.6 2016-06-14 14:44:15.941227 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.7 2016-06-14 14:44:15.941854 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.8 2016-06-14 14:44:15.942333 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.9 2016-06-14 14:44:15.943036 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.10 2016-06-14 14:44:15.944708 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.11 2016-06-14 14:44:15.946347 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.12 2016-06-14 14:44:15.947001 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.13 2016-06-14 14:44:15.947610 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.14 2016-06-14 14:44:15.947615 7fce370f7700 2 garbage collection: stop 2016-06-14 14:44:15.947949 7fd4728dea40 -1 rgw realm watcher: Failed to watch realms.87abf44e-cab3-48c4-b012-0a9247519a5b.control with (2) No such file or directory 2016-06-14 14:44:15.948370 7fd4728dea40 -1 rgw realm watcher: Failed to establish a watch on RGWRealm, disabling dynamic reconfiguration. 2016-06-14 17:34 GMT+03:00 Василий Ангапов : > I also get the following: > > $ radosgw-admin period update --commit > 2016-06-14 14:32:28.982847 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging > 2016-06-14 14:32:38.991846 7fed392baa40 0 ERROR: failed to distribute > cache for > .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch > 2016-06-14 14:32:49.002380 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3 > 2016-06-14 14:32:59.013307 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch > 2016-06-14 14:33:09.023554 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch > 2016-06-14 14:33:19.034593 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc > 2016-06-14 14:33:29.043825 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:zonegroups_names.ed > 2016-06-14 14:33:29.046386 7fed392baa40 0 Realm notify failed with -2 > { > "id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f", > "epoch": 3, >
Re: [ceph-users] librados and multithreading
Common, friends, No one knows an answer? 12 июня 2016 г. 16:21 пользователь "Юрий Соколов" написал: > I don't know. That is why i'm asking here. > > 2016-06-12 6:36 GMT+03:00 Ken Peng : > > Hi, > > > > We had experienced the similar error, when writing to RBD block with > > multi-threads using fio, some OSD got error and down. > > Did we talk about the same stuff? > > > > 2016-06-11 0:37 GMT+08:00 Юрий Соколов : > >> > >> Good day, all. > >> > >> I found this issue: https://github.com/ceph/ceph/pull/5991 > >> > >> Did this issue affected librados ? > >> Were it safe to use single rados_ioctx_t from multiple threads before > this > >> fix? > >> > >> -- > >> With regards, > >> Sokolov Yura aka funny_falcon > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > -- > With regards, > Sokolov Yura aka funny_falcon > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: ERROR: failed to distribute cache
I also get the following: $ radosgw-admin period update --commit 2016-06-14 14:32:28.982847 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging 2016-06-14 14:32:38.991846 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch 2016-06-14 14:32:49.002380 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3 2016-06-14 14:32:59.013307 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch 2016-06-14 14:33:09.023554 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch 2016-06-14 14:33:19.034593 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc 2016-06-14 14:33:29.043825 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:zonegroups_names.ed 2016-06-14 14:33:29.046386 7fed392baa40 0 Realm notify failed with -2 { "id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f", "epoch": 3, "predecessor_uuid": "f2645d83-b1b4-4045-bf26-2b762c71937b", "sync_status": [ "", "", 2016-06-14 17:12 GMT+03:00 Василий Ангапов : > Hello, > > I have Ceph 10.2.1 and when creating user in RGW I get the following error: > > $ radosgw-admin user create --uid=test --display-name="test" > 2016-06-14 14:07:32.332288 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1 > 2016-06-14 14:07:42.338251 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.users.uid:test > 2016-06-14 14:07:52.362768 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75 > { > "user_id": "test", > "display_name": "test", > "email": "", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [], > "keys": [ > { > "user": "melesta", > "access_key": "***", > "secret_key": "***" > } > ], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": [], > "bucket_quota": { > "enabled": false, > "max_size_kb": -1, > "max_objects": -1 > }, > "user_quota": { > "enabled": false, > "max_size_kb": -1, > "max_objects": -1 > }, > "temp_url_keys": [] > } > > What does it mean? Is something wrong? > > Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW: ERROR: failed to distribute cache
Hello, I have Ceph 10.2.1 and when creating user in RGW I get the following error: $ radosgw-admin user create --uid=test --display-name="test" 2016-06-14 14:07:32.332288 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1 2016-06-14 14:07:42.338251 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.users.uid:test 2016-06-14 14:07:52.362768 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75 { "user_id": "test", "display_name": "test", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "melesta", "access_key": "***", "secret_key": "***" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "temp_url_keys": [] } What does it mean? Is something wrong? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time
I am at the Ceph Day at CERN, I asked to Sage if it is supported to enable both S3 and swift API at the same time. The answer is yes, so it is meant to be supported, and this that we see here is probably a bug. I opened a bug report: http://tracker.ceph.com/issues/16293 If anyone has a chance to test it on a ceph version newer than Hammer, you can update the bug :) thank you Saverio 2016-05-12 15:49 GMT+02:00 Yehuda Sadeh-Weinraub : > On Thu, May 12, 2016 at 12:29 AM, Saverio Proto wrote: >>> While I'm usually not fond of blaming the client application, this is >>> really the swift command line tool issue. It tries to be smart by >>> comparing the md5sum of the object's content with the object's etag, >>> and it breaks with multipart objects. Multipart objects is calculated >>> differently (md5sum of the md5sum of each part). I think the swift >>> tool has a special handling for swift large objects (which are not the >>> same as s3 multipart objects), so that's why it works in that specific >>> use case. >> >> Well but I tried also with rclone and I have the same issue. >> >> Clients I tried >> rclone (both SWIFT and S3) >> s3cmd (S3) >> python-swiftclient (SWIFT). >> >> I can reproduce the issue with different clients. >> Once a multipart object is uploaded via S3 (with rclone or s3cmd) I >> cannot read it anymore via SWIFT (either with rclone or >> pythonswift-client). >> >> Are you saying that all SWIFT clients implementations are wrong ? > > Yes. > >> >> Or should the radosgw be configured with only 1 API active ? >> >> Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] "mount error 5 = Input/output error" with the CephFS file system from client node
On Tue, Jun 14, 2016 at 4:29 AM, Rakesh Parkiti wrote: > Hello, > > Unable to mount the CephFS file system from client node with "mount error 5 > = Input/output error" > MDS was installed on a separate node. Ceph Cluster health is OK and mds > services are running. firewall was disabled across all the nodes in a > cluster. > > -- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1) > -- Client Nodes - Ubuntu 14.04 LTS > > Admin Node: > [root@Admin ceph]# ceph mds stat > e34: 0/0/1 up > > Client Side: > user@clientA2:/etc/ceph$ ceph fs ls --name client.admin > name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] > > user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user > /home/user/cephfs -o > name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw== > mount error 5 = Input/output error > > Connection Establishment was successful to monitor node. > $tail -f /var/log/syslog > Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid > 66c5f31c-1756-47ce-889d-960e0d99f37a > Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0 > 10.10.100.5:6789 session established > > Able to check ceph health status from client node with client.admin > keyring.: > > user@clientA2:/etc/ceph$ ceph -s --name client.admin > cluster 66c5f31c-1756-47ce-889d-960e0d99f37a > health HEALTH_OK > monmap e6: 3 mons at > {siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0} > election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon > fsmap e34: 0/0/1 up > osdmap e1097: 19 osds: 19 up, 19 in > flags sortbitwise > pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects > 3998 MB used, 4704 GB / 4708 GB avail > 1286 active+clean According to this, you don't have an active MDS in your cluster. If it really is running, you'll need to figure out why it's not connecting. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph and Openstack
On Tue, Jun 14, 2016 at 8:15 AM, Fran Barrera wrote: > 2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store > glance_store._drivers.rbd.Store doesn't support updating dynamic storage > capabilities. Please overwrite 'update_capabilities' method of the store to > implement updating logics if needed. update_capabilities > /usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98 I don't think that is anything to worry about -- it looks like a TODO comment [1]. In fact, it doesn't appear like any store drivers implement that method. [1] https://github.com/openstack/glance_store/blob/stable/mitaka/glance_store/capabilities.py#L94 -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to select particular OSD to act as primary OSD.
Thanks for the reply shylesh, but the procedure is not working. In ceph.com it is mentioned that we can make particular osd as a primary osd by setting primary affinity weightage between 0-1. But it is not working. On 14 Jun 2016 16:15, "shylesh kumar" wrote: > Hi, > > I think you can edit the crush rule something like below > > rule another_replicated_ruleset { > ruleset 1 > type replicated > min_size 1 > max_size 10 > step take default > step take osd1 > step choose firstn 1 type osd > step emit > step take osd2 > step choose firstn 1 type osd > step emit > step take osd5 > step choose firstn 1 type osd > step emit > step take osd4 > step choose firstn 1 type osd > step emit > } > > and create pool using this rule. > > It might work , though I am not 100% sure. > > Thanks, > Shylesh > > On Tue, Jun 14, 2016 at 4:05 PM, Kanchana. P > wrote: > >> Hi, >> >> How to select particular OSD to act as primary OSD. >> I modified the ceph.conf file and added >> [mon] >> ... >> mon osd allow primary affinity = true >> Restarted ceph target, now primary affinity is set to true in all monitor >> nodes. >> Using the below commands set some weights to the osds. >> >> $ ceph osd primary-affinity osd.1 0.25 >> $ ceph osd primary-affinity osd.6 0.50 >> $ ceph osd primary-affinity osd.11 0.75 >> $ ceph osd primary-affinity osd.16 1 >> >> Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in >> order 16,11,6,1 >> Even after setting the primary affinity weight, it took osds in different >> order. >> Can we select the primary OSD, if so, how can we do that. Please let me >> know what I am missing here to set an OSD as a primary OSD. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Thanks & Regards > Shylesh Kumar M > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disk failures
Hi, bit rot is not "bit rot" per se - nothing is rotting on the drive platter. It occurs during reads (mostly, anyway), and it's random. You can happily read a block and get the correct data, then read it again and get garbage, then get correct data again. This could be caused by a worn out cell on SSD but firmwares look for than and rewrite it if the signal is attentuated too much. On spinners there are no cells to refresh so rewriting it doesn't help either. You can't really "look for" bit rot due to the reasons above, strong checksumming/hash verification during reads is the only solution. And trust me, bit rot is a very real thing and very dangerous as well - do you think companies like Seagate or WD would lie about bit rot if it's not real? I'd buy a drive with BER 10^999 over one with 10^14, wouldn't everyone? And it is especially dangerous when something like Ceph handles much larger blocks of data than the client does. While the client (or an app) has some knowledge of the data _and_ hopefully throws an error if it read garbage, Ceph will (if for example snapshots are used and FIEMAP is off) actually have to read the whole object (say 4MiB) and write it elsewhere, without any knowledge whether what it read (and wrote) made any sense to the app. This way corruption might spread silently into your backups if you don't validate the data somehow (or dump it from a database for example, where it's likely to get detected). Btw just because you think you haven't seen it doesn't mean you haven't seen it - never seen artefacting in movies? Just a random bug in the decoder, is it? VoD guys would tell you... For things like databases this is somewhat less impactful - bit rot doesn't "flip a bit" but affects larger blocks of data (like one sector), so databases usually catch this during read and err instead of returning garbage to the client. Jan > On 09 Jun 2016, at 09:16, Christian Balzer wrote: > > > Hello, > > On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote: > >> Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: >>> Ceph currently doesn't do any (relevant) checksumming at all, so if a >>> PRIMARY PG suffers from bit-rot this will be undetected until the next >>> deep-scrub. >>> >>> This is one of the longest and gravest outstanding issues with Ceph and >>> supposed to be addressed with bluestore (which currently doesn't have >>> checksum verified reads either). >> >> So if bit rot happens on primary PG, ceph is spreading the currupted data >> across the cluster? > No. > > You will want to re-read the Ceph docs and the countless posts here about > replication within Ceph works. > http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale > > A client write goes to the primary OSD/PG and will not be ACK'ed to the > client until is has reached all replica OSDs. > This happens while the data is in-flight (in RAM), it's not read from the > journal or filestore. > >> What would be sent to the replica, the original data or the saved one? >> >> When bit rot happens I'll have 1 corrupted object and 2 good. >> how do you manage this between deep scrubs? Which data would be used by >> ceph? I think that a bitrot on a huge VM block device could lead to a >> mess like the whole device corrupted >> VM affected by bitrot would be able to stay up and running? >> And bitrot on a qcow2 file? >> > Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run nor > on other systems here where I (can) actually check for it. > > As to how it would affect things, that very much depends. > > If it's something like a busy directory inode that gets corrupted, the data > in question will be in RAM (SLAB) and the next update will correct things. > > If it's a logfile, you're likely to never notice until deep-scrub detects > it eventually. > > This isn't a Ceph specific question, on all systems that aren't backed > by something like ZFS or BTRFS you're potentially vulnerable to this. > > Of course if you're that worried, you could always run BTRFS of ZFS inside > your VM and notice immediately when something goes wrong. > I personally wouldn't though, due to the performance penalties involved > (CoW). > > >> Let me try to explain: when writing to primary PG i have to write bit "1" >> Due to a bit rot, I'm saving "0". >> Would ceph read the wrote bit and spread that across the cluster (so it >> will spread "0") or spread the in memory value "1" ? >> >> What if the journal fails during a read or a write? > Again, you may want to get a deeper understanding of Ceph. > The journal isn't involved in reads. > >> Ceph is able to >> recover by removing that journal from the affected osd (and still >> running at lower speed) or should i use a raid1 on ssds used by journal ? >> > Neither, a journal failure is lethal for the OSD involved and unless you > have LOTS of money RAID1 SSDs are a waste. > > If you use DC level SSDs with sufficient endurance
[ceph-users] Ceph and Openstack
Hi all, I have a problem integration Glance with Ceph. Openstack Mitaka Ceph Jewel I've following the Ceph doc ( http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to list or create images, I have an error "Unable to establish connection to http://IP:9292/v2/images";, and in the debug mode I can see this: 2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store glance_store._drivers.rbd.Store doesn't support updating dynamic storage capabilities. Please overwrite 'update_capabilities' method of the store to implement updating logics if needed. update_capabilities /usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98 I've also tried to remove the database and populate again but the same error. Cinder with Ceph works correctly. Any suggestions? Thanks, Fran. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] local variable 'region_name' referenced before assignment
- Original Message - > From: "Parveen Sharma" > To: ceph-users@lists.ceph.com > Sent: Tuesday, June 14, 2016 2:34:27 PM > Subject: [ceph-users] local variable 'region_name' referenced before > assignment > > Hi, > > I'm getting "UnboundLocalError: local variable 'region_name' referenced > before assignment " error while placing an object in my earlier created > bucket using my RADOSGW with boto. > > > My package details: > > $ sudo rpm -qa | grep rados > librados2-10.2.1-0.el7.x86_64 > libradosstriper1-10.2.1-0.el7.x86_64 > python-rados-10.2.1-0.el7.x86_64 > ceph-radosgw-10.2.1-0.el7.x86_64 > $ > > $ python > Python 2.7.10 (default, Oct 23 2015, 19:19:21) > [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>> > import sys, boto > >>> boto.Version '2.40.0' >>> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a > workaround but it apparently not working for me or I am missing something. > > > > > $ cat ~/.boto > > [Credentials] > > aws_access_key_id = X > > aws_secret_access_key = YYY > > > > > [s3] > > > > > use-sigv4 = True > > $ > > $ > > $ cat s3test_for_placing_object_in_bucket.py > > import boto > > import boto.s3.connection > > > > > conn = boto.connect_s3( > > host = 'mc2', port = 7480, > > is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), > > ) > > > > #From > http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto > > bucket = conn.get_bucket('my-new-bucket') > > key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt') > > with open('myTestFileIn_my-new-bucket.txt') as f: > > key.send_file(f) > > $ > > $ > > $ python s3test_for_placing_object_in_bucket.py > > Traceback (most recent call last): > > File "s3test_for_placing_object_in_bucket.py", line 12, in > > bucket = conn.get_bucket('my-new-bucket') > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506, in > get_bucket > > return self.head_bucket(bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525, in > head_bucket > > response = self.make_request('HEAD', bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668, in > make_request > > retry_handler=retry_handler > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, in > make_request > > retry_handler=retry_handler) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in > _mexe > > request.authorize(connection=self) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in > authorize > > connection._auth_handler.add_auth(self, **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in add_auth > > **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in add_auth > > string_to_sign = self.string_to_sign(req, canonical_request) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in > string_to_sign > > sts.append(self.credential_scope(http_request)) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in > credential_scope > > region_name = self.determine_region_name(http_request.host) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in > determine_region_name > > return region_name > > UnboundLocalError: local variable 'region_name' referenced before assignment > > $ > > > > > - > > Parveen > You have to make that change in boto/auth.py. Please take a look at this: http://www.spinics.net/lists/ceph-devel/msg30612.html Shilpa > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to mount the CephFS file system fromclientnode with "mount error 5 = Input/output error"
Hi, On 06/14/2016 01:21 PM, Rakesh Parkiti wrote: Hello, Unable to mount the CephFS file system from client node with *"mount error 5 = Input/output error"* MDS was installed on a separate node. Ceph Cluster health is OK and mds services are running. firewall was disabled across all the nodes in a cluster. -- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1) -- Client Nodes - Ubuntu 14.04 LTS Admin Node: *[root@Admin ceph]# ceph mds stat* e34: 0/0/1 up *snipsnap* The MDS is not up and running. Otherwise the output should look like this: # ceph mds stat e190193: 1/1/1 up {0=XYZ=up:active} Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "mount error 5 = Input/output error" with the CephFS file system from client node
Hello, Unable to mount the CephFS file system from client node with "mount error 5 = Input/output error" MDS was installed on a separate node. Ceph Cluster health is OK and mds services are running. firewall was disabled across all the nodes in a cluster. -- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1) -- Client Nodes - Ubuntu 14.04 LTS Admin Node: [root@Admin ceph]# ceph mds stat e34: 0/0/1 up Client Side: user@clientA2:/etc/ceph$ ceph fs ls --name client.admin name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user /home/user/cephfs -o name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw== mount error 5 = Input/output error Connection Establishment was successful to monitor node. $tail -f /var/log/syslog Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid 66c5f31c-1756-47ce-889d-960e0d99f37a Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0 10.10.100.5:6789 session established Able to check ceph health status from client node with client.admin keyring.: user@clientA2:/etc/ceph$ ceph -s --name client.admin cluster 66c5f31c-1756-47ce-889d-960e0d99f37a health HEALTH_OK monmap e6: 3 mons at {siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0} election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon fsmap e34: 0/0/1 up osdmap e1097: 19 osds: 19 up, 19 in flags sortbitwise pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects 3998 MB used, 4704 GB / 4708 GB avail 1286 active+clean Can anyone please help with solution for above issue. Thanks Rakesh Parkiti ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unable to mount the CephFS file system from client node with "mount error 5 = Input/output error"
Hello, Unable to mount the CephFS file system from client node with "mount error 5 = Input/output error" MDS was installed on a separate node. Ceph Cluster health is OK and mds services are running. firewall was disabled across all the nodes in a cluster. -- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1) -- Client Nodes - Ubuntu 14.04 LTS Admin Node: [root@Admin ceph]# ceph mds stat e34: 0/0/1 up Client Side: user@clientA2:/etc/ceph$ ceph fs ls --name client.admin name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user /home/user/cephfs -o name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw== mount error 5 = Input/output error Connection Establishment was successful to monitor node. $tail -f /var/log/syslog Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid 66c5f31c-1756-47ce-889d-960e0d99f37a Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0 10.10.100.5:6789 session established Able to check ceph health status from client node with client.admin keyring.: user@clientA2:/etc/ceph$ ceph -s --name client.admin cluster 66c5f31c-1756-47ce-889d-960e0d99f37a health HEALTH_OK monmap e6: 3 mons at {siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0} election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon fsmap e34: 0/0/1 up osdmap e1097: 19 osds: 19 up, 19 in flags sortbitwise pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects 3998 MB used, 4704 GB / 4708 GB avail 1286 active+clean Can anyone please help with solution for above issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange cache tier behaviour with cephfs
The basic logic is that if a IO is not on the cache tier, then proxy it, which means do the IO direct on the base tier. The throttle is designed to minimise the latency impact of promotions and flushes. So yes during testing it will not promote everything, but during normal work loads it makes things much better. The defaults were chosen after benchmarks showed they were the turning point where performance started become effected. But yes I think there could be a better section on tuning the cache tier, but it's not an easy task as there are a lot of variables that can change depending on the hardware and work load. Sent from Nine From: Oliver Dzombic Sent: 14 Jun 2016 12:11 p.m. To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] strange cache tier behaviour with cephfs Hi, ok the write test also shows now a more expected behaviour. As it seems to me, if there is more writing than osd_tier_promote_max_bytes_sec the write's are going directly against the cold pool ( which is a really good behaviour ( seriously ) ). But that should be definitly added to the documentation. Otherwise (new) people have no chance to find that. The search engines show < 10 hits for "osd_tier_promote_max_bytes_sec" one of it in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007632.html which has a totally different topic. Anyway, super super big thanks for your time ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 14.06.2016 um 07:47 schrieb Nick Fisk: > osd_tier_promote_max_objects_sec > and > osd_tier_promote_max_bytes_sec > > is what you are looking for, I think by default its set to 5MB/s, which > would roughly correlate to why you are only seeing around 8 objects each > time being promoted. This was done like this as too many promotions hurt > performance, so you don't actually want to promote on every IO. > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Christian Balzer >> Sent: 14 June 2016 02:00 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs >> >> >> Hello, >> >> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: >> >>> Hi Christian, >>> >>> if i read a 1,5 GB file, which is not changing at all. >>> >>> Then i expect the agent to copy it one time from the cold pool to the >>> cache pool. >>> >> Before Jewel, that is what you would have seen, yes. >> >> Did you read what Sam wrote and me in reply to him? >> >>> In fact its every time making a new copy. >>> >> Is it? >> Is there 1.5GB of data copied into the cache tier each time? >> An object is 4MB, you only had 8 in your first run, then 16... >> >>> I can see that by increasing disc usage of the cache and the >>> increasing object number. >>> >>> And the non existing improvement of speed. >>> >> That could be down to your network or other factors on your client. >> >> Christian >> -- >> Christian Balzer Network/Systems Engineer >> ch...@gol.com Global OnLine Japan/Rakuten Communications >> http://www.gol.com/ >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to select particular OSD to act as primary OSD.
Hi, I think you can edit the crush rule something like below rule another_replicated_ruleset { ruleset 1 type replicated min_size 1 max_size 10 step take default step take osd1 step choose firstn 1 type osd step emit step take osd2 step choose firstn 1 type osd step emit step take osd5 step choose firstn 1 type osd step emit step take osd4 step choose firstn 1 type osd step emit } and create pool using this rule. It might work , though I am not 100% sure. Thanks, Shylesh On Tue, Jun 14, 2016 at 4:05 PM, Kanchana. P wrote: > Hi, > > How to select particular OSD to act as primary OSD. > I modified the ceph.conf file and added > [mon] > ... > mon osd allow primary affinity = true > Restarted ceph target, now primary affinity is set to true in all monitor > nodes. > Using the below commands set some weights to the osds. > > $ ceph osd primary-affinity osd.1 0.25 > $ ceph osd primary-affinity osd.6 0.50 > $ ceph osd primary-affinity osd.11 0.75 > $ ceph osd primary-affinity osd.16 1 > > Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in > order 16,11,6,1 > Even after setting the primary affinity weight, it took osds in different > order. > Can we select the primary OSD, if so, how can we do that. Please let me > know what I am missing here to set an OSD as a primary OSD. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Thanks & Regards Shylesh Kumar M ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to select particular OSD to act as primary OSD.
Hi, How to select particular OSD to act as primary OSD. I modified the ceph.conf file and added [mon] ... mon osd allow primary affinity = true Restarted ceph target, now primary affinity is set to true in all monitor nodes. Using the below commands set some weights to the osds. $ ceph osd primary-affinity osd.1 0.25 $ ceph osd primary-affinity osd.6 0.50 $ ceph osd primary-affinity osd.11 0.75 $ ceph osd primary-affinity osd.16 1 Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in order 16,11,6,1 Even after setting the primary affinity weight, it took osds in different order. Can we select the primary OSD, if so, how can we do that. Please let me know what I am missing here to set an OSD as a primary OSD. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Hi, your cluster will be in warning state if you disable scrubbing, and you relay need it in case of some data loss cheers, Ansgar 2016-06-14 11:05 GMT+02:00 Wido den Hollander : > >> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : >> >> >> Is it a good idea to disable scrub and deep-scrub for bucket.index >> pool? What negative consequences it may cause? >> > > No, I would not do that. Scrubbing is essential to detect (silent) data > corruption. > > You should really scrub all your data. > >> 2016-06-14 11:51 GMT+03:00 Wido den Hollander : >> > >> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >> >> : >> >> >> >> >> >> Hi, >> >> >> >> we are using ceph and radosGW to store images (~300kb each) in S3, >> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >> >> >> >> my questions is: >> >> >> >> in case of that amount of objects/files is it better to calculate the >> >> PGs on a object-bases instant of the volume size? and how it should be >> >> done? >> >> >> > >> > Do you have bucket sharding enabled? >> > >> > And how many objects do you have in a single bucket? >> > >> > If sharding is not enabled for the bucket index you might have large RADOS >> > objects with bucket indexes which are hard to scrub. >> > >> > Wido >> > >> >> thanks >> >> Ansgar >> >> ___ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Hi, yes we have index sharding enabled, we have only tow big buckets at the moment with 15Mil objects each and some smaller ones cheers, Ansgar 2016-06-14 10:51 GMT+02:00 Wido den Hollander : > >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >> : >> >> >> Hi, >> >> we are using ceph and radosGW to store images (~300kb each) in S3, >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >> >> my questions is: >> >> in case of that amount of objects/files is it better to calculate the >> PGs on a object-bases instant of the volume size? and how it should be >> done? >> > > Do you have bucket sharding enabled? > > And how many objects do you have in a single bucket? > > If sharding is not enabled for the bucket index you might have large RADOS > objects with bucket indexes which are hard to scrub. > > Wido > >> thanks >> Ansgar >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange cache tier behaviour with cephfs
Hi, ok the write test also shows now a more expected behaviour. As it seems to me, if there is more writing than osd_tier_promote_max_bytes_sec the write's are going directly against the cold pool ( which is a really good behaviour ( seriously ) ). But that should be definitly added to the documentation. Otherwise (new) people have no chance to find that. The search engines show < 10 hits for "osd_tier_promote_max_bytes_sec" one of it in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007632.html which has a totally different topic. Anyway, super super big thanks for your time ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 14.06.2016 um 07:47 schrieb Nick Fisk: > osd_tier_promote_max_objects_sec > and > osd_tier_promote_max_bytes_sec > > is what you are looking for, I think by default its set to 5MB/s, which > would roughly correlate to why you are only seeing around 8 objects each > time being promoted. This was done like this as too many promotions hurt > performance, so you don't actually want to promote on every IO. > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Christian Balzer >> Sent: 14 June 2016 02:00 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs >> >> >> Hello, >> >> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: >> >>> Hi Christian, >>> >>> if i read a 1,5 GB file, which is not changing at all. >>> >>> Then i expect the agent to copy it one time from the cold pool to the >>> cache pool. >>> >> Before Jewel, that is what you would have seen, yes. >> >> Did you read what Sam wrote and me in reply to him? >> >>> In fact its every time making a new copy. >>> >> Is it? >> Is there 1.5GB of data copied into the cache tier each time? >> An object is 4MB, you only had 8 in your first run, then 16... >> >>> I can see that by increasing disc usage of the cache and the >>> increasing object number. >>> >>> And the non existing improvement of speed. >>> >> That could be down to your network or other factors on your client. >> >> Christian >> -- >> Christian BalzerNetwork/Systems Engineer >> ch...@gol.comGlobal OnLine Japan/Rakuten Communications >> http://www.gol.com/ >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Hello, On Tue, 14 Jun 2016 12:20:44 +0300 Nmz wrote: > > > > - Original Message - > From: Wido den Hollander > To: Василий Ангапов > Date: Tuesday, June 14, 2016, 12:05:51 PM > Subject: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs > > > >> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : > >> > >> > >> Is it a good idea to disable scrub and deep-scrub for bucket.index > >> pool? What negative consequences it may cause? > >> > > > No, I would not do that. Scrubbing is essential to detect (silent) > > data corruption. > > > You should really scrub all your data. > > Ceph do not protect from silent data corruption at all. > > You can read this thread > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007680.html > While that is unfortunately very true, Ceph does at least allow you to detect it (after the fact) and if you're lucky it is a replica and not the primary object that's corrupted. So it's better than Ext4 or XFS, but worse than ZFS or BTRFS. Bluestore is supposed to address this, but currently lacks live checksums as well. Now with a storage that large (40million 300KB objects...) the statistical chances of bitrot do of course increase. I've run a cluster with a few TB of data for more than a year w/o deep scrubs and unsurprisingly nothing bad was found when I turned it back on. But your millage may vary, caveat emptor, etc. Christan > >> 2016-06-14 11:51 GMT+03:00 Wido den Hollander : > >> > > >> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski > >> >> : > >> >> > >> >> > >> >> Hi, > >> >> > >> >> we are using ceph and radosGW to store images (~300kb each) in S3, > >> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) > >> >> > >> >> my questions is: > >> >> > >> >> in case of that amount of objects/files is it better to calculate > >> >> the PGs on a object-bases instant of the volume size? and how it > >> >> should be done? > >> >> > >> > > >> > Do you have bucket sharding enabled? > >> > > >> > And how many objects do you have in a single bucket? > >> > > >> > If sharding is not enabled for the bucket index you might have > >> > large RADOS objects with bucket indexes which are hard to scrub. > >> > > >> > Wido > >> > > >> >> thanks > >> >> Ansgar > >> >> ___ > >> >> ceph-users mailing list > >> >> ceph-users@lists.ceph.com > >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] local variable 'region_name' referenced before assignment
I'm sending on personnel ID as my posts to ceph-users@lists.ceph.com are not reaching to the mailing list, though I've subscribed. On Tue, Jun 14, 2016 at 2:49 PM, Parveen Sharma wrote: > > Hi, > > I'm getting "UnboundLocalError: local variable 'region_name' referenced > before assignment" error while placing an object in my earlier created > bucket using my RADOSGW with boto. > > > My package details: > > $ sudo rpm -qa | grep rados > librados2-10.2.1-0.el7.x86_64 > libradosstriper1-10.2.1-0.el7.x86_64 > python-rados-10.2.1-0.el7.x86_64 > ceph-radosgw-10.2.1-0.el7.x86_64 > $ > > $ python > Python 2.7.10 (default, Oct 23 2015, 19:19:21) > [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin > Type "help", "copyright", "credits" or "license" for more information.>>> > import sys, boto > >>> boto.Version > '2.40.0'>>> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a > workaround but it apparently not working for me or I am missing something. > > > *$ cat ~/.boto * > > [Credentials] > > aws_access_key_id = X > > aws_secret_access_key = YYY > > > [s3] > > use-sigv4 = True > > $ > > $ > > *$ cat s3test_for_placing_object_in_bucket.py* > > import boto > > import boto.s3.connection > > > > conn = boto.connect_s3( > > host = 'mc2', port = 7480, > > is_secure=False, calling_format = > boto.s3.connection.OrdinaryCallingFormat(), > > ) > > #From > http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto > > bucket = conn.get_bucket('my-new-bucket') > > key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt') > > with open('myTestFileIn_my-new-bucket.txt') as f: > > key.send_file(f) > > $ > > $ > > *$ python s3test_for_placing_object_in_bucket.py* > > Traceback (most recent call last): > > File "s3test_for_placing_object_in_bucket.py", line 12, in > > bucket = conn.get_bucket('my-new-bucket') > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 506, in get_bucket > > return self.head_bucket(bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 525, in head_bucket > > response = self.make_request('HEAD', bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 668, in make_request > > retry_handler=retry_handler > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, > in make_request > > retry_handler=retry_handler) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, > in _mexe > > request.authorize(connection=self) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, > in authorize > > connection._auth_handler.add_auth(self, **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in > add_auth > > **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in > add_auth > > string_to_sign = self.string_to_sign(req, canonical_request) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in > string_to_sign > > sts.append(self.credential_scope(http_request)) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in > credential_scope > > region_name = self.determine_region_name(http_request.host) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in > determine_region_name > > return region_name > > *UnboundLocalError: local variable 'region_name' referenced before > assignment* > > $ > > > - > > Parveen > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange cache tier behaviour with cephfs
Hi, wow. After setting in ceph.conf and restarting the whole cluster this: osd tier promote max bytes sec = 1610612736 osd tier promote max objects sec = 2 And repeating the test, the cache pool got the full 11 GB of the test file with 2560 objects copied from the cold pool. Aaand, repeating the test multiple times showed, that each time, there is some movement within the cache pool WITHOUT copy from the cold pool. So it shifts some MB within the cache pool from one OSD to another. So its for example changing from: /dev/sde1 234315556 2559404 231756152 2% /var/lib/ceph/osd/ceph-0 /dev/sdf1 234315556 2848300 231467256 2% /var/lib/ceph/osd/ceph-1 /dev/sdi1 234315556 2820596 231494960 2% /var/lib/ceph/osd/ceph-2 /dev/sdj1 234315556 2712796 231602760 2% /var/lib/ceph/osd/ceph-3 to /dev/sde1 234315556 2670360 231645196 2% /var/lib/ceph/osd/ceph-0 /dev/sdf1 234315556 2951116 231364440 2% /var/lib/ceph/osd/ceph-1 /dev/sdi1 234315556 2903000 231412556 2% /var/lib/ceph/osd/ceph-2 /dev/sdj1 234315556 2831992 231483564 2% /var/lib/ceph/osd/ceph-3 So around 400 MB has been shifted inside cache pool ( why ever ). The numbers of objects is stable and not changed. The Speed is going from ~ 100 MB/s up to ~ 170 MB/s which is close to the network maximum considering the client is busy too. So this hidden and undocumentent config option changed the behaviour to the, according to the documentation, expected behaviour. Thank you very much for this hint ! I will repeat now all the testing. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 14.06.2016 um 07:47 schrieb Nick Fisk: > osd_tier_promote_max_objects_sec > and > osd_tier_promote_max_bytes_sec > > is what you are looking for, I think by default its set to 5MB/s, which > would roughly correlate to why you are only seeing around 8 objects each > time being promoted. This was done like this as too many promotions hurt > performance, so you don't actually want to promote on every IO. > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Christian Balzer >> Sent: 14 June 2016 02:00 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs >> >> >> Hello, >> >> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: >> >>> Hi Christian, >>> >>> if i read a 1,5 GB file, which is not changing at all. >>> >>> Then i expect the agent to copy it one time from the cold pool to the >>> cache pool. >>> >> Before Jewel, that is what you would have seen, yes. >> >> Did you read what Sam wrote and me in reply to him? >> >>> In fact its every time making a new copy. >>> >> Is it? >> Is there 1.5GB of data copied into the cache tier each time? >> An object is 4MB, you only had 8 in your first run, then 16... >> >>> I can see that by increasing disc usage of the cache and the >>> increasing object number. >>> >>> And the non existing improvement of speed. >>> >> That could be down to your network or other factors on your client. >> >> Christian >> -- >> Christian BalzerNetwork/Systems Engineer >> ch...@gol.comGlobal OnLine Japan/Rakuten Communications >> http://www.gol.com/ >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
- Original Message - From: Wido den Hollander To: Василий Ангапов Date: Tuesday, June 14, 2016, 12:05:51 PM Subject: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs >> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : >> >> >> Is it a good idea to disable scrub and deep-scrub for bucket.index >> pool? What negative consequences it may cause? >> > No, I would not do that. Scrubbing is essential to detect (silent) data > corruption. > You should really scrub all your data. Ceph do not protect from silent data corruption at all. You can read this thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007680.html >> 2016-06-14 11:51 GMT+03:00 Wido den Hollander : >> > >> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >> >> : >> >> >> >> >> >> Hi, >> >> >> >> we are using ceph and radosGW to store images (~300kb each) in S3, >> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >> >> >> >> my questions is: >> >> >> >> in case of that amount of objects/files is it better to calculate the >> >> PGs on a object-bases instant of the volume size? and how it should be >> >> done? >> >> >> > >> > Do you have bucket sharding enabled? >> > >> > And how many objects do you have in a single bucket? >> > >> > If sharding is not enabled for the bucket index you might have large RADOS >> > objects with bucket indexes which are hard to scrub. >> > >> > Wido >> > >> >> thanks >> >> Ansgar >> >> ___ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] local variable 'region_name' referenced before assignment
Hi, I'm getting "UnboundLocalError: local variable 'region_name' referenced before assignment" error while placing an object in my earlier created bucket using my RADOSGW with boto. My package details: $ sudo rpm -qa | grep rados librados2-10.2.1-0.el7.x86_64 libradosstriper1-10.2.1-0.el7.x86_64 python-rados-10.2.1-0.el7.x86_64 ceph-radosgw-10.2.1-0.el7.x86_64 $ $ python Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Type "help", "copyright", "credits" or "license" for more information.>>> import sys, boto >>> boto.Version '2.40.0'>>> https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a workaround but it apparently not working for me or I am missing something. *$ cat ~/.boto * [Credentials] aws_access_key_id = X aws_secret_access_key = YYY [s3] use-sigv4 = True $ $ *$ cat s3test_for_placing_object_in_bucket.py* import boto import boto.s3.connection conn = boto.connect_s3( host = 'mc2', port = 7480, is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) #From http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto bucket = conn.get_bucket('my-new-bucket') key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt') with open('myTestFileIn_my-new-bucket.txt') as f: key.send_file(f) $ $ *$ python s3test_for_placing_object_in_bucket.py* Traceback (most recent call last): File "s3test_for_placing_object_in_bucket.py", line 12, in bucket = conn.get_bucket('my-new-bucket') File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506, in get_bucket return self.head_bucket(bucket_name, headers=headers) File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525, in head_bucket response = self.make_request('HEAD', bucket_name, headers=headers) File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668, in make_request retry_handler=retry_handler File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, in make_request retry_handler=retry_handler) File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in _mexe request.authorize(connection=self) File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in authorize connection._auth_handler.add_auth(self, **kwargs) File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in add_auth **kwargs) File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in add_auth string_to_sign = self.string_to_sign(req, canonical_request) File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in string_to_sign sts.append(self.credential_scope(http_request)) File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in credential_scope region_name = self.determine_region_name(http_request.host) File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in determine_region_name return region_name *UnboundLocalError: local variable 'region_name' referenced before assignment* $ - Parveen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue installing ceph with ceph-deploy
Hi, Thanks to both of you, finally the problem was fixed deleting everything and the user ceph and install again as George commented. Best Regards, Fran. 2016-06-13 17:41 GMT+02:00 Tu Holmes : > I have seen this. > > Just stop ceph and kill any ssh processes related to it. > > I had the same issue, and the fix for me was to enable root login, ssh to > the node as root and run the env DEBIAN_FRONTEND=noninteractive > DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends > install -o Dpkg::Options::=--force-confnew ceph ceph-mds radosgw as root > after the ceph-deploy fails. > > This worked for me. > > -Tu > > > > On Mon, Jun 13, 2016 at 6:18 AM George Shuklin > wrote: > >> I believe this is the source of issues (cited line). >> >> Purge all ceph packages from this node and remove user/group 'ceph', >> than retry. >> >> On 06/13/2016 02:46 PM, Fran Barrera wrote: >> > [ceph-admin][WARNIN] usermod: user ceph is currently used by process >> 1303 >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Wido, can you please give more details about that? What sort of corruption may occur? What scrubbing actually does especially for bucket index pool? 2016-06-14 12:05 GMT+03:00 Wido den Hollander : > >> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : >> >> >> Is it a good idea to disable scrub and deep-scrub for bucket.index >> pool? What negative consequences it may cause? >> > > No, I would not do that. Scrubbing is essential to detect (silent) data > corruption. > > You should really scrub all your data. > >> 2016-06-14 11:51 GMT+03:00 Wido den Hollander : >> > >> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >> >> : >> >> >> >> >> >> Hi, >> >> >> >> we are using ceph and radosGW to store images (~300kb each) in S3, >> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >> >> >> >> my questions is: >> >> >> >> in case of that amount of objects/files is it better to calculate the >> >> PGs on a object-bases instant of the volume size? and how it should be >> >> done? >> >> >> > >> > Do you have bucket sharding enabled? >> > >> > And how many objects do you have in a single bucket? >> > >> > If sharding is not enabled for the bucket index you might have large RADOS >> > objects with bucket indexes which are hard to scrub. >> > >> > Wido >> > >> >> thanks >> >> Ansgar >> >> ___ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
> Op 14 juni 2016 om 11:00 schreef Василий Ангапов : > > > Is it a good idea to disable scrub and deep-scrub for bucket.index > pool? What negative consequences it may cause? > No, I would not do that. Scrubbing is essential to detect (silent) data corruption. You should really scrub all your data. > 2016-06-14 11:51 GMT+03:00 Wido den Hollander : > > > >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski > >> : > >> > >> > >> Hi, > >> > >> we are using ceph and radosGW to store images (~300kb each) in S3, > >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) > >> > >> my questions is: > >> > >> in case of that amount of objects/files is it better to calculate the > >> PGs on a object-bases instant of the volume size? and how it should be > >> done? > >> > > > > Do you have bucket sharding enabled? > > > > And how many objects do you have in a single bucket? > > > > If sharding is not enabled for the bucket index you might have large RADOS > > objects with bucket indexes which are hard to scrub. > > > > Wido > > > >> thanks > >> Ansgar > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] local variable 'region_name' referenced before assignment
Hi, I'm getting "UnboundLocalError: local variable 'region_name' referenced before assignment" error while placing an object in my earlier created bucket using my RADOSGW with boto. My package details: $ sudo rpm -qa | grep rados librados2-10.2.1-0.el7.x86_64 libradosstriper1-10.2.1-0.el7.x86_64 python-rados-10.2.1-0.el7.x86_64 ceph-radosgw-10.2.1-0.el7.x86_64 $ $ python Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Type "help", "copyright", "credits" or "license" for more information.>>> import sys, boto >>> boto.Version '2.40.0'>>> https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a workaround but it apparently not working for me or I am missing something. *$ cat ~/.boto * [Credentials] aws_access_key_id = X aws_secret_access_key = YYY [s3] use-sigv4 = True $ $ *$ cat s3test_for_placing_object_in_bucket.py* import boto import boto.s3.connection conn = boto.connect_s3( host = 'mc2', port = 7480, is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) #From http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto bucket = conn.get_bucket('my-new-bucket') key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt') with open('myTestFileIn_my-new-bucket.txt') as f: key.send_file(f) $ $ *$ python s3test_for_placing_object_in_bucket.py* Traceback (most recent call last): File "s3test_for_placing_object_in_bucket.py", line 12, in bucket = conn.get_bucket('my-new-bucket') File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506, in get_bucket return self.head_bucket(bucket_name, headers=headers) File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525, in head_bucket response = self.make_request('HEAD', bucket_name, headers=headers) File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668, in make_request retry_handler=retry_handler File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, in make_request retry_handler=retry_handler) File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in _mexe request.authorize(connection=self) File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in authorize connection._auth_handler.add_auth(self, **kwargs) File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in add_auth **kwargs) File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in add_auth string_to_sign = self.string_to_sign(req, canonical_request) File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in string_to_sign sts.append(self.credential_scope(http_request)) File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in credential_scope region_name = self.determine_region_name(http_request.host) File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in determine_region_name return region_name *UnboundLocalError: local variable 'region_name' referenced before assignment* $ - Parveen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Is it a good idea to disable scrub and deep-scrub for bucket.index pool? What negative consequences it may cause? 2016-06-14 11:51 GMT+03:00 Wido den Hollander : > >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski >> : >> >> >> Hi, >> >> we are using ceph and radosGW to store images (~300kb each) in S3, >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...) >> >> my questions is: >> >> in case of that amount of objects/files is it better to calculate the >> PGs on a object-bases instant of the volume size? and how it should be >> done? >> > > Do you have bucket sharding enabled? > > And how many objects do you have in a single bucket? > > If sharding is not enabled for the bucket index you might have large RADOS > objects with bucket indexes which are hard to scrub. > > Wido > >> thanks >> Ansgar >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange cache tier behaviour with cephfs
Hi, ok lets make it step by step: before `dd if=file of=/dev/zero` [root@cephmon1 ~]# rados -p ssd_cache cache-flush-evict-all -> Moving all away [root@cephmon1 ~]# rados -p ssd_cache ls [root@cephmon1 ~]# -> empty cache osds at that point: /dev/sde1 234315556 84368 234231188 1% /var/lib/ceph/osd/ceph-0 /dev/sdf1 234315556106716 234208840 1% /var/lib/ceph/osd/ceph-1 /dev/sdi1 234315556 97132 234218424 1% /var/lib/ceph/osd/ceph-2 /dev/sdj1 234315556 87584 234227972 1% /var/lib/ceph/osd/ceph-3 /dev/sde1 234315556 90252 234225304 1% /var/lib/ceph/osd/ceph-8 /dev/sdf1 234315556107424 234208132 1% /var/lib/ceph/osd/ceph-9 /dev/sdi1 234315556378104 233937452 1% /var/lib/ceph/osd/ceph-10 /dev/sdj1 234315556 94856 234220700 1% /var/lib/ceph/osd/ceph-11 Now we run the dd. 20971520+0 records in 20971520+0 records out 10737418240 bytes (11 GB) copied, 85.6032 s, 125 MB/s [root@cephmon1 ~]# rados -p ssd_cache ls | wc -l 40 /dev/sde1 234315556624896 233690660 1% /var/lib/ceph/osd/ceph-0 /dev/sdf1 234315556643200 233672356 1% /var/lib/ceph/osd/ceph-1 /dev/sdi1 234315556596744 233718812 1% /var/lib/ceph/osd/ceph-2 /dev/sdj1 234315556615868 233699688 1% /var/lib/ceph/osd/ceph-3 /dev/sde1 234315556573496 233742060 1% /var/lib/ceph/osd/ceph-8 /dev/sdf1 234315556570240 233745316 1% /var/lib/ceph/osd/ceph-9 /dev/sdi1 234315556624032 233691524 1% /var/lib/ceph/osd/ceph-10 /dev/sdj1 234315556627216 233688340 1% /var/lib/ceph/osd/ceph-11 So we were going from ~ 1 GB to ~ 4 GB. ( of a 11 GB file ). So 3 GB are copied from the cold pool to cache pool. So i assume 3 GB had, maybe, served from the cache pool, and the other 8 GB had been served from the cold storage. According to the docu it says for the writeback mode: " When a Ceph client needs data that resides in the storage tier, the cache tiering agent migrates the data to the cache tier on read, then it is sent to the Ceph client. " This is obviously not happening there. And the question is why. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 14.06.2016 um 03:00 schrieb Christian Balzer: > > Hello, > > On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: > >> Hi Christian, >> >> if i read a 1,5 GB file, which is not changing at all. >> >> Then i expect the agent to copy it one time from the cold pool to the >> cache pool. >> > Before Jewel, that is what you would have seen, yes. > > Did you read what Sam wrote and me in reply to him? > >> In fact its every time making a new copy. >> > Is it? > Is there 1.5GB of data copied into the cache tier each time? > An object is 4MB, you only had 8 in your first run, then 16... > >> I can see that by increasing disc usage of the cache and the increasing >> object number. >> >> And the non existing improvement of speed. >> > That could be down to your network or other factors on your client. > > Christian > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski > : > > > Hi, > > we are using ceph and radosGW to store images (~300kb each) in S3, > when in comes to deep-scrubbing we facing task timeouts (> 30s ...) > > my questions is: > > in case of that amount of objects/files is it better to calculate the > PGs on a object-bases instant of the volume size? and how it should be > done? > Do you have bucket sharding enabled? And how many objects do you have in a single bucket? If sharding is not enabled for the bucket index you might have large RADOS objects with bucket indexes which are hard to scrub. Wido > thanks > Ansgar > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] tier pool 'ssdpool' has snapshot state; it cannot be added as a tier without breaking the pool.
Hi, All i have make a sas pool and a ssd pool. then run "ceph osd tier add ssdpool saspool", it says: tier pool 'ssdpool' has snapshot state; it cannot be added as a tier without breaking the pool. anyone who had hit the case? what can i do? and, "ceph osd pool" has "mksnap" & "rmsnap" but no "list snap" option. so, how could i know snap details of a pool? Regards, XiuCai___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu Trusty: kernel 3.13 vs kernel 4.2
One storage setups has exhibited extremely poor performance in my lab on 4.2 kernel (mdraid1+lvm+nfs), others run fine. No problems with xenial so far. If I had to choose a LTS kernel for trusty I'd choose the xenial one. (Btw I think newest trusty point release already has the 4.2 HWE stack by default, not sure if 3.13 is supported? I usually just upgrade) Jan > On 14 Jun 2016, at 09:45, magicb...@hotmail.com wrote: > > Hi list, > > is there any opinion/recommendation regarding the ubuntu trusty available > kernels and Ceph(hammer, xfs)? > Does kernel 4.2 worth installing from Ceph(hammer, xfs) perspective? > > Thanks :) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
Hi, we are using ceph and radosGW to store images (~300kb each) in S3, when in comes to deep-scrubbing we facing task timeouts (> 30s ...) my questions is: in case of that amount of objects/files is it better to calculate the PGs on a object-bases instant of the volume size? and how it should be done? thanks Ansgar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ubuntu Trusty: kernel 3.13 vs kernel 4.2
Hi list, is there any opinion/recommendation regarding the ubuntu trusty available kernels and Ceph(hammer, xfs)? Does kernel 4.2 worth installing from Ceph(hammer, xfs) perspective? Thanks :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] UnboundLocalError: local variable 'region_name' referenced before assignment
Any help for me as well, please. :) - Parveen On Tue, Jun 14, 2016 at 11:55 AM, Parveen Sharma wrote: > Hi, > > I'm getting "UnboundLocalError: local variable 'region_name' referenced > before assignment" error while placing an object in my earlier created > bucket using my RADOSGW with boto. > > > My package details: > > $ sudo rpm -qa | grep rados > librados2-10.2.1-0.el7.x86_64 > libradosstriper1-10.2.1-0.el7.x86_64 > python-rados-10.2.1-0.el7.x86_64 > ceph-radosgw-10.2.1-0.el7.x86_64 > $ > > $ python > Python 2.7.10 (default, Oct 23 2015, 19:19:21) > [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin > Type "help", "copyright", "credits" or "license" for more information.>>> > import sys, boto > >>> boto.Version > '2.40.0'>>> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a > workaround but it apparently not working for me or I am missing something. > > > *$ cat ~/.boto * > > [Credentials] > > aws_access_key_id = AKIAI6KEIQSY2746KS5Q > > aws_secret_access_key = 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 > > > [s3] > > use-sigv4 = True > > $ > > $ > > *$ cat s3test_for_placing_object_in_bucket.py* > > import boto > > import boto.s3.connection > > > > conn = boto.connect_s3( > > host = 'mc2', port = 7480, > > is_secure=False, calling_format = > boto.s3.connection.OrdinaryCallingFormat(), > > ) > > #From > http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto > > bucket = conn.get_bucket('my-new-bucket') > > key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt') > > with open('myTestFileIn_my-new-bucket.txt') as f: > > key.send_file(f) > > $ > > $ > > *$ python s3test_for_placing_object_in_bucket.py* > > Traceback (most recent call last): > > File "s3test_for_placing_object_in_bucket.py", line 12, in > > bucket = conn.get_bucket('my-new-bucket') > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 506, in get_bucket > > return self.head_bucket(bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 525, in head_bucket > > response = self.make_request('HEAD', bucket_name, headers=headers) > > File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line > 668, in make_request > > retry_handler=retry_handler > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, > in make_request > > retry_handler=retry_handler) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, > in _mexe > > request.authorize(connection=self) > > File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, > in authorize > > connection._auth_handler.add_auth(self, **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in > add_auth > > **kwargs) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in > add_auth > > string_to_sign = self.string_to_sign(req, canonical_request) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in > string_to_sign > > sts.append(self.credential_scope(http_request)) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in > credential_scope > > region_name = self.determine_region_name(http_request.host) > > File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in > determine_region_name > > return region_name > > UnboundLocalError: local variable 'region_name' referenced before > assignment > > $ > > > > > - > > Parveen > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange unfounding of PGs
Yes! After i read the mail i unsetted it immadiately, and now the recovery process started to continue. After i switched back my off keeped OSD ceph founded the unfounded objects, and now the recovery process runs. Thanks Nick and Christian, you saved me! :) Christian Balzer ezt írta (időpont: 2016. jún. 14., K, 9:24): > On Tue, 14 Jun 2016 07:09:45 + Csaba Tóth wrote: > > > Hi Nick! > > Yes i did. :( > > Do you know how can i fix it? > > > > > Supposedly just by un-setting it: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29651.html > > Christian > > > Nick Fisk ezt írta (időpont: 2016. jún. 14., K, 7:52): > > > > > Did you enable the sortbitwise flag as per the upgrade instructions, as > > > there is a known bug with it? I don't know why these instructions > > > haven't been amended in light of this bug. > > > > > > http://tracker.ceph.com/issues/16113 > > > > > > > > > > > > > -Original Message- > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > > > > Behalf Of Csaba Tóth > > > > Sent: 13 June 2016 16:17 > > > > To: ceph-us...@ceph.com > > > > Subject: [ceph-users] strange unfounding of PGs > > > > > > > > Hi! > > > > > > > > I have a soo strange problem. At friday night i upgraded my small > > > > ceph > > > cluster > > > > from hammer to jewel. Everything went so well, but the chowning of > > > > osd datadir took a lot time, so i skipped two osd and do the > > > > run-as-root > > > trick. > > > > Yesterday evening i wanted to fix this, shutted down the first OSD > > > > and chowned the lib/ceph dir. But when i started it back a lot of > > > > strange pg > > > not > > > > found error happened (this is just a small list): > > > > > > > > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and > > > > apparently lost > > > > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and > > > > apparently lost > > > > > > > > > > > > After this error messages i see this ceph health: > > > > 2016-06-12 23:44:10.498613 7f5941e0f700 0 log_channel(cluster) log > > > [INF] : > > > > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5 >
Re: [ceph-users] strange unfounding of PGs
On Tue, 14 Jun 2016 07:09:45 + Csaba Tóth wrote: > Hi Nick! > Yes i did. :( > Do you know how can i fix it? > > Supposedly just by un-setting it: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29651.html Christian > Nick Fisk ezt írta (időpont: 2016. jún. 14., K, 7:52): > > > Did you enable the sortbitwise flag as per the upgrade instructions, as > > there is a known bug with it? I don't know why these instructions > > haven't been amended in light of this bug. > > > > http://tracker.ceph.com/issues/16113 > > > > > > > > > -Original Message- > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > > > Behalf Of Csaba Tóth > > > Sent: 13 June 2016 16:17 > > > To: ceph-us...@ceph.com > > > Subject: [ceph-users] strange unfounding of PGs > > > > > > Hi! > > > > > > I have a soo strange problem. At friday night i upgraded my small > > > ceph > > cluster > > > from hammer to jewel. Everything went so well, but the chowning of > > > osd datadir took a lot time, so i skipped two osd and do the > > > run-as-root > > trick. > > > Yesterday evening i wanted to fix this, shutted down the first OSD > > > and chowned the lib/ceph dir. But when i started it back a lot of > > > strange pg > > not > > > found error happened (this is just a small list): > > > > > > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and > > > apparently lost > > > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and > > > apparently lost > > > > > > > > > After this error messages i see this ceph health: > > > 2016-06-12 23:44:10.498613 7f5941e0f700 0 log_channel(cluster) log > > [INF] : > > > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5 > > > active+remapped+wait_backfill, 167 active+recovery_wait+degraded, 1 > > > active+remapped, 1 active+recovering+degraded, 13 > > > active+undersized+degraded+remapped+wait_backfill, 595 active+clean; > > > 795 GB data, 1926 GB used, 5512 GB / 7438 GB avail; 7695 B/s wr, 2 > > > op/s; 24459/3225218 objects degraded (0.758%); 44435/3225218 objects > > > misplaced (1.378%); 346/1231022 unfound (0.028%) > > > > > > Some minutes later it stalled in this state: > > > 2016-06-13 00:07:32.761265 7f5941e0f700 0 log_channel(cluster) log > >
Re: [ceph-users] strange unfounding of PGs
Hi Nick! Yes i did. :( Do you know how can i fix it? Nick Fisk ezt írta (időpont: 2016. jún. 14., K, 7:52): > Did you enable the sortbitwise flag as per the upgrade instructions, as > there is a known bug with it? I don't know why these instructions haven't > been amended in light of this bug. > > http://tracker.ceph.com/issues/16113 > > > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Csaba Tóth > > Sent: 13 June 2016 16:17 > > To: ceph-us...@ceph.com > > Subject: [ceph-users] strange unfounding of PGs > > > > Hi! > > > > I have a soo strange problem. At friday night i upgraded my small ceph > cluster > > from hammer to jewel. Everything went so well, but the chowning of osd > > datadir took a lot time, so i skipped two osd and do the run-as-root > trick. > > Yesterday evening i wanted to fix this, shutted down the first OSD and > > chowned the lib/ceph dir. But when i started it back a lot of strange pg > not > > found error happened (this is just a small list): > > > > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and > > apparently lost > > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and > > apparently lost > > > > > > After this error messages i see this ceph health: > > 2016-06-12 23:44:10.498613 7f5941e0f700 0 log_channel(cluster) log > [INF] : > > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5 > > active+remapped+wait_backfill, 167 active+recovery_wait+degraded, 1 > > active+remapped, 1 active+recovering+degraded, 13 > > active+undersized+degraded+remapped+wait_backfill, 595 active+clean; 795 > > GB data, 1926 GB used, 5512 GB / 7438 GB avail; 7695 B/s wr, 2 op/s; > > 24459/3225218 objects degraded (0.758%); 44435/3225218 objects misplaced > > (1.378%); 346/1231022 unfound (0.028%) > > > > Some minutes later it stalled in this state: > > 2016-06-13 00:07:32.761265 7f5941e0f700 0 log_channel(cluster) log > [INF] : > > pgmap v23123311: 820 pgs: 1 > > active+recovery_wait+undersized+degraded+remapped, 1 > > active+recovering+degraded, 11 > > active+undersized+degraded+remapped+wait_backfill, 5 > > active+remapped+wait_backfill, 207 active+recovery_wait+degraded, 595 > > active+clean; 795 GB data, 1878 GB used, 5559 GB / 7438 GB avail; 14164 > B/s > > wr, 3 op/s; 22562/3223912 objects degraded (0.700%); 3873
Re: [ceph-users] strange cache tier behaviour with cephfs
Hello, On Tue, 14 Jun 2016 06:47:03 +0100 Nick Fisk wrote: > osd_tier_promote_max_objects_sec > and > osd_tier_promote_max_bytes_sec > Right, I remember those from February and May. And I'm not asking for this feature, but personally I would have split that in read and write promotes. As in, throttle promotes done to satisfy reads, but not for writes (as that will benefit from the faster pool a lot more). > is what you are looking for, I think by default its set to 5MB/s, which > would roughly correlate to why you are only seeing around 8 objects each > time being promoted. This was done like this as too many promotions hurt > performance, so you don't actually want to promote on every IO. > Well, I do, but yeah. Obviously the defaults were picked to be on the safe side of things, though anybody running a cache tier worth its salt will be able to handle more than 5MB/s. But never mind that, since these parameters are not documented on the cache-tiering documentation page new users like Oliver will get unexpected results. And existing cache-tier users will be rudely surprised, as this isn't mentioned in the changelog either... Christian > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > > Of Christian Balzer > > Sent: 14 June 2016 02:00 > > To: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] strange cache tier behaviour with cephfs > > > > > > Hello, > > > > On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: > > > > > Hi Christian, > > > > > > if i read a 1,5 GB file, which is not changing at all. > > > > > > Then i expect the agent to copy it one time from the cold pool to the > > > cache pool. > > > > > Before Jewel, that is what you would have seen, yes. > > > > Did you read what Sam wrote and me in reply to him? > > > > > In fact its every time making a new copy. > > > > > Is it? > > Is there 1.5GB of data copied into the cache tier each time? > > An object is 4MB, you only had 8 in your first run, then 16... > > > > > I can see that by increasing disc usage of the cache and the > > > increasing object number. > > > > > > And the non existing improvement of speed. > > > > > That could be down to your network or other factors on your client. > > > > Christian > > -- > > Christian BalzerNetwork/Systems Engineer > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com