date:20160614

Re: [ceph-users] Disk failures

2016-06-14 Thread Gandalf Corvotempesta

Il 15 giu 2016 03:27, "Christian Balzer"  ha scritto:
> And that makes deep-scrubbing something of quite limited value.

This is not true.
If you checksum *before* writing to disk (so when data is still in ram)
then when reading back from disk you could do the checksum verification and
if doesn't match you can heal from the other nodes

Obviously you have to replicate directly from ram when bitrot couldn't
happen.
if you write to disk and then replicate the wrote data you could replicate
a rotted value.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-14 Thread Bill Sharer

This is why I use btrfs mirror sets underneath ceph and hopefully more 
than make up for the space loss by going with 2 replicas instead of 3 
and on the fly lzo compression.  The ceph deep scrubs replace any need 
for btrfs scrubs, but I still get the benefit of self healing when btrfs 
finds bit rot.


The only errors I've run into are from hard shutdowns and possible ecc 
errors due to working with consumer hardware and memory.  I've been on 
top of btrfs using gentoo since Firefly.


Bill Sharer


On 06/14/2016 09:27 PM, Christian Balzer wrote:

Hello,

On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote:


Hi,
bit rot is not "bit rot" per se - nothing is rotting on the drive
platter.

Never mind that I used the wrong terminology (according to Wiki) and that
my long experience with "laser-rot" probably caused me to choose that
term, there are data degradation scenarios that are caused by
undetected media failures or by the corruption happening in the write
path, thus making them quite reproducible.


It occurs during reads (mostly, anyway), and it's random. You
can happily read a block and get the correct data, then read it again
and get garbage, then get correct data again. This could be caused by a
worn out cell on SSD but firmwares look for than and rewrite it if the
signal is attentuated too much. On spinners there are no cells to
refresh so rewriting it doesn't help either.

You can't really "look for" bit rot due to the reasons above, strong
checksumming/hash verification during reads is the only solution.


Which is what I've been saying in the mail below and for years on this ML.

And that makes deep-scrubbing something of quite limited value.

Christian

And trust me, bit rot is a very real thing and very dangerous as well -
do you think companies like Seagate or WD would lie about bit rot if
it's not real? I'd buy a drive with BER 10^999 over one with 10^14,
wouldn't everyone? And it is especially dangerous when something like
Ceph handles much larger blocks of data than the client does. While the
client (or an app) has some knowledge of the data _and_ hopefully throws
an error if it read garbage, Ceph will (if for example snapshots are
used and FIEMAP is off) actually have to read the whole object (say
4MiB) and write it elsewhere, without any knowledge whether what it read
(and wrote) made any sense to the app. This way corruption might spread
silently into your backups if you don't validate the data somehow (or
dump it from a database for example, where it's likely to get detected).

Btw just because you think you haven't seen it doesn't mean you haven't
seen it - never seen artefacting in movies? Just a random bug in the
decoder, is it? VoD guys would tell you...

For things like databases this is somewhat less impactful - bit rot
doesn't "flip a bit" but affects larger blocks of data (like one
sector), so databases usually catch this during read and err instead of
returning garbage to the client.

Jan




On 09 Jun 2016, at 09:16, Christian Balzer  wrote:


Hello,

On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote:


Il 09 giu 2016 02:09, "Christian Balzer"  ha scritto:

Ceph currently doesn't do any (relevant) checksumming at all, so if a
PRIMARY PG suffers from bit-rot this will be undetected until the
next deep-scrub.

This is one of the longest and gravest outstanding issues with Ceph
and supposed to be addressed with bluestore (which currently doesn't
have checksum verified reads either).

So if bit rot happens on primary PG, ceph is spreading the currupted
data across the cluster?

No.

You will want to re-read the Ceph docs and the countless posts here
about replication within Ceph works.
http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale

A client write goes to the primary OSD/PG and will not be ACK'ed to the
client until is has reached all replica OSDs.
This happens while the data is in-flight (in RAM), it's not read from
the journal or filestore.


What would be sent to the replica,  the original data or the saved
one?

When bit rot happens I'll have 1 corrupted object and 2 good.
how do you manage this between deep scrubs?  Which data would be used
by ceph? I think that a bitrot on a huge VM block device could lead
to a mess like the whole device corrupted
VM affected by bitrot would be able to stay up and running?
And bitrot on a qcow2 file?


Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run
nor on other systems here where I (can) actually check for it.

As to how it would affect things, that very much depends.

If it's something like a busy directory inode that gets corrupted, the
data in question will be in RAM (SLAB) and the next update  will
correct things.

If it's a logfile, you're likely to never notice until deep-scrub
detects it eventually.

This isn't a  Ceph specific question, on all systems that aren't backed
by something like ZFS or BTRFS you're potentially vulnerable to this.

Of course if you're that

Re: [ceph-users] striping for a small cluster

2016-06-14 Thread pixelfairy

looks like well rebuild the cluster when bluestore is released anyway.
thanks!

On Tue, Jun 14, 2016 at 7:02 PM Christian Balzer  wrote:

>
> Hello,
>
> On Wed, 15 Jun 2016 00:22:51 + pixelfairy wrote:
>
> > We have a small cluster, 3mons, each which also have 6 4tb osds, and a
> > 20gig link to the cluster (2x10gig lacp to a stacked pair of switches).
> > well have at least replica pool (size=3) and one erasure coded pool.
>
> I'm neither particular knowledgeable nor a fan of EC pools, but keep in
> mind that the coding is dictated by the number of OSD nodes, so 3 doesn't
> give a lot of options, IIRC.
> In fact, it will be the same as a RAID5 and only sustain the loss of one
> OSD/disk, something nobody in their right mind does these days.
>
> > current plan is to have journals coexist with osds as that seems to the
> > be safest and most economical.
> >
> You will be thoroughly disappointed by the performance if you do this,
> unless your use case is something like a backup server with very few
> random I/Os.
> Any performance optimizations will suggest looking at journal SSDs first.
>
> > what levels of striping would you recommend for this size cluster? any
> > other optimization conciderations? looking for a starting point to work
> > from.
> >
> Striping is one of the last things to ponder.
> Not only does it depend a LOT on your use case, it's also not possible to
> change later on, so getting it right for the initial size and future
> growth is an interesting challenge.
>
> > also, any recommendations for testing / benchmarking these
> > configurations?
> >
> > so far, looking at
> > https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
> > bsd rebuilding itself, and maybe phoronix.
> >
> Those benchmarks are very much out-dated, both in terms of Ceph versions
> and capabilities as well as the tools used (fio is the most common
> benchmark tool for some time now).
> Once bluestore comes along (in a year or so), there will be another
> performance and HW design shift.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] striping for a small cluster

2016-06-14 Thread Christian Balzer


Hello,

On Wed, 15 Jun 2016 00:22:51 + pixelfairy wrote:

> We have a small cluster, 3mons, each which also have 6 4tb osds, and a
> 20gig link to the cluster (2x10gig lacp to a stacked pair of switches).
> well have at least replica pool (size=3) and one erasure coded pool.

I'm neither particular knowledgeable nor a fan of EC pools, but keep in
mind that the coding is dictated by the number of OSD nodes, so 3 doesn't
give a lot of options, IIRC.
In fact, it will be the same as a RAID5 and only sustain the loss of one
OSD/disk, something nobody in their right mind does these days.

> current plan is to have journals coexist with osds as that seems to the
> be safest and most economical.
> 
You will be thoroughly disappointed by the performance if you do this,
unless your use case is something like a backup server with very few
random I/Os. 
Any performance optimizations will suggest looking at journal SSDs first.

> what levels of striping would you recommend for this size cluster? any
> other optimization conciderations? looking for a starting point to work
> from.
> 
Striping is one of the last things to ponder.
Not only does it depend a LOT on your use case, it's also not possible to
change later on, so getting it right for the initial size and future
growth is an interesting challenge.

> also, any recommendations for testing / benchmarking these
> configurations?
> 
> so far, looking at
> https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
> bsd rebuilding itself, and maybe phoronix.
>
Those benchmarks are very much out-dated, both in terms of Ceph versions
and capabilities as well as the tools used (fio is the most common
benchmark tool for some time now).
Once bluestore comes along (in a year or so), there will be another
performance and HW design shift.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-14 Thread Christian Balzer


Hello,

On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote:

> Hi,
> bit rot is not "bit rot" per se - nothing is rotting on the drive
> platter. 

Never mind that I used the wrong terminology (according to Wiki) and that
my long experience with "laser-rot" probably caused me to choose that
term, there are data degradation scenarios that are caused by
undetected media failures or by the corruption happening in the write
path, thus making them quite reproducible. 

> It occurs during reads (mostly, anyway), and it's random. You
> can happily read a block and get the correct data, then read it again
> and get garbage, then get correct data again. This could be caused by a
> worn out cell on SSD but firmwares look for than and rewrite it if the
> signal is attentuated too much. On spinners there are no cells to
> refresh so rewriting it doesn't help either. 
> 
> You can't really "look for" bit rot due to the reasons above, strong
> checksumming/hash verification during reads is the only solution.
> 
Which is what I've been saying in the mail below and for years on this ML.

And that makes deep-scrubbing something of quite limited value.

Christian
> And trust me, bit rot is a very real thing and very dangerous as well -
> do you think companies like Seagate or WD would lie about bit rot if
> it's not real? I'd buy a drive with BER 10^999 over one with 10^14,
> wouldn't everyone? And it is especially dangerous when something like
> Ceph handles much larger blocks of data than the client does. While the
> client (or an app) has some knowledge of the data _and_ hopefully throws
> an error if it read garbage, Ceph will (if for example snapshots are
> used and FIEMAP is off) actually have to read the whole object (say
> 4MiB) and write it elsewhere, without any knowledge whether what it read
> (and wrote) made any sense to the app. This way corruption might spread
> silently into your backups if you don't validate the data somehow (or
> dump it from a database for example, where it's likely to get detected).
> 
> Btw just because you think you haven't seen it doesn't mean you haven't
> seen it - never seen artefacting in movies? Just a random bug in the
> decoder, is it? VoD guys would tell you...
> 
> For things like databases this is somewhat less impactful - bit rot
> doesn't "flip a bit" but affects larger blocks of data (like one
> sector), so databases usually catch this during read and err instead of
> returning garbage to the client.
> 
> Jan
> 
> 
> 
> > On 09 Jun 2016, at 09:16, Christian Balzer  wrote:
> > 
> > 
> > Hello,
> > 
> > On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote:
> > 
> >> Il 09 giu 2016 02:09, "Christian Balzer"  ha scritto:
> >>> Ceph currently doesn't do any (relevant) checksumming at all, so if a
> >>> PRIMARY PG suffers from bit-rot this will be undetected until the
> >>> next deep-scrub.
> >>> 
> >>> This is one of the longest and gravest outstanding issues with Ceph
> >>> and supposed to be addressed with bluestore (which currently doesn't
> >>> have checksum verified reads either).
> >> 
> >> So if bit rot happens on primary PG, ceph is spreading the currupted
> >> data across the cluster?
> > No.
> > 
> > You will want to re-read the Ceph docs and the countless posts here
> > about replication within Ceph works.
> > http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale
> > 
> > A client write goes to the primary OSD/PG and will not be ACK'ed to the
> > client until is has reached all replica OSDs.
> > This happens while the data is in-flight (in RAM), it's not read from
> > the journal or filestore.
> > 
> >> What would be sent to the replica,  the original data or the saved
> >> one?
> >> 
> >> When bit rot happens I'll have 1 corrupted object and 2 good.
> >> how do you manage this between deep scrubs?  Which data would be used
> >> by ceph? I think that a bitrot on a huge VM block device could lead
> >> to a mess like the whole device corrupted
> >> VM affected by bitrot would be able to stay up and running?
> >> And bitrot on a qcow2 file?
> >> 
> > Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run
> > nor on other systems here where I (can) actually check for it.
> > 
> > As to how it would affect things, that very much depends.
> > 
> > If it's something like a busy directory inode that gets corrupted, the
> > data in question will be in RAM (SLAB) and the next update  will
> > correct things.
> > 
> > If it's a logfile, you're likely to never notice until deep-scrub
> > detects it eventually.
> > 
> > This isn't a  Ceph specific question, on all systems that aren't backed
> > by something like ZFS or BTRFS you're potentially vulnerable to this.
> > 
> > Of course if you're that worried, you could always run BTRFS of ZFS
> > inside your VM and notice immediately when something goes wrong.
> > I personally wouldn't though, due to the performance penalties involved
> > (CoW).
> > 
> > 
> >> Let me try to explain

Re: [ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Noah Watkins

Working for me now. Thanks for taking care of this.

- Noah

On Tue, Jun 14, 2016 at 5:42 PM, Alfredo Deza  wrote:
> We are now good to go.
>
> Sorry for all the troubles, some packages were missed in the metadata,
> had to resync+re-sign them to get everything in order.
>
> Just tested it out and it works as expected. Let me know if you have any 
> issues.
>
> On Tue, Jun 14, 2016 at 5:57 PM, Noah Watkins  wrote:
>> Yeh, I'm still seeing the problem, too Thanks.
>>
>> On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza  wrote:
>>>
>>> On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza  wrote:
>>> > Is it possible you tried to install just when I was syncing 10.2.2 ?
>>> >
>>> > :)
>>> >
>>> > Would you mind trying this again and see if you are good?
>>> >
>>> > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins 
>>> > wrote:
>>> >> Installing Jewel with ceph-deploy has been working for weeks. Today I
>>> >> started to get some dependency issues:
>>> >>
>>> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies:
>>> >> [b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but
>>> >> it
>>> >> is not going to be installed
>>> >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but
>>> >> it
>>> >> is not going to be installed
>>> >> [b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (=
>>> >> 10.2.1-1trusty) but
>>> >> it is not going to be installed
>>> >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held
>>> >> broken
>>> >> packages.
>>> >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit
>>> >> status:
>>> >> 100
>>> >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>>> >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>>> >> --assume-yes
>>> >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew
>>> >> ceph
>>> >> ceph-mds radosgw
>>> >>
>>> >> Seems to be an issue with 10.2.1 vs 10.2.2?
>>>
>>> Bah, it looks like this is still an issue even right now.
>>>
>>> I will update once I know what is going on
>>> >>
>>> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
>>> >> Reading package lists... Done
>>> >> Building dependency tree
>>> >> Reading state information... Done
>>> >> Some packages could not be installed. This may mean that you have
>>> >> requested an impossible situation or if you are using the unstable
>>> >> distribution that some required packages have not yet been created
>>> >> or been moved out of Incoming.
>>> >> The following information may help to resolve the situation:
>>> >>
>>> >> The following packages have unmet dependencies:
>>> >>  ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is
>>> >> to
>>> >> be installed
>>> >> E: Unable to correct problems, you have held broken packages.
>>> >>
>>> >>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Spreading deep-scrubbing load

2016-06-14 Thread Christian Balzer

Hello,

On Wed, 15 Jun 2016 00:01:42 + Jared Curtis wrote:

> I’ve just started looking into one of our ceph clusters because a weekly
> deep scrub had a major IO impact on the cluster which caused multiple
> VMs to grind to a halt.
> 
A story you will find aplenty in the ML archives.

> So far I’ve discovered that this particular cluster is configured
> incorrectly for the number of PGS per OSD. Currently that setting is 6
> but should be closer to ~4096 based on the calc tool.
> 
You're having a case of apples and oranges here.
PGs (and PGPs, don't forget them!) are configured per pool, the amount of
PGs per OSD is a result of all PGs in all pools.

Output of "ceph osd pool ls detail" would be helpful for us.

> If I change the number of PGS to the suggested values what should I
> expect specially around the deep scrub performance but also just in
> general as I’m very new to ceph. 

We're not psychic. 
The amount of PGs will have an impact, but that very much depends on your
existing setup.
So the usual, all versions (Ceph/OS), detailed cluster description (all HW
details down to the SSD model if you have them, network, etc).

Generally speaking, deep-scrub is a very expensive operation with a
questionable value, see the current "Disk failures" thread for example.

That said, your cluster should be able to cope with it, as the
deep-scrub impact is a lot like what you'd get from recovery and/or
backfilling operations. 
Think of deep-scrub causing pain as an early warning sign that your
cluster is underpowered and/or badly configured.

>What I’m hoping will happen is that
> instead of a single weekly deep scrub that runs for 24+ hours we would
> have lots of smaller deep scrubs that can hopefully finish in a
> reasonable time with minimal cluster impact.
> 
Google and the (albeit often lacking behind) documentation are your
friends.

These are scrub related configuration parameters, this sample is from my
Hammer test cluster and comments below for relevant ones:

"osd_scrub_thread_timeout": "60",
"osd_scrub_thread_suicide_timeout": "300",
"osd_scrub_finalize_thread_timeout": "600",
"osd_scrub_invalid_stats": "true",
"osd_max_scrubs": "1",
Default AFAIK, no more than one scrub per OSD, alas deep scrubs from other
OSDs of course might want data from this one as well.

"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",
These 2 are perfect if your cluster can finish a deep scrub within
off-peak hours.

"osd_scrub_load_threshold": "0.5",
Adjust to not starve your I/O.

"osd_scrub_min_interval": "86400",
"osd_scrub_max_interval": "604800",
"osd_scrub_interval_randomize_ratio": "0.5",
Latest Hammer and afterwards can randomize things (spreading the load out),
but if you want things to happen within a certain time frame this might
not be helpful.

"osd_scrub_chunk_min": "5",
"osd_scrub_chunk_max": "25",
"osd_scrub_sleep": "0.1",
This will allow client I/O to get a foot in and tends to be the biggest
help in Hammer and before. In Jewel the combined I/O queue should help a
lot as well.

"osd_deep_scrub_interval": "604800",
Once that's exceeded, Ceph will deep-scrub, come hell or high water,
ignoring at the very least the load setting above.

"osd_deep_scrub_stride": "524288",
"osd_deep_scrub_update_digest_min_age": "7200",
"osd_debug_scrub_chance_rewrite_digest": "0",

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Alfredo Deza

We are now good to go.

Sorry for all the troubles, some packages were missed in the metadata,
had to resync+re-sign them to get everything in order.

Just tested it out and it works as expected. Let me know if you have any issues.

On Tue, Jun 14, 2016 at 5:57 PM, Noah Watkins  wrote:
> Yeh, I'm still seeing the problem, too Thanks.
>
> On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza  wrote:
>>
>> On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza  wrote:
>> > Is it possible you tried to install just when I was syncing 10.2.2 ?
>> >
>> > :)
>> >
>> > Would you mind trying this again and see if you are good?
>> >
>> > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins 
>> > wrote:
>> >> Installing Jewel with ceph-deploy has been working for weeks. Today I
>> >> started to get some dependency issues:
>> >>
>> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies:
>> >> [b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but
>> >> it
>> >> is not going to be installed
>> >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but
>> >> it
>> >> is not going to be installed
>> >> [b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (=
>> >> 10.2.1-1trusty) but
>> >> it is not going to be installed
>> >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held
>> >> broken
>> >> packages.
>> >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit
>> >> status:
>> >> 100
>> >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>> >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>> >> --assume-yes
>> >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew
>> >> ceph
>> >> ceph-mds radosgw
>> >>
>> >> Seems to be an issue with 10.2.1 vs 10.2.2?
>>
>> Bah, it looks like this is still an issue even right now.
>>
>> I will update once I know what is going on
>> >>
>> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
>> >> Reading package lists... Done
>> >> Building dependency tree
>> >> Reading state information... Done
>> >> Some packages could not be installed. This may mean that you have
>> >> requested an impossible situation or if you are using the unstable
>> >> distribution that some required packages have not yet been created
>> >> or been moved out of Incoming.
>> >> The following information may help to resolve the situation:
>> >>
>> >> The following packages have unmet dependencies:
>> >>  ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is
>> >> to
>> >> be installed
>> >> E: Unable to correct problems, you have held broken packages.
>> >>
>> >>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] striping for a small cluster

2016-06-14 Thread pixelfairy

We have a small cluster, 3mons, each which also have 6 4tb osds, and a
20gig link to the cluster (2x10gig lacp to a stacked pair of switches).
well have at least replica pool (size=3) and one erasure coded pool.
current plan is to have journals coexist with osds as that seems to the be
safest and most economical.

what levels of striping would you recommend for this size cluster? any
other optimization conciderations? looking for a starting point to work
from.

also, any recommendations for testing / benchmarking these configurations?

so far, looking at
https://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
bsd rebuilding itself, and maybe phoronix.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Spreading deep-scrubbing load

2016-06-14 Thread Jared Curtis

I’ve just started looking into one of our ceph clusters because a weekly deep 
scrub had a major IO impact on the cluster which caused multiple VMs to grind 
to a halt.

So far I’ve discovered that this particular cluster is configured incorrectly 
for the number of PGS per OSD. Currently that setting is 6 but should be closer 
to ~4096 based on the calc tool.

If I change the number of PGS to the suggested values what should I expect 
specially around the deep scrub performance but also just in general as I’m 
very new to ceph. What I’m hoping will happen is that instead of a single 
weekly deep scrub that runs for 24+ hours we would have lots of smaller deep 
scrubs that can hopefully finish in a reasonable time with minimal cluster 
impact.

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Noah Watkins

Yeh, I'm still seeing the problem, too Thanks.

On Tue, Jun 14, 2016 at 2:55 PM Alfredo Deza  wrote:

> On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza  wrote:
> > Is it possible you tried to install just when I was syncing 10.2.2 ?
> >
> > :)
> >
> > Would you mind trying this again and see if you are good?
> >
> > On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins 
> wrote:
> >> Installing Jewel with ceph-deploy has been working for weeks. Today I
> >> started to get some dependency issues:
> >>
> >> [b61808c8624c][DEBUG ] The following packages have unmet dependencies:
> >> [b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but
> it
> >> is not going to be installed
> >> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but
> it
> >> is not going to be installed
> >> [b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (=
> 10.2.1-1trusty) but
> >> it is not going to be installed
> >> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held
> broken
> >> packages.
> >> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit
> status:
> >> 100
> >> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
> >> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
> --assume-yes
> >> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew
> ceph
> >> ceph-mds radosgw
> >>
> >> Seems to be an issue with 10.2.1 vs 10.2.2?
>
> Bah, it looks like this is still an issue even right now.
>
> I will update once I know what is going on
> >>
> >> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
> >> Reading package lists... Done
> >> Building dependency tree
> >> Reading state information... Done
> >> Some packages could not be installed. This may mean that you have
> >> requested an impossible situation or if you are using the unstable
> >> distribution that some required packages have not yet been created
> >> or been moved out of Incoming.
> >> The following information may help to resolve the situation:
> >>
> >> The following packages have unmet dependencies:
> >>  ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is
> to
> >> be installed
> >> E: Unable to correct problems, you have held broken packages.
> >>
> >>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Alfredo Deza

On Tue, Jun 14, 2016 at 5:52 PM, Alfredo Deza  wrote:
> Is it possible you tried to install just when I was syncing 10.2.2 ?
>
> :)
>
> Would you mind trying this again and see if you are good?
>
> On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins  wrote:
>> Installing Jewel with ceph-deploy has been working for weeks. Today I
>> started to get some dependency issues:
>>
>> [b61808c8624c][DEBUG ] The following packages have unmet dependencies:
>> [b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it
>> is not going to be installed
>> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it
>> is not going to be installed
>> [b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (= 10.2.1-1trusty) but
>> it is not going to be installed
>> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken
>> packages.
>> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status:
>> 100
>> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes
>> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph
>> ceph-mds radosgw
>>
>> Seems to be an issue with 10.2.1 vs 10.2.2?

Bah, it looks like this is still an issue even right now.

I will update once I know what is going on
>>
>> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> Some packages could not be installed. This may mean that you have
>> requested an impossible situation or if you are using the unstable
>> distribution that some required packages have not yet been created
>> or been moved out of Incoming.
>> The following information may help to resolve the situation:
>>
>> The following packages have unmet dependencies:
>>  ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to
>> be installed
>> E: Unable to correct problems, you have held broken packages.
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Alfredo Deza

Is it possible you tried to install just when I was syncing 10.2.2 ?

:)

Would you mind trying this again and see if you are good?

On Tue, Jun 14, 2016 at 5:31 PM, Noah Watkins  wrote:
> Installing Jewel with ceph-deploy has been working for weeks. Today I
> started to get some dependency issues:
>
> [b61808c8624c][DEBUG ] The following packages have unmet dependencies:
> [b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it
> is not going to be installed
> [b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it
> is not going to be installed
> [b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (= 10.2.1-1trusty) but
> it is not going to be installed
> [b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken
> packages.
> [b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status:
> 100
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes
> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph
> ceph-mds radosgw
>
> Seems to be an issue with 10.2.1 vs 10.2.2?
>
> root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> Some packages could not be installed. This may mean that you have
> requested an impossible situation or if you are using the unstable
> distribution that some required packages have not yet been created
> or been moved out of Incoming.
> The following information may help to resolve the situation:
>
> The following packages have unmet dependencies:
>  ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to
> be installed
> E: Unable to correct problems, you have held broken packages.
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs reporting 2x data available

2016-06-14 Thread Daniel Davidson


Thanks John,

I just wanted to make sure I wasnt doing anything wrong, that should 
work fine.


Dan

On 06/14/2016 03:24 PM, John Spray wrote:

On Tue, Jun 14, 2016 at 7:45 PM, Daniel Davidson
 wrote:

I have just deployed a cluster and started messing with it, which I think
two replicas.  However when I have a metadata server and mount via fuse, it
is reporting its full size.  With two replicas, I thought it would be only
reporting half of that.  Did I make a mistake, or is there something I can
change to get around that?

It reports the overall (raw) free space available on the cluster, i.e.
not accounting for replication.  I'm assuming that by "it is
reporting" you mean that "df" is reporting this on your ceph-fuse
mount.

Because the replica count is a per-pool thing, and a filesystem can
use multiple pools with different replica counts (via files having
different layouts), giving the raw free space is the most consistent
thing we can do.

If you want to see a smarter view of available space, use "ceph df",
which gives you a pool breakdown and and an "available" size that
takes account of replication.

John


How do you check that your replicas are actually set correct?  It is set in
my ceph.conf file, but I am guessing there is someplace else I should look
at.

Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Василий Ангапов

But is there any way to recreate bucket index for existing bucket? Is
it possible to change bucket's index pool to some new pool in its
metadata and them tell RadosGW to rebuild (--check --fix) index?
Sounds really crazy but will it work? Will the new index become
sharded?

2016-06-14 13:18 GMT+03:00 Ansgar Jazdzewski :
> Hi,
>
> your cluster will be in warning state if you disable scrubbing, and
> you relay need it in case of some data loss
>
> cheers,
> Ansgar
>
> 2016-06-14 11:05 GMT+02:00 Wido den Hollander :
>>
>>> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
>>>
>>>
>>> Is it a good idea to disable scrub and deep-scrub for bucket.index
>>> pool? What negative consequences it may cause?
>>>
>>
>> No, I would not do that. Scrubbing is essential to detect (silent) data 
>> corruption.
>>
>> You should really scrub all your data.
>>
>>> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
>>> >
>>> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>>> >> :
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> we are using ceph and radosGW to store images (~300kb each) in S3,
>>> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>>> >>
>>> >> my questions is:
>>> >>
>>> >> in case of that amount of objects/files is it better to calculate the
>>> >> PGs on a object-bases instant of the volume size? and how it should be
>>> >> done?
>>> >>
>>> >
>>> > Do you have bucket sharding enabled?
>>> >
>>> > And how many objects do you have in a single bucket?
>>> >
>>> > If sharding is not enabled for the bucket index you might have large 
>>> > RADOS objects with bucket indexes which are hard to scrub.
>>> >
>>> > Wido
>>> >
>>> >> thanks
>>> >> Ansgar
>>> >> ___
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Protecting rbd from multiple simultaneous mapping.

2016-06-14 Thread Puneet Zaroo

The email thread here :
http://www.spinics.net/lists/ceph-devel/msg12226.html discusses a way of
preventing multiple simultaneous clients from mapping an rbd via the legacy
advisory locking scheme, along with osd blacklisting.

Is it now advisable to use the exclusive lock feature, discussed here :
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004857.html
?

In other words, does the exclusive lock feature automatically break the
lock of any older lock holders and prevent any writes to the rbd from the
older holder ?

Another way to frame my question would be : what is the recommended way of
preventing multiple simultaneous rbd mappings, based on the
state-of-the-art in ceph?

thanks in advance,
- Puneet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-deploy jewel install dependencies

2016-06-14 Thread Noah Watkins

Installing Jewel with ceph-deploy has been working for weeks. Today I
started to get some dependency issues:

[b61808c8624c][DEBUG ] The following packages have unmet dependencies:
[b61808c8624c][DEBUG ]  ceph : Depends: ceph-mon (= 10.2.1-1trusty) but it
is not going to be installed
[b61808c8624c][DEBUG ] Depends: ceph-osd (= 10.2.1-1trusty) but it
is not going to be installed
[b61808c8624c][DEBUG ]  ceph-mds : Depends: ceph-base (= 10.2.1-1trusty)
but it is not going to be installed
[b61808c8624c][WARNIN] E: Unable to correct problems, you have held broken
packages.
[b61808c8624c][ERROR ] RuntimeError: command returned non-zero exit status:
100
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
--assume-yes -q --no-install-recommends install -o
Dpkg::Options::=--force-confnew ceph ceph-mds radosgw

Seems to be an issue with 10.2.1 vs 10.2.2?

root@b61808c8624c:/ceph-deploy# apt-get install ceph-mon ceph-base
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 ceph-mon : Depends: ceph-base (= 10.2.1-1trusty) but 10.2.2-1trusty is to
be installed
E: Unable to correct problems, you have held broken packages.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs reporting 2x data available

2016-06-14 Thread John Spray

On Tue, Jun 14, 2016 at 7:45 PM, Daniel Davidson
 wrote:
> I have just deployed a cluster and started messing with it, which I think
> two replicas.  However when I have a metadata server and mount via fuse, it
> is reporting its full size.  With two replicas, I thought it would be only
> reporting half of that.  Did I make a mistake, or is there something I can
> change to get around that?

It reports the overall (raw) free space available on the cluster, i.e.
not accounting for replication.  I'm assuming that by "it is
reporting" you mean that "df" is reporting this on your ceph-fuse
mount.

Because the replica count is a per-pool thing, and a filesystem can
use multiple pools with different replica counts (via files having
different layouts), giving the raw free space is the most consistent
thing we can do.

If you want to see a smarter view of available space, use "ceph df",
which gives you a pool breakdown and and an "available" size that
takes account of replication.

John

>
> How do you check that your replicas are actually set correct?  It is set in
> my ceph.conf file, but I am guessing there is someplace else I should look
> at.
>
> Dan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Clearing Incomplete Clones State

2016-06-14 Thread Lazuardi Nasution

Hi,

Additional information. It seem that snapshot state is wrong. Any idea on
my case? How to manually edit pool flags for removing "incomplete_clones"
flag?

[root@management-b ~]# rados -p rbd ls
rbd_directory
[root@management-b ~]# rados -p rbd_cache ls
rbd_directory
[root@management-b ~]# rados -p rbd lssnap
0 snaps
[root@management-b ~]# rados -p rbd_cache lssnap
0 snaps

Best regards,

On Tue, Jun 14, 2016 at 4:55 AM, Lazuardi Nasution 
wrote:

> Hi,
>
> I have removed cache tiering due to "missing hit_sets" warning. After
> removing, I want to try to add tiering again with the same cache pool and
> storage pool, but I can't even the cache pool is empty or forced to clear.
> Following is some output. How can I deal with this? Is it possible to clear
> "incomplete_clones" and "snapshot state"? How to avoid "missing hit_sets"
> warning to appear again?
>
> [root@management-b ~]# ceph osd tier add rbd rbd_cache
> Error ENOTEMPTY: tier pool 'rbd_cache' is not empty; --force-nonempty to
> force
> [root@management-b ~]# ceph osd tier add rbd rbd_cache --force-nonempty
> Error ENOTEMPTY: tier pool 'rbd_cache' has snapshot state; it cannot be
> added as a tier without breaking the pool
> [root@management-b ~]# rados -p rbd_cache ls
> rbd_directory
> [root@management-b ~]# rados -p rbd lssnap
> 0 snaps
> [root@management-b ~]# ceph osd dump | grep rbd_cache
> pool 6 'rbd_cache' replicated size 3 min_size 1 crush_ruleset 1
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 8090 flags
> hashpspool,incomplete_clones stripe_width 0
> [root@management-b ~]# ceph -v
> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> [root@management-b ~]#
>
> Best regards,
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Jonathan D. Proulx

On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote:
:Hi Jon,
:   Which is the hypervisor used for your Openstack deployment? We have lots
:of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package, RDB
:driver was not supported )

we're using kvm (Ubuntu 14.04, libvirt 1.2.12 )

-Jon

:
:Regards, I
:
:2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :
:
:> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
:> :Hi all,
:> :
:> :I have a problem integration Glance with Ceph.
:> :
:> :Openstack Mitaka
:> :Ceph Jewel
:> :
:> :I've following the Ceph doc (
:> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to
:> list
:> :or create images, I have an error "Unable to establish connection to
:> :http://IP:9292/v2/images";, and in the debug mode I can see this:
:>
:> This suggests that the Glance API service isn't running properly
:> and probably isn't related to the rbd backend.
:>
:> You should be able to conncet to the glance API endpoint even if the
:> ceph config is wrong (though you'd probably get 'internal server
:> errors' if the storage backend isn't set up correctly).
:>
:> In either case you'll probably get better resonse on the openstack
:> lists, but my suggestion would be to try the regular file backend to
:> verify your glance setup is working, then switch to the rbd backend.
:>
:> -Jon
:>
:> :
:> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
:> :glance_store._drivers.rbd.Store doesn't support updating dynamic storage
:> :capabilities. Please overwrite 'update_capabilities' method of the store
:> to
:> :implement updating logics if needed. update_capabilities
:> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
:> :
:> :I've also tried to remove the database and populate again but the same
:> :error.
:> :Cinder with Ceph works correctly.
:> :
:> :Any suggestions?
:> :
:> :Thanks,
:> :Fran.
:>
:> :___
:> :ceph-users mailing list
:> :ceph-users@lists.ceph.com
:> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
:>
:>
:> --
:> ___
:> ceph-users mailing list
:> ceph-users@lists.ceph.com
:> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
:>
:
:
:
:-- 
:
:Iban Cabrillo Bartolome
:Instituto de Fisica de Cantabria (IFCA)
:Santander, Spain
:Tel: +34942200969
:PGP PUBLIC KEY:
:http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
:
:Bertrand Russell:
:*"El problema con el mundo es que los estúpidos están seguros de todo y los
:inteligentes están llenos de dudas*"

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-14 Thread Ilya Dryomov

On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger  wrote:
> I just realized that this issue is probably because I’m running jewel 10.2.1 
> on the servers side, but accessing from a client running hammer 0.94.7 or 
> infernalis 9.2.1
>
> Here is what happens if I run rbd ls from a client on infernalis.  I was 
> testing this access since we weren’t planning on building rpms for Jewel on 
> CentOS 6
>
> $ rbd ls
> 2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
> 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
> c=0x562ed3ea0ac0).fault
> 2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 
> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7fe608004ef0).fault
> 2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7fe60800c5f0).fault
> 2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7fe6080078e0).fault
> 2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 
> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7fe608007110).fault

Accessing jewel with older clients should work as long as you don't
enable jewel tunables and such; the same goes for older kernels.  Can
you do

rbd --debug-ms=20 ls

and attach the output?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs reporting 2x data available

2016-06-14 Thread Daniel Davidson

I have just deployed a cluster and started messing with it, which I 
think two replicas.  However when I have a metadata server and mount via 
fuse, it is reporting its full size.  With two replicas, I thought it 
would be only reporting half of that.  Did I make a mistake, or is there 
something I can change to get around that?


How do you check that your replicas are actually set correct?  It is set 
in my ceph.conf file, but I am guessing there is someplace else I should 
look at.


Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] librados and multithreading

2016-06-14 Thread Юрий Соколов

Thank you, Jason.

2016-06-14 18:43 GMT+03:00 Jason Dillaman :
> On Fri, Jun 10, 2016 at 12:37 PM, Юрий Соколов  wrote:
>> Good day, all.
>>
>> I found this issue: https://github.com/ceph/ceph/pull/5991
>>
>> Did this issue affected librados ?
>
> No -- this affected the start-up and shut-down of librbd as described
> in the associated tracker ticket.
>
>> Were it safe to use single rados_ioctx_t from multiple threads before this 
>> fix?
>
> Yes.
>
>>
>> --
>> With regards,
>> Sokolov Yura aka funny_falcon
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason



-- 
With regards,
Sokolov Yura aka funny_falcon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Iban Cabrillo

Hi Jon,
   Which is the hypervisor used for your Openstack deployment? We have lots
of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package, RDB
driver was not supported )

Regards, I

2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :

> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
> :Hi all,
> :
> :I have a problem integration Glance with Ceph.
> :
> :Openstack Mitaka
> :Ceph Jewel
> :
> :I've following the Ceph doc (
> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to
> list
> :or create images, I have an error "Unable to establish connection to
> :http://IP:9292/v2/images";, and in the debug mode I can see this:
>
> This suggests that the Glance API service isn't running properly
> and probably isn't related to the rbd backend.
>
> You should be able to conncet to the glance API endpoint even if the
> ceph config is wrong (though you'd probably get 'internal server
> errors' if the storage backend isn't set up correctly).
>
> In either case you'll probably get better resonse on the openstack
> lists, but my suggestion would be to try the regular file backend to
> verify your glance setup is working, then switch to the rbd backend.
>
> -Jon
>
> :
> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
> :glance_store._drivers.rbd.Store doesn't support updating dynamic storage
> :capabilities. Please overwrite 'update_capabilities' method of the store
> to
> :implement updating logics if needed. update_capabilities
> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
> :
> :I've also tried to remove the database and populate again but the same
> :error.
> :Cinder with Ceph works correctly.
> :
> :Any suggestions?
> :
> :Thanks,
> :Fran.
>
> :___
> :ceph-users mailing list
> :ceph-users@lists.ceph.com
> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*"El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] librados and multithreading

2016-06-14 Thread Jason Dillaman

On Fri, Jun 10, 2016 at 12:37 PM, Юрий Соколов  wrote:
> Good day, all.
>
> I found this issue: https://github.com/ceph/ceph/pull/5991
>
> Did this issue affected librados ?

No -- this affected the start-up and shut-down of librbd as described
in the associated tracker ticket.

> Were it safe to use single rados_ioctx_t from multiple threads before this 
> fix?

Yes.

>
> --
> With regards,
> Sokolov Yura aka funny_falcon
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Jonathan D. Proulx

On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
:Hi all,
:
:I have a problem integration Glance with Ceph.
:
:Openstack Mitaka
:Ceph Jewel
:
:I've following the Ceph doc (
:http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to list
:or create images, I have an error "Unable to establish connection to
:http://IP:9292/v2/images";, and in the debug mode I can see this:

This suggests that the Glance API service isn't running properly
and probably isn't related to the rbd backend.

You should be able to conncet to the glance API endpoint even if the
ceph config is wrong (though you'd probably get 'internal server
errors' if the storage backend isn't set up correctly).

In either case you'll probably get better resonse on the openstack
lists, but my suggestion would be to try the regular file backend to
verify your glance setup is working, then switch to the rbd backend.

-Jon

:
:2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
:glance_store._drivers.rbd.Store doesn't support updating dynamic storage
:capabilities. Please overwrite 'update_capabilities' method of the store to
:implement updating logics if needed. update_capabilities
:/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
:
:I've also tried to remove the database and populate again but the same
:error.
:Cinder with Ceph works correctly.
:
:Any suggestions?
:
:Thanks,
:Fran.

:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW: ERROR: failed to distribute cache

2016-06-14 Thread Василий Ангапов

BTW, I have 10 RGW load balanced through Apache.

When restarting one of them I get the following messages in log:
2016-06-14 14:44:15.919801 7fd4728dea40  2 all 8 watchers are set,
enabling cache
2016-06-14 14:44:15.919879 7fce370f7700  2 garbage collection: start
2016-06-14 14:44:15.919990 7fce368f6700  2 object expiration: start
2016-06-14 14:44:15.920534 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.15
2016-06-14 14:44:15.921257 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.16
2016-06-14 14:44:15.922145 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.17
2016-06-14 14:44:15.923772 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.18
2016-06-14 14:44:15.924557 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.19
2016-06-14 14:44:15.925400 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.20
2016-06-14 14:44:15.926349 7fd4728dea40  0 starting handler: fastcgi
2016-06-14 14:44:15.927125 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.21
2016-06-14 14:44:15.927897 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.22
2016-06-14 14:44:15.928412 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.23
2016-06-14 14:44:15.929042 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.24
2016-06-14 14:44:15.930752 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.25
2016-06-14 14:44:15.931313 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.26
2016-06-14 14:44:15.932482 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.27
2016-06-14 14:44:15.933237 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.28
2016-06-14 14:44:15.934097 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.29
2016-06-14 14:44:15.934660 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.30
2016-06-14 14:44:15.936322 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.31
2016-06-14 14:44:15.936979 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.0
2016-06-14 14:44:15.937559 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.1
2016-06-14 14:44:15.938222 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.2
2016-06-14 14:44:15.939000 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.3
2016-06-14 14:44:15.939622 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.4
2016-06-14 14:44:15.940135 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.5
2016-06-14 14:44:15.940669 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.6
2016-06-14 14:44:15.941227 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.7
2016-06-14 14:44:15.941854 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.8
2016-06-14 14:44:15.942333 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.9
2016-06-14 14:44:15.943036 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.10
2016-06-14 14:44:15.944708 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.11
2016-06-14 14:44:15.946347 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.12
2016-06-14 14:44:15.947001 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.13
2016-06-14 14:44:15.947610 7fce370f7700  0 RGWGC::process() failed to
acquire lock on gc.14
2016-06-14 14:44:15.947615 7fce370f7700  2 garbage collection: stop
2016-06-14 14:44:15.947949 7fd4728dea40 -1 rgw realm watcher: Failed
to watch realms.87abf44e-cab3-48c4-b012-0a9247519a5b.control with (2)
No such file or directory
2016-06-14 14:44:15.948370 7fd4728dea40 -1 rgw realm watcher: Failed
to establish a watch on RGWRealm, disabling dynamic reconfiguration.

2016-06-14 17:34 GMT+03:00 Василий Ангапов :
> I also get the following:
>
> $ radosgw-admin period update --commit
> 2016-06-14 14:32:28.982847 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging
> 2016-06-14 14:32:38.991846 7fed392baa40  0 ERROR: failed to distribute
> cache for 
> .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch
> 2016-06-14 14:32:49.002380 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3
> 2016-06-14 14:32:59.013307 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch
> 2016-06-14 14:33:09.023554 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch
> 2016-06-14 14:33:19.034593 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc
> 2016-06-14 14:33:29.043825 7fed392baa40  0 ERROR: failed to distribute
> cache for .rgw.root:zonegroups_names.ed
> 2016-06-14 14:33:29.046386 7fed392baa40  0 Realm notify failed with -2
> {
> "id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f",
> "epoch": 3,
>

Re: [ceph-users] librados and multithreading

2016-06-14 Thread Юрий Соколов

Common, friends,
No one knows an answer?
12 июня 2016 г. 16:21 пользователь "Юрий Соколов" 
написал:

> I don't know. That is why i'm asking here.
>
> 2016-06-12 6:36 GMT+03:00 Ken Peng :
> > Hi,
> >
> > We had experienced the similar error, when writing to RBD block with
> > multi-threads using fio, some OSD got error and down.
> > Did we talk about the same stuff?
> >
> > 2016-06-11 0:37 GMT+08:00 Юрий Соколов :
> >>
> >> Good day, all.
> >>
> >> I found this issue: https://github.com/ceph/ceph/pull/5991
> >>
> >> Did this issue affected librados ?
> >> Were it safe to use single rados_ioctx_t from multiple threads before
> this
> >> fix?
> >>
> >> --
> >> With regards,
> >> Sokolov Yura aka funny_falcon
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
>
> --
> With regards,
> Sokolov Yura aka funny_falcon
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW: ERROR: failed to distribute cache

2016-06-14 Thread Василий Ангапов

I also get the following:

$ radosgw-admin period update --commit
2016-06-14 14:32:28.982847 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging
2016-06-14 14:32:38.991846 7fed392baa40  0 ERROR: failed to distribute
cache for 
.rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch
2016-06-14 14:32:49.002380 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3
2016-06-14 14:32:59.013307 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch
2016-06-14 14:33:09.023554 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch
2016-06-14 14:33:19.034593 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc
2016-06-14 14:33:29.043825 7fed392baa40  0 ERROR: failed to distribute
cache for .rgw.root:zonegroups_names.ed
2016-06-14 14:33:29.046386 7fed392baa40  0 Realm notify failed with -2
{
"id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f",
"epoch": 3,
"predecessor_uuid": "f2645d83-b1b4-4045-bf26-2b762c71937b",
"sync_status": [
"",
"",

2016-06-14 17:12 GMT+03:00 Василий Ангапов :
> Hello,
>
> I have Ceph 10.2.1 and when creating user in RGW I get the following error:
>
> $ radosgw-admin user create --uid=test --display-name="test"
> 2016-06-14 14:07:32.332288 7f00a4487a40  0 ERROR: failed to distribute
> cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1
> 2016-06-14 14:07:42.338251 7f00a4487a40  0 ERROR: failed to distribute
> cache for ed-1.rgw.users.uid:test
> 2016-06-14 14:07:52.362768 7f00a4487a40  0 ERROR: failed to distribute
> cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75
> {
> "user_id": "test",
> "display_name": "test",
> "email": "",
> "suspended": 0,
> "max_buckets": 1000,
> "auid": 0,
> "subusers": [],
> "keys": [
> {
> "user": "melesta",
> "access_key": "***",
> "secret_key": "***"
> }
> ],
> "swift_keys": [],
> "caps": [],
> "op_mask": "read, write, delete",
> "default_placement": "",
> "placement_tags": [],
> "bucket_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> },
> "user_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> },
> "temp_url_keys": []
> }
>
> What does it mean? Is something wrong?
>
> Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW: ERROR: failed to distribute cache

2016-06-14 Thread Василий Ангапов

Hello,

I have Ceph 10.2.1 and when creating user in RGW I get the following error:

$ radosgw-admin user create --uid=test --display-name="test"
2016-06-14 14:07:32.332288 7f00a4487a40  0 ERROR: failed to distribute
cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1
2016-06-14 14:07:42.338251 7f00a4487a40  0 ERROR: failed to distribute
cache for ed-1.rgw.users.uid:test
2016-06-14 14:07:52.362768 7f00a4487a40  0 ERROR: failed to distribute
cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75
{
"user_id": "test",
"display_name": "test",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "melesta",
"access_key": "***",
"secret_key": "***"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

What does it mean? Is something wrong?

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-06-14 Thread Saverio Proto

I am at the Ceph Day at CERN,
I asked to Sage if it is supported to enable both S3 and swift API at
the same time. The answer is yes, so it is meant to be supported, and
this that we see here is probably a bug.

I opened a bug report:
http://tracker.ceph.com/issues/16293

If anyone has a chance to test it on a ceph version newer than Hammer,
you can update the bug :)

thank you

Saverio


2016-05-12 15:49 GMT+02:00 Yehuda Sadeh-Weinraub :
> On Thu, May 12, 2016 at 12:29 AM, Saverio Proto  wrote:
>>> While I'm usually not fond of blaming the client application, this is
>>> really the swift command line tool issue. It tries to be smart by
>>> comparing the md5sum of the object's content with the object's etag,
>>> and it breaks with multipart objects. Multipart objects is calculated
>>> differently (md5sum of the md5sum of each part). I think the swift
>>> tool has a special handling for swift large objects (which are not the
>>> same as s3 multipart objects), so that's why it works in that specific
>>> use case.
>>
>> Well but I tried also with rclone and I have the same issue.
>>
>> Clients I tried
>> rclone (both SWIFT and S3)
>> s3cmd (S3)
>> python-swiftclient (SWIFT).
>>
>> I can reproduce the issue with different clients.
>> Once a multipart object is uploaded via S3 (with rclone or s3cmd) I
>> cannot read it anymore via SWIFT (either with rclone or
>> pythonswift-client).
>>
>> Are you saying that all SWIFT clients implementations are wrong ?
>
> Yes.
>
>>
>> Or should the radosgw be configured with only 1 API active ?
>>
>> Saverio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "mount error 5 = Input/output error" with the CephFS file system from client node

2016-06-14 Thread Gregory Farnum

On Tue, Jun 14, 2016 at 4:29 AM, Rakesh Parkiti
 wrote:
> Hello,
>
> Unable to mount the CephFS file system from client node with "mount error 5
> = Input/output error"
> MDS was installed on a separate node. Ceph Cluster health is OK and mds
> services are running. firewall was disabled across all the nodes in a
> cluster.
>
> -- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1)
> -- Client Nodes - Ubuntu 14.04 LTS
>
> Admin Node:
> [root@Admin ceph]# ceph mds stat
> e34: 0/0/1 up
>
> Client Side:
> user@clientA2:/etc/ceph$ ceph fs ls --name client.admin
> name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
>
> user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user
> /home/user/cephfs -o
> name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw==
> mount error 5 = Input/output error
>
> Connection Establishment was successful to monitor node.
> $tail -f /var/log/syslog
> Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid
> 66c5f31c-1756-47ce-889d-960e0d99f37a
> Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0
> 10.10.100.5:6789 session established
>
> Able to check ceph health status from client node with client.admin
> keyring.:
>
> user@clientA2:/etc/ceph$ ceph -s --name client.admin
> cluster 66c5f31c-1756-47ce-889d-960e0d99f37a
>  health HEALTH_OK
>  monmap e6: 3 mons at
> {siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0}
> election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon
>   fsmap e34: 0/0/1 up
>  osdmap e1097: 19 osds: 19 up, 19 in
> flags sortbitwise
>   pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects
> 3998 MB used, 4704 GB / 4708 GB avail
> 1286 active+clean

According to this, you don't have an active MDS in your cluster. If it
really is running, you'll need to figure out why it's not connecting.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Jason Dillaman

On Tue, Jun 14, 2016 at 8:15 AM, Fran Barrera  wrote:
> 2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
> glance_store._drivers.rbd.Store doesn't support updating dynamic storage
> capabilities. Please overwrite 'update_capabilities' method of the store to
> implement updating logics if needed. update_capabilities
> /usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98

I don't think that is anything to worry about -- it looks like a TODO
comment [1]. In fact, it doesn't appear like any store drivers
implement that method.

[1] 
https://github.com/openstack/glance_store/blob/stable/mitaka/glance_store/capabilities.py#L94

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to select particular OSD to act as primary OSD.

2016-06-14 Thread Kanchana. P

Thanks for the reply shylesh, but the procedure is not working. In ceph.com
it is mentioned that we can make particular osd as a primary osd by setting
primary affinity weightage between 0-1. But it is not working.
On 14 Jun 2016 16:15, "shylesh kumar"  wrote:

> Hi,
>
> I think you can edit the crush rule something like below
>
> rule another_replicated_ruleset {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step take osd1
> step choose firstn 1 type osd
> step emit
> step take osd2
> step choose firstn 1 type osd
> step emit
> step take osd5
> step choose firstn 1 type osd
> step emit
> step take osd4
> step choose firstn 1 type osd
> step emit
> }
>
> and create pool using this rule.
>
> It might work , though I am not 100% sure.
>
> Thanks,
> Shylesh
>
> On Tue, Jun 14, 2016 at 4:05 PM, Kanchana. P 
> wrote:
>
>> Hi,
>>
>> How to select particular OSD to act as primary OSD.
>> I modified the ceph.conf file and added
>> [mon]
>> ...
>> mon osd allow primary affinity = true
>> Restarted ceph target, now primary affinity is set to true in all monitor
>> nodes.
>> Using the below commands set some weights to the osds.
>>
>> $ ceph osd primary-affinity osd.1 0.25
>> $ ceph osd primary-affinity osd.6 0.50
>> $ ceph osd primary-affinity osd.11 0.75
>> $ ceph osd primary-affinity osd.16 1
>>
>> Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in
>> order 16,11,6,1
>> Even after setting the primary affinity weight, it took osds in different
>> order.
>> Can we select the primary OSD, if so, how can we do that. Please let me
>> know what I am missing here to set an OSD as a primary OSD.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Thanks & Regards
> Shylesh Kumar M
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-14 Thread Jan Schermer

Hi,
bit rot is not "bit rot" per se - nothing is rotting on the drive platter. It 
occurs during reads (mostly, anyway), and it's random.
You can happily read a block and get the correct data, then read it again and 
get garbage, then get correct data again.
This could be caused by a worn out cell on SSD but firmwares look for than and 
rewrite it if the signal is attentuated too much.
On spinners there are no cells to refresh so rewriting it doesn't help either. 

You can't really "look for" bit rot due to the reasons above, strong 
checksumming/hash verification during reads is the only solution.

And trust me, bit rot is a very real thing and very dangerous as well - do you 
think companies like Seagate or WD would lie about bit rot if it's not real?
I'd buy a drive with BER 10^999 over one with 10^14, wouldn't everyone?
And it is especially dangerous when something like Ceph handles much larger 
blocks of data than the client does.
While the client (or an app) has some knowledge of the data _and_ hopefully 
throws an error if it read garbage, Ceph will (if for example snapshots
are used and FIEMAP is off) actually have to read the whole object (say 4MiB) 
and write it elsewhere, without any knowledge whether what it read (and wrote) 
made any sense to the app.
This way corruption might spread silently into your backups if you don't 
validate the data somehow (or dump it from a database for example, where it's 
likely to get detected).

Btw just because you think you haven't seen it doesn't mean you haven't seen it 
- never seen artefacting in movies? Just a random bug in the decoder, is it? 
VoD guys would tell you...

For things like databases this is somewhat less impactful - bit rot doesn't 
"flip a bit" but affects larger blocks of data (like one sector), so databases 
usually catch this during read and err
instead of returning garbage to the client.

Jan

> On 09 Jun 2016, at 09:16, Christian Balzer  wrote:
> 
> 
> Hello,
> 
> On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote:
> 
>> Il 09 giu 2016 02:09, "Christian Balzer"  ha scritto:
>>> Ceph currently doesn't do any (relevant) checksumming at all, so if a
>>> PRIMARY PG suffers from bit-rot this will be undetected until the next
>>> deep-scrub.
>>> 
>>> This is one of the longest and gravest outstanding issues with Ceph and
>>> supposed to be addressed with bluestore (which currently doesn't have
>>> checksum verified reads either).
>> 
>> So if bit rot happens on primary PG, ceph is spreading the currupted data
>> across the cluster?
> No.
> 
> You will want to re-read the Ceph docs and the countless posts here about
> replication within Ceph works.
> http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale
> 
> A client write goes to the primary OSD/PG and will not be ACK'ed to the
> client until is has reached all replica OSDs.
> This happens while the data is in-flight (in RAM), it's not read from the
> journal or filestore.
> 
>> What would be sent to the replica,  the original data or the saved one?
>> 
>> When bit rot happens I'll have 1 corrupted object and 2 good.
>> how do you manage this between deep scrubs?  Which data would be used by
>> ceph? I think that a bitrot on a huge VM block device could lead to a
>> mess like the whole device corrupted
>> VM affected by bitrot would be able to stay up and running?
>> And bitrot on a qcow2 file?
>> 
> Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run nor
> on other systems here where I (can) actually check for it.
> 
> As to how it would affect things, that very much depends.
> 
> If it's something like a busy directory inode that gets corrupted, the data
> in question will be in RAM (SLAB) and the next update  will correct things.
> 
> If it's a logfile, you're likely to never notice until deep-scrub detects
> it eventually.
> 
> This isn't a  Ceph specific question, on all systems that aren't backed
> by something like ZFS or BTRFS you're potentially vulnerable to this.
> 
> Of course if you're that worried, you could always run BTRFS of ZFS inside
> your VM and notice immediately when something goes wrong.
> I personally wouldn't though, due to the performance penalties involved
> (CoW).
> 
> 
>> Let me try to explain: when writing to primary PG i have to write bit "1"
>> Due to a bit rot, I'm saving "0".
>> Would ceph read the wrote bit and spread that across the cluster (so it
>> will spread "0") or spread the in memory value "1" ?
>> 
>> What if the journal fails during a read or a write? 
> Again, you may want to get a deeper understanding of Ceph.
> The journal isn't involved in reads.
> 
>> Ceph is able to
>> recover by removing that journal from the affected osd (and still
>> running at lower speed) or should i use a raid1 on ssds used by journal ?
>> 
> Neither, a journal failure is lethal for the OSD involved and unless you
> have LOTS of money RAID1 SSDs are a waste.
> 
> If you use DC level SSDs with sufficient endurance

[ceph-users] Ceph and Openstack

2016-06-14 Thread Fran Barrera

Hi all,

I have a problem integration Glance with Ceph.

Openstack Mitaka
Ceph Jewel

I've following the Ceph doc (
http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to list
or create images, I have an error "Unable to establish connection to
http://IP:9292/v2/images";, and in the debug mode I can see this:

2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
glance_store._drivers.rbd.Store doesn't support updating dynamic storage
capabilities. Please overwrite 'update_capabilities' method of the store to
implement updating logics if needed. update_capabilities
/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98

I've also tried to remove the database and populate again but the same
error.
Cinder with Ceph works correctly.

Any suggestions?

Thanks,
Fran.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] local variable 'region_name' referenced before assignment

2016-06-14 Thread Shilpa Manjarabad Jagannath



- Original Message -
> From: "Parveen Sharma" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, June 14, 2016 2:34:27 PM
> Subject: [ceph-users] local variable 'region_name' referenced before  
> assignment
> 
> Hi,
> 
> I'm getting "UnboundLocalError: local variable 'region_name' referenced
> before assignment " error while placing an object in my earlier created
> bucket using my RADOSGW with boto.
> 
> 
> My package details:
> 
> $ sudo rpm -qa | grep rados
> librados2-10.2.1-0.el7.x86_64
> libradosstriper1-10.2.1-0.el7.x86_64
> python-rados-10.2.1-0.el7.x86_64
> ceph-radosgw-10.2.1-0.el7.x86_64
> $
> 
> $ python
> Python 2.7.10 (default, Oct 23 2015, 19:19:21)
> [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
> Type "help", "copyright", "credits" or "license" for more information. >>>
> import sys, boto
> >>> boto.Version '2.40.0' >>>
> 
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1343813 is mentioning a
> workaround but it apparently not working for me or I am missing something.
> 
> 
> 
> 
> $ cat ~/.boto
> 
> [Credentials]
> 
> aws_access_key_id = X
> 
> aws_secret_access_key = YYY
> 
> 
> 
> 
> [s3]
> 
> 
> 
> 
> use-sigv4 = True
> 
> $
> 
> $
> 
> $ cat s3test_for_placing_object_in_bucket.py
> 
> import boto
> 
> import boto.s3.connection
> 
> 
> 
> 
> conn = boto.connect_s3(
> 
> host = 'mc2', port = 7480,
> 
> is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(),
> 
> )
> 
> 
> 
> #From
> http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto
> 
> bucket = conn.get_bucket('my-new-bucket')
> 
> key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt')
> 
> with open('myTestFileIn_my-new-bucket.txt') as f:
> 
> key.send_file(f)
> 
> $
> 
> $
> 
> $ python s3test_for_placing_object_in_bucket.py
> 
> Traceback (most recent call last):
> 
> File "s3test_for_placing_object_in_bucket.py", line 12, in 
> 
> bucket = conn.get_bucket('my-new-bucket')
> 
> File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506, in
> get_bucket
> 
> return self.head_bucket(bucket_name, headers=headers)
> 
> File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525, in
> head_bucket
> 
> response = self.make_request('HEAD', bucket_name, headers=headers)
> 
> File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668, in
> make_request
> 
> retry_handler=retry_handler
> 
> File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071, in
> make_request
> 
> retry_handler=retry_handler)
> 
> File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in
> _mexe
> 
> request.authorize(connection=self)
> 
> File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in
> authorize
> 
> connection._auth_handler.add_auth(self, **kwargs)
> 
> File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in add_auth
> 
> **kwargs)
> 
> File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in add_auth
> 
> string_to_sign = self.string_to_sign(req, canonical_request)
> 
> File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in
> string_to_sign
> 
> sts.append(self.credential_scope(http_request))
> 
> File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in
> credential_scope
> 
> region_name = self.determine_region_name(http_request.host)
> 
> File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in
> determine_region_name
> 
> return region_name
> 
> UnboundLocalError: local variable 'region_name' referenced before assignment
> 
> $
> 
> 
> 
> 
> -
> 
> Parveen
> 

You have to make that change in boto/auth.py. Please take a look at this:

http://www.spinics.net/lists/ceph-devel/msg30612.html 


Shilpa

> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unable to mount the CephFS file system fromclientnode with "mount error 5 = Input/output error"

2016-06-14 Thread Burkhard Linke


Hi,

On 06/14/2016 01:21 PM, Rakesh Parkiti wrote:

Hello,

Unable to mount the CephFS file system from client node with *"mount 
error 5 = Input/output error"*
MDS was installed on a separate node. Ceph Cluster health is OK and 
mds services are running. firewall was disabled across all the nodes 
in a cluster.


-- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1)
-- Client Nodes - Ubuntu 14.04 LTS

Admin Node:
*[root@Admin ceph]# ceph mds stat*
e34: 0/0/1 up


*snipsnap*

The MDS is not up and running. Otherwise the output should look like this:

# ceph mds stat
e190193: 1/1/1 up {0=XYZ=up:active}

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] "mount error 5 = Input/output error" with the CephFS file system from client node

2016-06-14 Thread Rakesh Parkiti

Hello,

Unable to mount the CephFS file system from client node with "mount error 5 = 
Input/output error"
MDS was installed on a separate node. Ceph Cluster health is OK and mds 
services are running. firewall was disabled across all the nodes in a cluster.

-- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1)
-- Client Nodes - Ubuntu 14.04 LTS

Admin Node:
[root@Admin ceph]# ceph mds stat
e34: 0/0/1 up

Client Side:
user@clientA2:/etc/ceph$ ceph fs ls --name client.admin
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user 
/home/user/cephfs -o name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw==
mount error 5 = Input/output error

Connection Establishment was successful to monitor node.
$tail -f /var/log/syslog
Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid 
66c5f31c-1756-47ce-889d-960e0d99f37a
Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0 10.10.100.5:6789 
session established

Able to check ceph health status from client node with client.admin keyring.:

user@clientA2:/etc/ceph$ ceph -s --name client.admin
cluster 66c5f31c-1756-47ce-889d-960e0d99f37a
 health HEALTH_OK
 monmap e6: 3 mons at 
{siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0}
election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon
  fsmap e34: 0/0/1 up
 osdmap e1097: 19 osds: 19 up, 19 in
flags sortbitwise
  pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects
3998 MB used, 4704 GB / 4708 GB avail
1286 active+clean

Can anyone please help with solution for above issue.

Thanks
Rakesh Parkiti



  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Unable to mount the CephFS file system from client node with "mount error 5 = Input/output error"

2016-06-14 Thread Rakesh Parkiti




Hello,

Unable to mount the CephFS file system from client node with "mount error 5 = 
Input/output error"
MDS was installed on a separate node. Ceph Cluster health is OK and mds 
services are running. firewall was disabled across all the nodes in a cluster.

-- Ceph Cluster Nodes (RHEL 7.2 version + Jewel version 10.2.1)
-- Client Nodes - Ubuntu 14.04 LTS

Admin Node:
[root@Admin ceph]# ceph mds stat
e34: 0/0/1 up

Client Side:
user@clientA2:/etc/ceph$ ceph fs ls --name client.admin
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

user@clientA2:/etc/ceph$ sudo mount -t ceph 10.10.100.5:6789:/user 
/home/user/cephfs -o name=admin,secret=AQAQK1NXgupKIRAA9O7fKxadI/iIq/vPKLI9rw==
mount error 5 = Input/output error

Connection Establishment was successful to monitor node.
$tail -f /var/log/syslog
Jun 14 16:32:24 clientA2 kernel: [82270.155030] libceph: client134154 fsid 
66c5f31c-1756-47ce-889d-960e0d99f37a
Jun 14 16:32:24 clientA2 kernel: [82270.156726] libceph: mon0 10.10.100.5:6789 
session established

Able to check ceph health status from client node with client.admin keyring.:

user@clientA2:/etc/ceph$ ceph -s --name client.admin
cluster 66c5f31c-1756-47ce-889d-960e0d99f37a
 health HEALTH_OK
 monmap e6: 3 mons at 
{siteAmon=10.10.100.5:6789/0,siteBmon=10.10.150.6:6789/0,siteCmon=10.10.200.7:6789/0}
election epoch 70, quorum 0,1,2 siteAmon,siteBmon,siteCmon
  fsmap e34: 0/0/1 up
 osdmap e1097: 19 osds: 19 up, 19 in
flags sortbitwise
  pgmap v25719: 1286 pgs, 5 pools, 92160 kB data, 9 objects
3998 MB used, 4704 GB / 4708 GB avail
1286 active+clean

Can anyone please help with solution for above issue.


  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange cache tier behaviour with cephfs

2016-06-14 Thread Nick Fisk

The basic logic is that if a IO is not on the cache tier, then proxy it, which 
means do the IO direct on the base tier. The throttle is designed to minimise 
the latency impact of promotions and flushes.

So yes during testing it will not promote everything, but during normal work 
loads it makes things much better. The defaults were chosen after benchmarks 
showed they were the turning point where performance started become effected.

But yes I think there could be a better section on tuning the cache tier, but 
it's not an easy task as there are a lot of variables that can change depending 
on the hardware and work load.

Sent from Nine

From: Oliver Dzombic 
Sent: 14 Jun 2016 12:11 p.m.
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] strange cache tier behaviour with cephfs

Hi, 

ok the write test also shows now a more expected behaviour. 

As it seems to me, if there is more writing than 

osd_tier_promote_max_bytes_sec 

the write's are going directly against the cold pool ( which is a really 
good behaviour ( seriously ) ). 

But that should be definitly added to the documentation. Otherwise (new) 
people have no chance to find that. 

The search engines show < 10 hits for "osd_tier_promote_max_bytes_sec" 
one of it in 

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007632.html 

which has a totally different topic. 

Anyway, super super big thanks for your time ! 

-- 
Mit freundlichen Gruessen / Best regards 

Oliver Dzombic 
IP-Interactive 

mailto:i...@ip-interactive.de 

Anschrift: 

IP Interactive UG ( haftungsbeschraenkt ) 
Zum Sonnenberg 1-3 
63571 Gelnhausen 

HRB 93402 beim Amtsgericht Hanau 
Geschäftsführung: Oliver Dzombic 

Steuer Nr.: 35 236 3622 1 
UST ID: DE274086107 


Am 14.06.2016 um 07:47 schrieb Nick Fisk: 
> osd_tier_promote_max_objects_sec 
> and 
> osd_tier_promote_max_bytes_sec 
> 
> is what you are looking for, I think by default its set to 5MB/s, which 
> would roughly correlate to why you are only seeing around 8 objects each 
> time being promoted. This was done like this as too many promotions hurt 
> performance, so you don't actually want to promote on every IO. 
> 
>> -Original Message- 
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>> Christian Balzer 
>> Sent: 14 June 2016 02:00 
>> To: ceph-users@lists.ceph.com 
>> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs 
>> 
>> 
>> Hello, 
>> 
>> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote: 
>> 
>>> Hi Christian, 
>>> 
>>> if i read a 1,5 GB file, which is not changing at all. 
>>> 
>>> Then i expect the agent to copy it one time from the cold pool to the 
>>> cache pool. 
>>> 
>> Before Jewel, that is what you would have seen, yes. 
>> 
>> Did you read what Sam wrote and me in reply to him? 
>> 
>>> In fact its every time making a new copy. 
>>> 
>> Is it? 
>> Is there 1.5GB of data copied into the cache tier each time? 
>> An object is 4MB, you only had 8 in your first run, then 16... 
>> 
>>> I can see that by increasing disc usage of the cache and the 
>>> increasing object number. 
>>> 
>>> And the non existing improvement of speed. 
>>> 
>> That could be down to your network or other factors on your client. 
>> 
>> Christian 
>> -- 
>> Christian Balzer    Network/Systems Engineer 
>> ch...@gol.com   Global OnLine Japan/Rakuten Communications 
>> http://www.gol.com/ 
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to select particular OSD to act as primary OSD.

2016-06-14 Thread shylesh kumar

Hi,

I think you can edit the crush rule something like below

rule another_replicated_ruleset {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step take osd1
step choose firstn 1 type osd
step emit
step take osd2
step choose firstn 1 type osd
step emit
step take osd5
step choose firstn 1 type osd
step emit
step take osd4
step choose firstn 1 type osd
step emit
}

and create pool using this rule.

It might work , though I am not 100% sure.

Thanks,
Shylesh

On Tue, Jun 14, 2016 at 4:05 PM, Kanchana. P 
wrote:

> Hi,
>
> How to select particular OSD to act as primary OSD.
> I modified the ceph.conf file and added
> [mon]
> ...
> mon osd allow primary affinity = true
> Restarted ceph target, now primary affinity is set to true in all monitor
> nodes.
> Using the below commands set some weights to the osds.
>
> $ ceph osd primary-affinity osd.1 0.25
> $ ceph osd primary-affinity osd.6 0.50
> $ ceph osd primary-affinity osd.11 0.75
> $ ceph osd primary-affinity osd.16 1
>
> Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in
> order 16,11,6,1
> Even after setting the primary affinity weight, it took osds in different
> order.
> Can we select the primary OSD, if so, how can we do that. Please let me
> know what I am missing here to set an OSD as a primary OSD.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Thanks & Regards
Shylesh Kumar M
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How to select particular OSD to act as primary OSD.

2016-06-14 Thread Kanchana. P

Hi,

How to select particular OSD to act as primary OSD.
I modified the ceph.conf file and added
[mon]
...
mon osd allow primary affinity = true
Restarted ceph target, now primary affinity is set to true in all monitor
nodes.
Using the below commands set some weights to the osds.

$ ceph osd primary-affinity osd.1 0.25
$ ceph osd primary-affinity osd.6 0.50
$ ceph osd primary-affinity osd.11 0.75
$ ceph osd primary-affinity osd.16 1

Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in
order 16,11,6,1
Even after setting the primary affinity weight, it took osds in different
order.
Can we select the primary OSD, if so, how can we do that. Please let me
know what I am missing here to set an OSD as a primary OSD.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Ansgar Jazdzewski

Hi,

your cluster will be in warning state if you disable scrubbing, and
you relay need it in case of some data loss

cheers,
Ansgar

2016-06-14 11:05 GMT+02:00 Wido den Hollander :
>
>> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
>>
>>
>> Is it a good idea to disable scrub and deep-scrub for bucket.index
>> pool? What negative consequences it may cause?
>>
>
> No, I would not do that. Scrubbing is essential to detect (silent) data 
> corruption.
>
> You should really scrub all your data.
>
>> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
>> >
>> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>> >> :
>> >>
>> >>
>> >> Hi,
>> >>
>> >> we are using ceph and radosGW to store images (~300kb each) in S3,
>> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>> >>
>> >> my questions is:
>> >>
>> >> in case of that amount of objects/files is it better to calculate the
>> >> PGs on a object-bases instant of the volume size? and how it should be
>> >> done?
>> >>
>> >
>> > Do you have bucket sharding enabled?
>> >
>> > And how many objects do you have in a single bucket?
>> >
>> > If sharding is not enabled for the bucket index you might have large RADOS 
>> > objects with bucket indexes which are hard to scrub.
>> >
>> > Wido
>> >
>> >> thanks
>> >> Ansgar
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Ansgar Jazdzewski

Hi,

yes we have index sharding enabled, we have only tow big buckets at
the moment with 15Mil objects each and some smaller ones

cheers,
Ansgar

2016-06-14 10:51 GMT+02:00 Wido den Hollander :
>
>> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>> :
>>
>>
>> Hi,
>>
>> we are using ceph and radosGW to store images (~300kb each) in S3,
>> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>>
>> my questions is:
>>
>> in case of that amount of objects/files is it better to calculate the
>> PGs on a object-bases instant of the volume size? and how it should be
>> done?
>>
>
> Do you have bucket sharding enabled?
>
> And how many objects do you have in a single bucket?
>
> If sharding is not enabled for the bucket index you might have large RADOS 
> objects with bucket indexes which are hard to scrub.
>
> Wido
>
>> thanks
>> Ansgar
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange cache tier behaviour with cephfs

2016-06-14 Thread Oliver Dzombic

Hi,

ok the write test also shows now a more expected behaviour.

As it seems to me, if there is more writing than

osd_tier_promote_max_bytes_sec

the write's are going directly against the cold pool ( which is a really
good behaviour ( seriously ) ).

But that should be definitly added to the documentation. Otherwise (new)
people have no chance to find that.

The search engines show < 10 hits for "osd_tier_promote_max_bytes_sec"
one of it in

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007632.html

which has a totally different topic.

Anyway, super super big thanks for your time !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 14.06.2016 um 07:47 schrieb Nick Fisk:
> osd_tier_promote_max_objects_sec
> and
> osd_tier_promote_max_bytes_sec
> 
> is what you are looking for, I think by default its set to 5MB/s, which
> would roughly correlate to why you are only seeing around 8 objects each
> time being promoted. This was done like this as too many promotions hurt
> performance, so you don't actually want to promote on every IO.
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Christian Balzer
>> Sent: 14 June 2016 02:00
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs
>>
>>
>> Hello,
>>
>> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote:
>>
>>> Hi Christian,
>>>
>>> if i read a 1,5 GB file, which is not changing at all.
>>>
>>> Then i expect the agent to copy it one time from the cold pool to the
>>> cache pool.
>>>
>> Before Jewel, that is what you would have seen, yes.
>>
>> Did you read what Sam wrote and me in reply to him?
>>
>>> In fact its every time making a new copy.
>>>
>> Is it?
>> Is there 1.5GB of data copied into the cache tier each time?
>> An object is 4MB, you only had 8 in your first run, then 16...
>>
>>> I can see that by increasing disc usage of the cache and the
>>> increasing object number.
>>>
>>> And the non existing improvement of speed.
>>>
>> That could be down to your network or other factors on your client.
>>
>> Christian
>> --
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.comGlobal OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Christian Balzer


Hello,

On Tue, 14 Jun 2016 12:20:44 +0300 Nmz wrote:

> 
> 
> 
> - Original Message -
> From: Wido den Hollander 
> To: Василий Ангапов 
> Date: Tuesday, June 14, 2016, 12:05:51 PM
> Subject: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs
> 
> 
> >> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
> >> 
> >> 
> >> Is it a good idea to disable scrub and deep-scrub for bucket.index
> >> pool? What negative consequences it may cause?
> >> 
> 
> > No, I would not do that. Scrubbing is essential to detect (silent)
> > data corruption.
> 
> > You should really scrub all your data.
> 
> Ceph do not protect from silent data corruption at all.
> 
> You can read this thread
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007680.html
> 

While that is unfortunately very true, Ceph does at least allow you to
detect it (after the fact) and if you're lucky it is a replica and not the
primary object that's corrupted.

So it's better than Ext4 or XFS, but worse than ZFS or BTRFS.

Bluestore is supposed to address this, but currently lacks live checksums
as well. 

Now with a storage that large (40million 300KB objects...) the statistical
chances of bitrot do of course increase.

I've run a cluster with a few TB of data for more than a year w/o deep
scrubs and unsurprisingly nothing bad was found when I turned it back on.

But your millage may vary, caveat emptor, etc.


Christan

> >> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
> >> >
> >> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski
> >> >> :
> >> >>
> >> >>
> >> >> Hi,
> >> >>
> >> >> we are using ceph and radosGW to store images (~300kb each) in S3,
> >> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
> >> >>
> >> >> my questions is:
> >> >>
> >> >> in case of that amount of objects/files is it better to calculate
> >> >> the PGs on a object-bases instant of the volume size? and how it
> >> >> should be done?
> >> >>
> >> >
> >> > Do you have bucket sharding enabled?
> >> >
> >> > And how many objects do you have in a single bucket?
> >> >
> >> > If sharding is not enabled for the bucket index you might have
> >> > large RADOS objects with bucket indexes which are hard to scrub.
> >> >
> >> > Wido
> >> >
> >> >> thanks
> >> >> Ansgar
> >> >> ___
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] local variable 'region_name' referenced before assignment

2016-06-14 Thread Parveen Sharma

I'm sending on personnel ID as my posts to ceph-users@lists.ceph.com  are
not reaching to the mailing list, though I've subscribed.



On Tue, Jun 14, 2016 at 2:49 PM, Parveen Sharma 
wrote:

>
> Hi,
>
> I'm getting "UnboundLocalError: local variable 'region_name' referenced
> before assignment"  error while placing an object in my earlier created
> bucket using my RADOSGW with boto.
>
>
> My package details:
>
> $ sudo rpm -qa | grep rados
> librados2-10.2.1-0.el7.x86_64
> libradosstriper1-10.2.1-0.el7.x86_64
> python-rados-10.2.1-0.el7.x86_64
> ceph-radosgw-10.2.1-0.el7.x86_64
> $
>
> $ python
> Python 2.7.10 (default, Oct 23 2015, 19:19:21)
> [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.>>> 
> import sys, boto
> >>> boto.Version
> '2.40.0'>>>
>
>
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1343813  is mentioning a
> workaround but it apparently not working for me or I am missing something.
>
>
> *$ cat ~/.boto *
>
> [Credentials]
>
> aws_access_key_id = X
>
> aws_secret_access_key = YYY
>
>
> [s3]
>
> use-sigv4 = True
>
> $
>
> $
>
> *$ cat  s3test_for_placing_object_in_bucket.py*
>
> import boto
>
> import boto.s3.connection
>
>
>
> conn = boto.connect_s3(
>
> host = 'mc2', port = 7480,
>
> is_secure=False, calling_format =
> boto.s3.connection.OrdinaryCallingFormat(),
>
> )
>
> #From
> http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto
>
> bucket = conn.get_bucket('my-new-bucket')
>
> key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt')
>
> with open('myTestFileIn_my-new-bucket.txt') as f:
>
> key.send_file(f)
>
> $
>
> $
>
> *$ python s3test_for_placing_object_in_bucket.py*
>
> Traceback (most recent call last):
>
>   File "s3test_for_placing_object_in_bucket.py", line 12, in 
>
> bucket = conn.get_bucket('my-new-bucket')
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 506, in get_bucket
>
> return self.head_bucket(bucket_name, headers=headers)
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 525, in head_bucket
>
> response = self.make_request('HEAD', bucket_name, headers=headers)
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 668, in make_request
>
> retry_handler=retry_handler
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071,
> in make_request
>
> retry_handler=retry_handler)
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 927,
> in _mexe
>
> request.authorize(connection=self)
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 377,
> in authorize
>
> connection._auth_handler.add_auth(self, **kwargs)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in
> add_auth
>
> **kwargs)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in
> add_auth
>
> string_to_sign = self.string_to_sign(req, canonical_request)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in
> string_to_sign
>
> sts.append(self.credential_scope(http_request))
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in
> credential_scope
>
> region_name = self.determine_region_name(http_request.host)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in
> determine_region_name
>
> return region_name
>
> *UnboundLocalError: local variable 'region_name' referenced before
> assignment*
>
> $
>
>
> -
>
> Parveen
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange cache tier behaviour with cephfs

2016-06-14 Thread Oliver Dzombic

Hi,

wow.

After setting in ceph.conf and restarting the whole cluster this:

osd tier promote max bytes sec = 1610612736
osd tier promote max objects sec = 2

And repeating the test, the cache pool got the full 11 GB of the test
file with 2560 objects copied from the cold pool.



Aaand, repeating the test multiple times showed, that each time, there
is some movement within the cache pool WITHOUT copy from the cold pool.
So it shifts some MB within the cache pool from one OSD to another.

So its for example changing from:

/dev/sde1   234315556   2559404  231756152   2% /var/lib/ceph/osd/ceph-0
/dev/sdf1   234315556   2848300  231467256   2% /var/lib/ceph/osd/ceph-1
/dev/sdi1   234315556   2820596  231494960   2% /var/lib/ceph/osd/ceph-2
/dev/sdj1   234315556   2712796  231602760   2% /var/lib/ceph/osd/ceph-3

to

/dev/sde1   234315556   2670360  231645196   2% /var/lib/ceph/osd/ceph-0
/dev/sdf1   234315556   2951116  231364440   2% /var/lib/ceph/osd/ceph-1
/dev/sdi1   234315556   2903000  231412556   2% /var/lib/ceph/osd/ceph-2
/dev/sdj1   234315556   2831992  231483564   2% /var/lib/ceph/osd/ceph-3


So around 400 MB has been shifted inside cache pool ( why ever ).

The numbers of objects is stable and not changed.

The Speed is going from ~ 100 MB/s up to ~ 170 MB/s which is close to
the network maximum considering the client is busy too.

So this hidden and undocumentent config option changed the behaviour to
the, according to the documentation, expected behaviour.

Thank you very much for this hint !

I will repeat now all the testing.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 14.06.2016 um 07:47 schrieb Nick Fisk:
> osd_tier_promote_max_objects_sec
> and
> osd_tier_promote_max_bytes_sec
> 
> is what you are looking for, I think by default its set to 5MB/s, which
> would roughly correlate to why you are only seeing around 8 objects each
> time being promoted. This was done like this as too many promotions hurt
> performance, so you don't actually want to promote on every IO.
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Christian Balzer
>> Sent: 14 June 2016 02:00
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] strange cache tier behaviour with cephfs
>>
>>
>> Hello,
>>
>> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote:
>>
>>> Hi Christian,
>>>
>>> if i read a 1,5 GB file, which is not changing at all.
>>>
>>> Then i expect the agent to copy it one time from the cold pool to the
>>> cache pool.
>>>
>> Before Jewel, that is what you would have seen, yes.
>>
>> Did you read what Sam wrote and me in reply to him?
>>
>>> In fact its every time making a new copy.
>>>
>> Is it?
>> Is there 1.5GB of data copied into the cache tier each time?
>> An object is 4MB, you only had 8 in your first run, then 16...
>>
>>> I can see that by increasing disc usage of the cache and the
>>> increasing object number.
>>>
>>> And the non existing improvement of speed.
>>>
>> That could be down to your network or other factors on your client.
>>
>> Christian
>> --
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.comGlobal OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Nmz




- Original Message -
From: Wido den Hollander 
To: Василий Ангапов 
Date: Tuesday, June 14, 2016, 12:05:51 PM
Subject: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs


>> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
>> 
>> 
>> Is it a good idea to disable scrub and deep-scrub for bucket.index
>> pool? What negative consequences it may cause?
>> 

> No, I would not do that. Scrubbing is essential to detect (silent) data 
> corruption.

> You should really scrub all your data.

Ceph do not protect from silent data corruption at all.

You can read this thread 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007680.html

>> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
>> >
>> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>> >> :
>> >>
>> >>
>> >> Hi,
>> >>
>> >> we are using ceph and radosGW to store images (~300kb each) in S3,
>> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>> >>
>> >> my questions is:
>> >>
>> >> in case of that amount of objects/files is it better to calculate the
>> >> PGs on a object-bases instant of the volume size? and how it should be
>> >> done?
>> >>
>> >
>> > Do you have bucket sharding enabled?
>> >
>> > And how many objects do you have in a single bucket?
>> >
>> > If sharding is not enabled for the bucket index you might have large RADOS 
>> > objects with bucket indexes which are hard to scrub.
>> >
>> > Wido
>> >
>> >> thanks
>> >> Ansgar
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] local variable 'region_name' referenced before assignment

2016-06-14 Thread Parveen Sharma

Hi,

I'm getting "UnboundLocalError: local variable 'region_name' referenced
before assignment"  error while placing an object in my earlier created
bucket using my RADOSGW with boto.


My package details:

$ sudo rpm -qa | grep rados
librados2-10.2.1-0.el7.x86_64
libradosstriper1-10.2.1-0.el7.x86_64
python-rados-10.2.1-0.el7.x86_64
ceph-radosgw-10.2.1-0.el7.x86_64
$

$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.>>> import sys, boto
>>> boto.Version
'2.40.0'>>>



https://bugzilla.redhat.com/show_bug.cgi?id=1343813  is mentioning a
workaround but it apparently not working for me or I am missing something.


*$ cat ~/.boto *

[Credentials]

aws_access_key_id = X

aws_secret_access_key = YYY


[s3]

use-sigv4 = True

$

$

*$ cat  s3test_for_placing_object_in_bucket.py*

import boto

import boto.s3.connection



conn = boto.connect_s3(

host = 'mc2', port = 7480,

is_secure=False, calling_format =
boto.s3.connection.OrdinaryCallingFormat(),

)

#From
http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto

bucket = conn.get_bucket('my-new-bucket')

key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt')

with open('myTestFileIn_my-new-bucket.txt') as f:

key.send_file(f)

$

$

*$ python s3test_for_placing_object_in_bucket.py*

Traceback (most recent call last):

  File "s3test_for_placing_object_in_bucket.py", line 12, in 

bucket = conn.get_bucket('my-new-bucket')

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506,
in get_bucket

return self.head_bucket(bucket_name, headers=headers)

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525,
in head_bucket

response = self.make_request('HEAD', bucket_name, headers=headers)

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668,
in make_request

retry_handler=retry_handler

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071,
in make_request

retry_handler=retry_handler)

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in
_mexe

request.authorize(connection=self)

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in
authorize

connection._auth_handler.add_auth(self, **kwargs)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in
add_auth

**kwargs)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in
add_auth

string_to_sign = self.string_to_sign(req, canonical_request)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in
string_to_sign

sts.append(self.credential_scope(http_request))

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in
credential_scope

region_name = self.determine_region_name(http_request.host)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in
determine_region_name

return region_name

*UnboundLocalError: local variable 'region_name' referenced before
assignment*

$


-

Parveen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Issue installing ceph with ceph-deploy

2016-06-14 Thread Fran Barrera

Hi,

Thanks to both of you, finally the problem was fixed deleting everything
and the user ceph and install again as George commented.

Best Regards,
Fran.



2016-06-13 17:41 GMT+02:00 Tu Holmes :

> I have seen this.
>
> Just stop ceph and kill any ssh processes related to it.
>
> I had the same issue, and the fix for me was to enable root login, ssh to
> the node as root and run the env DEBIAN_FRONTEND=noninteractive
> DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends
> install -o Dpkg::Options::=--force-confnew ceph ceph-mds radosgw as root
> after the ceph-deploy fails.
>
> This worked for me.
>
> -Tu
>
>
>
> On Mon, Jun 13, 2016 at 6:18 AM George Shuklin 
> wrote:
>
>> I believe this is the source of issues (cited line).
>>
>> Purge all ceph packages from this node and remove user/group 'ceph',
>> than retry.
>>
>> On 06/13/2016 02:46 PM, Fran Barrera wrote:
>> > [ceph-admin][WARNIN] usermod: user ceph is currently used by process
>> 1303
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Василий Ангапов

Wido, can you please give more details about that? What sort of
corruption may occur? What scrubbing actually does especially for
bucket index pool?

2016-06-14 12:05 GMT+03:00 Wido den Hollander :
>
>> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
>>
>>
>> Is it a good idea to disable scrub and deep-scrub for bucket.index
>> pool? What negative consequences it may cause?
>>
>
> No, I would not do that. Scrubbing is essential to detect (silent) data 
> corruption.
>
> You should really scrub all your data.
>
>> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
>> >
>> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>> >> :
>> >>
>> >>
>> >> Hi,
>> >>
>> >> we are using ceph and radosGW to store images (~300kb each) in S3,
>> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>> >>
>> >> my questions is:
>> >>
>> >> in case of that amount of objects/files is it better to calculate the
>> >> PGs on a object-bases instant of the volume size? and how it should be
>> >> done?
>> >>
>> >
>> > Do you have bucket sharding enabled?
>> >
>> > And how many objects do you have in a single bucket?
>> >
>> > If sharding is not enabled for the bucket index you might have large RADOS 
>> > objects with bucket indexes which are hard to scrub.
>> >
>> > Wido
>> >
>> >> thanks
>> >> Ansgar
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Wido den Hollander


> Op 14 juni 2016 om 11:00 schreef Василий Ангапов :
> 
> 
> Is it a good idea to disable scrub and deep-scrub for bucket.index
> pool? What negative consequences it may cause?
> 

No, I would not do that. Scrubbing is essential to detect (silent) data 
corruption.

You should really scrub all your data.

> 2016-06-14 11:51 GMT+03:00 Wido den Hollander :
> >
> >> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
> >> :
> >>
> >>
> >> Hi,
> >>
> >> we are using ceph and radosGW to store images (~300kb each) in S3,
> >> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
> >>
> >> my questions is:
> >>
> >> in case of that amount of objects/files is it better to calculate the
> >> PGs on a object-bases instant of the volume size? and how it should be
> >> done?
> >>
> >
> > Do you have bucket sharding enabled?
> >
> > And how many objects do you have in a single bucket?
> >
> > If sharding is not enabled for the bucket index you might have large RADOS 
> > objects with bucket indexes which are hard to scrub.
> >
> > Wido
> >
> >> thanks
> >> Ansgar
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] local variable 'region_name' referenced before assignment

2016-06-14 Thread Parveen Sharma

Hi,

I'm getting "UnboundLocalError: local variable 'region_name' referenced
before assignment"  error while placing an object in my earlier created
bucket using my RADOSGW with boto.


My package details:

$ sudo rpm -qa | grep rados
librados2-10.2.1-0.el7.x86_64
libradosstriper1-10.2.1-0.el7.x86_64
python-rados-10.2.1-0.el7.x86_64
ceph-radosgw-10.2.1-0.el7.x86_64
$

$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.>>> import sys, boto
>>> boto.Version
'2.40.0'>>>



https://bugzilla.redhat.com/show_bug.cgi?id=1343813  is mentioning a
workaround but it apparently not working for me or I am missing something.


*$ cat ~/.boto *

[Credentials]

aws_access_key_id = X

aws_secret_access_key = YYY


[s3]

use-sigv4 = True

$

$

*$ cat  s3test_for_placing_object_in_bucket.py*

import boto

import boto.s3.connection



conn = boto.connect_s3(

host = 'mc2', port = 7480,

is_secure=False, calling_format =
boto.s3.connection.OrdinaryCallingFormat(),

)

#From
http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto

bucket = conn.get_bucket('my-new-bucket')

key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt')

with open('myTestFileIn_my-new-bucket.txt') as f:

key.send_file(f)

$

$

*$ python s3test_for_placing_object_in_bucket.py*

Traceback (most recent call last):

  File "s3test_for_placing_object_in_bucket.py", line 12, in 

bucket = conn.get_bucket('my-new-bucket')

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 506,
in get_bucket

return self.head_bucket(bucket_name, headers=headers)

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 525,
in head_bucket

response = self.make_request('HEAD', bucket_name, headers=headers)

  File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 668,
in make_request

retry_handler=retry_handler

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071,
in make_request

retry_handler=retry_handler)

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 927, in
_mexe

request.authorize(connection=self)

  File "/Library/Python/2.7/site-packages/boto/connection.py", line 377, in
authorize

connection._auth_handler.add_auth(self, **kwargs)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in
add_auth

**kwargs)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in
add_auth

string_to_sign = self.string_to_sign(req, canonical_request)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in
string_to_sign

sts.append(self.credential_scope(http_request))

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in
credential_scope

region_name = self.determine_region_name(http_request.host)

  File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in
determine_region_name

return region_name

*UnboundLocalError: local variable 'region_name' referenced before
assignment*

$


-

Parveen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Василий Ангапов

Is it a good idea to disable scrub and deep-scrub for bucket.index
pool? What negative consequences it may cause?

2016-06-14 11:51 GMT+03:00 Wido den Hollander :
>
>> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
>> :
>>
>>
>> Hi,
>>
>> we are using ceph and radosGW to store images (~300kb each) in S3,
>> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
>>
>> my questions is:
>>
>> in case of that amount of objects/files is it better to calculate the
>> PGs on a object-bases instant of the volume size? and how it should be
>> done?
>>
>
> Do you have bucket sharding enabled?
>
> And how many objects do you have in a single bucket?
>
> If sharding is not enabled for the bucket index you might have large RADOS 
> objects with bucket indexes which are hard to scrub.
>
> Wido
>
>> thanks
>> Ansgar
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange cache tier behaviour with cephfs

2016-06-14 Thread Oliver Dzombic

Hi,

ok lets make it step by step:

before  `dd if=file of=/dev/zero`

[root@cephmon1 ~]# rados -p ssd_cache cache-flush-evict-all
-> Moving all away
[root@cephmon1 ~]# rados -p ssd_cache ls
[root@cephmon1 ~]#
-> empty

cache osds at that point:

/dev/sde1   234315556 84368  234231188   1% /var/lib/ceph/osd/ceph-0
/dev/sdf1   234315556106716  234208840   1% /var/lib/ceph/osd/ceph-1
/dev/sdi1   234315556 97132  234218424   1% /var/lib/ceph/osd/ceph-2
/dev/sdj1   234315556 87584  234227972   1% /var/lib/ceph/osd/ceph-3

/dev/sde1   234315556 90252  234225304   1% /var/lib/ceph/osd/ceph-8
/dev/sdf1   234315556107424  234208132   1% /var/lib/ceph/osd/ceph-9
/dev/sdi1   234315556378104  233937452   1%
/var/lib/ceph/osd/ceph-10
/dev/sdj1   234315556 94856  234220700   1%
/var/lib/ceph/osd/ceph-11


Now we run the dd.


20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 85.6032 s, 125 MB/s

[root@cephmon1 ~]# rados -p ssd_cache ls | wc -l
40

/dev/sde1   234315556624896  233690660   1% /var/lib/ceph/osd/ceph-0
/dev/sdf1   234315556643200  233672356   1% /var/lib/ceph/osd/ceph-1
/dev/sdi1   234315556596744  233718812   1% /var/lib/ceph/osd/ceph-2
/dev/sdj1   234315556615868  233699688   1% /var/lib/ceph/osd/ceph-3

/dev/sde1   234315556573496  233742060   1% /var/lib/ceph/osd/ceph-8
/dev/sdf1   234315556570240  233745316   1% /var/lib/ceph/osd/ceph-9
/dev/sdi1   234315556624032  233691524   1%
/var/lib/ceph/osd/ceph-10
/dev/sdj1   234315556627216  233688340   1%
/var/lib/ceph/osd/ceph-11

So we were going from ~ 1 GB to ~ 4 GB. ( of a 11 GB file ).

So 3 GB are copied from the cold pool to cache pool.

So i assume 3 GB had, maybe, served from the cache pool, and the other 8
GB had been served from the cold storage.

According to the docu it says for the writeback mode:

" When a Ceph client needs data that resides in the storage tier, the
cache tiering agent migrates the data to the cache tier on read, then it
is sent to the Ceph client.
"

This is obviously not happening there.

And the question is why.


-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 14.06.2016 um 03:00 schrieb Christian Balzer:
> 
> Hello,
> 
> On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote:
> 
>> Hi Christian,
>>
>> if i read a 1,5 GB file, which is not changing at all.
>>
>> Then i expect the agent to copy it one time from the cold pool to the
>> cache pool.
>>
> Before Jewel, that is what you would have seen, yes.
> 
> Did you read what Sam wrote and me in reply to him?
> 
>> In fact its every time making a new copy.
>>
> Is it?
> Is there 1.5GB of data copied into the cache tier each time?
> An object is 4MB, you only had 8 in your first run, then 16...
> 
>> I can see that by increasing disc usage of the cache and the increasing
>> object number.
>>
>> And the non existing improvement of speed.
>>
> That could be down to your network or other factors on your client.
> 
> Christian
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Wido den Hollander


> Op 14 juni 2016 om 10:10 schreef Ansgar Jazdzewski 
> :
> 
> 
> Hi,
> 
> we are using ceph and radosGW to store images (~300kb each) in S3,
> when in comes to deep-scrubbing we facing task timeouts (> 30s ...)
> 
> my questions is:
> 
> in case of that amount of objects/files is it better to calculate the
> PGs on a object-bases instant of the volume size? and how it should be
> done?
> 

Do you have bucket sharding enabled?

And how many objects do you have in a single bucket?

If sharding is not enabled for the bucket index you might have large RADOS 
objects with bucket indexes which are hard to scrub.

Wido

> thanks
> Ansgar
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] tier pool 'ssdpool' has snapshot state; it cannot be added as a tier without breaking the pool.

2016-06-14 Thread ????

Hi, All


   i have make a sas pool and a ssd pool.
   then run "ceph osd tier add ssdpool saspool", it says:
   tier pool 'ssdpool' has snapshot state; it cannot be added as a tier 
without breaking the pool.
   anyone who had hit the case? what can i do?


   and, "ceph osd pool" has "mksnap" & "rmsnap" but no "list snap" option.
   so, how could i know snap details of a pool?


Regards,
XiuCai___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ubuntu Trusty: kernel 3.13 vs kernel 4.2

2016-06-14 Thread Jan Schermer

One storage setups has exhibited extremely poor performance in my lab on 4.2 
kernel (mdraid1+lvm+nfs), others run fine.
No problems with xenial so far. If I had to choose a LTS kernel for trusty I'd 
choose the xenial one.
(Btw I think newest trusty point release already has the 4.2 HWE stack by 
default, not sure if 3.13 is supported? I usually just upgrade)

Jan

> On 14 Jun 2016, at 09:45, magicb...@hotmail.com wrote:
> 
> Hi list,
> 
> is there any opinion/recommendation regarding the ubuntu trusty available 
> kernels and Ceph(hammer, xfs)?
> Does kernel 4.2 worth installing from Ceph(hammer, xfs) perspective?
> 
> Thanks :)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 40Mil objects in S3 rados pool / how calculate PGs

2016-06-14 Thread Ansgar Jazdzewski

Hi,

we are using ceph and radosGW to store images (~300kb each) in S3,
when in comes to deep-scrubbing we facing task timeouts (> 30s ...)

my questions is:

in case of that amount of objects/files is it better to calculate the
PGs on a object-bases instant of the volume size? and how it should be
done?

thanks
Ansgar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ubuntu Trusty: kernel 3.13 vs kernel 4.2

2016-06-14 Thread magicb...@hotmail.com


Hi list,

is there any opinion/recommendation regarding the ubuntu trusty 
available kernels and Ceph(hammer, xfs)?

Does kernel 4.2 worth installing from Ceph(hammer, xfs) perspective?

Thanks :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] UnboundLocalError: local variable 'region_name' referenced before assignment

2016-06-14 Thread Parveen Sharma

Any help for me as well, please. :)



-
Parveen

On Tue, Jun 14, 2016 at 11:55 AM, Parveen Sharma 
wrote:

> Hi,
>
> I'm getting "UnboundLocalError: local variable 'region_name' referenced
> before assignment"  error while placing an object in my earlier created
> bucket using my RADOSGW with boto.
>
>
> My package details:
>
> $ sudo rpm -qa | grep rados
> librados2-10.2.1-0.el7.x86_64
> libradosstriper1-10.2.1-0.el7.x86_64
> python-rados-10.2.1-0.el7.x86_64
> ceph-radosgw-10.2.1-0.el7.x86_64
> $
>
> $ python
> Python 2.7.10 (default, Oct 23 2015, 19:19:21)
> [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.>>> 
> import sys, boto
> >>> boto.Version
> '2.40.0'>>>
>
>
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1343813  is mentioning a
> workaround but it apparently not working for me or I am missing something.
>
>
> *$ cat ~/.boto *
>
> [Credentials]
>
> aws_access_key_id = AKIAI6KEIQSY2746KS5Q
>
> aws_secret_access_key = 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
>
>
> [s3]
>
> use-sigv4 = True
>
> $
>
> $
>
> *$ cat  s3test_for_placing_object_in_bucket.py*
>
> import boto
>
> import boto.s3.connection
>
>
>
> conn = boto.connect_s3(
>
> host = 'mc2', port = 7480,
>
> is_secure=False, calling_format =
> boto.s3.connection.OrdinaryCallingFormat(),
>
> )
>
> #From
> http://stackoverflow.com/questions/15085864/how-to-upload-a-file-to-directory-in-s3-bucket-using-boto
>
> bucket = conn.get_bucket('my-new-bucket')
>
> key = boto.s3.key.Key(bucket, 'myTestFileIn_my-new-bucket.txt')
>
> with open('myTestFileIn_my-new-bucket.txt') as f:
>
> key.send_file(f)
>
> $
>
> $
>
> *$ python s3test_for_placing_object_in_bucket.py*
>
> Traceback (most recent call last):
>
>   File "s3test_for_placing_object_in_bucket.py", line 12, in 
>
> bucket = conn.get_bucket('my-new-bucket')
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 506, in get_bucket
>
> return self.head_bucket(bucket_name, headers=headers)
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 525, in head_bucket
>
> response = self.make_request('HEAD', bucket_name, headers=headers)
>
>   File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line
> 668, in make_request
>
> retry_handler=retry_handler
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 1071,
> in make_request
>
> retry_handler=retry_handler)
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 927,
> in _mexe
>
> request.authorize(connection=self)
>
>   File "/Library/Python/2.7/site-packages/boto/connection.py", line 377,
> in authorize
>
> connection._auth_handler.add_auth(self, **kwargs)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 722, in
> add_auth
>
> **kwargs)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 542, in
> add_auth
>
> string_to_sign = self.string_to_sign(req, canonical_request)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 482, in
> string_to_sign
>
> sts.append(self.credential_scope(http_request))
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 464, in
> credential_scope
>
> region_name = self.determine_region_name(http_request.host)
>
>   File "/Library/Python/2.7/site-packages/boto/auth.py", line 657, in
> determine_region_name
>
> return region_name
>
> UnboundLocalError: local variable 'region_name' referenced before
> assignment
>
> $
>
>
>
>
> -
>
> Parveen
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange unfounding of PGs

2016-06-14 Thread Csaba Tóth

Yes!
After i read the mail i unsetted it immadiately, and now the recovery
process started to continue.
After i switched back my off keeped OSD ceph founded the unfounded objects,
and now the recovery process runs.

Thanks Nick and Christian, you saved me! :)


Christian Balzer  ezt írta (időpont: 2016. jún. 14., K,
9:24):

> On Tue, 14 Jun 2016 07:09:45 + Csaba Tóth wrote:
>
> > Hi Nick!
> > Yes i did. :(
> > Do you know how can i fix it?
> >
> >
> Supposedly just by un-setting it:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29651.html
>
> Christian
>
> > Nick Fisk  ezt írta (időpont: 2016. jún. 14., K, 7:52):
> >
> > > Did you enable the sortbitwise flag as per the upgrade instructions, as
> > > there is a known bug with it? I don't know why these instructions
> > > haven't been amended in light of this bug.
> > >
> > > http://tracker.ceph.com/issues/16113
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > > > Behalf Of Csaba Tóth
> > > > Sent: 13 June 2016 16:17
> > > > To: ceph-us...@ceph.com
> > > > Subject: [ceph-users] strange unfounding of PGs
> > > >
> > > > Hi!
> > > >
> > > > I have a soo strange problem. At friday night i upgraded my small
> > > > ceph
> > > cluster
> > > > from hammer to jewel. Everything went so well, but the chowning of
> > > > osd datadir took a lot time, so i skipped two osd and do the
> > > > run-as-root
> > > trick.
> > > > Yesterday evening i wanted to fix this, shutted down the first OSD
> > > > and chowned the lib/ceph dir. But when i started it back a lot of
> > > > strange pg
> > > not
> > > > found error happened (this is just a small list):
> > > >
> > > > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and
> > > > apparently lost
> > > > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and
> > > > apparently lost
> > > >
> > > >
> > > > After this error messages i see this ceph health:
> > > > 2016-06-12 23:44:10.498613 7f5941e0f700  0 log_channel(cluster) log
> > > [INF] :
> > > > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5
>

Re: [ceph-users] strange unfounding of PGs

2016-06-14 Thread Christian Balzer

On Tue, 14 Jun 2016 07:09:45 + Csaba Tóth wrote:

> Hi Nick!
> Yes i did. :(
> Do you know how can i fix it?
> 
>
Supposedly just by un-setting it: 
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29651.html

Christian

> Nick Fisk  ezt írta (időpont: 2016. jún. 14., K, 7:52):
> 
> > Did you enable the sortbitwise flag as per the upgrade instructions, as
> > there is a known bug with it? I don't know why these instructions
> > haven't been amended in light of this bug.
> >
> > http://tracker.ceph.com/issues/16113
> >
> >
> >
> > > -Original Message-
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > > Behalf Of Csaba Tóth
> > > Sent: 13 June 2016 16:17
> > > To: ceph-us...@ceph.com
> > > Subject: [ceph-users] strange unfounding of PGs
> > >
> > > Hi!
> > >
> > > I have a soo strange problem. At friday night i upgraded my small
> > > ceph
> > cluster
> > > from hammer to jewel. Everything went so well, but the chowning of
> > > osd datadir took a lot time, so i skipped two osd and do the
> > > run-as-root
> > trick.
> > > Yesterday evening i wanted to fix this, shutted down the first OSD
> > > and chowned the lib/ceph dir. But when i started it back a lot of
> > > strange pg
> > not
> > > found error happened (this is just a small list):
> > >
> > > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and
> > > apparently lost
> > > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and
> > > apparently lost
> > >
> > >
> > > After this error messages i see this ceph health:
> > > 2016-06-12 23:44:10.498613 7f5941e0f700  0 log_channel(cluster) log
> > [INF] :
> > > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5
> > > active+remapped+wait_backfill, 167 active+recovery_wait+degraded, 1
> > > active+remapped, 1 active+recovering+degraded, 13
> > > active+undersized+degraded+remapped+wait_backfill, 595 active+clean;
> > > 795 GB data, 1926 GB used, 5512 GB / 7438 GB avail; 7695 B/s wr, 2
> > > op/s; 24459/3225218 objects degraded (0.758%); 44435/3225218 objects
> > > misplaced (1.378%); 346/1231022 unfound (0.028%)
> > >
> > > Some minutes later it stalled in this state:
> > > 2016-06-13 00:07:32.761265 7f5941e0f700  0 log_channel(cluster) log
> >

Re: [ceph-users] strange unfounding of PGs

2016-06-14 Thread Csaba Tóth

Hi Nick!
Yes i did. :(
Do you know how can i fix it?


Nick Fisk  ezt írta (időpont: 2016. jún. 14., K, 7:52):

> Did you enable the sortbitwise flag as per the upgrade instructions, as
> there is a known bug with it? I don't know why these instructions haven't
> been amended in light of this bug.
>
> http://tracker.ceph.com/issues/16113
>
>
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Csaba Tóth
> > Sent: 13 June 2016 16:17
> > To: ceph-us...@ceph.com
> > Subject: [ceph-users] strange unfounding of PGs
> >
> > Hi!
> >
> > I have a soo strange problem. At friday night i upgraded my small ceph
> cluster
> > from hammer to jewel. Everything went so well, but the chowning of osd
> > datadir took a lot time, so i skipped two osd and do the run-as-root
> trick.
> > Yesterday evening i wanted to fix this, shutted down the first OSD and
> > chowned the lib/ceph dir. But when i started it back a lot of strange pg
> not
> > found error happened (this is just a small list):
> >
> > 2016-06-12 23:43:05.096078 osd.2 [ERR] 5.3d has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.096915 osd.2 [ERR] 5.30 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.097702 osd.2 [ERR] 5.39 has 4 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.100449 osd.2 [ERR] 5.2f has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.104519 osd.2 [ERR] 1.8 has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.106041 osd.2 [ERR] 5.3f has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.107379 osd.2 [ERR] 1.76 has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.107630 osd.2 [ERR] 1.0 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.107661 osd.2 [ERR] 2.14 has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.107722 osd.2 [ERR] 2.3 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.108082 osd.2 [ERR] 5.16 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.108417 osd.2 [ERR] 5.38 has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.108910 osd.2 [ERR] 1.43 has 3 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.109561 osd.2 [ERR] 1.a has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.110299 osd.2 [ERR] 1.10 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.111781 osd.2 [ERR] 1.22 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.111869 osd.2 [ERR] 1.1a has 3 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.205688 osd.4 [ERR] 1.29 has 2 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.206016 osd.4 [ERR] 1.1c has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.206219 osd.4 [ERR] 5.24 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.209013 osd.4 [ERR] 1.6a has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.209421 osd.4 [ERR] 1.68 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.209597 osd.4 [ERR] 5.d has 3 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.209620 osd.4 [ERR] 1.9 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.210191 osd.4 [ERR] 5.62 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.210649 osd.4 [ERR] 2.57 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.212011 osd.4 [ERR] 1.6 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.212106 osd.4 [ERR] 2.b has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.212212 osd.4 [ERR] 5.8 has 1 objects unfound and
> > apparently lost
> > 2016-06-12 23:43:05.215850 osd.4 [ERR] 2.56 has 2 objects unfound and
> > apparently lost
> >
> >
> > After this error messages i see this ceph health:
> > 2016-06-12 23:44:10.498613 7f5941e0f700  0 log_channel(cluster) log
> [INF] :
> > pgmap v23122505: 820 pgs: 1 peering, 37 active+degraded, 5
> > active+remapped+wait_backfill, 167 active+recovery_wait+degraded, 1
> > active+remapped, 1 active+recovering+degraded, 13
> > active+undersized+degraded+remapped+wait_backfill, 595 active+clean; 795
> > GB data, 1926 GB used, 5512 GB / 7438 GB avail; 7695 B/s wr, 2 op/s;
> > 24459/3225218 objects degraded (0.758%); 44435/3225218 objects misplaced
> > (1.378%); 346/1231022 unfound (0.028%)
> >
> > Some minutes later it stalled in this state:
> > 2016-06-13 00:07:32.761265 7f5941e0f700  0 log_channel(cluster) log
> [INF] :
> > pgmap v23123311: 820 pgs: 1
> > active+recovery_wait+undersized+degraded+remapped, 1
> > active+recovering+degraded, 11
> > active+undersized+degraded+remapped+wait_backfill, 5
> > active+remapped+wait_backfill, 207 active+recovery_wait+degraded, 595
> > active+clean; 795 GB data, 1878 GB used, 5559 GB / 7438 GB avail; 14164
> B/s
> > wr, 3 op/s; 22562/3223912 objects degraded (0.700%); 3873

Re: [ceph-users] strange cache tier behaviour with cephfs

2016-06-14 Thread Christian Balzer


Hello,

On Tue, 14 Jun 2016 06:47:03 +0100 Nick Fisk wrote:

> osd_tier_promote_max_objects_sec
> and
> osd_tier_promote_max_bytes_sec
> 
Right, I remember those from February and May. 

And I'm not asking for this feature, but personally I would have split
that in read and write promotes. 
As in, throttle promotes done to satisfy reads, but not for writes (as that
will benefit from the faster pool a lot more).

> is what you are looking for, I think by default its set to 5MB/s, which
> would roughly correlate to why you are only seeing around 8 objects each
> time being promoted. This was done like this as too many promotions hurt
> performance, so you don't actually want to promote on every IO.
> 
Well, I do, but yeah.

Obviously the defaults were picked to be on the safe side of things,
though anybody running a cache tier worth its salt will be able to handle
more than 5MB/s.

But never mind that, since these parameters are not documented on the
cache-tiering documentation page new users like Oliver will get unexpected
results.
And existing cache-tier users will be rudely surprised, as this isn't
mentioned in the changelog either...

Christian

> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Christian Balzer
> > Sent: 14 June 2016 02:00
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] strange cache tier behaviour with cephfs
> > 
> > 
> > Hello,
> > 
> > On Tue, 14 Jun 2016 02:52:43 +0200 Oliver Dzombic wrote:
> > 
> > > Hi Christian,
> > >
> > > if i read a 1,5 GB file, which is not changing at all.
> > >
> > > Then i expect the agent to copy it one time from the cold pool to the
> > > cache pool.
> > >
> > Before Jewel, that is what you would have seen, yes.
> > 
> > Did you read what Sam wrote and me in reply to him?
> > 
> > > In fact its every time making a new copy.
> > >
> > Is it?
> > Is there 1.5GB of data copied into the cache tier each time?
> > An object is 4MB, you only had 8 in your first run, then 16...
> > 
> > > I can see that by increasing disc usage of the cache and the
> > > increasing object number.
> > >
> > > And the non existing improvement of speed.
> > >
> > That could be down to your network or other factors on your client.
> > 
> > Christian
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

67 matches

Mail list logo