Re: [ceph-users] List of SSDs

2016-02-24 Thread Shinobu Kinjo
Thanks, Robert for your more specific explanation.

Rgds,
Shinobu

- Original Message -
From: "Robert LeBlanc" 
To: "Shinobu Kinjo" 
Cc: "ceph-users" 
Sent: Thursday, February 25, 2016 2:56:15 PM
Subject: Re: [ceph-users] List of SSDs

We are moving to the Intel S3610, from our testing it is a good balance
between price, performance and longevity. But as with all things, do your
testing ahead of time. This will be our third model of SSDs for our
cluster. The S3500s didn't have enough life and performance tapers off add
it gets full. The Micron M600s looked good with the Sebastian journal
tests, but once in use for a while go downhill pretty bad. We also tested
Micron M500dc drives and they were on par with the S3610s and are more
expensive and are closer to EoL. The S3700s didn't have quite the same
performance as the S3610s, but they will last forever and are very stable
in terms of performance and have the best power loss protection.

Short answer is test them for yourself to make sure they will work. You are
pretty safe with the Intel S3xxx drives. The Micron M500dc is also pretty
safe based on my experience. It had also been mentioned that someone has
had good experience with a Samsung DC Pro (has to have both DC and Pro in
the name), but we weren't able to get any quick enough to test so I can't
vouch for them.

Sent from a mobile device, please excuse any typos.
On Feb 24, 2016 6:37 PM, "Shinobu Kinjo"  wrote:

> Hello,
>
> There has been a bunch of discussion about using SSD.
> Does anyone have any list of SSDs describing which SSD is highly
> recommended, which SSD is not.
>
> Rgds,
> Shinobu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-24 Thread Robert LeBlanc
With my S3500 drives in my test cluster, the latest master branch gave me
an almost 2x increase in performance compare to just a month or two ago.
There looks to be some really nice things coming in Jewel around SSD
performance. My drives are now 80-85% busy doing about 10-12K IOPS when
doing 4K fio to libRBD.

Sent from a mobile device, please excuse any typos.
On Feb 24, 2016 8:10 PM, "Christian Balzer"  wrote:

>
> Hello,
>
> For posterity and of course to ask some questions, here are my experiences
> with a pure SSD pool.
>
> SW: Debian Jessie, Ceph Hammer 0.94.5.
>
> HW:
> 2 nodes (thus replication of 2) with each:
> 2x E5-2623 CPUs
> 64GB RAM
> 4x DC S3610 800GB SSDs
> Infiniband (IPoIB) network
>
> Ceph: no tuning or significant/relevant config changes, OSD FS is Ext4,
> Ceph journal is inline (journal file).
>
> Performance:
> A test run with "rados -p cache  bench 30 write -t 32" (4MB blocks) gives
> me about 620MB/s, the storage nodes are I/O bound (all SSDs are 100% busy
> according to atop) and this meshes nicely with the speeds I saw when
> testing the individual SSDs with fio before involving Ceph.
>
> To elaborate on that, an individual SSD of that type can do about 500MB/s
> sequential writes, so ideally you would see 1GB/s writes with Ceph
> (500*8/2(replication)/2(journal on same disk).
> However my experience tells me that other activities (FS journals, leveldb
> PG updates, etc) impact things as well.
>
> A test run with "rados -p cache  bench 30 write -t 32 -b 4096" (4KB
> blocks) gives me about 7200 IOPS, the SSDs are about 40% busy.
> All OSD processes are using about 2 cores and the OS another 2, but that
> leaves about 6 cores unused (MHz on all cores scales to max during the
> test run).
> Closer inspection with all CPUs being displayed in atop shows that no
> single core is fully used, they all average around 40% and even the
> busiest ones (handling IRQs) still have ample capacity available.
> I'm wondering if this an indication of insufficient parallelism or if it's
> latency of sorts.
> I'm aware of the many tuning settings for SSD based OSDs, however I was
> expecting to run into a CPU wall first and foremost.
>
>
> Write amplification:
> 10 second rados bench with 4MB blocks, 6348MB written in total.
> nand-writes per SSD:118*32MB=3776MB.
> 30208MB total written to all SSDs.
> Amplification:4.75
>
> Very close to what you would expect with a replication of 2 and journal on
> same disk.
>
>
> 10 second rados bench with 4KB blocks, 219MB written in total.
> nand-writes per SSD:41*32MB=1312MB.
> 10496MB total written to all SSDs.
> Amplification:48!!!
>
> Le ouch.
> In my use case with rbd cache on all VMs I expect writes to be rather
> large for the most part and not like this extreme example.
> But as I wrote the last time I did this kind of testing, this is an area
> where caveat emptor most definitely applies when planning and buying SSDs.
> And where the Ceph code could probably do with some attention.
>
> Regards,
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] List of SSDs

2016-02-24 Thread Robert LeBlanc
We are moving to the Intel S3610, from our testing it is a good balance
between price, performance and longevity. But as with all things, do your
testing ahead of time. This will be our third model of SSDs for our
cluster. The S3500s didn't have enough life and performance tapers off add
it gets full. The Micron M600s looked good with the Sebastian journal
tests, but once in use for a while go downhill pretty bad. We also tested
Micron M500dc drives and they were on par with the S3610s and are more
expensive and are closer to EoL. The S3700s didn't have quite the same
performance as the S3610s, but they will last forever and are very stable
in terms of performance and have the best power loss protection.

Short answer is test them for yourself to make sure they will work. You are
pretty safe with the Intel S3xxx drives. The Micron M500dc is also pretty
safe based on my experience. It had also been mentioned that someone has
had good experience with a Samsung DC Pro (has to have both DC and Pro in
the name), but we weren't able to get any quick enough to test so I can't
vouch for them.

Sent from a mobile device, please excuse any typos.
On Feb 24, 2016 6:37 PM, "Shinobu Kinjo"  wrote:

> Hello,
>
> There has been a bunch of discussion about using SSD.
> Does anyone have any list of SSDs describing which SSD is highly
> recommended, which SSD is not.
>
> Rgds,
> Shinobu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Erasure code Plugins

2016-02-24 Thread Sharath Gururaj
Try using more OSDs.
I was encountering this scenario when my osds were equal to k+m
The errors went away when I used k+m+2
So in your case try with 8 or 10 osds.

On Thu, Feb 25, 2016 at 11:18 AM, Daleep Singh Bais 
wrote:

> hi All,
>
> Any help in this regard will be appreciated.
>
> Thanks..
> Daleep Singh Bais
>
>
>  Forwarded Message 
> Subject: Erasure code Plugins
> Date: Fri, 19 Feb 2016 12:13:36 +0530
> From: Daleep Singh Bais  
> To: ceph-users  
>
> Hi All,
>
> I am experimenting with erasure profiles and would like to understand more
> about them. I created an LRC profile based on *
> http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/
> *
>
> The LRC profile created by me is
>
> *ceph osd erasure-code-profile get lrctest1*
> k=2
> l=2
> m=2
> plugin=lrc
> ruleset-failure-domain=host
> ruleset-locality=host
> ruleset-root=default
>
> However, when I create a pool based on this profile, I see a health
> warning in ceph -w ( 128 pgs stuck inactive and 128 pgs stuck unclean).
> This is the first pool in cluster.
>
> As i understand, m is parity bit and l will create additional parity bit
> for data bit k. Please correct me if I am wrong.
>
> Below is output of ceph -w
>
> health HEALTH_WARN
> *128 pgs stuck inactive*
> *128 pgs stuck unclean*
>  monmap e7: 1 mons at {node1=192.168.1.111:6789/0}
> election epoch 101, quorum 0 node1
>  osdmap e928: *6 osds: 6 up, 6 in*
> flags sortbitwise
>   pgmap v54114: 128 pgs, 1 pools, 0 bytes data, 0 objects
> 10182 MB used, 5567 GB / 5589 GB avail
>  *128 creating*
>
>
> Any help or guidance in this regard is highly appreciated.
>
> Thanks,
>
> Daleep Singh Bais
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Erasure code Plugins

2016-02-24 Thread Daleep Singh Bais
hi All,

Any help in this regard will be appreciated.

Thanks..
Daleep Singh Bais


 Forwarded Message 
Subject:Erasure code Plugins
Date:   Fri, 19 Feb 2016 12:13:36 +0530
From:   Daleep Singh Bais 
To: ceph-users 



Hi All,

I am experimenting with erasure profiles and would like to understand
more about them. I created an LRC profile based on
*http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/*

The LRC profile created by me is

*ceph osd erasure-code-profile get lrctest1*
k=2
l=2
m=2
plugin=lrc
ruleset-failure-domain=host
ruleset-locality=host
ruleset-root=default

However, when I create a pool based on this profile, I see a health
warning in ceph -w ( 128 pgs stuck inactive and 128 pgs stuck unclean).
This is the first pool in cluster.

As i understand, m is parity bit and l will create additional parity bit
for data bit k. Please correct me if I am wrong.

Below is output of ceph -w

health HEALTH_WARN
*128 pgs stuck inactive**
**128 pgs stuck unclean*
 monmap e7: 1 mons at {node1=192.168.1.111:6789/0}
election epoch 101, quorum 0 node1
 osdmap e928: *6 osds: 6 up, 6 in*
flags sortbitwise
  pgmap v54114: 128 pgs, 1 pools, 0 bytes data, 0 objects
10182 MB used, 5567 GB / 5589 GB avail
 *128 creating*


Any help or guidance in this regard is highly appreciated.

Thanks,

Daleep Singh Bais


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread Robert LeBlanc
We have not seen this issue, but we don't run EC pools yet (we are waiting
for multiple layers to be available). We are not running 0.94.6 in
production yet either. We have adopted the policy to only run released
versions in production unless there is a really pressing need to have a
patch. We are running 0.94.6 through our alpha and staging clusters and
hoping to do the upgrade in the next couple of weeks. We won't know how
much the recency fix will help until then because we have not been able to
replicate our workload with fio accurately enough to get good test results.
Unfortunately we will probably be swapping out our M600s with S3610s. We've
burned through 30% of the life in 2 months and they have 8x the op latency.
Due to the 10 Minutes of Terror, we are going to have to do both at the
same time to reduce the impact. Luckily, when you have weighted out OSDs or
empty ones, it is much less impactful. If you get your upgrade done before
ours, I'd like to know how it went. I'll be posting the results from ours
when it is done.

Sent from a mobile device, please excuse any typos.
On Feb 24, 2016 5:43 PM, "Christian Balzer"  wrote:

>
> Hello Jason (Ceph devs et al),
>
> On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:
>
> > If you run "rados -p  ls | grep "rbd_id." and
> > don't see that object, you are experiencing that issue [1].
> >
> > You can attempt to work around this issue by running "rados -p irfu-virt
> > setomapval rbd_id. dummy value" to force-promote the object
> > to the cache pool.  I haven't tested / verified that will alleviate the
> > issue, though.
> >
> > [1] http://tracker.ceph.com/issues/14762
> >
>
> This concerns me greatly, as I'm about to phase in a cache tier this
> weekend into a very busy, VERY mission critical Ceph cluster.
> That is on top of a replicated pool, Hammer.
>
> That issue and the related git blurb are less than crystal clear, so for
> my and everybody else's benefit could you elaborate a bit more on this?
>
> 1. Does this only affect EC base pools?
> 2. Is this a regressions of sorts and when came it about?
>I have a hard time imagining people not running into this earlier,
>unless that problem is very hard to trigger.
> 3. One assumes that this isn't fixed in any released version of Ceph,
>correct?
>
> Robert, sorry for CC'ing you, but AFAICT your cluster is about the closest
> approximation in terms of busyness to mine here.
> And I a assume that you're neither using EC pools (since you need
> performance, not space) and haven't experienced this bug all?
>
> Also, would you consider the benefits of the recency fix (thanks for
> that) being worth risk of being an early adopter of 0.94.6?
> In other words, are you eating your own dog food already and 0.94.6 hasn't
> eaten your data babies yet? ^o^
>
> Regards,
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Observations with a SSD based pool under Hammer

2016-02-24 Thread Christian Balzer

Hello, 

For posterity and of course to ask some questions, here are my experiences
with a pure SSD pool.

SW: Debian Jessie, Ceph Hammer 0.94.5.

HW:
2 nodes (thus replication of 2) with each: 
2x E5-2623 CPUs
64GB RAM
4x DC S3610 800GB SSDs
Infiniband (IPoIB) network

Ceph: no tuning or significant/relevant config changes, OSD FS is Ext4,
Ceph journal is inline (journal file).

Performance:
A test run with "rados -p cache  bench 30 write -t 32" (4MB blocks) gives
me about 620MB/s, the storage nodes are I/O bound (all SSDs are 100% busy
according to atop) and this meshes nicely with the speeds I saw when
testing the individual SSDs with fio before involving Ceph.

To elaborate on that, an individual SSD of that type can do about 500MB/s
sequential writes, so ideally you would see 1GB/s writes with Ceph
(500*8/2(replication)/2(journal on same disk).
However my experience tells me that other activities (FS journals, leveldb
PG updates, etc) impact things as well.

A test run with "rados -p cache  bench 30 write -t 32 -b 4096" (4KB
blocks) gives me about 7200 IOPS, the SSDs are about 40% busy.
All OSD processes are using about 2 cores and the OS another 2, but that
leaves about 6 cores unused (MHz on all cores scales to max during the
test run). 
Closer inspection with all CPUs being displayed in atop shows that no
single core is fully used, they all average around 40% and even the
busiest ones (handling IRQs) still have ample capacity available.
I'm wondering if this an indication of insufficient parallelism or if it's
latency of sorts.
I'm aware of the many tuning settings for SSD based OSDs, however I was
expecting to run into a CPU wall first and foremost.


Write amplification:
10 second rados bench with 4MB blocks, 6348MB written in total. 
nand-writes per SSD:118*32MB=3776MB. 
30208MB total written to all SSDs.
Amplification:4.75

Very close to what you would expect with a replication of 2 and journal on
same disk.


10 second rados bench with 4KB blocks, 219MB written in total. 
nand-writes per SSD:41*32MB=1312MB. 
10496MB total written to all SSDs.
Amplification:48!!!

Le ouch. 
In my use case with rbd cache on all VMs I expect writes to be rather
large for the most part and not like this extreme example. 
But as I wrote the last time I did this kind of testing, this is an area
where caveat emptor most definitely applies when planning and buying SSDs.
And where the Ceph code could probably do with some attention.
 
Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-maintainers] download.ceph.com has AAAA record that points to unavailable address

2016-02-24 Thread Dan Mick
Yes.  download.ceph.com does not currently support IPv6 access.

On 02/14/2016 11:53 PM, Artem Fokin wrote:
> Hi
> 
> It seems like download.ceph.com has some outdated IPv6 address
> 
> ~ curl -v -s download.ceph.com > /dev/null
> * About to connect() to download.ceph.com port 80 (#0)
> *   Trying 2607:f298:6050:51f3:f816:3eff:fe50:5ec... Connection refused
> *   Trying 173.236.253.173... connected
> 
> 
> 
> ~ dig  download.ceph.com | grep 
> ; <<>> DiG 9.8.1-P1 <<>>  download.ceph.com
> ;download.ceph.com.IN
> download.ceph.com.286IN
> 2607:f298:6050:51f3:f816:3eff:fe50:5ec
> 
> If this is the wrong mailing list, please refer to the correct one.
> 
> Thanks!
> ___
> Ceph-maintainers mailing list
> ceph-maintain...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-maintainers-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] List of SSDs

2016-02-24 Thread Shinobu Kinjo
Thanks for the pointer.
That's perfect atm. 

Rgds,
Shinobu

- Original Message -
From: "Christian Balzer" 
To: "ceph-users" 
Cc: "Shinobu Kinjo" 
Sent: Thursday, February 25, 2016 10:49:02 AM
Subject: Re: [ceph-users] List of SSDs

On Wed, 24 Feb 2016 20:37:07 -0500 (EST) Shinobu Kinjo wrote:

> Hello,
> 
> There has been a bunch of discussion about using SSD.
> Does anyone have any list of SSDs describing which SSD is highly
> recommended, which SSD is not.
> 
The answer to that is of course in all those threads and the "reference"
link given in most if not all of them:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

That being said, the only SSDs that never ever gave me any issues like
timeouts, task aborts (use your google foo) are Intel DC S3700s.

That's aside from the point above if they're suitable for Ceph or not. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] List of SSDs

2016-02-24 Thread Christian Balzer
On Wed, 24 Feb 2016 20:37:07 -0500 (EST) Shinobu Kinjo wrote:

> Hello,
> 
> There has been a bunch of discussion about using SSD.
> Does anyone have any list of SSDs describing which SSD is highly
> recommended, which SSD is not.
> 
The answer to that is of course in all those threads and the "reference"
link given in most if not all of them:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

That being said, the only SSDs that never ever gave me any issues like
timeouts, task aborts (use your google foo) are Intel DC S3700s.

That's aside from the point above if they're suitable for Ceph or not. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-02-24 Thread Ben Hines
Any idea what is going on here? I get these intermittently, especially with
very large file.

The client is doing RANGE requests on this >51 GB file, incrementally
fetching later chunks.

2016-02-24 16:30:59.669561 7fd33b7fe700  1 == starting new request
req=0x7fd32c0879c0 =
2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for
trans_id = tx00037ad24-0056ce4b43-259914b-default
2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=
2016-02-24 16:30:59.669757 7fd33b7fe700 10
s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
s->bucket=
2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing
2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading
permissions
2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
50346000384
2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op
2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op mask
2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op permissions
2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
uid=anonymous mask=49
2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not found
2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for
group=1 mask=49
2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for
group=2 mask=49
2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not found
2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions id=anonymous
owner= perm=1
2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested perm
(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op params
2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing
2016-02-24 16:30:59.674107 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674193 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674317 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674433 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.046110 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.150966 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.151118 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.161000 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.199553 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.278308 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.312306 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.751626 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.833570 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.871774 7fd33b7fe700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5
2016-02-24 16:31:00.872480 7fd33b7fe700  0 WARNING: set_req_state_err
err_no=5 resorting to 500
2016-02-24 16:31:00.872561 7

[ceph-users] List of SSDs

2016-02-24 Thread Shinobu Kinjo
Hello,

There has been a bunch of discussion about using SSD.
Does anyone have any list of SSDs describing which SSD is highly recommended, 
which SSD is not.

Rgds,
Shinobu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread Jason Dillaman
I'll speak to what I can answer off the top of my head.  The most important 
point is that this issue is only related to EC pool base tiers, not replicated 
pools.

> Hello Jason (Ceph devs et al),
> 
> On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:
> 
> > If you run "rados -p  ls | grep "rbd_id." and
> > don't see that object, you are experiencing that issue [1].
> > 
> > You can attempt to work around this issue by running "rados -p irfu-virt
> > setomapval rbd_id. dummy value" to force-promote the object
> > to the cache pool.  I haven't tested / verified that will alleviate the
> > issue, though.
> > 
> > [1] http://tracker.ceph.com/issues/14762
> > 
> 
> This concerns me greatly, as I'm about to phase in a cache tier this
> weekend into a very busy, VERY mission critical Ceph cluster.
> That is on top of a replicated pool, Hammer.
> 
> That issue and the related git blurb are less than crystal clear, so for
> my and everybody else's benefit could you elaborate a bit more on this?
> 
> 1. Does this only affect EC base pools?

Correct -- this is only an issue because EC pools do not directly support 
several operations required by RBD.  Placing a replicated cache tier in front 
of an EC pool was, in effect, a work-around to this limitation.

> 2. Is this a regressions of sorts and when came it about?
>I have a hard time imagining people not running into this earlier,
>unless that problem is very hard to trigger.
> 3. One assumes that this isn't fixed in any released version of Ceph,
>correct?
> 
> Robert, sorry for CC'ing you, but AFAICT your cluster is about the closest
> approximation in terms of busyness to mine here.
> And I a assume that you're neither using EC pools (since you need
> performance, not space) and haven't experienced this bug all?
> 
> Also, would you consider the benefits of the recency fix (thanks for
> that) being worth risk of being an early adopter of 0.94.6?
> In other words, are you eating your own dog food already and 0.94.6 hasn't
> eaten your data babies yet? ^o^

Per the referenced email chain, it was potentially the recency fix that exposed 
this issue for EC pools fronted by a cache tier. 

> 
> Regards,
> 
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> 

-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread Christian Balzer

Hello Jason (Ceph devs et al),

On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:

> If you run "rados -p  ls | grep "rbd_id." and
> don't see that object, you are experiencing that issue [1].
> 
> You can attempt to work around this issue by running "rados -p irfu-virt
> setomapval rbd_id. dummy value" to force-promote the object
> to the cache pool.  I haven't tested / verified that will alleviate the
> issue, though.
> 
> [1] http://tracker.ceph.com/issues/14762
> 

This concerns me greatly, as I'm about to phase in a cache tier this
weekend into a very busy, VERY mission critical Ceph cluster.
That is on top of a replicated pool, Hammer.

That issue and the related git blurb are less than crystal clear, so for
my and everybody else's benefit could you elaborate a bit more on this?

1. Does this only affect EC base pools?
2. Is this a regressions of sorts and when came it about? 
   I have a hard time imagining people not running into this earlier,
   unless that problem is very hard to trigger.
3. One assumes that this isn't fixed in any released version of Ceph,
   correct?

Robert, sorry for CC'ing you, but AFAICT your cluster is about the closest
approximation in terms of busyness to mine here.
And I a assume that you're neither using EC pools (since you need
performance, not space) and haven't experienced this bug all?

Also, would you consider the benefits of the recency fix (thanks for
that) being worth risk of being an early adopter of 0.94.6?
In other words, are you eating your own dog food already and 0.94.6 hasn't
eaten your data babies yet? ^o^

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can not disable rbd cache

2016-02-24 Thread Christian Balzer

Hello,

On Wed, 24 Feb 2016 16:59:33 -0700 Robert LeBlanc wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Let's start from the top. Where are you stuck with [1]? I have noticed
> that after evicting all the objects with RBD that one object for each
> active RBD is still left, I think this is the head object. 
Precisely.
That came up in my extensive tests as well.

> We haven't
> tried this, but our planned procedure for finishing the deactivation
> of a cache tier is to shut down the active VM, then flush again and
> then start the VM again. Once all VMs have been stopped, flushed and
> restarted, we should be able to remove the cache tier. That way we
> don't have to stop all the VMs at once or for long periods of time. 
Yup, I tested this as well and it works. 
Bit of a pain if you're in a hurry, but otherwise the way forward for now.

> I
> hope at some point the last object can be flushed without shutting
> down the VM.
> 
Very much so, I consider this a bug, both in terms of functionality and
from a documentation point of view.

Christian

> If you are experiencing something different, please provide some more
> info, especially more detailed steps of what you tried.
> 
> [1]
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/?highlight=cache#removing-a-cache-tier
> -BEGIN PGP SIGNATURE- Version: Mailvelope v1.3.6
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWzkPgCRDmVDuy+mK58QAABrEQAIFAxEEmKroSKqqGluFE
> aCwvTTxye5IfIBjmVoreFZy+/r5B5D+aMBUFArANCk/A9V678mb/24MkCggT
> 8Ehb0eBVbkWxptUfexXfSuXvFqTGWA5BDnVTzT9rJ5liTQinXbhDCuJcVCDb
> hcHmNRnUrituZoDfivwp9ZMpe/ZqsQsIN06NyVhLyPWtA1/Ji06v1WwVkEKe
> b6FkS4J4C6RdmgBi1+QNntcgLjgWi5CXNBrPwhyvRMHYyjGFGUJQ87S7mQJL
> 4bBSs5e/bBraMBZlv59DgRjmvlGuBQHlSiqSsy3BKsHErKzjxYsh06fNTAZe
> TJ6bVPsa+vUKprRdWtUIaxqbY6vAXytwpswL57zgvD4PuPAFD80Wz9AK0mgz
> ypoUacAocRu+rIZ2NgEt4Xr6+K3pJ2wRT2Fs+xMmKt2uoH7XyccU+7kIrEhy
> CD4AZfCXlOgA5LWYPFpBXC9087OygNZ7907klCG2QMn5Qh15W/MiylU0ECF8
> n3kNm4qEO4ICl5MiAXfaw2yaFa7Hht6N+oyDBRUI93Oj9I7pFA4uCrPhuPNt
> oRgNN9nTwBdVqUICvWJxOsb0AHuJoVIZbLbJ5dNKpcxehrO9aC9Ursa5/Wqt
> BGljYMYyg1QNf/CbAhZTpT+H4NQLPbN4D0muCchVKe7gekvj6u6vKjWwEiWR
> cl7D
> =U5aZ
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Feb 24, 2016 at 4:29 AM, Oliver Dzombic 
> wrote:
> > Hi Esta,
> >
> > how do you know, that its still active ?
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 24.02.2016 um 12:27 schrieb wikison:
> >> Hi,
> >> I want to disable rbd cache in my ceph cluster. I've set the *rbd
> >> cache* to be false in the [client] section of ceph.conf and rebooted
> >> the cluster. But caching system was still working. How can I disable
> >> the rbd caching system? Any help?
> >>
> >> best regards.
> >>
> >> 2016-02-24
> >> 
> >> Esta Wang
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can not disable rbd cache

2016-02-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Let's start from the top. Where are you stuck with [1]? I have noticed
that after evicting all the objects with RBD that one object for each
active RBD is still left, I think this is the head object. We haven't
tried this, but our planned procedure for finishing the deactivation
of a cache tier is to shut down the active VM, then flush again and
then start the VM again. Once all VMs have been stopped, flushed and
restarted, we should be able to remove the cache tier. That way we
don't have to stop all the VMs at once or for long periods of time. I
hope at some point the last object can be flushed without shutting
down the VM.

If you are experiencing something different, please provide some more
info, especially more detailed steps of what you tried.

[1] 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/?highlight=cache#removing-a-cache-tier
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.6
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWzkPgCRDmVDuy+mK58QAABrEQAIFAxEEmKroSKqqGluFE
aCwvTTxye5IfIBjmVoreFZy+/r5B5D+aMBUFArANCk/A9V678mb/24MkCggT
8Ehb0eBVbkWxptUfexXfSuXvFqTGWA5BDnVTzT9rJ5liTQinXbhDCuJcVCDb
hcHmNRnUrituZoDfivwp9ZMpe/ZqsQsIN06NyVhLyPWtA1/Ji06v1WwVkEKe
b6FkS4J4C6RdmgBi1+QNntcgLjgWi5CXNBrPwhyvRMHYyjGFGUJQ87S7mQJL
4bBSs5e/bBraMBZlv59DgRjmvlGuBQHlSiqSsy3BKsHErKzjxYsh06fNTAZe
TJ6bVPsa+vUKprRdWtUIaxqbY6vAXytwpswL57zgvD4PuPAFD80Wz9AK0mgz
ypoUacAocRu+rIZ2NgEt4Xr6+K3pJ2wRT2Fs+xMmKt2uoH7XyccU+7kIrEhy
CD4AZfCXlOgA5LWYPFpBXC9087OygNZ7907klCG2QMn5Qh15W/MiylU0ECF8
n3kNm4qEO4ICl5MiAXfaw2yaFa7Hht6N+oyDBRUI93Oj9I7pFA4uCrPhuPNt
oRgNN9nTwBdVqUICvWJxOsb0AHuJoVIZbLbJ5dNKpcxehrO9aC9Ursa5/Wqt
BGljYMYyg1QNf/CbAhZTpT+H4NQLPbN4D0muCchVKe7gekvj6u6vKjWwEiWR
cl7D
=U5aZ
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 24, 2016 at 4:29 AM, Oliver Dzombic  wrote:
> Hi Esta,
>
> how do you know, that its still active ?
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 24.02.2016 um 12:27 schrieb wikison:
>> Hi,
>> I want to disable rbd cache in my ceph cluster. I've set the *rbd cache*
>> to be false in the [client] section of ceph.conf and rebooted the
>> cluster. But caching system was still working. How can I disable the rbd
>> caching system? Any help?
>>
>> best regards.
>>
>> 2016-02-24
>> 
>> Esta Wang
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush map customization for production use

2016-02-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I think I saw someone say that they has issues with "step take" when
it was not a "root" node. Otherwise it looks good to me. The "step
chooseleaf firstn 0 type chassis" says to pick one OSD from different
chassis where 0 says to take as many is the replication factor. Since
an OSD can only be in one server, they you are accomplishing what you
want.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.6
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWzkEECRDmVDuy+mK58QAAPtgQAJc4AKklznB6tQBUOaF9
nRu1C7+CJMpdhWZLiJW96OwTCIQ4CDv0f86/W0tOEoMa5Swqk0kWj4CEaej3
65/MgHsk3BhW6qwKmOicI/y+bALPDuBXRTEUm97tuKjhVC19vpEsOqQhd7Ux
TlqCoQuf+yBjr5sOGj/NYRC6NKCVjmP6k3kth1INyvDPfjmK2h0VUuUB/AGo
2sWPdYG0Ki3I5JjtO3Ja5yjsYWMbDNZq1hgEFCfhEmsQSzbCmzRvIPRfbAiv
DmrRo7qy9M86tJKuucBuiUD0k4HmIEVR8b1f42w9Kfhc7FyhtkszyvCpo7cl
8yuAqgfQ5bgzRyHtPmvBCqxxNesca9T7jlLxn+Q6Wco2fwGbYvwb4HcF2v+I
+FAZQEOLZ1h4gxhsZ5j6IgSIwwoxlswc0G4DL1PIYwmWaqUBH3OUQjZg4tL5
eN1/X2fl7vgEdVO3fh+sm8+HfDLkEwL67GDxPm09RraSCpT/jyX+cjLWav0z
qbTT6GxG74YiZIgQ0/s95GUvYJem+W7XgfSuf7P5Hpk4ooKcI/H3H6WUjaby
kIvNvdgK2+DFcfRisE0WObKESQO/9tVojpEp9zkEH6OAv3cNvdCcGaHRiFDl
7cD0IpScVkSFHVn4MfOeB4Z+qw9ow9SwGB75BYm98axxsRdNlPNiQzxRcb5z
Tdal
=iMwX
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 24, 2016 at 4:09 AM, Vickey Singh
 wrote:
> Hello Geeks
>
> Can someone please review and comment on my custom crush maps. I would
> really appreciate your help
>
>
> My setup :  1 Rack , 4 chassis , 3 storage nodes each chassis ( so total 12
> storage nodes ) , pool size = 3
>
> What i want to achieve is:
> - Survive chassis failures , even if i loose 2 complete chassis (containing
> 3 nodes each) , data should not be lost
> - The crush ruleset should store each copy on a unique chassis and host
>
> For example :
> copy 1 ---> c1-node1
> copy 2 ---> c2-node3
> copy 3 ---> c4-node2
>
>
>
> Here is my crushmap
> =
>
> chassis block_storage_chassis_4 {
> id -17 # do not change unnecessarily
> # weight weight 163.350
> alg straw
> hash 0 # rjenkins1
> item c4-node1 weight 54.450
> item c4-node2 weight 54.450
> item c4-node3 weight 54.450
>
> }
>
> chassis block_storage_chassis_3 {
> id -16 # do not change unnecessarily
> # weight weight 163.350
> alg straw
> hash 0 # rjenkins1
> item c3-node1 weight 54.450
> item c3-node2 weight 54.450
> item c3-node3 weight 54.450
>
> }
>
> chassis block_storage_chassis_2 {
> id -15 # do not change unnecessarily
> # weight weight 163.350
> alg straw
> hash 0 # rjenkins1
> item c2-node1 weight 54.450
> item c2-node2 weight 54.450
> item c3-node3 weight 54.450
>
> }
>
> chassis block_storage_chassis_1 {
> id -14 # do not change unnecessarily
> # weight 163.350
> alg straw
> hash 0 # rjenkins1
> item c1-node1 weight 54.450
> item c1-node2 weight 54.450
> item c1-node3 weight 54.450
>
> }
>
> rack block_storage_rack_1 {
> id -10 # do not change unnecessarily
> # weight 174.240
> alg straw
> hash 0 # rjenkins1
> item block_storage_chassis_1 weight 163.350
> item block_storage_chassis_2 weight 163.350
> item block_storage_chassis_3 weight 163.350
> item block_storage_chassis_4 weight 163.350
>
> }
>
> class block_storage {
> id -6 # do not change unnecessarily
> # weight 210.540
> alg straw
> hash 0 # rjenkins1
> item block_storage_rack_1 weight 656.400
> }
>
> rule ruleset_block_storage {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take block_storage
> step chooseleaf firstn 0 type chassis
> step emit
> }
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Event Calendar Update

2016-02-24 Thread Patrick McGarry
Hey cephers,

Just wanted to update you all on some of the upcoming important dates
in the Ceph community. We have a lot going on in the near future, so I
figured it would be good to get it all in one place:


25 Feb - 1p EST - Ceph Tech Talk: CephFS
(http://ceph.com/ceph-tech-talks) [Tomorrow!]

02 Mar - 9p EST - Ceph Dev Monthly: APAC
(http://tracker.ceph.com/projects/ceph/wiki/Planning)

30 Mar - All Day - Ceph Day Sunnyvale
(http://ceph.com/cephdays/ceph-day-sunnyvale/)

31 Mar - 2 Days - Ceph Hackathon San Jose (agenda TBD)

25 Apr - 5 Days - OpenStack Austin


As always, if you have questions or concerns please feel free to
contact me directly. Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread Jason Dillaman
If you run "rados -p  ls | grep "rbd_id." and don't see 
that object, you are experiencing that issue [1].

You can attempt to work around this issue by running "rados -p irfu-virt 
setomapval rbd_id. dummy value" to force-promote the object to the 
cache pool.  I haven't tested / verified that will alleviate the issue, though.

[1] http://tracker.ceph.com/issues/14762

-- 

Jason Dillaman 

- Original Message - 

> From: "SCHAER Frederic" 
> To: ceph-us...@ceph.com
> Cc: "HONORE Pierre-Francois" 
> Sent: Wednesday, February 24, 2016 12:56:48 PM
> Subject: [ceph-users] ceph hammer : rbd info/Status : operation not supported
> (95) (EC+RBD tier pools)

> Hi,

> I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here.

> I built several pools, using pool tiering:
> - A small replicated SSD pool (5 SSDs only, but I thought it’d be better for
> IOPS, I intend to test the difference with disks only)
> - Overlaying a larger EC pool

> I just have 2 VMs in Ceph… and one of them is breaking something.
> The VM that is not breaking was migrated using qemu-img for creating the ceph
> volume, then migrating the data. Its rbd format is 1 :
> rbd image 'xxx-disk1':
> size 20480 MB in 5120 objects
> order 22 (4096 kB objects)
> block_name_prefix: rb.0.83a49.3d1b58ba
> format: 1

> The VM that’s failing has a rbd format 2
> this is what I had before things started breaking :
> rbd image 'yyy-disk1':
> size 10240 MB in 2560 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.8ae1f47398c89
> format: 2
> features: layering, striping
> flags:
> stripe unit: 4096 kB
> stripe count: 1

> The VM started behaving weirdly with a huge IOwait % during its install
> (that’s to say it did not take long to go wrong ;) )
> Now, this is the only thing that I can get

> [root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1
> 2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error reading
> image id: (95) Operation not supported
> rbd: error opening image yyy-disk1: (95) Operation not supported

> One thing to note : the VM * IS STILL * working : I can still do disk
> operations, apparently.
> During the VM installation, I realized I wrongly set the target SSD caching
> size to 100Mbytes, instead of 100Gbytes, and ceph complained it was almost
> full :
> health HEALTH_WARN
> 'ssd-hot-irfu-virt' at/near target max

> My question is…… am I facing the bug as reported in this list thread with
> title “Possible Cache Tier Bug - Can someone confirm” ?
> Or did I do something wrong ?

> The libvirt and kvm that are writing into ceph are the following :
> libvirt -1.2.17-13.el7_2.3.x86_64
> qemu- kvm -1.5.3-105.el7_2.3.x86_64

> Any idea how I could recover the VM file, if possible ?
> Please note I have no problem with deleting the VM and rebuilding it, I just
> spawned it to test.
> As a matter of fact, I just “virsh destroyed” the VM, to see if I could start
> it again… and I cant :

> # virsh start yyy
> error: Failed to start domain yyy
> error: internal error: process exited while connecting to monitor:
> 2016-02-24T17:49:59.262170Z qemu-kvm: -drive
> file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_\:6789,if=none,id=drive-virtio-disk0,format=raw:
> error reading header from yyy-disk1
> 2016-02-24T17:49:59.263743Z qemu-kvm: -drive
> file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw:
> could not open disk image
> rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789:
> Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***

> Ideas ?
> Thanks
> Frederic

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread SCHAER Frederic
Hi,

I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here.

I built several pools, using pool tiering:

-  A small replicated SSD pool (5 SSDs only, but I thought it'd be 
better for IOPS, I intend to test the difference with disks only)

-  Overlaying a larger EC pool

I just have 2 VMs in Ceph... and one of them is breaking something.
The VM that is not breaking was migrated using qemu-img for creating the ceph 
volume, then migrating the data. Its rbd format is 1 :
rbd image 'xxx-disk1':
size 20480 MB in 5120 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.83a49.3d1b58ba
format: 1

The VM that's failing has a rbd format 2
this is what I had before things started breaking :
rbd image 'yyy-disk1':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.8ae1f47398c89
format: 2
features: layering, striping
flags:
stripe unit: 4096 kB
stripe count: 1


The VM started behaving weirdly with a huge IOwait % during its install (that's 
to say it did not take long to go wrong ;) )
Now, this is the only thing that I can get

[root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1
2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error reading 
image id: (95) Operation not supported
rbd: error opening image yyy-disk1: (95) Operation not supported

One thing to note : the VM *IS STILL* working : I can still do disk operations, 
apparently.
During the VM installation, I realized I wrongly set the target SSD caching 
size to 100Mbytes, instead of 100Gbytes, and ceph complained it was almost full 
:
 health HEALTH_WARN
'ssd-hot-irfu-virt' at/near target max

My question is.. am I facing the bug as reported in this list thread with 
title "Possible Cache Tier Bug - Can someone confirm" ?
Or did I do something wrong ?

The libvirt and kvm that are writing into ceph are the following :
libvirt-1.2.17-13.el7_2.3.x86_64
qemu-kvm-1.5.3-105.el7_2.3.x86_64

Any idea how I could recover the VM file, if possible ?
Please note I have no problem with deleting the VM and rebuilding it, I just 
spawned it to test.
As a matter of fact, I just "virsh destroyed" the VM, to see if I could start 
it again... and I cant :

# virsh start yyy
error: Failed to start domain yyy
error: internal error: process exited while connecting to monitor: 
2016-02-24T17:49:59.262170Z qemu-kvm: -drive 
file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_\:6789,if=none,id=drive-virtio-disk0,format=raw:
 error reading header from yyy-disk1
2016-02-24T17:49:59.263743Z qemu-kvm: -drive 
file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw:
 could not open disk image 
rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789:
 Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***

Ideas ?
Thanks
Frederic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem create user RGW

2016-02-24 Thread Yehuda Sadeh-Weinraub
try running:

$ radosgw-admin --name client.rgw.servergw001 metadata list user


Yehuda

On Wed, Feb 24, 2016 at 8:41 AM, Andrea Annoè  wrote:
> I don’t see any user create in RGW
>
>
>
> sudo radosgw-admin metadata list user
>
> [
>
> ]
>
>
>
>
>
> sudo radosgw-admin user create --uid="user1site1" --display-name="User test
> replica site1" --name client.rgw.servergw001 --access-key=user1site1
> --secret=pwd1
>
> {
>
> "user_id": "user1site1",
>
> "display_name": "User test replica site1",
>
> "email": "",
>
> "suspended": 0,
>
> "max_buckets": 1000,
>
> "auid": 0,
>
> "subusers": [],
>
> "keys": [
>
> {
>
> "user": "user1site1",
>
> "access_key": "user1site1",
>
> "secret_key": "pwd1"
>
> }
>
> ],
>
> "swift_keys": [],
>
> "caps": [],
>
> "op_mask": "read, write, delete",
>
> "default_placement": "",
>
> "placement_tags": [],
>
> "bucket_quota": {
>
> "enabled": false,
>
> "max_size_kb": -1,
>
> "max_objects": -1
>
> },
>
> "user_quota": {
>
> "enabled": false,
>
> "max_size_kb": -1,
>
> "max_objects": -1
>
> },
>
> "temp_url_keys": []
>
> }
>
>
>
> sudo radosgw-admin metadata list user
>
> [
>
> ]
>
>
>
>
>
> The list of user don’t change… what’s the problem? Command, keyring… ??
>
> The command for create user don’t report error if I try to retry more time.
>
>
>
> Please help me.
>
>
>
> Best regards.
>
> Andrea
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem create user RGW

2016-02-24 Thread Andrea Annoè
I don't see any user create in RGW

sudo radosgw-admin metadata list user
[
]


sudo radosgw-admin user create --uid="user1site1" --display-name="User test 
replica site1" --name client.rgw.servergw001 --access-key=user1site1 
--secret=pwd1
{
"user_id": "user1site1",
"display_name": "User test replica site1",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "user1site1",
"access_key": "user1site1",
"secret_key": "pwd1"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

sudo radosgw-admin metadata list user
[
]


The list of user don't change... what's the problem? Command, keyring... ??
The command for create user don't report error if I try to retry more time.

Please help me.

Best regards.
Andrea

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem RGW agent sync: [radosgw_agent][ERROR ] HttpError: Http error code 403 content Forbidden

2016-02-24 Thread Andrea Annoè
Hello to all,
I have create an async replica from 2 zone in a region.

I have problem with user and permission: Http error code 403 content Forbidden

I have create user for manage replica but don't see the info.

sudo radosgw-admin user create --uid="site2" --display-name="Zone site2" --name 
client.rgw.servergw001 --system --access-key=admincephsite2 
--secret=admincephsite2pwd
{
"user_id": "site2",
"display_name": "Zone site2",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "site2",
"access_key": "admincephsite2",
"secret_key": "admincephsite2pwd"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"system": "true",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}


sudo radosgw-admin metadata list user
[
]






Log of problem with replica agent.

2016-02-24 17:23:10,830 5524 [radosgw_agent][INFO  ]  ____  
________
2016-02-24 17:23:10,830 5524 [radosgw_agent][INFO  ] /__` \ / |\ | /  `   
/\   / _`   |__   |\ |  |
2016-02-24 17:23:10,830 5524 [radosgw_agent][INFO  ] .__/  |  | \| \__,/~~\ 
\__> |___ | \|  |
2016-02-24 17:23:10,830 5524 [radosgw_agent][INFO  ]
  v1.2.4
2016-02-24 17:23:10,830 5524 [radosgw_agent][INFO  ] agent options:
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]  args:
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]conf
  : None
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]dest_access_key 
  : 
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]dest_secret_key 
  : 
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]dest_zone   
  : site2
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]destination 
  : http://servergwmi001.intra.hosting.it:7480
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]incremental_sync_delay  
  : 30
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]lock_timeout
  : 60
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]log_file
  : /var/log/radosgw/radosgw-sync.log
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]log_lock_time   
  : 20
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]max_entries 
  : 1000
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]metadata_only   
  : False
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]num_workers 
  : 1
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]object_sync_timeout 
  : 216000
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]prepare_error_delay 
  : 10
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]quiet   
  : False
2016-02-24 17:23:10,831 5524 [radosgw_agent][INFO  ]rgw_data_log_window 
  : 30
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]source  
  : http://servergw001.intra.hosting.it:7480
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]src_access_key  
  : 
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]src_secret_key  
  : 
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]src_zone
  : site1
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]sync_scope  
  : full
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]test_server_host
  : None
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]test_server_port
  : 8080
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]verbose 
  : True
2016-02-24 17:23:10,832 5524 [radosgw_agent][INFO  ]versioned   
  : False
2016-02-24 17:23:10,832 5524 [radosgw_agent.client][INFO  ] creating connection 
to endpoint: http://servergwmi001.intra.hosting.it:7480
2016-02-24 17:23:10,832 5524 [boto][DEBUG ] Using access key provided by client.
2016-02-24 17:23:10,832 5524 [boto][DEBUG ] Using secret key provided by client.
2016-02-24 17:23:10,833 5524 [boto][DEBUG ] StringToSign:
GET


Wed, 24 Feb 2016 16:23:10 GMT
/admin/config
2016-02-24 17:23:10,833 5524 [boto][DEBUG ] Signature:
AWS admincephsite2:Jx9VyxhCYggBpnMnE2bM4stkMk4=
2016-02-24 17:23:10,833 5524 [boto][DEBUG ] url = 
'http://servergwmi001.intra.hosting.it:7480/admin/config'
params={}
headers={'Date': 'Wed, 24 Feb 2016 16:23:10 GMT', 'Content-Length': '0', 
'Authorization': u'AWS admincephsite2:Jx9VyxhCYggBpnMnE2bM4stkMk4=', 
'User-Agent'

Re: [ceph-users] OSDs are crashing during PG replication

2016-02-24 Thread Alexander Gubanov
Hm. It seems that the cache pool qoutas have not been set. At least I'm
sure I didn't set them.

# ceph osd pool get-quota cache
quotas for pool 'cache':
  max objects: N/A
  max bytes  : N/A

Hmm. It seems that the cache pool quota have not been set. At least I'm
sure I didn't set it. Maybe it have default setting.

# ceph osd pool get-quota cache
quotas for pool 'cache':
  max objects: N/A
  max bytes  : N/A

But I set target_max_bytes:

# ceph osd pool set cache target_max_bytes 1

Can it serve as the reason?

On Wed, Feb 24, 2016 at 4:08 PM, Alexey Sheplyakov  wrote:

> Hi,
>
> > 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In
> function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread
> 7fd994825700 time 2016-02-24 04:51:45.870995
> osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)
> > ceph version 0.80.11-8-g95c4287
> (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
>
> This one looks familiar: http://tracker.ceph.com/issues/13098
>
> A quick work around is to unset the cache pool quota:
>
> ceph osd pool set-quota $cache_pool_name max_bytes 0
> ceph osd pool set-quota $cache_pool_name max_objects 0
>
> The problem has been properly fixed in infernalis v9.1.0, and
> (partially) in hammer (v0.94.6 which will be released soon).
>
>  Best regards,
>   Alexey
>
>
> On Wed, Feb 24, 2016 at 5:37 AM, Alexander Gubanov 
> wrote:
> > Hi,
> >
> > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG
> > replication because crashing only 2 OSDs and every time they're are the
> > same.
> >
> > 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In
> > function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> > ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread
> > 7fd994825700 time 2016-02-24 04:51:45.870995
> > osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)
> >
> >  ceph version 0.80.11-8-g95c4287
> (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
> >  1: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> > ceph::buffer::list::iterator&, OSDOp&,
> std::tr1::shared_ptr&,
> > bool)+0xffc) [0x7c1f7c]
> >  2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
> std::vector > std::allocator >&)+0x4171) [0x809f21]
> >  3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62)
> > [0x814622]
> >  4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8)
> [0x815098]
> >  5: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3dd4)
> [0x81a3f4]
> >  6: (ReplicatedPG::do_request(std::tr1::shared_ptr,
> > ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
> >  7: (OSD::dequeue_op(boost::intrusive_ptr,
> > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
> >  8: (OSD::OpWQ::_process(boost::intrusive_ptr,
> > ThreadPool::TPHandle&)+0x203) [0x61cba3]
> >  9: (ThreadPool::WorkQueueVal,
> > std::tr1::shared_ptr >, boost::intrusive_ptr
> >>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
> >  10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
> >  11: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
> >  12: (()+0x7dc5) [0x7fd9ad03edc5]
> >  13: (clone()+0x6d) [0x7fd9abd2828d]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to
> > interpret this.
> >
> > --- logging levels ---
> >0/ 5 none
> >0/ 1 lockdep
> >0/ 1 context
> >1/ 1 crush
> >1/ 5 mds
> >1/ 5 mds_balancer
> >1/ 5 mds_locker
> >1/ 5 mds_log
> >1/ 5 mds_log_expire
> >1/ 5 mds_migrator
> >0/ 1 buffer
> >0/ 1 timer
> >0/ 1 filer
> >0/ 1 striper
> >0/ 1 objecter
> >0/ 5 rados
> >0/ 5 rbd
> >0/ 5 journaler
> >0/ 5 objectcacher
> >0/ 5 client
> >0/ 5 osd
> >0/ 5 optracker
> >0/ 5 objclass
> >1/ 3 filestore
> >1/ 3 keyvaluestore
> >1/ 3 journal
> >0/ 5 ms
> >1/ 5 mon
> >0/10 monc
> >1/ 5 paxos
> >0/ 5 tp
> >1/ 5 auth
> >1/ 5 crypto
> >1/ 1 finisher
> >1/ 5 heartbeatmap
> >1/ 5 perfcounter
> >1/ 5 rgw
> >1/10 civetweb
> >1/ 5 javaclient
> >1/ 5 asok
> >1/ 1 throttle
> >   -2/-2 (syslog threshold)
> >   -1/-1 (stderr threshold)
> >   max_recent 1
> >   max_new 1000
> >   log_file /var/log/ceph/ceph-osd.3.log
> > --- end dump of recent events ---
> > 2016-02-24 04:51:45.97 7fd994825700 -1 *** Caught signal (Aborted) **
> >  in thread 7fd994825700
> >
> >  ceph version 0.80.11-8-g95c4287
> (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
> >  1: /usr/bin/ceph-osd() [0x9a24f6]
> >  2: (()+0xf100) [0x7fd9ad046100]
> >  3: (gsignal()+0x37) [0x7fd9abc675f7]
> >  4: (abort()+0x148) [0x7fd9abc68ce8]
> >  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5]
> >  6: (()+0x5e946) [0x7fd9ac569946]
> >  7: (()+0x5e973) [0x7fd9ac569973]
> >  8: (()+0x5eb93) [0x7fd9ac569b93]
> >  9: (ceph::__ceph_assert_fail(char cons

Re: [ceph-users] v0.94.6 Hammer released

2016-02-24 Thread Alfredo Deza
On Wed, Feb 24, 2016 at 4:31 AM, Dan van der Ster  wrote:
> Thanks Sage, looking forward to some scrub randomization.
>
> Were binaries built for el6? http://download.ceph.com/rpm-hammer/el6/x86_64/

We are no longer building binaries for el6. Just for Centos 7, Ubuntu
Trusty, and Debian Jessie.

>
> Cheers, Dan
>
>
> On Tue, Feb 23, 2016 at 5:01 PM, Sage Weil  wrote:
>> This Hammer point release fixes a range of bugs, most notably a fix for
>> unbounded growth of the monitor’s leveldb store, and a workaround in the
>> OSD to keep most xattrs small enough to be stored inline in XFS inodes.
>>
>> We recommend that all hammer v0.94.x users upgrade.
>>
>> For more detailed information, see the complete changelog:
>>
>>   http://docs.ceph.com/docs/master/_downloads/v0.94.6.txt
>>
>> Notable Changes
>> ---
>>
>> * build/ops: Ceph daemon failed to start, because the service name was 
>> already used. (#13474, Chuanhong Wang)
>> * build/ops: LTTng-UST tracing should be dynamically enabled (#13274, Jason 
>> Dillaman)
>> * build/ops: ceph upstart script rbdmap.conf incorrectly processes 
>> parameters (#13214, Sage Weil)
>> * build/ops: ceph.spec.in License line does not reflect COPYING (#12935, 
>> Nathan Cutler)
>> * build/ops: ceph.spec.in libcephfs_jni1 has no %post and %postun  (#12927, 
>> Owen Synge)
>> * build/ops: configure.ac: no use to add "+" before ac_ext=c (#14330, Kefu 
>> Chai, Robin H. Johnson)
>> * build/ops: deb: strip tracepoint libraries from Wheezy/Precise builds 
>> (#14801, Jason Dillaman)
>> * build/ops: init script reload doesn't work on EL7 (#13709, Hervé Rousseau)
>> * build/ops: init-rbdmap uses distro-specific functions (#12415, Boris Ranto)
>> * build/ops: logrotate reload error on Ubuntu 14.04 (#11330, Sage Weil)
>> * build/ops: miscellaneous spec file fixes (#12931, #12994, #12924, #12360, 
>> Boris Ranto, Nathan Cutler, Owen Synge, Travis Rhoden, Ken Dreyer)
>> * build/ops: pass tcmalloc env through to ceph-os (#14802, Sage Weil)
>> * build/ops: rbd-replay-* moved from ceph-test-dbg to ceph-common-dbg as 
>> well (#13785, Loic Dachary)
>> * build/ops: unknown argument --quiet in udevadm settle (#13560, Jason 
>> Dillaman)
>> * common: Objecter: pool op callback may hang forever. (#13642, xie xingguo)
>> * common: Objecter: potential null pointer access when do pool_snap_list. 
>> (#13639, xie xingguo)
>> * common: ThreadPool add/remove work queue methods not thread safe (#12662, 
>> Jason Dillaman)
>> * common: auth/cephx: large amounts of log are produced by osd (#13610, 
>> Qiankun Zheng)
>> * common: client nonce collision due to unshared pid namespaces (#13032, 
>> Josh Durgin)
>> * common: common/Thread:pthread_attr_destroy(thread_attr) when done with it 
>> (#12570, Piotr Dałek)
>> * common: log: Log.cc: Assign LOG_DEBUG priority to syslog calls (#13993, 
>> Brad Hubbard)
>> * common: objecter: cancellation bugs (#13071, Jianpeng Ma)
>> * common: pure virtual method called (#13636, Jason Dillaman)
>> * common: small probability sigabrt when setting rados_osd_op_timeout 
>> (#13208, Ruifeng Yang)
>> * common: wrong conditional for boolean function KeyServer::get_auth() 
>> (#9756, #13424, Nathan Cutler)
>> * crush: crash if we see CRUSH_ITEM_NONE in early rule step (#13477, Sage 
>> Weil)
>> * doc: man: document listwatchers cmd in "rados" manpage (#14556, Kefu Chai)
>> * doc: regenerate man pages, add orphans commands to radosgw-admin(8) 
>> (#14637, Ken Dreyer)
>> * fs: CephFS restriction on removing cache tiers is overly strict (#11504, 
>> John Spray)
>> * fs: fsstress.sh fails (#12710, Yan, Zheng)
>> * librados: LibRadosWatchNotify.WatchNotify2Timeout (#13114, Sage Weil)
>> * librbd: ImageWatcher shouldn't block the notification thread (#14373, 
>> Jason Dillaman)
>> * librbd: diff_iterate needs to handle holes in parent images (#12885, Jason 
>> Dillaman)
>> * librbd: fix merge-diff for >2GB diff-files (#14030, Jason Dillaman)
>> * librbd: invalidate object map on error even w/o holding lock (#13372, 
>> Jason Dillaman)
>> * librbd: reads larger than cache size hang (#13164, Lu Shi)
>> * mds: ceph mds add_data_pool check for EC pool is wrong (#12426, John Spray)
>> * mon: MonitorDBStore: get_next_key() only if prefix matches (#11786, Joao 
>> Eduardo Luis)
>> * mon: OSDMonitor: do not assume a session exists in send_incremental() 
>> (#14236, Joao Eduardo Luis)
>> * mon: check for store writeablility before participating in election 
>> (#13089, Sage Weil)
>> * mon: compact full epochs also (#14537, Kefu Chai)
>> * mon: include min_last_epoch_clean as part of PGMap::print_summary and 
>> PGMap::dump (#13198, Guang Yang)
>> * mon: map_cache can become inaccurate if osd does not receive the osdmaps 
>> (#10930, Kefu Chai)
>> * mon: should not set isvalid = true when cephx_verify_authorizer return 
>> false (#13525, Ruifeng Yang)
>> * osd: Ceph Pools' MAX AVAIL is 0 if some OSDs' weight is 0 (#13840, 
>> Chengyuan Li)
>> * osd: FileStore calls syncfs(2) even it

Re: [ceph-users] Old MDS resurrected after update

2016-02-24 Thread Scottix
Thanks for the responses John.

--Scott

On Wed, Feb 24, 2016 at 3:07 AM John Spray  wrote:

> On Tue, Feb 23, 2016 at 5:36 PM, Scottix  wrote:
> > I had a weird thing happen when I was testing an upgrade in a dev
> > environment where I have removed an MDS from a machine a while back.
> >
> > I upgraded to 0.94.6 and low and behold the mds daemon started up on the
> > machine again. I know the /var/lib/ceph/mds folder was removed becaues I
> > renamed it /var/lib/ceph/mds-removed and I definitely have restarted this
> > machine several times with mds not starting before.
> >
> > Only thing I noticed was the auth keys were still in play. I am assuming
> the
> > upgrade recreated the folder and found it still had access so it started
> > back up.
>
> I don't see how the upgrade would have recreated auth keys for the
> MDS, unless you mean some external upgrade script rather than just the
> packages on the machine?
>
> > I am guessing we have to add one more step in the removal mds from this
> post
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045649.html
> >
> >  1 Stop the old MDS
> >  2 Run "ceph mds fail 0"
> >  3 Run "ceph auth del mds."
>
> Yes, good point, folks should also do the "auth del" part.
>
> > I am a little weary of command 2 since there is no clear depiction of
> what 0
> > is. Is this command better since it is more clear "ceph mds rm 0
> mds."
>
> '0' in this context means rank 0, i.e. whichever active daemon holds
> that rank at the time.  If you have more than one daemon, the you may
> not need to do this; if the daemon you're removing is not currently
> active (i.e. not holding a rank) then you don't actually need to do
> this.
>
> Cheers,
> John
>
> > Is there anything else that could possibly resurrect it?
>
> Nope, not that I can think of.  I actually don't understand how it got
> resurrected in this instance because removing its
> /var/lib/ceph/mds/... directory should have destroyed its auth keys.
>
> John
>
> > Best,
> > Scott
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can not disable rbd cache

2016-02-24 Thread Oliver Dzombic
Hi Esta,

how do you know, that its still active ?

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 24.02.2016 um 12:27 schrieb wikison:
> Hi,
> I want to disable rbd cache in my ceph cluster. I've set the *rbd cache*
> to be false in the [client] section of ceph.conf and rebooted the
> cluster. But caching system was still working. How can I disable the rbd
> caching system? Any help?
>  
> best regards.
>  
> 2016-02-24
> 
> Esta Wang
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can not disable rbd cache

2016-02-24 Thread wikison
Hi,
I want to disable rbd cache in my ceph cluster. I've set the rbd cache to be 
false in the [client] section of ceph.conf and rebooted the cluster. But 
caching system was still working. How can I disable the rbd caching system? Any 
help?

best regards.

2016-02-24



Esta Wang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crush map customization for production use

2016-02-24 Thread Vickey Singh
Hello Geeks

Can someone please review and comment on my custom crush maps. I would
really appreciate your help


My setup :  1 Rack , 4 chassis , 3 storage nodes each chassis ( so total 12
storage nodes ) , pool size = 3

What i want to achieve is:
- Survive chassis failures , even if i loose 2 complete chassis (containing
3 nodes each) , data should not be lost
- The crush ruleset should store each copy on a unique chassis and host

For example :
copy 1 ---> c1-node1
copy 2 ---> c2-node3
copy 3 ---> c4-node2



Here is my crushmap
=

chassis block_storage_chassis_4 {
id -17 # do not change unnecessarily
# weight weight 163.350
alg straw
hash 0 # rjenkins1
item c4-node1 weight 54.450
item c4-node2 weight 54.450
item c4-node3 weight 54.450

}

chassis block_storage_chassis_3 {
id -16 # do not change unnecessarily
# weight weight 163.350
alg straw
hash 0 # rjenkins1
item c3-node1 weight 54.450
item c3-node2 weight 54.450
item c3-node3 weight 54.450

}

chassis block_storage_chassis_2 {
id -15 # do not change unnecessarily
# weight weight 163.350
alg straw
hash 0 # rjenkins1
item c2-node1 weight 54.450
item c2-node2 weight 54.450
item c3-node3 weight 54.450

}

chassis block_storage_chassis_1 {
id -14 # do not change unnecessarily
# weight 163.350
alg straw
hash 0 # rjenkins1
item c1-node1 weight 54.450
item c1-node2 weight 54.450
item c1-node3 weight 54.450

}

rack block_storage_rack_1 {
id -10 # do not change unnecessarily
# weight 174.240
alg straw
hash 0 # rjenkins1
item block_storage_chassis_1 weight 163.350
item block_storage_chassis_2 weight 163.350
item block_storage_chassis_3 weight 163.350
item block_storage_chassis_4 weight 163.350

}

class block_storage {
id -6 # do not change unnecessarily
# weight 210.540
alg straw
hash 0 # rjenkins1
item block_storage_rack_1 weight 656.400
}

rule ruleset_block_storage {
ruleset 1
type replicated
min_size 1
max_size 10
step take block_storage
step chooseleaf firstn 0 type chassis
step emit
}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Old MDS resurrected after update

2016-02-24 Thread John Spray
On Tue, Feb 23, 2016 at 5:36 PM, Scottix  wrote:
> I had a weird thing happen when I was testing an upgrade in a dev
> environment where I have removed an MDS from a machine a while back.
>
> I upgraded to 0.94.6 and low and behold the mds daemon started up on the
> machine again. I know the /var/lib/ceph/mds folder was removed becaues I
> renamed it /var/lib/ceph/mds-removed and I definitely have restarted this
> machine several times with mds not starting before.
>
> Only thing I noticed was the auth keys were still in play. I am assuming the
> upgrade recreated the folder and found it still had access so it started
> back up.

I don't see how the upgrade would have recreated auth keys for the
MDS, unless you mean some external upgrade script rather than just the
packages on the machine?

> I am guessing we have to add one more step in the removal mds from this post
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045649.html
>
>  1 Stop the old MDS
>  2 Run "ceph mds fail 0"
>  3 Run "ceph auth del mds."

Yes, good point, folks should also do the "auth del" part.

> I am a little weary of command 2 since there is no clear depiction of what 0
> is. Is this command better since it is more clear "ceph mds rm 0 mds."

'0' in this context means rank 0, i.e. whichever active daemon holds
that rank at the time.  If you have more than one daemon, the you may
not need to do this; if the daemon you're removing is not currently
active (i.e. not holding a rank) then you don't actually need to do
this.

Cheers,
John

> Is there anything else that could possibly resurrect it?

Nope, not that I can think of.  I actually don't understand how it got
resurrected in this instance because removing its
/var/lib/ceph/mds/... directory should have destroyed its auth keys.

John

> Best,
> Scott
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rack weight imbalance

2016-02-24 Thread Chen, Xiaoxi
My 0.02,  there are two kinds of balance, one for space utilization , another 
for performance.

Now seems you will be good for the space utilization, but you might suffer a 
bit for the performance as the density of disk increase.The new rack will hold 
1/3 data by 1/5 disks, if we assume the workload even distributed(# of request 
/ amout of data = const N), the new racks will become bottleneck.

Primary affinity might help (that leverage all read requests to the old racks), 
or maybe your disk is fairly idle so it is not a problem at all:)


-Xiaoxi





On 2/23/16, 4:19 AM, "ceph-users on behalf of Gregory Farnum" 
 wrote:

>On Mon, Feb 22, 2016 at 9:29 AM, George Mihaiescu  wrote:
>> Hi,
>>
>> We have a fairly large Ceph cluster (3.2 PB) that we want to expand and we
>> would like to get your input on this.
>>
>> The current cluster has around 700 OSDs (4 TB and 6 TB) in three racks with
>> the largest pool being rgw and using a replica 3.
>> For non-technical reasons (budgetary, etc) we are considering getting three
>> more racks, but initially adding only two storage nodes with 36 x 8 TB
>> drives in each, which will basically cause the rack weights to be imbalanced
>> (three racks with weight around a 1000 and 288 OSDs, and three racks with
>> weight around 500 but only 72 OSDs)
>>
>> The one replica per rack CRUSH rule will cause existing data to be
>> re-balanced among all six racks, with OSDs in the new racks getting only a
>> proportionate amount of replicas.
>>
>> Do you see any possible problems with this approach? Should Ceph be able to
>> properly rebalance the existing data among racks with imbalanced weights?
>>
>> Thank you for your input and please let me know if you need additional info.
>
>This should be okay; you have multiple racks in each size and aren't
>trying to replicate a full copy to each rack individually. You can
>test it ahead of time with the crush tool, though:
>http://docs.ceph.com/docs/master/man/8/crushtool/
>It may turn out you're using old tunables and want to update them
>first or something.
>-Greg
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-02-24 Thread Dan van der Ster
Thanks Sage, looking forward to some scrub randomization.

Were binaries built for el6? http://download.ceph.com/rpm-hammer/el6/x86_64/

Cheers, Dan


On Tue, Feb 23, 2016 at 5:01 PM, Sage Weil  wrote:
> This Hammer point release fixes a range of bugs, most notably a fix for
> unbounded growth of the monitor’s leveldb store, and a workaround in the
> OSD to keep most xattrs small enough to be stored inline in XFS inodes.
>
> We recommend that all hammer v0.94.x users upgrade.
>
> For more detailed information, see the complete changelog:
>
>   http://docs.ceph.com/docs/master/_downloads/v0.94.6.txt
>
> Notable Changes
> ---
>
> * build/ops: Ceph daemon failed to start, because the service name was 
> already used. (#13474, Chuanhong Wang)
> * build/ops: LTTng-UST tracing should be dynamically enabled (#13274, Jason 
> Dillaman)
> * build/ops: ceph upstart script rbdmap.conf incorrectly processes parameters 
> (#13214, Sage Weil)
> * build/ops: ceph.spec.in License line does not reflect COPYING (#12935, 
> Nathan Cutler)
> * build/ops: ceph.spec.in libcephfs_jni1 has no %post and %postun  (#12927, 
> Owen Synge)
> * build/ops: configure.ac: no use to add "+" before ac_ext=c (#14330, Kefu 
> Chai, Robin H. Johnson)
> * build/ops: deb: strip tracepoint libraries from Wheezy/Precise builds 
> (#14801, Jason Dillaman)
> * build/ops: init script reload doesn't work on EL7 (#13709, Hervé Rousseau)
> * build/ops: init-rbdmap uses distro-specific functions (#12415, Boris Ranto)
> * build/ops: logrotate reload error on Ubuntu 14.04 (#11330, Sage Weil)
> * build/ops: miscellaneous spec file fixes (#12931, #12994, #12924, #12360, 
> Boris Ranto, Nathan Cutler, Owen Synge, Travis Rhoden, Ken Dreyer)
> * build/ops: pass tcmalloc env through to ceph-os (#14802, Sage Weil)
> * build/ops: rbd-replay-* moved from ceph-test-dbg to ceph-common-dbg as well 
> (#13785, Loic Dachary)
> * build/ops: unknown argument --quiet in udevadm settle (#13560, Jason 
> Dillaman)
> * common: Objecter: pool op callback may hang forever. (#13642, xie xingguo)
> * common: Objecter: potential null pointer access when do pool_snap_list. 
> (#13639, xie xingguo)
> * common: ThreadPool add/remove work queue methods not thread safe (#12662, 
> Jason Dillaman)
> * common: auth/cephx: large amounts of log are produced by osd (#13610, 
> Qiankun Zheng)
> * common: client nonce collision due to unshared pid namespaces (#13032, Josh 
> Durgin)
> * common: common/Thread:pthread_attr_destroy(thread_attr) when done with it 
> (#12570, Piotr Dałek)
> * common: log: Log.cc: Assign LOG_DEBUG priority to syslog calls (#13993, 
> Brad Hubbard)
> * common: objecter: cancellation bugs (#13071, Jianpeng Ma)
> * common: pure virtual method called (#13636, Jason Dillaman)
> * common: small probability sigabrt when setting rados_osd_op_timeout 
> (#13208, Ruifeng Yang)
> * common: wrong conditional for boolean function KeyServer::get_auth() 
> (#9756, #13424, Nathan Cutler)
> * crush: crash if we see CRUSH_ITEM_NONE in early rule step (#13477, Sage 
> Weil)
> * doc: man: document listwatchers cmd in "rados" manpage (#14556, Kefu Chai)
> * doc: regenerate man pages, add orphans commands to radosgw-admin(8) 
> (#14637, Ken Dreyer)
> * fs: CephFS restriction on removing cache tiers is overly strict (#11504, 
> John Spray)
> * fs: fsstress.sh fails (#12710, Yan, Zheng)
> * librados: LibRadosWatchNotify.WatchNotify2Timeout (#13114, Sage Weil)
> * librbd: ImageWatcher shouldn't block the notification thread (#14373, Jason 
> Dillaman)
> * librbd: diff_iterate needs to handle holes in parent images (#12885, Jason 
> Dillaman)
> * librbd: fix merge-diff for >2GB diff-files (#14030, Jason Dillaman)
> * librbd: invalidate object map on error even w/o holding lock (#13372, Jason 
> Dillaman)
> * librbd: reads larger than cache size hang (#13164, Lu Shi)
> * mds: ceph mds add_data_pool check for EC pool is wrong (#12426, John Spray)
> * mon: MonitorDBStore: get_next_key() only if prefix matches (#11786, Joao 
> Eduardo Luis)
> * mon: OSDMonitor: do not assume a session exists in send_incremental() 
> (#14236, Joao Eduardo Luis)
> * mon: check for store writeablility before participating in election 
> (#13089, Sage Weil)
> * mon: compact full epochs also (#14537, Kefu Chai)
> * mon: include min_last_epoch_clean as part of PGMap::print_summary and 
> PGMap::dump (#13198, Guang Yang)
> * mon: map_cache can become inaccurate if osd does not receive the osdmaps 
> (#10930, Kefu Chai)
> * mon: should not set isvalid = true when cephx_verify_authorizer return 
> false (#13525, Ruifeng Yang)
> * osd: Ceph Pools' MAX AVAIL is 0 if some OSDs' weight is 0 (#13840, 
> Chengyuan Li)
> * osd: FileStore calls syncfs(2) even it is not supported (#12512, Kefu Chai)
> * osd: FileStore: potential memory leak if getattrs fails. (#13597, xie 
> xingguo)
> * osd: IO error on kvm/rbd with an erasure coded pool tier (#12012, Kefu Chai)
> * osd: OSD::build_past_intervals_parallel() shall re

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

2016-02-24 Thread Dimitar Boichev
I think this happened because of the wrongly removed OSD...
A bug maybe ?

Do you think that "ceph pg repair" will force the remove of the PG from the 
missing osd ?
I am concerned about executing "pg repair" or "osd lost" because maybe it will 
decide that the stuck one is the right data and try to do stuff with it and 
discard the active running copy ..


Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boic...@axsmarine.com


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Stillwell, Bryan
Sent: Tuesday, February 23, 2016 7:31 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] osd not removed from crush map after ceph osd crush 
remove

Dimitar,

I would agree with you that getting the cluster into a healthy state first is 
probably the better idea.  Based on your pg query, it appears like you're using 
only 1 replica.  Any ideas why that would be?

The output should look like this (with 3 replicas):

osdmap e133481 pg 11.1b8 (11.1b8) -> up [13,58,37] acting [13,58,37]

Bryan

From:  Dimitar Boichev 
Date:  Tuesday, February 23, 2016 at 1:08 AM
To:  CTG User , "ceph-users@lists.ceph.com"

Subject:  RE: [ceph-users] osd not removed from crush map after ceph osd crush 
remove


>Hello,
>Thank you Bryan.
>
>I was just trying to upgrade to hammer or upper but before that I was 
>wanting to get the cluster in Healthy state.
>Do you think it is safe to upgrade now first to latest firefly then to 
>Hammer ?
>
>
>Regards.
>
>Dimitar Boichev
>SysAdmin Team Lead
>AXSMarine Sofia
>Phone: +359 889 22 55 42
>Skype: dimitar.boichev.axsmarine
>E-mail:
>dimitar.boic...@axsmarine.com
>
>
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>On Behalf Of Stillwell, Bryan
>Sent: Tuesday, February 23, 2016 1:51 AM
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd 
>crush remove
>
>
>
>Dimitar,
>
>
>
>I'm not sure why those PGs would be stuck in the stale+active+clean 
>state.  Maybe try upgrading to the 0.80.11 release to see if it's a bug 
>that was fixed already?  You can use the 'ceph tell osd.*  version' 
>command after the upgrade to make sure all OSDs are running the new 
>version.  Also since firefly (0.80.x) is near its EOL, you should 
>consider upgrading to hammer (0.94.x).
>
>
>
>As for why osd.4 didn't get fully removed, the last command you ran 
>isn't correct.  It should be 'ceph osd rm 4'.  Trying to remember when 
>to use the CRUSH name (osd.4) versus the OSD number (4)  can be a pain.
>
>
>
>Bryan
>
>
>
>From: ceph-users  on behalf of 
>Dimitar Boichev 
>Date: Monday, February 22, 2016 at 1:10 AM
>To: Dimitar Boichev , 
>"ceph-users@lists.ceph.com" 
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd 
>crush remove
>
>
>
>>Anyone ?
>>
>>Regards.
>>
>>
>>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>>On Behalf Of Dimitar Boichev
>>Sent: Thursday, February 18, 2016 5:06 PM
>>To: ceph-users@lists.ceph.com
>>Subject: [ceph-users] osd not removed from crush map after ceph osd 
>>crush remove
>>
>>
>>
>>Hello,
>>I am running a tiny cluster of 2 nodes.
>>ceph -v
>>ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>>One osd died and I added a new osd (not replacing the old one).
>>After that I wanted to remove the failed osd completely from the cluster.
>>Here is what I did:
>>ceph osd reweight osd.4 0.0
>>ceph osd crush reweight osd.4 0.0
>>ceph osd out osd.4
>>ceph osd crush remove osd.4
>>ceph auth del osd.4
>>ceph osd rm osd.4
>>
>>
>>But after the rebalancing I ended up with 155 PGs in 
>>stale+active+clean state.
>>
>>@storage1:/tmp# ceph -s
>>cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
>> health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests 
>>are blocked > 32 sec; nodeep-scrub flag(s) set
>> monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election 
>>epoch 1, quorum 0 storage1
>> osdmap e1064: 6 osds: 6 up, 6 in
>>flags nodeep-scrub
>>  pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
>>1209 GB used, 14210 GB / 15419 GB avail
>> 155 stale+active+clean
>> 557 active+clean
>>  client io 91925 B/s wr, 5 op/s
>>
>>I know about the 1 monitor problem I just want to fix the cluster to 
>>healthy state then I will add the third storage node and go up to 3 
>>monitors.
>>
>>The problem is as follows:
>>@storage1:/tmp# ceph pg map 2.3a
>>osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6] @storage1:/tmp# ceph 
>>pg 2.3a query Error ENOENT: i don't have pgid 2.3a
>>
>>
>>@storage1:/tmp# ceph health detail
>>HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked 
>>>
>>32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set pg 7.2a is 
>>stuck stale for 8887559.656879, current state
>>stale+active+clean, last acting [4]
>>pg 5.28 is stu

Re: [ceph-users] Data transfer occasionally freezes for significant time

2016-02-24 Thread Haomai Wang
On Wed, Feb 24, 2016 at 4:57 PM,  wrote:

> >So you don't find any slow request
> Yes, exactly
> >we may has some problems on using poll call. The only potential related
> PR is https://github.com/ceph/ceph/pull/6971
> How we can clarify this hypothesis?
> Obviously, we can't install experimental osd version on our production
> cluster.
> BTW there's no unususal cpu usage on osd host at that moment.
>
>
I don't remember anyone who report alike problem, so if you improve
debug_ms=10/10 may help to find more. And the other difference in your
cluster is ceph running in docker via NET, I'm not sure whether exists some
potential problems, but obviously it's not a very common deploy


>
>
>
> 2016-02-20 9:01 GMT+03:00 Haomai Wang :
> >
> >
> > On Sat, Feb 20, 2016 at 2:26 AM,  wrote:
> >>
> >> Hi All.
> >>
> >> We're running 180-node cluster in docker containers -- official
> >> ceph:hammer.
> >> Recently, we've found a rarely reproducible problem on it: sometimes
> >> data transfer freezes for significant time (5-15 minutes). The issue
> >> is taking place while using radosgw & librados apps
> >> (docker-distribution). This problem can be worked around with "ms tcp
> >> read timeout" parameter decreased to 2-3 seconds on the client side,
> >> but that does not seem to be a good solution.
> >> I've written bash script, getting every object (and it's omap/xattr)
> >> with 'rados' cli utility  from data pool in infinite cycle, to
> >> reproduce the problem. Running that on 3 hosts simultaneously on
> >> docker-distribution's pool (4mb objects) during 8 hours resulted in 25
> >> reads, each of them took more than 60 seconds.
> >> Script results here (hostnames substituted):
> >>
> >>
> https://gist.github.com/aaaler/cb190c1eb636564519a5#file-distribution-pool-err-sorted
> >> But there's nothing suspicious on corresponding OSD logs.
> >> For example, take a look on the one of these faulty reads:
> >>  21:44:32 consumed 1891 seconds reading
> >> blob:daa46e8d-170e-43ab-8c00-526782f95e02-0 on host1(192.168.1.133)
> >> osdmap e80485 pool 'distribution' (17) object
> >> 'blob:daa46e8d-170e-43ab-8c00-526782f95e02-0' -> pg 17.97f485f (17.5f)
> >> -> up ([139,149,167], p139) acting ([139,149,167], p139)
> >>
> >> Thus, we've got 1891 seconds of waiting, and after that the client has
> >> just proceed without any errors occurred, so I tried to find something
> >> useful in osd.139 logs
> >> (https://gist.github.com/aaaler/cb190c1eb636564519a5#file-osd-139-log),
> >> but could not find anything interesting.
> >>
> >> Another example (next line in script output) shew us 2983 seconds of
> >> reading blob:f5c22093-6e6d-41a6-be36-462330b36c67-71 from osd.56.
> >> Again, nothing in osd.56 logs during that time:
> >> https://gist.github.com/aaaler/cb190c1eb636564519a5#file-osd-56-log
> >>
> >> How can I troubleshoot this? As too excessive logging on 180-node
> >> cluster will make bunch of traffic and bring problems with finding the
> >> right host to check log :(
> >
> >
> > So you don't find any slow request with ceph -s or at
> /var/log/ceph/ceph.log
> > in monitor side?
> >
> > You mentioned "ms tcp read timeout" has some effects on your case, I
> guess
> > we may has some problems on using poll call.
> >
> > The only potential related PR is https://github.com/ceph/ceph/pull/6971
> >
> >>
> >> Few words about underlying configuration:
> >> - ceph:hammer containers in docker 1.9.1 (--net=host)
> >> - gentoo with 3.14.18/3.18.10 kernel.
> >> - 1gbps LAN
> >> - osd using directory in /var
> >> - hosts share osd workload with some other php-fpm's
> >>
> >> The configuration is pretty default, except some osd parameters
> >> configured to reduce scrubbing workload:
> >> [osd]
> >> osd disk thread ioprio class = idle
> >> osd disk thread ioprio priority = 5
> >> osd recovery max active = 1
> >> osd max backfills = 2
> >>
> >> --
> >>  Sincerely, Alexey Griazin
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > Wheat
>



-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librados: how to get notified when a certain object is created

2016-02-24 Thread Sorin Manolache

On 2016-02-23 20:48, Gregory Farnum wrote:

On Saturday, February 20, 2016, Sorin Manolache mailto:sor...@gmail.com>> wrote:

Hello,

I can set a watch on an object in librados. Does this object have to
exist already at the moment I'm setting the watch on it? What
happens if the object does not exist? Is my watcher valid? Will I
get notified when someone else creates the missing object that I'm
watching and sends a notification?


I believe a watch implicitly creates the object, but you could run it on
a non-existent object and check. ;) but...


I hadn't tried that when I asked the question as I was still down at the 
learning curve of librados. I've tried it in the meantime and 
rados_watch returns ENOENT if the object does not exist.


Sorin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs are crashing during PG replication

2016-02-24 Thread Alexey Sheplyakov
Hi,

> 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In 
> function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, 
> ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread 
> 7fd994825700 time 2016-02-24 04:51:45.870995
osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)
> ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)

This one looks familiar: http://tracker.ceph.com/issues/13098

A quick work around is to unset the cache pool quota:

ceph osd pool set-quota $cache_pool_name max_bytes 0
ceph osd pool set-quota $cache_pool_name max_objects 0

The problem has been properly fixed in infernalis v9.1.0, and
(partially) in hammer (v0.94.6 which will be released soon).

 Best regards,
  Alexey


On Wed, Feb 24, 2016 at 5:37 AM, Alexander Gubanov  wrote:
> Hi,
>
> Every time 2 of 18 OSDs are crashing. I think it's happening when run PG
> replication because crashing only 2 OSDs and every time they're are the
> same.
>
> 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In
> function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread
> 7fd994825700 time 2016-02-24 04:51:45.870995
> osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)
>
>  ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
>  1: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> ceph::buffer::list::iterator&, OSDOp&, std::tr1::shared_ptr&,
> bool)+0xffc) [0x7c1f7c]
>  2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector std::allocator >&)+0x4171) [0x809f21]
>  3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62)
> [0x814622]
>  4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
>  5: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3dd4) [0x81a3f4]
>  6: (ReplicatedPG::do_request(std::tr1::shared_ptr,
> ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
>  7: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
>  8: (OSD::OpWQ::_process(boost::intrusive_ptr,
> ThreadPool::TPHandle&)+0x203) [0x61cba3]
>  9: (ThreadPool::WorkQueueVal,
> std::tr1::shared_ptr >, boost::intrusive_ptr
>>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
>  10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
>  11: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
>  12: (()+0x7dc5) [0x7fd9ad03edc5]
>  13: (clone()+0x6d) [0x7fd9abd2828d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>0/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 keyvaluestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 5 heartbeatmap
>1/ 5 perfcounter
>1/ 5 rgw
>1/10 civetweb
>1/ 5 javaclient
>1/ 5 asok
>1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent 1
>   max_new 1000
>   log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
> 2016-02-24 04:51:45.97 7fd994825700 -1 *** Caught signal (Aborted) **
>  in thread 7fd994825700
>
>  ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
>  1: /usr/bin/ceph-osd() [0x9a24f6]
>  2: (()+0xf100) [0x7fd9ad046100]
>  3: (gsignal()+0x37) [0x7fd9abc675f7]
>  4: (abort()+0x148) [0x7fd9abc68ce8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5]
>  6: (()+0x5e946) [0x7fd9ac569946]
>  7: (()+0x5e973) [0x7fd9ac569973]
>  8: (()+0x5eb93) [0x7fd9ac569b93]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1ef) [0xa8d9df]
>  10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
> ceph::buffer::list::iterator&, OSDOp&, std::tr1::shared_ptr&,
> bool)+0xffc) [0x7c1f7c]
>  11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector std::allocator >&)+0x4171) [0x809f21]
>  12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62)
> [0x814622]
>  13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
>  14: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3dd4)
> [0x81a3f4]
>  15: (ReplicatedPG::do_request(std::tr1::shared_ptr,
> ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
>  16: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
>  17: (OSD::OpWQ::_process(boost::intrusive_ptr,
> ThreadPool::TPHandle&)