Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-03 Thread Łukasz Jagiełło
Hi Radek,

I can confirm, v10.2.7 without 2 commits you mentioned earlier works as
expected.

Best,

On Wed, May 3, 2017 at 2:59 AM, Radoslaw Zarzynski 
wrote:

> Hello Łukasz,
>
> Thanks for your testing and sorry for my mistake. It looks that two commits
> need to be reverted to get the previous behaviour:
>
> The already mentioned one:
>   https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> Its dependency:
>   https://github.com/ceph/ceph/commit/b72fc1b820ede3cd186d887d9d30f7
> f91fe3764b
>
> They have been merged in the same pull request:
>   https://github.com/ceph/ceph/pull/11760
> and form the difference visible between v10.2.5 and v10.2.6 in the matter
> of "in_hosted_domain" handling:
>   https://github.com/ceph/ceph/blame/v10.2.5/src/rgw/rgw_rest.cc#L1773
>   https://github.com/ceph/ceph/blame/v10.2.6/src/rgw/rgw_
> rest.cc#L1781-L1782
>
> I'm really not sure we want to revert them. Still, it can be that they just
> unhide a misconfiguration issue while fixing the problems we had with
> handling of virtual hosted buckets.
>
> Regards,
> Radek
>
> On Wed, May 3, 2017 at 3:12 AM, Łukasz Jagiełło
>  wrote:
> > Hi,
> >
> > I tried today revert [1] from 10.2.7 but the problem is still there even
> > without the change. Revert to 10.2.5 fix the issue instantly.
> >
> > https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> >
> > On Thu, Apr 27, 2017 at 4:53 AM, Radoslaw Zarzynski
> >  wrote:
> >>
> >> Bingo! From the 10.2.5-admin:
> >>
> >>   GET
> >>
> >>   Thu, 27 Apr 2017 07:49:59 GMT
> >>   /
> >>
> >> And also:
> >>
> >>   2017-04-27 09:49:59.117447 7f4a90ff9700 20 subdomain= domain=
> >> in_hosted_domain=0 in_hosted_domain_s3website=0
> >>   2017-04-27 09:49:59.117449 7f4a90ff9700 20 final domain/bucket
> >> subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
> >> s->info.domain= s->info.request_uri=/
> >>
> >> The most interesting part is the "final ... in_hosted_domain=0".
> >> It looks we need to dig around RGWREST::preprocess(),
> >> rgw_find_host_in_domains() & company.
> >>
> >> There is a commit introduced in v10.2.6 that touches this area [1].
> >> I'm definitely not saying it's the root cause. It might be that a change
> >> in the code just unhidden a configuration issue [2].
> >>
> >> I will talk about the problem on the today's sync-up.
> >>
> >> Thanks for the logs!
> >> Regards,
> >> Radek
> >>
> >> [1]
> >> https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> >> [2] http://tracker.ceph.com/issues/17440
> >>
> >> On Thu, Apr 27, 2017 at 10:11 AM, Ben Morrice 
> wrote:
> >> > Hello Radek,
> >> >
> >> > Thank-you for your analysis so far! Please find attached logs for both
> >> > the
> >> > admin user and a keystone backed user from 10.2.5 (same host as
> before,
> >> > I
> >> > have simply downgraded the packages). Both users can authenticate and
> >> > list
> >> > buckets on 10.2.5.
> >> >
> >> > Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so
> >> > the
> >> > bug i'm hitting looks like it was introduced in 10.2.6
> >> >
> >> > Kind regards,
> >> >
> >> > Ben Morrice
> >> >
> >> > 
> __
> >> > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
> >> > EPFL / BBP
> >> > Biotech Campus
> >> > Chemin des Mines 9
> >> > 1202 Geneva
> >> > Switzerland
> >> >
> >> > On 27/04/17 04:45, Radoslaw Zarzynski wrote:
> >> >>
> >> >> Thanks for the logs, Ben.
> >> >>
> >> >> It looks that two completely different authenticators have failed:
> >> >> the local, RADOS-backed auth (admin.txt) and Keystone-based
> >> >> one as well. In the second case I'm pretty sure that Keystone has
> >> >> rejected [1][2] to authenticate provided signature/StringToSign.
> >> >> RGW tried to fallback to the local auth which obviously didn't have
> >> >> any chance as the credentials were stored remotely. This explains
> >> >> th

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-02 Thread Łukasz Jagiełło
adosGW's log for the failed request?
> >>> I mean the lines starting from header listing, through the start
> >>> marker ("== starting new request...") till the end marker?
> >>>
> >>> At the moment we can't see any details related to the signature
> >>> calculation.
> >>>
> >>> Regards,
> >>> Radek
> >>>
> >>> On Thu, Apr 20, 2017 at 5:08 PM, Ben Morrice 
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7
> >>>> (RHEL7)
> >>>> and authentication is in a very bad state. This installation is part
> of
> >>>> a
> >>>> multigw configuration, and I have just updated one host in the
> secondary
> >>>> zone (all other hosts/zones are running 10.2.5).
> >>>>
> >>>> On the 10.2.7 server I cannot authenticate as a user (normally backed
> by
> >>>> OpenStack Keystone), but even worse I can also not authenticate with
> an
> >>>> admin user.
> >>>>
> >>>> Please see [1] for the results of performing a list bucket operation
> >>>> with
> >>>> python boto (script works against rgw 10.2.5)
> >>>>
> >>>> Also, if I try to authenticate from the 'master' rgw zone with a
> >>>> "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get:
> >>>>
> >>>> "ERROR: failed to fetch datalog info"
> >>>>
> >>>> "failed to retrieve sync info: (13) Permission denied"
> >>>>
> >>>> The above errors correlates to the errors in the log on the server
> >>>> running
> >>>> 10.2.7 (debug level 20) at [2]
> >>>>
> >>>> I'm not sure what I have done wrong or can try next?
> >>>>
> >>>> By the way, downgrading the packages from 10.2.7 to 10.2.5 returns
> >>>> authentication functionality
> >>>>
> >>>> [1]
> >>>> boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
> >>>>  >>>>
> >>>> encoding="UTF-8"?>SignatureDoesNotMatch Code>tx4-0058f8c86a-3fa2959-bbp-gva-
> secondary3fa2959-bbp-gva-secondary-bbp-
> gva
> >>>>
> >>>> [2]
> >>>> /bbpsrvc15.cscs.ch/admin/log
> >>>> 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated
> >>>> digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM=
> >>>> 2017-04-20 16:43:04.916255 7ff87c6c0700 15
> >>>> auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU=
> >>>> 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34
> >>>> 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request
> >>>> 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER:
> >>>> err_no=-2027 new_err_no=-2027
> >>>> 2017-04-20 16:43:04.916329 7ff87c6c0700  2 req 354:0.052585:s3:GET
> >>>> /admin/log:get_obj:op status=0
> >>>> 2017-04-20 16:43:04.916339 7ff87c6c0700  2 req 354:0.052595:s3:GET
> >>>> /admin/log:get_obj:http status=403
> >>>> 2017-04-20 16:43:04.916343 7ff87c6c0700  1 == req done
> >>>> req=0x7ff87c6ba710 op status=0 http_status=403 ==
> >>>> 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned
> >>>> -2027
> >>>> 2017-04-20 16:43:04.916390 7ff87c6c0700  1 civetweb: 0x7ff990015610:
> >>>> 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1"
> >>>> 403 0
> >>>> - -
> >>>> 2017-04-20 16:43:04.917212 7ff9777e6700 20
> >>>> cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate()
> >>>> 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync:
> >>>> incremental_sync:1544: shard_id=20
> >>>> mdlog_marker=1_1492686039.901886_5551978.1
> >>>> sync_marker.marker=1_1492686039.901886_5551978.1 period_marker=
> >>>> 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync:
> >>>> incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20
> >>>> 2017-04-20 16:43:04.917236 7ff9777e6700 20
> >>>> cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine:
> >>>> operate()
>

Re: [ceph-users] XFS no space left on device

2016-10-25 Thread Łukasz Jagiełło
Hi,

What is your "allocsize"? If it's default you're allocating 64KiB per file.
If you have lots of small files (check file size distribution) you may need
to use a smaller allocsize (eg. 4KiB).

Best,

On Tue, Oct 25, 2016 at 4:37 AM, Василий Ангапов  wrote:

> Hello,
>
> I got Ceph 10.2.1 cluster with 10 nodes, each having 29 * 6TB OSDs.
> Yesterday I found that 3 OSDs were down and out with 89% space
> utilization.
> In logs there is:
> 2016-10-24 22:36:37.599253 7f8309c5e800  0 ceph version 10.2.1
> (3a66dd4f30852819c1bdaa8ec23c795d4ad77269), process ceph-osd, pid
> 2602081
> 2016-10-24 22:36:37.600129 7f8309c5e800  0 pidfile_write: ignore empty
> --pid-file
> 2016-10-24 22:36:37.635769 7f8309c5e800  0
> filestore(/var/lib/ceph/osd/ceph-123) backend xfs (magic 0x58465342)
> 2016-10-24 22:36:37.635805 7f8309c5e800 -1
> genericfilestorebackend(/var/lib/ceph/osd/ceph-123) detect_features:
> unable to create /var/lib/ceph/osd/ceph-123/fiemap_test: (28) No space
> left on device
> 2016-10-24 22:36:37.635814 7f8309c5e800 -1
> filestore(/var/lib/ceph/osd/ceph-123) _detect_fs: detect_features
> error: (28) No space left on device
> 2016-10-24 22:36:37.635818 7f8309c5e800 -1
> filestore(/var/lib/ceph/osd/ceph-123) FileStore::mount: error in
> _detect_fs: (28) No space left on device
> 2016-10-24 22:36:37.635824 7f8309c5e800 -1 osd.123 0 OSD:init: unable
> to mount object store
> 2016-10-24 22:36:37.635827 7f8309c5e800 -1 ESC[0;31m ** ERROR: osd
> init failed: (28) No space left on deviceESC[0m
>
> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -h
> /var/lib/ceph/osd/ceph-123
> FilesystemSize  Used Avail Use% Mounted on
> /dev/mapper/disk23p1  5.5T  4.9T  651G  89% /var/lib/ceph/osd/ceph-123
>
> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -i
> /var/lib/ceph/osd/ceph-123
> Filesystem  InodesIUsed IFree IUse% Mounted on
> /dev/mapper/disk23p1 146513024 22074752 124438272   16%
> /var/lib/ceph/osd/ceph-123
>
> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ touch 123
> touch: cannot touch ‘123’: No space left on device
>
> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ grep ceph-123 /proc/mounts
> /dev/mapper/disk23p1 /var/lib/ceph/osd/ceph-123 xfs
> rw,noatime,attr2,inode64,noquota 0 0
>
> The same situation is for all three down OSDs. OSD can be unmounted
> and mounted without problem:
> root@ed-ds-c178:[~]:$ umount /var/lib/ceph/osd/ceph-123
> root@ed-ds-c178:[~]:$
> root@ed-ds-c178:[~]:$ mount /var/lib/ceph/osd/ceph-123
> root@ed-ds-c178:[~]:$ touch /var/lib/ceph/osd/ceph-123/123
> touch: cannot touch ‘/var/lib/ceph/osd/ceph-123/123’: No space left on
> device
>
> xfs_repair gives no error for FS.
>
> Kernel is
> root@ed-ds-c178:[~]:$ uname -r
> 4.7.0-1.el7.wg.x86_64
>
> What else can I do to rectify that situation?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd inside LXC

2016-07-13 Thread Łukasz Jagiełło
Hi,

Just wonder why you want each OSD inside separate LXC container? Just to
pin them to specific cpus?

On Tue, Jul 12, 2016 at 6:33 AM, Guillaume Comte <
guillaume.co...@blade-group.com> wrote:

> Hi,
>
> I am currently defining a storage architecture based on ceph, and i wish
> to know if i don't misunderstood some stuffs.
>
> So, i plan to deploy for each HDD of each servers as much as OSD as free
> harddrive, each OSD will be inside a LXC container.
>
> Then, i wish to turn the server itself as a rbd client for objects created
> in the pools, i wish also to have a SSD to activate caching (and also store
> osd logs as well)
>
> The idea behind is to create CRUSH rules which will maintain a set of
> object within a couple of servers connected to the same pair of switch in
> order to have the best proximity between where i store the object and where
> i use them (i don't bother having a very high insurance to not loose data
> if my whole rack powerdown)
>
> Am i already on the wrong track ? Is there a way to guaranty proximity of
> data with ceph whitout making twisted configuration as i am ready to do ?
>
> Thks in advance,
>
> Regards
> --
> *Guillaume Comte*
> 06 25 85 02 02  | guillaume.co...@blade-group.com
> 
> 90 avenue des Ternes, 75 017 Paris
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Missing bucket

2015-11-13 Thread Łukasz Jagiełło
On Fri, Nov 13, 2015 at 1:47 PM, Yehuda Sadeh-Weinraub 
wrote:

> >> >> > Recently I've noticed a problem with one of our buckets:
> >> >> >
> >> >> > I cannot list or stats on a bucket:
> >> >> > #v+
> >> >> > root@ceph-s1:~# radosgw-admin bucket stats
> >> >> > --bucket=problematic_bucket
> >> >> > error getting bucket stats ret=-22
> >> >>
> >> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
> >> >> radosgw-admin version mismatch vs. version that osds are running. Try
> >> >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
> >> >> more info about the source of this error.
> >> >
> >> >
> >> > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
> >> >
> >> > Result of more verbose debug.
> >> >
> >> 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
> >> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
> >> .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
> >> ?+0 0x15f3740 con 0x15daa60
> >> 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
> >> osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
> >> .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
> >> 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
> >> error getting bucket stats ret=-22
> >>
> >> You can try taking a look at osd.12 logs. Any chance osd.12 and
> >> radosgw-admin aren't running the same major version? (more likely
> >> radosgw-admin running a newer version).
> >
> >
> > From last 12h it's just deep-scrub info
> > #v+
> > 2015-11-13 08:23:00.690076 7fc4c62ee700  0 log [INF] : 15.621 deep-scrub
> ok
> > #v-
>
> This is unrelated.
>
> >
> > But yesterday there was a big rebalance and a host with that osd was
> > rebuilding from scratch.
> >
> > We're running the same version (ceph, rados) across entire cluster just
> > double check it.
> >
>
> what does 'radosgw-admin --version' return?
>

Everywhere the same:
ceph version 0.67.11 (bc8b67bef6309a32361be76cd11fb56b057ea9d2)

-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Missing bucket

2015-11-13 Thread Łukasz Jagiełło
>
> >> > Recently I've noticed a problem with one of our buckets:
> >> >
> >> > I cannot list or stats on a bucket:
> >> > #v+
> >> > root@ceph-s1:~# radosgw-admin bucket stats
> --bucket=problematic_bucket
> >> > error getting bucket stats ret=-22
> >>
> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
> >> radosgw-admin version mismatch vs. version that osds are running. Try
> >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
> >> more info about the source of this error.
> >
> >
> > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
> >
> > Result of more verbose debug.
> >
> 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
> .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
> ?+0 0x15f3740 con 0x15daa60
> 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
> osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
> .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
> 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
> error getting bucket stats ret=-22
>
> You can try taking a look at osd.12 logs. Any chance osd.12 and
> radosgw-admin aren't running the same major version? (more likely
> radosgw-admin running a newer version).


>From last 12h it's just deep-scrub info
#v+
2015-11-13 08:23:00.690076 7fc4c62ee700  0 log [INF] : 15.621 deep-scrub ok
#v-

But yesterday there was a big rebalance and a host with that osd was
rebuilding from scratch.

We're running the same version (ceph, rados) across entire cluster just
double check it.

-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Missing bucket

2015-11-13 Thread Łukasz Jagiełło
On Fri, Nov 13, 2015 at 1:07 PM, Yehuda Sadeh-Weinraub 
wrote:

> > Recently I've noticed a problem with one of our buckets:
> >
> > I cannot list or stats on a bucket:
> > #v+
> > root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
> > error getting bucket stats ret=-22
>
> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
> radosgw-admin version mismatch vs. version that osds are running. Try
> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
> more info about the source of this error.


https://gist.github.com/ljagiello/06a4dd1f34a776e38f77

Result of more verbose debug.


> You're really behind.
>

I know, we've got scheduled update for 2016 it's a big project to ensure
everything is fine.

-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Missing bucket

2015-11-13 Thread Łukasz Jagiełło
Hi all,

Recently I've noticed a problem with one of our buckets:

I cannot list or stats on a bucket:
#v+
root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
error getting bucket stats ret=-22
➜  ~  s3cmd -c /etc/s3cmd/prod.cfg ls
s3://problematic_bucket/images/e/e0/file.png
ERROR: S3 error: None
#v-

,but direct request for an object is working perfectly fine:
#v+
➜  ~  curl -svo /dev/null
http://ceph-s1/problematic_bucket/images/e/e0/file.png
[…]
< HTTP/1.1 200 OK
< Content-Type: image/png
< Content-Length: 379906
[…]
#v-

Any solution how to fix it? We're still running ceph 0.67.11

Thanks,
-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Łukasz Jagiełło
Hi,

rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms

04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
Network Connection (rev 01)

at both hosts and Arista 7050S-64 between.

Both hosts were part of active ceph cluster.


On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander  wrote:

> Hello,
>
> While working at a customer I've ran into a 10GbE latency which seems
> high to me.
>
> I have access to a couple of Ceph cluster and I ran a simple ping test:
>
> $ ping -s 8192 -c 100 -n 
>
> Two results I got:
>
> rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
> rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
>
> Both these environment are running with Intel 82599ES 10Gbit cards in
> LACP. One with Extreme Networks switches, the other with Arista.
>
> Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
> seeing:
>
> rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
>
> As you can see, the Cisco Nexus network has high latency compared to the
> other setup.
>
> You would say the switches are to blame, but we also tried with a direct
> TwinAx connection, but that didn't help.
>
> This setup also uses the Intel 82599ES cards, so the cards don't seem to
> be the problem.
>
> The MTU is set to 9000 on all these networks and cards.
>
> I was wondering, others with a Ceph cluster running on 10GbE, could you
> perform a simple network latency test like this? I'd like to compare the
> results.
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-27 Thread Łukasz Jagiełło
Hi,

I've got 16 nodes cluster ssd only. Each node is 6x600GB, 10Gbit uplink.

We're using Intel 320 series. Cluster is running now half year as
production and no problems with ssds.

Replication x3 (main DC) and x2 in backup DC (10 nodes cluster there = less
space).

>From what I've notice it's just easier to remove entire node from cluster
and rebuild it comparing to work on it or try to fix it. With 10gbit uplink
3.6TB is rebalanced after 40-60min. During rebalance we've been able to
saturated 10Gbit network.


On Thu, May 22, 2014 at 9:00 AM, Alexandre DERUMIER wrote:

> Hi,
>
> I'm looking to build a full osd ssd cluster, with this config:
>
> 6 nodes,
>
> each node 10 osd/ ssd drives (dual 10gbit network).  (1journal + datas on
> each osd)
>
> ssd drive will be entreprise grade,
>
> maybe intel sc3500 800GB (well known ssd)
>
> or new Samsung SSD PM853T 960GB (don't have too much info about it for the
> moment, but price seem a little bit lower than intel)
>
>
> I would like to have some advise on replication level,
>
>
> Maybe somebody have experience with intel sc3500 failure rate ?
> How many chance to have 2 failing disks on 2 differents nodes at the same
> time (murphy's law ;).
>
>
> I think in case of disk failure, pgs should replicate fast with 10gbits
> links.
>
>
> So the question is:
>
> 2x or 3x ?
>
>
> Regards,
>
> Alexandre
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD

2013-12-14 Thread Łukasz Jagiełło
Hi,

Inodes looks ok, 9% used.


#v+
/dev/sdc1  69752420 6227392 635250289% /tmp/sdc1

touch /tmp/sdc1/swfdedefd
touch: cannot touch `/tmp/sdc1/swfdedefd': No space left on device

root@dfs-s2:~# xfs_db -r "-c freesp -s" /dev/sdc1
   from  to extents  blockspct
  1   1   68631   68631   0.22
  2   3  220424  548648   1.73
  4   7  426549 2370963   7.47
  8  15 2224898 28577194  89.99
 16  318496  189768   0.60
total free extents 2948998
total free blocks 31755204
average free extent size 10.7681
#v-


2013/12/14 Sean Crosby 

> Since you are using XFS, you may have run out of inodes on the device and
> need to enable the inode64 option.
>
> What does `df -i` say?
>
> Sean
>
>
> On 13 December 2013 00:51, Łukasz Jagiełło wrote:
>
>> Hi,
>>
>> 72 OSDs (12 servers with 6 OSD per server) and 2000 placement groups.
>> Replica factor is 3.
>>
>>
>> 2013/12/12 Pierre BLONDEAU 
>>
>>> Hi,
>>>
>>> How many osd did you have ?
>>>
>>> It could be a problem of placement group :
>>> http://article.gmane.org/gmane.comp.file-systems.ceph.
>>> user/2261/match=pierre+blondeau
>>>
>>> Regards.
>>>
>>> Le 10/12/2013 23:23, Łukasz Jagiełło a écrit :
>>>
>>>> Hi,
>>>>
>>>> Today my ceph cluster suffer of such problem:
>>>>
>>>> #v+
>>>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# df -h | grep ceph-1
>>>> /dev/sdc1   559G  438G  122G  79% /var/lib/ceph/osd/ceph-1
>>>> #v-
>>>>
>>>> Disk report 122GB free space. Looks ok but:
>>>>
>>>> #v+
>>>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# touch aaa
>>>> touch: cannot touch `aaa': No space left on device
>>>> #v-
>>>>
>>>> Few more of data:
>>>> #v+
>>>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# mount | grep ceph-1
>>>> /dev/sdc1 on /var/lib/ceph/osd/ceph-1 type xfs (rw,noatime)
>>>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -r "-c freesp -s"
>>>> /dev/sdc1
>>>> from  to extents  blockspct
>>>>1   1  366476  366476   1.54
>>>>2   3  466928 1133786   4.76
>>>>    4   7  536691 2901804  12.18
>>>>8  15 1554873 19423430  81.52
>>>> total free extents 2924968
>>>> total free blocks 23825496
>>>> average free extent size 8.14556
>>>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -c frag -r /dev/sdc1
>>>> actual 9043587, ideal 8926438, fragmentation factor 1.30%
>>>> #v-
>>>>
>>>> Any possible reason of that, and how to avoid that in future ? Someone
>>>> earlier mention it's problem with fragmentation but 122GB ?
>>>>
>>>> Best Regards
>>>> --
>>>> Łukasz Jagiełło
>>>> lukaszjagielloorg
>>>>
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>> --
>>> --
>>> Pierre BLONDEAU
>>> Administrateur Systèmes & réseaux
>>> Université de Caen
>>> Laboratoire GREYC, Département d'informatique
>>>
>>> tel : 02 31 56 75 42
>>> bureau  : Campus 2, Science 3, 406
>>> --
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> Łukasz Jagiełło
>> lukaszjagielloorg
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>


-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD

2013-12-12 Thread Łukasz Jagiełło
Hi,

72 OSDs (12 servers with 6 OSD per server) and 2000 placement groups.
Replica factor is 3.


2013/12/12 Pierre BLONDEAU 

> Hi,
>
> How many osd did you have ?
>
> It could be a problem of placement group :
> http://article.gmane.org/gmane.comp.file-systems.ceph.
> user/2261/match=pierre+blondeau
>
> Regards.
>
> Le 10/12/2013 23:23, Łukasz Jagiełło a écrit :
>
>> Hi,
>>
>> Today my ceph cluster suffer of such problem:
>>
>> #v+
>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# df -h | grep ceph-1
>> /dev/sdc1   559G  438G  122G  79% /var/lib/ceph/osd/ceph-1
>> #v-
>>
>> Disk report 122GB free space. Looks ok but:
>>
>> #v+
>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# touch aaa
>> touch: cannot touch `aaa': No space left on device
>> #v-
>>
>> Few more of data:
>> #v+
>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# mount | grep ceph-1
>> /dev/sdc1 on /var/lib/ceph/osd/ceph-1 type xfs (rw,noatime)
>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -r "-c freesp -s" /dev/sdc1
>> from  to extents  blockspct
>>1   1  366476  366476   1.54
>>2   3  466928 1133786   4.76
>>4   7  536691 2901804  12.18
>>8  15 1554873 19423430  81.52
>> total free extents 2924968
>> total free blocks 23825496
>> average free extent size 8.14556
>> root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -c frag -r /dev/sdc1
>> actual 9043587, ideal 8926438, fragmentation factor 1.30%
>> #v-
>>
>> Any possible reason of that, and how to avoid that in future ? Someone
>> earlier mention it's problem with fragmentation but 122GB ?
>>
>> Best Regards
>> --
>> Łukasz Jagiełło
>> lukaszjagielloorg
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> --
> Pierre BLONDEAU
> Administrateur Systèmes & réseaux
> Université de Caen
> Laboratoire GREYC, Département d'informatique
>
> tel : 02 31 56 75 42
> bureau  : Campus 2, Science 3, 406
> --
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Full OSD

2013-12-10 Thread Łukasz Jagiełło
Hi,

Today my ceph cluster suffer of such problem:

#v+
root@dfs-s1:/var/lib/ceph/osd/ceph-1# df -h | grep ceph-1
/dev/sdc1   559G  438G  122G  79% /var/lib/ceph/osd/ceph-1
#v-

Disk report 122GB free space. Looks ok but:

#v+
root@dfs-s1:/var/lib/ceph/osd/ceph-1# touch aaa
touch: cannot touch `aaa': No space left on device
#v-

Few more of data:
#v+
root@dfs-s1:/var/lib/ceph/osd/ceph-1# mount | grep ceph-1
/dev/sdc1 on /var/lib/ceph/osd/ceph-1 type xfs (rw,noatime)
root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -r "-c freesp -s" /dev/sdc1
   from  to extents  blockspct
  1   1  366476  366476   1.54
  2   3  466928 1133786   4.76
  4   7  536691 2901804  12.18
  8  15 1554873 19423430  81.52
total free extents 2924968
total free blocks 23825496
average free extent size 8.14556
root@dfs-s1:/var/lib/ceph/osd/ceph-1# xfs_db -c frag -r /dev/sdc1
actual 9043587, ideal 8926438, fragmentation factor 1.30%
#v-

Any possible reason of that, and how to avoid that in future ? Someone
earlier mention it's problem with fragmentation but 122GB ?

Best Regards
-- 
Łukasz Jagiełło
lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com