date:20170503

[ceph-users] corrupted rbd filesystems since jewel

2017-05-03 Thread Stefan Priebe - Profihost AG

Hello,

since we've upgraded from hammer to jewel 10.2.7 and enabled
exclusive-lock,object-map,fast-diff we've problems with corrupting VM
filesystems.

Sometimes the VMs are just crashing with FS errors and a restart can
solve the problem. Sometimes the whole VM is not even bootable and we
need to import a backup.

All of them have the same problem that you can't revert to an older
snapshot. The rbd command just hangs at 99% forever.

Is this a known issue - anythink we can check?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph newbie thoughts and questions

2017-05-03 Thread David Turner

The clients will need to be able to contact the mons and the osds.  NEVER
use 2 mons.  Mons are a quorum and work best with odd numbers (1, 3, 5,
etc).  1 mon is better than 2 mons.  It is better to remove the raid and
put the individual disks as OSDs.  Ceph handles the redundancy through
replica copies.  It is much better to have a third node for failure domain
reasons so you can have 3 copies of your data and have 1 in each of the 3
servers.  The OSDs store their information in broken up objects divvied up
into PGs that are assigned to the OSDs.  You would need to set up CephFS
and rsync the data into it to migrate the data into ceph.

I don't usually recommend this, but you might prefer Gluster.  You would
use the raided disks as the brick in each node.  Set it up to have 2 copies
(better to have 3 but you only have 2 nodes).  Each server can be used to
NFS map the gluster mount point.  The files are stored as flat files on the
bricks, but you would still need to create the gluster first and then rsync
the data into the mounted gluster instead of directly onto the disk.  With
this you don't have to worry about the mon service, mds service, osd
services, balancing the crush map, etc.  Gluster of course has its own
complexities and limitations, but it might be closer to what you're looking
for right now.

On Wed, May 3, 2017 at 4:06 PM Marcus Pedersén 
wrote:

> Hello everybody!
>
> I am a newbie on ceph and I really like it and want to try it out.
> I have a couple of thoughts and questions after reading documentation and
> need some help to see that I am on the right path.
>
> Today I have two file servers in production that I want to start my ceph
> fs on and expand from that.
> I want these servers to function as a failover cluster and as I see it I
> will be able to do it with ceph.
>
> To get a failover cluster without a single point of failure I need at
> least 2 monitors, 2 mds and 2 osd (my existing file servers), right?
> Today, both of the file servers use a raid on 8 disks. Do I format my raid
> xfs and run my osds on the raid?
> Or do I split up my raid and add the disks directly to the osds?
>
> When I connect clients to my ceph fs are they talking to the mds or are
> the clients talking to the ods directly as well?
> If the client just talk to the mds then the ods and the monitor can be in
> a separate network and the mds connected both to the client network and the
> local "ceph" network.
>
> Today, we have about 11TB data on these file servers, how do I move the
> data to the ceph fs? Is it possible to rsync to one of the ods disks, start
> the ods daemon and let it replicate itself?
>
> Is it possible to set up the ceph fs with 2 mds, 2 monitors and 1 ods and
> add the second ods later?
> This is to be able to have one file server in production, config ceph and
> test with the other, swap to the ceph system and when it is up and running
> add the second ods.
>
> Of course I will test this out before I bring it to production.
>
> Many thanks in advance!
>
> Best regards
> Marcus
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-03 Thread Łukasz Jagiełło

Hi Radek,

I can confirm, v10.2.7 without 2 commits you mentioned earlier works as
expected.

Best,

On Wed, May 3, 2017 at 2:59 AM, Radoslaw Zarzynski 
wrote:

> Hello Łukasz,
>
> Thanks for your testing and sorry for my mistake. It looks that two commits
> need to be reverted to get the previous behaviour:
>
> The already mentioned one:
>   https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> Its dependency:
>   https://github.com/ceph/ceph/commit/b72fc1b820ede3cd186d887d9d30f7
> f91fe3764b
>
> They have been merged in the same pull request:
>   https://github.com/ceph/ceph/pull/11760
> and form the difference visible between v10.2.5 and v10.2.6 in the matter
> of "in_hosted_domain" handling:
>   https://github.com/ceph/ceph/blame/v10.2.5/src/rgw/rgw_rest.cc#L1773
>   https://github.com/ceph/ceph/blame/v10.2.6/src/rgw/rgw_
> rest.cc#L1781-L1782
>
> I'm really not sure we want to revert them. Still, it can be that they just
> unhide a misconfiguration issue while fixing the problems we had with
> handling of virtual hosted buckets.
>
> Regards,
> Radek
>
> On Wed, May 3, 2017 at 3:12 AM, Łukasz Jagiełło
>  wrote:
> > Hi,
> >
> > I tried today revert [1] from 10.2.7 but the problem is still there even
> > without the change. Revert to 10.2.5 fix the issue instantly.
> >
> > https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> >
> > On Thu, Apr 27, 2017 at 4:53 AM, Radoslaw Zarzynski
> >  wrote:
> >>
> >> Bingo! From the 10.2.5-admin:
> >>
> >>   GET
> >>
> >>   Thu, 27 Apr 2017 07:49:59 GMT
> >>   /
> >>
> >> And also:
> >>
> >>   2017-04-27 09:49:59.117447 7f4a90ff9700 20 subdomain= domain=
> >> in_hosted_domain=0 in_hosted_domain_s3website=0
> >>   2017-04-27 09:49:59.117449 7f4a90ff9700 20 final domain/bucket
> >> subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
> >> s->info.domain= s->info.request_uri=/
> >>
> >> The most interesting part is the "final ... in_hosted_domain=0".
> >> It looks we need to dig around RGWREST::preprocess(),
> >> rgw_find_host_in_domains() & company.
> >>
> >> There is a commit introduced in v10.2.6 that touches this area [1].
> >> I'm definitely not saying it's the root cause. It might be that a change
> >> in the code just unhidden a configuration issue [2].
> >>
> >> I will talk about the problem on the today's sync-up.
> >>
> >> Thanks for the logs!
> >> Regards,
> >> Radek
> >>
> >> [1]
> >> https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> >> [2] http://tracker.ceph.com/issues/17440
> >>
> >> On Thu, Apr 27, 2017 at 10:11 AM, Ben Morrice 
> wrote:
> >> > Hello Radek,
> >> >
> >> > Thank-you for your analysis so far! Please find attached logs for both
> >> > the
> >> > admin user and a keystone backed user from 10.2.5 (same host as
> before,
> >> > I
> >> > have simply downgraded the packages). Both users can authenticate and
> >> > list
> >> > buckets on 10.2.5.
> >> >
> >> > Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so
> >> > the
> >> > bug i'm hitting looks like it was introduced in 10.2.6
> >> >
> >> > Kind regards,
> >> >
> >> > Ben Morrice
> >> >
> >> > 
> __
> >> > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
> >> > EPFL / BBP
> >> > Biotech Campus
> >> > Chemin des Mines 9
> >> > 1202 Geneva
> >> > Switzerland
> >> >
> >> > On 27/04/17 04:45, Radoslaw Zarzynski wrote:
> >> >>
> >> >> Thanks for the logs, Ben.
> >> >>
> >> >> It looks that two completely different authenticators have failed:
> >> >> the local, RADOS-backed auth (admin.txt) and Keystone-based
> >> >> one as well. In the second case I'm pretty sure that Keystone has
> >> >> rejected [1][2] to authenticate provided signature/StringToSign.
> >> >> RGW tried to fallback to the local auth which obviously didn't have
> >> >> any chance as the credentials were stored remotely. This explains
> >> >> the presence of "error reading user info" in the user-keystone.txt.
> >> >>
> >> >> What is common for both scenarios are the low-level things related
> >> >> to StringToSign crafting/signature generation at RadosGW's side.
> >> >> Following one has been composed for the request from admin.txt:
> >> >>
> >> >>GET
> >> >>
> >> >>
> >> >>Wed, 26 Apr 2017 09:18:42 GMT
> >> >>/bbpsrvc15.cscs.ch/
> >> >>
> >> >> If you could provide a similar log from v10.2.5, I would be really
> >> >> grateful.
> >> >>
> >> >> Regards,
> >> >> Radek
> >> >>
> >> >> [1]
> >> >>
> >> >> https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_rest_
> s3.cc#L3269-L3272
> >> >> [2] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_
> common.h#L170
> >> >>
> >> >> On Wed, Apr 26, 2017 at 11:29 AM, Morrice Ben 
> >> >> wrote:
> >> >>>
> >> >>> Hello Radek,
> >> >>>
> >> >>> Please find attached

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Haomai Wang

refer to https://github.com/ceph/ceph/pull/5013

On Thu, May 4, 2017 at 7:56 AM, Brad Hubbard  wrote:
> +ceph-devel to get input on whether we want/need to check the value of
> /dev/cpu_dma_latency (platform dependant) at startup and issue a
> warning, or whether documenting this would suffice?
>
> Any doc contribution would be welcomed.
>
> On Wed, May 3, 2017 at 7:18 PM, Blair Bethwaite
>  wrote:
>> On 3 May 2017 at 19:07, Dan van der Ster  wrote:
>>> Whether cpu_dma_latency should be 0 or 1, I'm not sure yet. I assume
>>> your 30% boost was when going from throughput-performance to
>>> dma_latency=0, right? I'm trying to understand what is the incremental
>>> improvement from 1 to 0.
>>
>> Probably minimal given that represents a state transition latency
>> taking only 1us. Presumably the main issue is when the CPU can drop
>> into the lower states and the compounding impact of that over time. I
>> will do some simple characterisation of that over the next couple of
>> weeks and report back...
>>
>> --
>> Cheers,
>> ~Blairo
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Brad Hubbard

+ceph-devel to get input on whether we want/need to check the value of
/dev/cpu_dma_latency (platform dependant) at startup and issue a
warning, or whether documenting this would suffice?

Any doc contribution would be welcomed.

On Wed, May 3, 2017 at 7:18 PM, Blair Bethwaite
 wrote:
> On 3 May 2017 at 19:07, Dan van der Ster  wrote:
>> Whether cpu_dma_latency should be 0 or 1, I'm not sure yet. I assume
>> your 30% boost was when going from throughput-performance to
>> dma_latency=0, right? I'm trying to understand what is the incremental
>> improvement from 1 to 0.
>
> Probably minimal given that represents a state transition latency
> taking only 1us. Presumably the main issue is when the CPU can drop
> into the lower states and the compounding impact of that over time. I
> will do some simple characterisation of that over the next couple of
> weeks and report back...
>
> --
> Cheers,
> ~Blairo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] kernel BUG at fs/ceph/inode.c:1197

2017-05-03 Thread Brad Hubbard

+ceph-devel

On Thu, May 4, 2017 at 12:51 AM, James Poole  wrote:
> Hello,
>
> We currently have a ceph cluster supporting an Openshift cluster using
> cephfs and dynamic rbd provisioning. The client nodes appear to be
> triggering a kernel bug and are rebooting unexpectedly with the same message
> each time. Clients are running CentOS 7:
>
>   KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.10.2.el7.x86_64/vmlinux
> DUMPFILE: /var/crash/127.0.0.1-2017-05-02-09:06:17/vmcore  [PARTIAL
> DUMP]
> CPUS: 16
> DATE: Tue May  2 09:06:15 2017
>   UPTIME: 00:43:14
> LOAD AVERAGE: 1.52, 1.40, 1.48
>TASKS: 7408
> NODENAME: [redacted]
>  RELEASE: 3.10.0-514.10.2.el7.x86_64
>  VERSION: #1 SMP Fri Mar 3 00:04:05 UTC 2017
>  MACHINE: x86_64  (1997 Mhz)
>   MEMORY: 32 GB
>PANIC: "kernel BUG at fs/ceph/inode.c:1197!"
>  PID: 133
>  COMMAND: "kworker/1:1"
> TASK: 8801399bde20  [THREAD_INFO: 880138d0c000]
>  CPU: 1
>STATE: TASK_RUNNING (PANIC)
>
> [ 2596.061470] [ cut here ]
> [ 2596.061499] kernel BUG at fs/ceph/inode.c:1197!
> [ 2596.061516] invalid opcode:  [#1] SMP
> [ 2596.061535] Modules linked in: cfg80211 rfkill binfmt_misc veth ext4
> mbcache jbd2 rbd xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4
> xt_mark ipt_MASQUERADE nf_nat_masquerad
> e_ipv4 xt_addrtype br_netfilter bridge stp llc dm_thin_pool
> dm_persistent_data dm_bio_prison dm_bufio loop fuse ceph libceph
> dns_resolver vport_vxlan vxlan ip6_udp_tunnel udp_tunnel op
> envswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 iptable_nat
> nf_nat_ipv4 nf_nat xt_limit nf_log_ipv4 vmw_vsock_vmci_transport
> nf_log_common xt_LOG vsock nf_conntrack_ipv4 nf_defr
> ag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter
> intel_powerclamp coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
> aesni_intel lrw gf128mul glue_helper ablk_h
> elper cryptd ppdev vmw_balloon pcspkr sg vmw_vmci shpchp i2c_piix4
> parport_pc
> [ 2596.061875]  parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc
> ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
> ata_generic pata_acpi vmwgfx drm_kms_helper
>  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul
> crct10dif_common mptspi crc32c_intel drm ata_piix scsi_transport_spi
> serio_raw mptscsih libata mptbase vmxnet3 i2c_c
> ore fjes dm_mirror dm_region_hash dm_log dm_mod
> [ 2596.062042] CPU: 1 PID: 133 Comm: kworker/1:1 Not tainted
> 3.10.0-514.10.2.el7.x86_64 #1
> [ 2596.062070] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 09/17/2015
> [ 2596.062118] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [ 2596.062140] task: fffdf8801399be20 ti: 880138d0c000 task.ti:
> 880138d0c000
> [ 2596.062166] RIP: 0010:[]  []
> ceph_fill_trace+0x893/0xa00 [ceph]
> [ 2596.062209] RSP: :880138d0fb80  EFLAGS: 00010287
> [ 2596.062230] RAX: 88083b079680 RBX: 8801efe86760 RCX:
> 880095e26c00
> [ 2596.062257] RDX: 880003e8f2c0 RSI: 88053b4c0a08 RDI:
> 88053b4c0a00
> [ 2596.062288] RBP: 880138d0fbf8 R08: 880003e8f2c0 R09:
> 
> [ 2596.062320] R10: 0001 R11: 8804256f3ac0 R12:
> 880121d15400
> [ 2596.062351] R13: 880138dd4000 R14: 88007053f280 R15:
> 8807ee10f2c0
> [ 2596.062379] FS:  () GS:88013b84()
> knlGS:
> [ 2596.062413] CS:  0010 DS:  ES:  CR0: 8005003b
> [ 2596.062436] CR2: 7fe3bab2dcd0 CR3: 00042ebe CR4:
> 001407e0
> [ 2596.062498] DR0:  DR1:  DR2:
> 
> [ 2596.062540] DR3:  DR6: 0ff0 DR7:
> 0400
> [ 2596.062567] Stack:
> [ 2596.062578]  880121d15778 880121d15718 880138d0fc50
> 880095e26e7a
> [ 2596.062612]  880035c12400 88053b4c7800 3b4c0800
> 880138d0fbb8
> [ 2596.062645]  880138d0fbb8 a5446715 88053b4c0800
> 88008238ee10
> [ 2596.062681] Call Trace:
> [ 2596.062703]  [] handle_reply+0x3e8/0xc80 [ceph]
> [ 2596.062736]  [] dispatch+0xd9/0xaf0 [ceph]
> [ 2596.062762]  [] ? kernel_recvmsg+0x3a/0x50
> [ 2596.062790]  [] try_read+0x4bf/0x1220 [libceph]
> [ 2596.062819]  [] ? try_write+0xa13/0xe60 [libceph]
> [ 2596.062851]  [] ceph_con_workfn+0xb9/0x650 [libceph]
> [ 2596.062878]  [] process_one_work+0x17b/0x470
> [ 2596.062902]  [] worker_thread+0x126/0x410
> [ 2596.062925]  [] ? rescuer_thread+0x460/0x460
> [ 2596.062949]  [] kthread+0xcf/0xe0
> [ 2596.064014]  [] ? kthread_create_on_node+0x140/0x140
> [ 2596.065010]  [] ret_from_fork+0x58/0x90
> [ 2596.065955]  [] ? kthread_create_on_node+0x140/0x140
> [ 2596.066945] Code: e8 c3 2b d6 e0 e9 ca fa ff ff 4c 89 fa 48 c7 c6 07 d0
> 60 a0 48 c7 c7 50 24 61 a0 31 c0 e8 a6 2b d6 e0 e9 cd fa ff ff 0f 0b 0f 0b
> <0f> 0b

[ceph-users] Ceph newbie thoughts and questions

2017-05-03 Thread Marcus Pedersén

Hello everybody!

I am a newbie on ceph and I really like it and want to try it out.
I have a couple of thoughts and questions after reading documentation and need 
some help to see that I am on the right path.

Today I have two file servers in production that I want to start my ceph fs on 
and expand from that.
I want these servers to function as a failover cluster and as I see it I will 
be able to do it with ceph.

To get a failover cluster without a single point of failure I need at least 2 
monitors, 2 mds and 2 osd (my existing file servers), right?
Today, both of the file servers use a raid on 8 disks. Do I format my raid xfs 
and run my osds on the raid?
Or do I split up my raid and add the disks directly to the osds?

When I connect clients to my ceph fs are they talking to the mds or are the 
clients talking to the ods directly as well?
If the client just talk to the mds then the ods and the monitor can be in a 
separate network and the mds connected both to the client network and the local 
"ceph" network.

Today, we have about 11TB data on these file servers, how do I move the data to 
the ceph fs? Is it possible to rsync to one of the ods disks, start the ods 
daemon and let it replicate itself?

Is it possible to set up the ceph fs with 2 mds, 2 monitors and 1 ods and add 
the second ods later?
This is to be able to have one file server in production, config ceph and test 
with the other, swap to the ceph system and when it is up and running add the 
second ods.

Of course I will test this out before I bring it to production.

Many thanks in advance!

Best regards
Marcus

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD behavior for reads to a volume with no data written

2017-05-03 Thread Prashant Murthy

Thanks for the detailed explanation, Jason. It makes sense that such
operations would end up being a few metadata lookups only (and the metadata
lookups will hit the disk only if they are not cached in-memory).

Prashant

On Tue, May 2, 2017 at 11:29 AM, Jason Dillaman  wrote:

> If the RBD object map feature is enabled, the read request would never
> even be sent to the OSD if the client knows the backing object doesn't
> exist. However, if the object map feature is disabled, the read request
> will be sent to the OSD.
>
> The OSD isn't my area of expertise, but I can try to explain what occurs
> to the best of my knowledge. There is a small in-memory cache for object
> contexts with the OSD PG -- which includes a whiteout flag to indicate the
> object is deleted. I believe that the whiteout flag is only really used on
> a cache tier to avoid having to attempt to promote a known non-existent
> object. Therefore, in the common case the OSD would query the non-existent
> object from the object store (FileStore or BlueStore).
>
> In FileStore, it will attempt to open the associated backing file for
> object. If the necessary dentries are cached in the kernel, I'd expect that
> the -ENOENT error would be bubbled back to RBD w/o a disk hit. Otherwise,
> the kernel would need to read the associated dentries from disk to
> determine that the object is missing.
>
> In BlueStore, there is another in-memory cache for onodes that can quickly
> detect a missing object. If the object isn't in the cache, the associated
> onode will be looked up within the backing RocksDB. If the RocksDB metadata
> scan for the object's onode fails since the object is missing, the -ENOENT
> error would be bubbled back to the client.
>
>
>
> On Tue, May 2, 2017 at 1:24 PM, Prashant Murthy 
> wrote:
>
>> I wanted to add that I was particularly interested about the behavior
>> with filestore, but was also curious how this works on bluestore.
>>
>> Prashant
>>
>> On Mon, May 1, 2017 at 10:04 PM, Prashant Murthy 
>> wrote:
>>
>>> Hi all,
>>>
>>> I was wondering what happens when reads are issued to an RBD device with
>>> no previously written data. Can somebody explain how such requests flow
>>> from rbd (client) into OSDs and whether any of these reads would hit the
>>> disks at all or whether OSD metadata would recognize that there is no data
>>> at the offsets requested and returns a bunch of zeros back to the client?
>>>
>>> Thanks,
>>> Prashant
>>>
>>> --
>>> Prashant Murthy
>>> Sr Director, Software Engineering | Salesforce
>>> Mobile: 919-961-3041 <(919)%20961-3041>
>>>
>>>
>>> --
>>>
>>
>>
>>
>> --
>> Prashant Murthy
>> Sr Director, Software Engineering | Salesforce
>> Mobile: 919-961-3041 <(919)%20961-3041>
>>
>>
>> --
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Jason
>



-- 
Prashant Murthy
Sr Director, Software Engineering | Salesforce
Mobile: 919-961-3041


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Changing replica size of a running pool

2017-05-03 Thread David Turner

Those are both things that people have done and both work.  Neither is
optimal, but both options work fine.  The best option is to definitely just
get a third node now as you aren't going to be getting it for additional
space from it later.  Your usable space between a 2 node size 2 cluster and
a 3 node size 3 cluster is identical.

If getting a third node is not possible, I would recommend a size 2
min_size 2 configuration.  You will block writes if either of your nodes or
any copy of your data is down, but you will not get into an inconsistent
state that can happen with min_size of 1 (and you can always set the
min_size of a pool to 1 on the fly to perform maintenance).  If you go with
the option to use the failure domain of OSDs instead of hosts and have size
3, then a single node going down will block writes into your cluster.  The
only you gain from this is having 3 physical copies of the data until you
get a third node, but a lot of backfilling when you change the crush rule.

A more complex option that I think would be a better solution than your 2
options would be to create 2 hosts in your crush map for each physical host
and split the OSDs in each host evenly between them.  That way you can have
2 copies of data in a given node, but never all 3 copies.  You have your 3
copies of data and guaranteed that not all 3 are on the same host.
Assuming min_size of 2, you will still block writes if you restart either
node.

If modifying the hosts in your crush map doesn't sound daunting, then I
would recommend going that route... For most people that is more complex
than they'd like to go and I would say size 2 min_size 2 would be the way
to go until you get a third node.  #my2cents

On Wed, May 3, 2017 at 12:41 PM Maximiliano Venesio 
wrote:

> Guys hi.
>
> I have a Jewel Cluster composed by two storage servers which are
> configured on
> the crush map as different buckets to store data.
>
> I've to configure two new pools on this cluster with the certainty
> that i'll have to add more servers in a short term.
>
> Taking into account that the recommended replication size for every
> pool is 3, i'm thinking in two possible scenarios.
>
> 1) Set the replica size in 2 now, and in the future change the replica
> size to 3 on a running pool.
> Is that possible? Can i have serious issues with the rebalance of the
> pgs, changing the pool size on the fly ?
>
> 2) Set the replica size to 3, and change the ruleset to replicate by
> OSD instead of HOST now, and in the future change this rule in the
> ruleset to replicate again by host in a running pool.
> Is that possible? Can i have serious issues with the rebalance of the
> pgs, changing the ruleset in a running pool ?
>
> Which do you think is the best option ?
>
>
> Thanks in advanced.
>
>
> Maximiliano Venesio
> Chief Cloud Architect | NUBELIU
> E-mail: massimo@nubeliu.comCell: +54 9 11 3770 1853
> <+54%209%2011%203770-1853>
> _
> www.nubeliu.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Changing replica size of a running pool

2017-05-03 Thread Maximiliano Venesio

Guys hi.

I have a Jewel Cluster composed by two storage servers which are configured on
the crush map as different buckets to store data.

I've to configure two new pools on this cluster with the certainty
that i'll have to add more servers in a short term.

Taking into account that the recommended replication size for every
pool is 3, i'm thinking in two possible scenarios.

1) Set the replica size in 2 now, and in the future change the replica
size to 3 on a running pool.
Is that possible? Can i have serious issues with the rebalance of the
pgs, changing the pool size on the fly ?

2) Set the replica size to 3, and change the ruleset to replicate by
OSD instead of HOST now, and in the future change this rule in the
ruleset to replicate again by host in a running pool.
Is that possible? Can i have serious issues with the rebalance of the
pgs, changing the ruleset in a running pool ?

Which do you think is the best option ?


Thanks in advanced.


Maximiliano Venesio
Chief Cloud Architect | NUBELIU
E-mail: massimo@nubeliu.comCell: +54 9 11 3770 1853
_
www.nubeliu.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread yiming xie

./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
--rgw-zonegroup=default --rgw-zone=default
2017-05-03 05:34:23.886966 7f9e6e0036c0  0 failed reading zonegroup info: ret 
-2 (2) No such file or directory
couldn't init storage provider

I will commit this issue.
 Thanks for your advice！

> 在 2017年5月3日，下午5:13，Orit Wasserman  写道：
> 
> 
> 
> On Wed, May 3, 2017 at 12:05 PM, yiming xie  > wrote:
>  Cluster c2 have not zone:us-1
> 
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
> --rgw-zonegroup=us --rgw-zone=us-1
> 
> try --rgw-zonegroup==default --rgw-zone=default.
> 
> Could you open a tracker (tracker.ceph.com ) issue 
> about this ,include all the configuration and all the commands you tried.
> it should have worked without parameters or with just zone us-2.
> 
> Orit
> 
> 
> 2017-05-03 05:01:30.219721 7efcff2606c0  1 Cannot find zone id= (name=us-1), 
> switching to local zonegroup configuration
> 2017-05-03 05:01:30.222956 7efcff2606c0 -1 Cannot find zone id= (name=us-1)
> couldn't init storage provider
> 
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone get --rgw-zone=us-1 
> unable to initialize zone: (2) No such file or directory
> 
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone list
> {
> "default_info": "0cae32e6-82d5-489f-adf5-99e92c70f86f",
> "zones": [
> "us-2"
> ]
> }
> 
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup list
> {
> "default_info": "6cc7889a-3f00-4fcd-b4dd-0f5951fbd561",
> "zonegroups": [
> "us2",
> "us"
> ]
> }
> 
> 
>> 在 2017年5月3日，下午4:57，Orit Wasserman > > 写道：
>> 
>> 
>> 
>> On Wed, May 3, 2017 at 11:51 AM, yiming xie > > wrote:
>> I run
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>> 
>> 
>> try adding --rgw-zonegroup=us1 --rgw-zone=us-1
>>  
>> the error：
>> 2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone 
>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local 
>> zonegroup configuration
>> 2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone 
>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
>> couldn't init storage provider
>> 
>> 
>>> 在 2017年5月3日，下午4:45，Orit Wasserman >> > 写道：
>>> 
>>> 
>>> 
>>> On Wed, May 3, 2017 at 11:36 AM, yiming xie >> > wrote:
>>> Hi Orit:
>>>   Thanks for your reply.
>>> 
>>>  when I recreate secondary zone group, there is still a error！
>>> 
>>>   radosgw-admin realm pull --url=http://localhost:8001 
>>>  --access-key=$SYSTEM_ACCESS_KEY 
>>> --secret=$SYSTEM_SECRET_KEY --default
>>>   radosgw-admin period pull --url=http://localhost:8001 
>>>  --access-key=$SYSTEM_ACCESS_KEY 
>>> --secret=$SYSTEM_SECRET_KEY --default
>>>   radosge-admin zonegroup create --rgw-zonegroup=us2 
>>> --endpoints=http://localhost:8002  --rgw-realm=earth
>>>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2 
>>> --endpoints=http://localhost:8002  --master 
>>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>>>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
>>> 
>>> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
>>> 
>>>  
>>> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch 
>>> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:staging.latest_epoch
>>> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
>>> could not find connection for zone or zonegroup id: 
>>> cb8fd49d-9789-4cb3-8010-2523bf46a650
>>> request failed: (2) No such file or directory
>>> failed to commit period: (2) No such file or directory
>>> 
>>> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97405866e76d4)
>>> 
>>> 
>>>   
>>> 
 在 2017年5月3日，下午4:07，Orit Wasserman > 写道：
 
 Hi,
 
 On Wed, May 3, 2017 at 11:00 AM, yiming xie > wrote:
 Hi orit:
 I try to create multiple zonegroups in single realm, but failed. Pls tell 
 me the correct way about creating multiple zonegroups
 Tks a lot!!
 
>> 1.create the firstr zone group on the c1 cluster
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
>> --default
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create 
>> --rgw-zonegroup=us --endpoints=http://localhost:8001 
>>  --master --default
>> 
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
>> --rgw-zone=us-1 --endpoints=http://localhost:8001 
>>

[ceph-users] kernel BUG at fs/ceph/inode.c:1197

2017-05-03 Thread James Poole

Hello,

We currently have a ceph cluster supporting an Openshift cluster using
cephfs and dynamic rbd provisioning. The client nodes appear to be
triggering a kernel bug and are rebooting unexpectedly with the same
message each time. Clients are running CentOS 7:

  KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.10.2.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2017-05-02-09:06:17/vmcore  [PARTIAL
DUMP]
CPUS: 16
DATE: Tue May  2 09:06:15 2017
  UPTIME: 00:43:14
LOAD AVERAGE: 1.52, 1.40, 1.48
   TASKS: 7408
NODENAME: [redacted]
 RELEASE: 3.10.0-514.10.2.el7.x86_64
 VERSION: #1 SMP Fri Mar 3 00:04:05 UTC 2017
 MACHINE: x86_64  (1997 Mhz)
  MEMORY: 32 GB
   PANIC: "kernel BUG at fs/ceph/inode.c:1197!"
 PID: 133
 COMMAND: "kworker/1:1"
TASK: 8801399bde20  [THREAD_INFO: 880138d0c000]
 CPU: 1
   STATE: TASK_RUNNING (PANIC)

[ 2596.061470] [ cut here ]
[ 2596.061499] kernel BUG at fs/ceph/inode.c:1197!
[ 2596.061516] invalid opcode:  [#1] SMP
[ 2596.061535] Modules linked in: cfg80211 rfkill binfmt_misc veth ext4
mbcache jbd2 rbd xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4
xt_mark ipt_MASQUERADE nf_nat_masquerad
e_ipv4 xt_addrtype br_netfilter bridge stp llc dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio loop fuse ceph libceph
dns_resolver vport_vxlan vxlan ip6_udp_tunnel udp_tunnel op
envswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 iptable_nat
nf_nat_ipv4 nf_nat xt_limit nf_log_ipv4 vmw_vsock_vmci_transport
nf_log_common xt_LOG vsock nf_conntrack_ipv4 nf_defr
ag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter
intel_powerclamp coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
aesni_intel lrw gf128mul glue_helper ablk_h
elper cryptd ppdev vmw_balloon pcspkr sg vmw_vmci shpchp i2c_piix4
parport_pc
[ 2596.061875]  parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
ata_generic pata_acpi vmwgfx drm_kms_helper
 syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul
crct10dif_common mptspi crc32c_intel drm ata_piix scsi_transport_spi
serio_raw mptscsih libata mptbase vmxnet3 i2c_c
ore fjes dm_mirror dm_region_hash dm_log dm_mod
[ 2596.062042] CPU: 1 PID: 133 Comm: kworker/1:1 Not tainted
3.10.0-514.10.2.el7.x86_64 #1
[ 2596.062070] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/17/2015
[ 2596.062118] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 2596.062140] task: fffdf8801399be20 ti: 880138d0c000 task.ti:
880138d0c000
[ 2596.062166] RIP: 0010:[]  []
ceph_fill_trace+0x893/0xa00 [ceph]
[ 2596.062209] RSP: :880138d0fb80  EFLAGS: 00010287
[ 2596.062230] RAX: 88083b079680 RBX: 8801efe86760 RCX:
880095e26c00
[ 2596.062257] RDX: 880003e8f2c0 RSI: 88053b4c0a08 RDI:
88053b4c0a00
[ 2596.062288] RBP: 880138d0fbf8 R08: 880003e8f2c0 R09:

[ 2596.062320] R10: 0001 R11: 8804256f3ac0 R12:
880121d15400
[ 2596.062351] R13: 880138dd4000 R14: 88007053f280 R15:
8807ee10f2c0
[ 2596.062379] FS:  () GS:88013b84()
knlGS:
[ 2596.062413] CS:  0010 DS:  ES:  CR0: 8005003b
[ 2596.062436] CR2: 7fe3bab2dcd0 CR3: 00042ebe CR4:
001407e0
[ 2596.062498] DR0:  DR1:  DR2:

[ 2596.062540] DR3:  DR6: 0ff0 DR7:
0400
[ 2596.062567] Stack:
[ 2596.062578]  880121d15778 880121d15718 880138d0fc50
880095e26e7a
[ 2596.062612]  880035c12400 88053b4c7800 3b4c0800
880138d0fbb8
[ 2596.062645]  880138d0fbb8 a5446715 88053b4c0800
88008238ee10
[ 2596.062681] Call Trace:
[ 2596.062703]  [] handle_reply+0x3e8/0xc80 [ceph]
[ 2596.062736]  [] dispatch+0xd9/0xaf0 [ceph]
[ 2596.062762]  [] ? kernel_recvmsg+0x3a/0x50
[ 2596.062790]  [] try_read+0x4bf/0x1220 [libceph]
[ 2596.062819]  [] ? try_write+0xa13/0xe60 [libceph]
[ 2596.062851]  [] ceph_con_workfn+0xb9/0x650 [libceph]
[ 2596.062878]  [] process_one_work+0x17b/0x470
[ 2596.062902]  [] worker_thread+0x126/0x410
[ 2596.062925]  [] ? rescuer_thread+0x460/0x460
[ 2596.062949]  [] kthread+0xcf/0xe0
[ 2596.064014]  [] ? kthread_create_on_node+0x140/0x140
[ 2596.065010]  [] ret_from_fork+0x58/0x90
[ 2596.065955]  [] ? kthread_create_on_node+0x140/0x140
[ 2596.066945] Code: e8 c3 2b d6 e0 e9 ca fa ff ff 4c 89 fa 48 c7 c6 07
d0 60 a0 48 c7 c7 50 24 61 a0 31 c0 e8 a6 2b d6 e0 e9 cd fa ff ff 0f 0b
0f 0b <0f> 0b 0f 0b 48 8b 83 c8 fc ff ff
 4c 8b 89 c8 fc ff ff 4c 89 fa
[ 2596.069127] RIP  [] ceph_fill_trace+0x893/0xa00 [ceph]
[ 2596.070120]  RSP 


Just before the above there are lots of messages similar to this from
all ceph node ips:
[  933.282441] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=

[ceph-users] Spurious 'incorrect nilfs2 checksum' breaking ceph OSD

2017-05-03 Thread Matthew Vernon

Hi,

This has bitten us a couple of times now (such that we're considering
re-building util-linux with the nilfs2 code commented out), so I'm
wondering if anyone else has seen it [and noting the failure mode in
case anyone else is confused in future]

We see this with our setup of rotating media for the osd, NVMe partition
for journal.

What happens is that sometimes an osd refuses to start up, complaining
that /var/lib/ceph/osd/ceph-XXX/journal is missing.

inspecting that file will show it's a broken symlink to an entry in
/dev/disk/by-partuuid:

/var/lib/ceph/osd/ceph-388/journal: broken symbolic link to
/dev/disk/by-partuuid/d2ace848-7e2d-4395-a195-a4428631b333

If you inspect the relevant partition, you see that it has the matching
block id:

blkid /dev/nvme0n1p11
/dev/nvme0n1p11: PARTLABEL="ceph journal"
PARTUUID="d2ace848-7e2d-4395-a195-a4428631b333

And, if you look in syslog, you'll see this:

Jan  4 09:25:29 sto-3-3 systemd-udevd[107317]: incorrect nilfs2 checksum
on /dev/nvme0n1p11

The problem is that the nilfs2 checker is too promiscuous, looking for a
relatively short magic number (0x3434) in 2 different places (location
0x400, and (((part_size-512)/8)*512)). So sometimes you'll be unlucky
and have a ceph journal that matches, at which point the nilfs2 prober
find an invalid checksum, and so systemd/udevd doesn't create the
/dev/disk/by-partuuid link.

You can work around this by making the symlink by hand when the failure
occurs; I also understand that the nilfs2 prober in util_linux 2.29 is
more robust (but that's not in any LTS distributions yet, so I've not
tested it).

Regards,

Matthew

util-linux issue: https://github.com/karelzak/util-linux/issues/361
Ubuntu bug:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1653936


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread Orit Wasserman

On Wed, May 3, 2017 at 12:13 PM, Orit Wasserman  wrote:

>
>
> On Wed, May 3, 2017 at 12:05 PM, yiming xie  wrote:
>
>>  Cluster c2 have not *zone:us-1*
>>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>> --rgw-zonegroup=us --rgw-zone=us-1
>>
>
> try --rgw-zonegroup==default --rgw-zone=default.
>
> Could you open a tracker (tracker.ceph.com) issue about this ,include all
> the configuration and all the commands you tried.
> it should have worked without parameters or with just zone us-2.
>

Apparently there is already an issue and a fix:
http://tracker.ceph.com/issues/19554.
At the moment you need to use the url:
radosgw-admin period commit --url=http://localhost:8001
--access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY


> Orit
>
>
>> 2017-05-03 05:01:30.219721 7efcff2606c0  1 Cannot find zone id=
>> (name=us-1), switching to local zonegroup configuration
>> 2017-05-03 05:01:30.222956 7efcff2606c0 -1 Cannot find zone id=
>> (name=us-1)
>> couldn't init storage provider
>>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone get --rgw-zone=us-1
>> unable to initialize zone: (2) No such file or directory
>>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone list
>> {
>> "default_info": "0cae32e6-82d5-489f-adf5-99e92c70f86f",
>> "zones": [
>> "us-2"
>> ]
>> }
>>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup list
>> {
>> "default_info": "6cc7889a-3f00-4fcd-b4dd-0f5951fbd561",
>> "zonegroups": [
>> "us2",
>> "us"
>> ]
>> }
>>
>>
>> 在 2017年5月3日，下午4:57，Orit Wasserman  写道：
>>
>>
>>
>> On Wed, May 3, 2017 at 11:51 AM, yiming xie  wrote:
>>
>>> I run
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>>>
>>>
>> try adding --rgw-zonegroup=us1 --rgw-zone=us-1
>>
>>
>>> the error：
>>> 2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone
>>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local
>>> zonegroup configuration
>>> 2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone
>>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
>>> couldn't init storage provider
>>>
>>>
>>> 在 2017年5月3日，下午4:45，Orit Wasserman  写道：
>>>
>>>
>>>
>>> On Wed, May 3, 2017 at 11:36 AM, yiming xie  wrote:
>>>
 Hi Orit:
   Thanks for your reply.

  when I recreate secondary zone group, there is still a error！

   radosgw-admin realm pull --url=http://localhost:8001 
 --access-key=$SYSTEM_ACCESS_KEY
 --secret=$SYSTEM_SECRET_KEY --default
   radosgw-admin period pull --url=http://localhost:8001 
 --access-key=$SYSTEM_ACCESS_KEY
 --secret=$SYSTEM_SECRET_KEY --default
   radosge-admin zonegroup create --rgw-zonegroup=us2 --endpoints=
 http://localhost:8002 --rgw-realm=earth
   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2
 --endpoints=http://localhost:8002 --master
 --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
  radosgw-admin period update --commit --rgw-zonegroup=us2
 --rgw-zone=us-2

>>>
>>> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
>>>
>>>

>>> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch
 .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:stagi
 ng.latest_epoch
 Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
 could not find connection for zone or zonegroup id:
 cb8fd49d-9789-4cb3-8010-2523bf46a650
 request failed: (2) No such file or directory
 failed to commit period: (2) No such file or directory

 ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97
 405866e76d4)




 在 2017年5月3日，下午4:07，Orit Wasserman  写道：

 Hi,

 On Wed, May 3, 2017 at 11:00 AM, yiming xie  wrote:

> Hi orit:
> I try to create multiple zonegroups in single realm, but failed. Pls
> tell me the correct way about creating multiple zonegroups
> Tks a lot!!
>
> 1.create the firstr zone group on the c1 cluster
> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create
> --rgw-realm=earth --default
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create
> --rgw-zonegroup=us --endpoints=http://localhost:8001 --master
> --default
>
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create
> --rgw-zonegroup=us --rgw-zone=us-1 --endpoints=http://localhost:8001 
> --master
> --default --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user
> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY --system
> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
> //start rgw
> ./bin/radosgw -c ./run/c1/ceph.conf

Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-03 Thread James Eckersall

Hi David,

Thanks for the reply, it's appreciated.
We're going to upgrade the cluster to Kraken and see if that fixes the
metadata issue.

J

On 2 May 2017 at 17:00, David Zafman  wrote:

>
> James,
>
> You have an omap corruption.  It is likely caused by a bug which has
> already been identified.  A fix for that problem is available but it is
> still pending backport for the next Jewel point release.  All 4 of your
> replicas have different "omap_digest" values.
>
> Instead of the xattrs the ceph-osdomap-tool --command
> dump-objects-with-keys output from OSDs 3, 10, 11, 23 would be interesting
> to compare.
>
> ***WARNING*** Please backup your data before doing any repair attempts.
>
> If you can upgrade to Kraken v11.2.0, it will auto repair the omaps on
> ceph-osd start up.  It will likely still require a ceph pg repair to make
> the 4 replicas consistent with each other.  The final result may be the
> reappearance of removed MDS files in the directory.
>
> If you can recover the data, you could remove the directory entirely and
> rebuild it.  The original bug was triggered during omap deletion typically
> in a large directory which corresponds to an individual unlink in cephfs.
>
> If you can build a branch in github to get the newer ceph-osdomap-tool you
> could try to use it to repair the omaps.
>
> David
>
>
> On 5/2/17 5:05 AM, James Eckersall wrote:
>
> Hi,
>
> I'm having some issues with a ceph cluster.  It's an 8 node cluster rnning
> Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7.
> This cluster provides RBDs and a CephFS filesystem to a number of clients.
>
> ceph health detail is showing the following errors:
>
> pg 2.9 is active+clean+inconsistent, acting [3,10,11,23]
> 1 scrub errors
> mds0: Metadata damage detected
>
>
> The pg 2.9 is in the cephfs_metadata pool (id 2).
>
> I've looked at the OSD logs for OSD 3, which is the primary for this PG,
> but the only thing that appears relating to this PG is the following:
>
> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>
> After initiating a ceph pg repair 2.9, I see the following in the primary
> OSD log:
>
> log_channel(cluster) log [ERR] : 2.9 repair 1 errors, 0 fixed
> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>
>
> I found the below command in a previous ceph-users post.  Running this
> returns the following:
>
> # rados list-inconsistent-obj 2.9
> {"epoch":23738,"inconsistents":[{"object":{"name":"1411194.","nspace":"","locator":"","snap":"head","version":14737091},"errors":["omap_digest_mismatch"],"union_shard_errors":[],"selected_object_info":"2:9758b358:::1411194.:head(33456'14737091
> mds.0.214448:248532 dirty|omap|data_digest s 0 uv 14737091 dd
> )","shards":[{"osd":3,"errors":[],"size":0,"omap_digest":"0x6748eef3","data_digest":"0x"},{"osd":10,"errors":[],"size":0,"omap_digest":"0xa791d5a4","data_digest":"0x"},{"osd":11,"errors":[],"size":0,"omap_digest":"0x53f46ab0","data_digest":"0x"},{"osd":23,"errors":[],"size":0,"omap_digest":"0x97b80594","data_digest":"0x"}]}]}
>
>
> So from this, I think that the object in PG 2.9 with the problem is
> 1411194..
>
> This is what I see on the filesystem on the 4 OSD's this PG resides on:
>
> -rw-r--r--. 1 ceph ceph 0 Apr 27 12:31
> /var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/1411194.__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:05
> /var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/1411194.__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:07
> /var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/1411194.__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 16 03:58
> /var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/1411194.__head_1ACD1AE9__2
>
> The extended attrs are as follows, although I have no idea what any of them
> mean.
>
> # file:
> var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/1411194.__head_1ACD1AE9__2
> user.ceph._=0sDwj5BAM1ABQxMDAwMDQxMTE5NC4wMDAwMDAwMP7/6RrNGgAAAgAGAxwCAP8AAP//ABUn4QAAu4IAAK4m4QAAu4IAAAICFQIAAOSZDAAAsEUDjUoIWUgWsQQCAhUVJ+EAABwAAACNSghZESm8BP///w==
> user.ceph._@1=0s//8=
> user.ceph._layout=0sAgIY//8A
> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAgIjjxFBAAABAAAPdHViZWFtYXRldXIubmV0qdgCAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgACAh4QDUEAAAEAAAoAAAB3cC1jb250ZW50NAMCAhgNDUEAAAEAAAQAAABodG1sIAECAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAICMwAAADkAAQ==
> user.ceph._parent@1
>

[ceph-users] CDM tonight @ 9p EDT

2017-05-03 Thread Patrick McGarry

Hey cephers,

Just a reminder that the monthly Ceph Developer call is tonight at 9p EDT
as we're on an APAC-friendly month.

http://wiki.ceph.com/Planning

Please add any ongoing work to the will so that we can discuss. Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Increase PG or reweight OSDs?

2017-05-03 Thread Luis Periquito

TL;DR: add the OSDs and then split the PGs

They are different commands for different situations...

changing the weight is to have a bigger number of nodes/devices.
Depending on the size of cluster, the size of the devices, how busy it
is and by how much you're growing it will have some different impacts.

Usually people add the devices and slowly increase the OSDs weight to
slowly increase the usage and data on them. There are some way to
improve performance and/or reduce the impact of that operation, like
the number of allowable concurrent backfills and op/backfill priority
settings.

The other one will get *all* of the objects in the existing PGs and
redistribute them in another set of PGs.

The number of PGs doesn't change with the number of OSDs, so the more
OSDs you have to do the splitting the better - because amount of work
is the same, the more workers the least each will have to do.

If impact/IO is important - for example cluster is busy - then you can
additionally set the noscrub/nodeep-scrub flags

On Tue, May 2, 2017 at 7:16 AM, M Ranga Swami Reddy
 wrote:
> Hello,
> I have added 5 new Ceph OSD nodes to my ceph cluster. Here, I wanted
> to increase PG/PGP numbers of pools based new OSDs count. Same time
> need to increase the newly added OSDs weight from 0 -> 1.
>
> My question is:
> Do I need to increase the PG/PGP num increase and then reweight the OSDs?
> Or
> Reweight the OSDs first and then increase the PG/PGP num. of pool(s)?
>
> Both will cause the rebalnce...but wanted to understand, which one
> should be preferable to do on running cluster.
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Increase PG or reweight OSDs?

2017-05-03 Thread M Ranga Swami Reddy

+ Ceph-devel

On Tue, May 2, 2017 at 11:46 AM, M Ranga Swami Reddy
 wrote:
> Hello,
> I have added 5 new Ceph OSD nodes to my ceph cluster. Here, I wanted
> to increase PG/PGP numbers of pools based new OSDs count. Same time
> need to increase the newly added OSDs weight from 0 -> 1.
>
> My question is:
> Do I need to increase the PG/PGP num increase and then reweight the OSDs?
> Or
> Reweight the OSDs first and then increase the PG/PGP num. of pool(s)?
>
> Both will cause the rebalnce...but wanted to understand, which one
> should be preferable to do on running cluster.
>
> Thanks
> Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-03 Thread Radoslaw Zarzynski

Hello Łukasz,

Thanks for your testing and sorry for my mistake. It looks that two commits
need to be reverted to get the previous behaviour:

The already mentioned one:
  https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7f4c6d0
Its dependency:
  https://github.com/ceph/ceph/commit/b72fc1b820ede3cd186d887d9d30f7f91fe3764b

They have been merged in the same pull request:
  https://github.com/ceph/ceph/pull/11760
and form the difference visible between v10.2.5 and v10.2.6 in the matter
of "in_hosted_domain" handling:
  https://github.com/ceph/ceph/blame/v10.2.5/src/rgw/rgw_rest.cc#L1773
  https://github.com/ceph/ceph/blame/v10.2.6/src/rgw/rgw_rest.cc#L1781-L1782

I'm really not sure we want to revert them. Still, it can be that they just
unhide a misconfiguration issue while fixing the problems we had with
handling of virtual hosted buckets.

Regards,
Radek

On Wed, May 3, 2017 at 3:12 AM, Łukasz Jagiełło
 wrote:
> Hi,
>
> I tried today revert [1] from 10.2.7 but the problem is still there even
> without the change. Revert to 10.2.5 fix the issue instantly.
>
> https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7f4c6d0
>
> On Thu, Apr 27, 2017 at 4:53 AM, Radoslaw Zarzynski
>  wrote:
>>
>> Bingo! From the 10.2.5-admin:
>>
>>   GET
>>
>>   Thu, 27 Apr 2017 07:49:59 GMT
>>   /
>>
>> And also:
>>
>>   2017-04-27 09:49:59.117447 7f4a90ff9700 20 subdomain= domain=
>> in_hosted_domain=0 in_hosted_domain_s3website=0
>>   2017-04-27 09:49:59.117449 7f4a90ff9700 20 final domain/bucket
>> subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
>> s->info.domain= s->info.request_uri=/
>>
>> The most interesting part is the "final ... in_hosted_domain=0".
>> It looks we need to dig around RGWREST::preprocess(),
>> rgw_find_host_in_domains() & company.
>>
>> There is a commit introduced in v10.2.6 that touches this area [1].
>> I'm definitely not saying it's the root cause. It might be that a change
>> in the code just unhidden a configuration issue [2].
>>
>> I will talk about the problem on the today's sync-up.
>>
>> Thanks for the logs!
>> Regards,
>> Radek
>>
>> [1]
>> https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7f4c6d0
>> [2] http://tracker.ceph.com/issues/17440
>>
>> On Thu, Apr 27, 2017 at 10:11 AM, Ben Morrice  wrote:
>> > Hello Radek,
>> >
>> > Thank-you for your analysis so far! Please find attached logs for both
>> > the
>> > admin user and a keystone backed user from 10.2.5 (same host as before,
>> > I
>> > have simply downgraded the packages). Both users can authenticate and
>> > list
>> > buckets on 10.2.5.
>> >
>> > Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so
>> > the
>> > bug i'm hitting looks like it was introduced in 10.2.6
>> >
>> > Kind regards,
>> >
>> > Ben Morrice
>> >
>> > __
>> > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
>> > EPFL / BBP
>> > Biotech Campus
>> > Chemin des Mines 9
>> > 1202 Geneva
>> > Switzerland
>> >
>> > On 27/04/17 04:45, Radoslaw Zarzynski wrote:
>> >>
>> >> Thanks for the logs, Ben.
>> >>
>> >> It looks that two completely different authenticators have failed:
>> >> the local, RADOS-backed auth (admin.txt) and Keystone-based
>> >> one as well. In the second case I'm pretty sure that Keystone has
>> >> rejected [1][2] to authenticate provided signature/StringToSign.
>> >> RGW tried to fallback to the local auth which obviously didn't have
>> >> any chance as the credentials were stored remotely. This explains
>> >> the presence of "error reading user info" in the user-keystone.txt.
>> >>
>> >> What is common for both scenarios are the low-level things related
>> >> to StringToSign crafting/signature generation at RadosGW's side.
>> >> Following one has been composed for the request from admin.txt:
>> >>
>> >>GET
>> >>
>> >>
>> >>Wed, 26 Apr 2017 09:18:42 GMT
>> >>/bbpsrvc15.cscs.ch/
>> >>
>> >> If you could provide a similar log from v10.2.5, I would be really
>> >> grateful.
>> >>
>> >> Regards,
>> >> Radek
>> >>
>> >> [1]
>> >>
>> >> https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_rest_s3.cc#L3269-L3272
>> >> [2] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_common.h#L170
>> >>
>> >> On Wed, Apr 26, 2017 at 11:29 AM, Morrice Ben 
>> >> wrote:
>> >>>
>> >>> Hello Radek,
>> >>>
>> >>> Please find attached the failed request for both the admin user and a
>> >>> standard user (backed by keystone).
>> >>>
>> >>> Kind regards,
>> >>>
>> >>> Ben Morrice
>> >>>
>> >>> __
>> >>> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
>> >>> EPFL BBP
>> >>> Biotech Campus
>> >>> Chemin des Mines 9
>> >>> 1202 Geneva
>> >>> Switzerland
>> >>>
>> >>> 
>> >>> From: Radoslaw

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Blair Bethwaite

On 3 May 2017 at 19:07, Dan van der Ster  wrote:
> Whether cpu_dma_latency should be 0 or 1, I'm not sure yet. I assume
> your 30% boost was when going from throughput-performance to
> dma_latency=0, right? I'm trying to understand what is the incremental
> improvement from 1 to 0.

Probably minimal given that represents a state transition latency
taking only 1us. Presumably the main issue is when the CPU can drop
into the lower states and the compounding impact of that over time. I
will do some simple characterisation of that over the next couple of
weeks and report back...

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread Orit Wasserman

On Wed, May 3, 2017 at 12:05 PM, yiming xie  wrote:

>  Cluster c2 have not *zone:us-1*
>
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
> --rgw-zonegroup=us --rgw-zone=us-1
>

try --rgw-zonegroup==default --rgw-zone=default.

Could you open a tracker (tracker.ceph.com) issue about this ,include all
the configuration and all the commands you tried.
it should have worked without parameters or with just zone us-2.

Orit


> 2017-05-03 05:01:30.219721 7efcff2606c0  1 Cannot find zone id=
> (name=us-1), switching to local zonegroup configuration
> 2017-05-03 05:01:30.222956 7efcff2606c0 -1 Cannot find zone id= (name=us-1)
> couldn't init storage provider
>
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone get --rgw-zone=us-1
> unable to initialize zone: (2) No such file or directory
>
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone list
> {
> "default_info": "0cae32e6-82d5-489f-adf5-99e92c70f86f",
> "zones": [
> "us-2"
> ]
> }
>
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup list
> {
> "default_info": "6cc7889a-3f00-4fcd-b4dd-0f5951fbd561",
> "zonegroups": [
> "us2",
> "us"
> ]
> }
>
>
> 在 2017年5月3日，下午4:57，Orit Wasserman  写道：
>
>
>
> On Wed, May 3, 2017 at 11:51 AM, yiming xie  wrote:
>
>> I run
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>>
>>
> try adding --rgw-zonegroup=us1 --rgw-zone=us-1
>
>
>> the error：
>> 2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone
>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local
>> zonegroup configuration
>> 2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone
>> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
>> couldn't init storage provider
>>
>>
>> 在 2017年5月3日，下午4:45，Orit Wasserman  写道：
>>
>>
>>
>> On Wed, May 3, 2017 at 11:36 AM, yiming xie  wrote:
>>
>>> Hi Orit:
>>>   Thanks for your reply.
>>>
>>>  when I recreate secondary zone group, there is still a error！
>>>
>>>   radosgw-admin realm pull --url=http://localhost:8001 
>>> --access-key=$SYSTEM_ACCESS_KEY
>>> --secret=$SYSTEM_SECRET_KEY --default
>>>   radosgw-admin period pull --url=http://localhost:8001 
>>> --access-key=$SYSTEM_ACCESS_KEY
>>> --secret=$SYSTEM_SECRET_KEY --default
>>>   radosge-admin zonegroup create --rgw-zonegroup=us2 --endpoints=
>>> http://localhost:8002 --rgw-realm=earth
>>>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2
>>> --endpoints=http://localhost:8002 --master
>>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>>>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
>>>
>>
>> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
>>
>>
>>>
>> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch
>>> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:stagi
>>> ng.latest_epoch
>>> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
>>> could not find connection for zone or zonegroup id:
>>> cb8fd49d-9789-4cb3-8010-2523bf46a650
>>> request failed: (2) No such file or directory
>>> failed to commit period: (2) No such file or directory
>>>
>>> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97
>>> 405866e76d4)
>>>
>>>
>>>
>>>
>>> 在 2017年5月3日，下午4:07，Orit Wasserman  写道：
>>>
>>> Hi,
>>>
>>> On Wed, May 3, 2017 at 11:00 AM, yiming xie  wrote:
>>>
 Hi orit:
 I try to create multiple zonegroups in single realm, but failed. Pls
 tell me the correct way about creating multiple zonegroups
 Tks a lot!!

 1.create the firstr zone group on the c1 cluster
 ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create
 --rgw-realm=earth --default
 ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create
 --rgw-zonegroup=us --endpoints=http://localhost:8001 --master --default

 ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create
 --rgw-zonegroup=us --rgw-zone=us-1 --endpoints=http://localhost:8001 
 --master
 --default --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
 ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user
 --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY
 --secret=$SYSTEM_SECRET_KEY --system
 ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
 //start rgw
 ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log
 --debug-rgw=20 --debug-ms=1 -i client.rgw.us-1 -rgw-zone=us-1

 2.create the scondary zone group on the c2 cluster
 ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull --url=
 http://localhost:8001 --access-key=$SYSTEM_ACCESS_KEY
 --secret=$SYSTEM_SECRET_KEY

 I recommend adding --default that set this realm as default otherwise
>>> you need to run:
>>> radosgw-admin realm default --rgw-realm=earth
>>>
>>> You

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Blair Bethwaite
> Sent: 03 May 2017 09:53
> To: Dan van der Ster 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Intel power tuning - 30% throughput performance 
> increase
> 
> On 3 May 2017 at 18:38, Dan van der Ster  wrote:
> > Seems to work for me, or?
> 
> Yeah now that I read the code more I see it is opening and manipulating 
> /dev/cpu_dma_latency in response to that option, so the
> TODO comment seems to be outdated. I verified tuned latency-performance _is_ 
> doing this properly on our RHEL7.3 nodes (maybe I
> first tested this on 7.2 or just missed something then). In any case, I think 
> we're agreed that Ceph should recommend that
profile.

I did some testing on this last year, by forcing the C-state to C1 I found that 
small writes benefited the most. 4kb writes to a 3x
replica pool went from about 2ms write latency down to about 600us. I also 
measured power usage of the server and the increase was
only a couple of percent.

Ideally, I hoping for the day when the Linux scheduler is power aware so it 
always assigns threads to cores already running in
higher C-states, that way we will hopefully get the benefits of both worlds 
without having to force all cores to run at max.

> 
> --
> Cheers,
> ~Blairo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster

On Wed, May 3, 2017 at 10:52 AM, Blair Bethwaite
 wrote:
> On 3 May 2017 at 18:38, Dan van der Ster  wrote:
>> Seems to work for me, or?
>
> Yeah now that I read the code more I see it is opening and
> manipulating /dev/cpu_dma_latency in response to that option, so the
> TODO comment seems to be outdated. I verified tuned
> latency-performance _is_ doing this properly on our RHEL7.3 nodes
> (maybe I first tested this on 7.2 or just missed something then). In
> any case, I think we're agreed that Ceph should recommend that
> profile.

latency-performance <-- yes

Whether cpu_dma_latency should be 0 or 1, I'm not sure yet. I assume
your 30% boost was when going from throughput-performance to
dma_latency=0, right? I'm trying to understand what is the incremental
improvement from 1 to 0.

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread yiming xie

 Cluster c2 have not zone:us-1

./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
--rgw-zonegroup=us --rgw-zone=us-1

2017-05-03 05:01:30.219721 7efcff2606c0  1 Cannot find zone id= (name=us-1), 
switching to local zonegroup configuration
2017-05-03 05:01:30.222956 7efcff2606c0 -1 Cannot find zone id= (name=us-1)
couldn't init storage provider

./bin/radosgw-admin -c ./run/c2/ceph.conf zone get --rgw-zone=us-1 
unable to initialize zone: (2) No such file or directory

./bin/radosgw-admin -c ./run/c2/ceph.conf zone list
{
"default_info": "0cae32e6-82d5-489f-adf5-99e92c70f86f",
"zones": [
"us-2"
]
}

./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup list
{
"default_info": "6cc7889a-3f00-4fcd-b4dd-0f5951fbd561",
"zonegroups": [
"us2",
"us"
]
}


> 在 2017年5月3日，下午4:57，Orit Wasserman  写道：
> 
> 
> 
> On Wed, May 3, 2017 at 11:51 AM, yiming xie  > wrote:
> I run
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
> 
> 
> try adding --rgw-zonegroup=us1 --rgw-zone=us-1
>  
> the error：
> 2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone 
> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local 
> zonegroup configuration
> 2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone 
> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
> couldn't init storage provider
> 
> 
>> 在 2017年5月3日，下午4:45，Orit Wasserman > > 写道：
>> 
>> 
>> 
>> On Wed, May 3, 2017 at 11:36 AM, yiming xie > > wrote:
>> Hi Orit:
>>   Thanks for your reply.
>> 
>>  when I recreate secondary zone group, there is still a error！
>> 
>>   radosgw-admin realm pull --url=http://localhost:8001 
>>  --access-key=$SYSTEM_ACCESS_KEY 
>> --secret=$SYSTEM_SECRET_KEY --default
>>   radosgw-admin period pull --url=http://localhost:8001 
>>  --access-key=$SYSTEM_ACCESS_KEY 
>> --secret=$SYSTEM_SECRET_KEY --default
>>   radosge-admin zonegroup create --rgw-zonegroup=us2 
>> --endpoints=http://localhost:8002  --rgw-realm=earth
>>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2 
>> --endpoints=http://localhost:8002  --master 
>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
>> 
>> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
>> 
>>  
>> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch 
>> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:staging.latest_epoch
>> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
>> could not find connection for zone or zonegroup id: 
>> cb8fd49d-9789-4cb3-8010-2523bf46a650
>> request failed: (2) No such file or directory
>> failed to commit period: (2) No such file or directory
>> 
>> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97405866e76d4)
>> 
>> 
>>   
>> 
>>> 在 2017年5月3日，下午4:07，Orit Wasserman >> > 写道：
>>> 
>>> Hi,
>>> 
>>> On Wed, May 3, 2017 at 11:00 AM, yiming xie >> > wrote:
>>> Hi orit:
>>> I try to create multiple zonegroups in single realm, but failed. Pls tell 
>>> me the correct way about creating multiple zonegroups
>>> Tks a lot!!
>>> 
> 1.create the firstr zone group on the c1 cluster
> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
> --default
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create 
> --rgw-zonegroup=us --endpoints=http://localhost:8001 
>  --master --default
> 
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
> --rgw-zone=us-1 --endpoints=http://localhost:8001 
>  --master --default 
> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user 
> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY 
> --secret=$SYSTEM_SECRET_KEY --system
> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
> //start rgw
> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log 
> --debug-rgw=20 --debug-ms=1 -i client.rgw.us -1 
> -rgw-zone=us-1
> 
> 2.create the scondary zone group on the c2 cluster
> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull 
> --url=http://localhost:8001  
> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>>> 
>>> I recommend adding --default that set this realm as default otherwise you 
>>> need to run:
>>> radosgw-admin realm default --rgw-realm=earth
>>> 
>>> You are

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread Orit Wasserman

On Wed, May 3, 2017 at 11:51 AM, yiming xie  wrote:

> I run
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>
>
try adding --rgw-zonegroup=us1 --rgw-zone=us-1


> the error：
> 2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone
> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local
> zonegroup configuration
> 2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone
> id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
> couldn't init storage provider
>
>
> 在 2017年5月3日，下午4:45，Orit Wasserman  写道：
>
>
>
> On Wed, May 3, 2017 at 11:36 AM, yiming xie  wrote:
>
>> Hi Orit:
>>   Thanks for your reply.
>>
>>  when I recreate secondary zone group, there is still a error！
>>
>>   radosgw-admin realm pull --url=http://localhost:8001 
>> --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY --default
>>   radosgw-admin period pull --url=http://localhost:8001 
>> --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY --default
>>   radosge-admin zonegroup create --rgw-zonegroup=us2 --endpoints=
>> http://localhost:8002 --rgw-realm=earth
>>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2
>> --endpoints=http://localhost:8002 --master --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY
>>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
>>
>
> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
>
>
>>
> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch
>> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:stagi
>> ng.latest_epoch
>> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
>> could not find connection for zone or zonegroup id:
>> cb8fd49d-9789-4cb3-8010-2523bf46a650
>> request failed: (2) No such file or directory
>> failed to commit period: (2) No such file or directory
>>
>> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97
>> 405866e76d4)
>>
>>
>>
>>
>> 在 2017年5月3日，下午4:07，Orit Wasserman  写道：
>>
>> Hi,
>>
>> On Wed, May 3, 2017 at 11:00 AM, yiming xie  wrote:
>>
>>> Hi orit:
>>> I try to create multiple zonegroups in single realm, but failed. Pls
>>> tell me the correct way about creating multiple zonegroups
>>> Tks a lot!!
>>>
>>> 1.create the firstr zone group on the c1 cluster
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth
>>> --default
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create
>>> --rgw-zonegroup=us --endpoints=http://localhost:8001 --master --default
>>>
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us
>>> --rgw-zone=us-1 --endpoints=http://localhost:8001 --master --default
>>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user
>>> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY
>>> --secret=$SYSTEM_SECRET_KEY --system
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
>>> //start rgw
>>> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log
>>> --debug-rgw=20 --debug-ms=1 -i client.rgw.us-1 -rgw-zone=us-1
>>>
>>> 2.create the scondary zone group on the c2 cluster
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull --url=
>>> http://localhost:8001 --access-key=$SYSTEM_ACCESS_KEY
>>> --secret=$SYSTEM_SECRET_KEY
>>>
>>> I recommend adding --default that set this realm as default otherwise
>> you need to run:
>> radosgw-admin realm default --rgw-realm=earth
>>
>> You are missing the period pull command:
>>  radosgw-admin period pull --url=http://localhost:8001 
>> --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY --default
>>
>> Orit
>>
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create
>>> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 --rgw-realm=earth
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>>> --rgw-zonegroup=us2 --rgw-realm=earth
>>>
>>> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=),
>>> switching to local zonegroup configuration
>>> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
>>> couldn't init storage provider
>>> .
>>>
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..
>>>
>>>
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Blair Bethwaite

On 3 May 2017 at 18:38, Dan van der Ster  wrote:
> Seems to work for me, or?

Yeah now that I read the code more I see it is opening and
manipulating /dev/cpu_dma_latency in response to that option, so the
TODO comment seems to be outdated. I verified tuned
latency-performance _is_ doing this properly on our RHEL7.3 nodes
(maybe I first tested this on 7.2 or just missed something then). In
any case, I think we're agreed that Ceph should recommend that
profile.

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread yiming xie

I run
./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit

the error：
2017-05-03 04:46:10.298103 7fdb2e4226c0  1 Cannot find zone 
id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2), switching to local 
zonegroup configuration
2017-05-03 04:46:10.300145 7fdb2e4226c0 -1 Cannot find zone 
id=0cae32e6-82d5-489f-adf5-99e92c70f86f (name=us-2)
couldn't init storage provider


> 在 2017年5月3日，下午4:45，Orit Wasserman  写道：
> 
> 
> 
> On Wed, May 3, 2017 at 11:36 AM, yiming xie  > wrote:
> Hi Orit:
>   Thanks for your reply.
> 
>  when I recreate secondary zone group, there is still a error！
> 
>   radosgw-admin realm pull --url=http://localhost:8001 
>  --access-key=$SYSTEM_ACCESS_KEY 
> --secret=$SYSTEM_SECRET_KEY --default
>   radosgw-admin period pull --url=http://localhost:8001 
>  --access-key=$SYSTEM_ACCESS_KEY 
> --secret=$SYSTEM_SECRET_KEY --default
>   radosge-admin zonegroup create --rgw-zonegroup=us2 
> --endpoints=http://localhost:8002  --rgw-realm=earth
>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2 
> --endpoints=http://localhost:8002  --master 
> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
> 
> Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet
> 
>  
> 2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch 
> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:staging.latest_epoch
> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
> could not find connection for zone or zonegroup id: 
> cb8fd49d-9789-4cb3-8010-2523bf46a650
> request failed: (2) No such file or directory
> failed to commit period: (2) No such file or directory
> 
> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97405866e76d4)
> 
> 
>   
> 
>> 在 2017年5月3日，下午4:07，Orit Wasserman > > 写道：
>> 
>> Hi,
>> 
>> On Wed, May 3, 2017 at 11:00 AM, yiming xie > > wrote:
>> Hi orit:
>> I try to create multiple zonegroups in single realm, but failed. Pls tell me 
>> the correct way about creating multiple zonegroups
>> Tks a lot!!
>> 
 1.create the firstr zone group on the c1 cluster
 ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
 --default
 ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create 
 --rgw-zonegroup=us --endpoints=http://localhost:8001 
  --master --default
 
 ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
 --rgw-zone=us-1 --endpoints=http://localhost:8001  
 --master --default --access-key=$SYSTEM_ACCESS_KEY 
 --secret=$SYSTEM_SECRET_KEY
 ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user 
 --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY 
 --secret=$SYSTEM_SECRET_KEY --system
 ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
 //start rgw
 ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log 
 --debug-rgw=20 --debug-ms=1 -i client.rgw.us -1 
 -rgw-zone=us-1
 
 2.create the scondary zone group on the c2 cluster
 ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull 
 --url=http://localhost:8001  
 --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>> 
>> I recommend adding --default that set this realm as default otherwise you 
>> need to run:
>> radosgw-admin realm default --rgw-realm=earth
>> 
>> You are missing the period pull command:
>>  radosgw-admin period pull --url=http://localhost:8001 
>>  --access-key=$SYSTEM_ACCESS_KEY 
>> --secret=$SYSTEM_SECRET_KEY --default
>> 
>> Orit
 ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create 
 --rgw-zonegroup=us2 --endpoints=http://localhost:8002 
  --rgw-realm=earth
 ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
 --rgw-zonegroup=us2 --rgw-realm=earth
 
 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=), 
 switching to local zonegroup configuration
 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
 couldn't init storage provider
 .
 
 ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread Orit Wasserman

On Wed, May 3, 2017 at 11:36 AM, yiming xie  wrote:

> Hi Orit:
>   Thanks for your reply.
>
>  when I recreate secondary zone group, there is still a error！
>
>   radosgw-admin realm pull --url=http://localhost:8001 
> --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY --default
>   radosgw-admin period pull --url=http://localhost:8001 
> --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY --default
>   radosge-admin zonegroup create --rgw-zonegroup=us2 --endpoints=
> http://localhost:8002 --rgw-realm=earth
>   radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2
> --endpoints=http://localhost:8002 --master --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY
>  radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2
>

Remove  "--rgw-zonegroup=us2 --rgw-zone=us-2" - they don't exist yet


>
2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch
> .rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:
> staging.latest_epoch
> Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
> could not find connection for zone or zonegroup id:
> cb8fd49d-9789-4cb3-8010-2523bf46a650
> request failed: (2) No such file or directory
> failed to commit period: (2) No such file or directory
>
> ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e974
> 05866e76d4)
>
>
>
>
> 在 2017年5月3日，下午4:07，Orit Wasserman  写道：
>
> Hi,
>
> On Wed, May 3, 2017 at 11:00 AM, yiming xie  wrote:
>
>> Hi orit:
>> I try to create multiple zonegroups in single realm, but failed. Pls tell
>> me the correct way about creating multiple zonegroups
>> Tks a lot!!
>>
>> 1.create the firstr zone group on the c1 cluster
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth
>> --default
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create
>> --rgw-zonegroup=us --endpoints=http://localhost:8001 --master --default
>>
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us
>> --rgw-zone=us-1 --endpoints=http://localhost:8001 --master --default
>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user
>> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY --system
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
>> //start rgw
>> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log
>> --debug-rgw=20 --debug-ms=1 -i client.rgw.us-1 -rgw-zone=us-1
>>
>> 2.create the scondary zone group on the c2 cluster
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull --url=
>> http://localhost:8001 --access-key=$SYSTEM_ACCESS_KEY
>> --secret=$SYSTEM_SECRET_KEY
>>
>> I recommend adding --default that set this realm as default otherwise you
> need to run:
> radosgw-admin realm default --rgw-realm=earth
>
> You are missing the period pull command:
>  radosgw-admin period pull --url=http://localhost:8001 
> --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY --default
>
> Orit
>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create
>> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 --rgw-realm=earth
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
>> --rgw-zonegroup=us2 --rgw-realm=earth
>>
>> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=),
>> switching to local zonegroup configuration
>> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
>> couldn't init storage provider
>> .
>>
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster

On Wed, May 3, 2017 at 10:32 AM, Blair Bethwaite
 wrote:
> On 3 May 2017 at 18:15, Dan van der Ster  wrote:
>> It looks like el7's tuned natively supports the pmqos interface in
>> plugins/plugin_cpu.py.
>
> Ahha, you are right, but I'm sure I tested tuned and it did not help.
> Thanks for pointing out this script, I had not noticed it before and I
> can see now why it isn't helping (on a RHEL7.3 host):
>
> /usr/lib/python2.7/site-packages/tuned/plugins/plugin_cpu.py:L13
> # TODO: force_latency -> command
> #   intel_pstate
>

Seems to work for me, or?

# grep force_latency /usr/lib/tuned/latency-performance/tuned.conf
force_latency=1
# tuned-adm profile latency-performance
# cpupower monitor -m Idle_Stats | head
  |Idle_Stats
PKG |CORE|CPU | POLL | C1-H | C1E- | C3-H | C6-H
   0|   0|   0|  0.08| 95.12|  0.00|  0.00|  0.00
   0|   0|  16|  0.00| 97.22|  0.00|  0.00|  0.00
   0|   1|   1|  0.00| 95.78|  0.00|  0.00|  0.00
   0|   1|  17|  0.00| 60.36|  0.00|  0.00|  0.00
   0|   2|   2|  0.00| 91.60|  0.00|  0.00|  0.00
   0|   2|  18|  0.00| 95.17|  0.00|  0.00|  0.00
   0|   3|   3|  0.28| 87.05|  0.00|  0.00|  0.00
   0|   3|  19|  0.00| 98.33|  0.00|  0.00|  0.00
# vi /usr/lib/tuned/latency-performance/tuned.conf
# grep force_latency /usr/lib/tuned/latency-performance/tuned.conf
force_latency=0
# tuned-adm profile latency-performance
# cpupower monitor -m Idle_Stats | head
  |Idle_Stats
PKG |CORE|CPU | POLL | C1-H | C1E- | C3-H | C6-H
   0|   0|   0| 96.76|  0.00|  0.00|  0.00|  0.00
   0|   0|  16| 99.04|  0.00|  0.00|  0.00|  0.00
   0|   1|   1| 97.65|  0.00|  0.00|  0.00|  0.00
   0|   1|  17| 98.77|  0.00|  0.00|  0.00|  0.00
   0|   2|   2| 97.97|  0.00|  0.00|  0.00|  0.00
   0|   2|  18| 99.10|  0.00|  0.00|  0.00|  0.00
   0|   3|   3| 92.27|  0.00|  0.00|  0.00|  0.00
   0|   3|  19| 99.00|  0.00|  0.00|  0.00|  0.00
# rpm -q tuned
tuned-2.7.1-3.el7_3.1.noarch

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread yiming xie

Hi Orit:
  Thanks for your reply.

 when I recreate secondary zone group, there is still a error！

  radosgw-admin realm pull --url=http://localhost:8001  
--access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY --default
  radosgw-admin period pull --url=http://localhost:8001 
 --access-key=$SYSTEM_ACCESS_KEY 
--secret=$SYSTEM_SECRET_KEY --default
  radosge-admin zonegroup create --rgw-zonegroup=us2 
--endpoints=http://localhost:8002 --rgw-realm=earth
  radosgw-admin zone create --rgw-zonegroup=us2 --rgw-zone=us-2 
--endpoints=http://localhost:8002 --master --access-key=$SYSTEM_ACCESS_KEY 
--secret=$SYSTEM_SECRET_KEY
 radosgw-admin period update --commit --rgw-zonegroup=us2 --rgw-zone=us-2

2017-05-03 04:31:57.319796 7f87dab4e6c0  1 error read_lastest_epoch 
.rgw.root:periods.894eeaf6-4c1f-4478-88eb-413e58f1a4a4:staging.latest_epoch
Sending period to new master zone cb8fd49d-9789-4cb3-8010-2523bf46a650
could not find connection for zone or zonegroup id: 
cb8fd49d-9789-4cb3-8010-2523bf46a650
request failed: (2) No such file or directory
failed to commit period: (2) No such file or directory

ceph version 11.1.0-7421-gd25b355 (d25b3550dae243f6868a526632e97405866e76d4)


  

> 在 2017年5月3日，下午4:07，Orit Wasserman  写道：
> 
> Hi,
> 
> On Wed, May 3, 2017 at 11:00 AM, yiming xie  > wrote:
> Hi orit:
> I try to create multiple zonegroups in single realm, but failed. Pls tell me 
> the correct way about creating multiple zonegroups
> Tks a lot!!
> 
>>> 1.create the firstr zone group on the c1 cluster
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
>>> --default
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create 
>>> --rgw-zonegroup=us --endpoints=http://localhost:8001 
>>>  --master --default
>>> 
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
>>> --rgw-zone=us-1 --endpoints=http://localhost:8001  
>>> --master --default --access-key=$SYSTEM_ACCESS_KEY 
>>> --secret=$SYSTEM_SECRET_KEY
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user 
>>> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY 
>>> --secret=$SYSTEM_SECRET_KEY --system
>>> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
>>> //start rgw
>>> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log 
>>> --debug-rgw=20 --debug-ms=1 -i client.rgw.us -1 
>>> -rgw-zone=us-1
>>> 
>>> 2.create the scondary zone group on the c2 cluster
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull 
>>> --url=http://localhost:8001  
>>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> 
> I recommend adding --default that set this realm as default otherwise you 
> need to run:
> radosgw-admin realm default --rgw-realm=earth
> 
> You are missing the period pull command:
>  radosgw-admin period pull --url=http://localhost:8001 
>  --access-key=$SYSTEM_ACCESS_KEY 
> --secret=$SYSTEM_SECRET_KEY --default
> 
> Orit
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create 
>>> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 
>>>  --rgw-realm=earth
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
>>> --rgw-zonegroup=us2 --rgw-realm=earth
>>> 
>>> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=), 
>>> switching to local zonegroup configuration
>>> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
>>> couldn't init storage provider
>>> .
>>> 
>>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Blair Bethwaite

On 3 May 2017 at 18:15, Dan van der Ster  wrote:
> It looks like el7's tuned natively supports the pmqos interface in
> plugins/plugin_cpu.py.

Ahha, you are right, but I'm sure I tested tuned and it did not help.
Thanks for pointing out this script, I had not noticed it before and I
can see now why it isn't helping (on a RHEL7.3 host):

/usr/lib/python2.7/site-packages/tuned/plugins/plugin_cpu.py:L13
# TODO: force_latency -> command
#   intel_pstate

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Blair Bethwaite

Hi Dan,

On 3 May 2017 at 17:43, Dan van der Ster  wrote:
> We use cpu_dma_latency=1, because it was in the latency-performance profile.
> And indeed by setting cpu_dma_latency=0 on one of our OSD servers,
> powertop now shows the package as 100% in turbo mode.

I tried both 0 and 1 and didn't notice a difference in the effective
frequency, though I think on most contemporary Intel systems this
would/should allow the CPU to transition to C1/C1E. You can check the
values that the pmqos interface uses to determine the transition
latencies from various C-states via:
`sudo find /sys/devices/system/cpu/cpu0/cpuidle -name latency -o -name
name | xargs cat`

`cpupower monitor` is another good option for inspecting this.

> So I suppose we'll pay for this performance boost in energy.
> But more importantly, can the CPU survive being in turbo 100% of the time?

If it does not then Intel has messed up - the CPUs will still thermal
throttle. You can see the natural production variation in any
burn-in/acceptance testing of new homogenous cluster gear - anywhere
from 5-8% CPU performance variation is expected. It's certainly not
uncommon for HPC installations to run continuously with these options
statically tuned (via kernel command line), in fact Mellanox
recommends this in their tuning documentation, and we have certainly
found it helps in a virtualised HPC environment - I've read anecdotal
reports that the pstate driver is not as effective for such workloads
but do not know if there is any hard evidence for that in the code.

The real question is why the performance loss is so bad in a default
configuration.

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster

On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite
 wrote:
> We did the latter using the pmqos_static.py, which was previously part of
> the RHEL6 tuned latency-performance profile, but seems to have been dropped
> in RHEL7 (don't yet know why),

It looks like el7's tuned natively supports the pmqos interface in
plugins/plugin_cpu.py.
So the equiv in el7 is:

latency-performance/tuned.conf:force_latency=1

For your optimization, set force_latency=0.

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread Orit Wasserman

Hi,

On Wed, May 3, 2017 at 11:00 AM, yiming xie  wrote:

> Hi orit:
> I try to create multiple zonegroups in single realm, but failed. Pls tell
> me the correct way about creating multiple zonegroups
> Tks a lot!!
>
> 1.create the firstr zone group on the c1 cluster
> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth
> --default
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create
> --rgw-zonegroup=us --endpoints=http://localhost:8001 --master --default
>
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us
> --rgw-zone=us-1 --endpoints=http://localhost:8001 --master --default
> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user
> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY --system
> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
> //start rgw
> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log
> --debug-rgw=20 --debug-ms=1 -i client.rgw.us-1 -rgw-zone=us-1
>
> 2.create the scondary zone group on the c2 cluster
> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull --url=
> http://localhost:8001 --access-key=$SYSTEM_ACCESS_KEY
> --secret=$SYSTEM_SECRET_KEY
>
> I recommend adding --default that set this realm as default otherwise you
need to run:
radosgw-admin realm default --rgw-realm=earth

You are missing the period pull command:
 radosgw-admin period pull --url=http://localhost:8001
--access-key=$SYSTEM_ACCESS_KEY
--secret=$SYSTEM_SECRET_KEY --default

Orit

> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create
> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 --rgw-realm=earth
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit
> --rgw-zonegroup=us2 --rgw-realm=earth
>
> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=),
> switching to local zonegroup configuration
> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
> couldn't init storage provider
> .
>
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-03 Thread Willem Jan Withagen

On 02-05-17 23:53, David Turner wrote:
> I was only interjecting on the comment "So that is 5 . Which is real
> easy to obtain" and commenting on what the sustained writes into a
> cluster of 2,000 OSDs would require to actually sustain that 5 MBps on
> each SSD journal.

Reading your calculation below I understand where the 2000 comes from.
I meant that hardware of the previous millennium could easily write 5
Mbyte/sec sustained. :)

This does NOT otherwise invalidate your interesting math below.
And your conclusion is an important one.

Go back to 200 OSDs and 25 SSDs and you end up with 40MB/s sustained
writes to wear out your SSDs exactly at then end of the warranty.
Higher sustained writes will linearly shorten your SSD lifetime.

--WjW

> My calculation was off because I forgot replica size, but my corrected
> math is this...
> 
> 5 MBps per journal device
> 8 OSDs per journal (overestimated number as most do 4)
> 2,000 OSDs based on what you said "Which is real easy to obtain, even
> with hardware 0f 2000."
> 3 replicas
> 
> 2,000 OSDs / 8 OSDs per journal = 250 journal SSDs
> 250 SSDs * 5 MBps = 1,250 MBps / 3 replicas = 416.67 MBps required
> sustained cluster write speed to cause each SSD to average 5 MBps on
> each journal device.
> 
> Swap out any variable you want to match your environment.  For example,
> if you only have 4 OSDs per journal device, that number would be double
> for a cluster this size to require a cluster write speed of
> 833.33 MBps to average 5 MBps on each journal.  Also if you have less
> than 2,000 OSDs, then everything shrinks fast.
> 
> 
> On Tue, May 2, 2017 at 5:39 PM Willem Jan Withagen  > wrote:
> 
> On 02-05-17 19:54, David Turner wrote:
> > Are you guys talking about 5Mbytes/sec to each journal device? 
> Even if
> > you had 8 OSDs per journal and had 2000 osds... you would need a
> > sustained 1.25 Gbytes/sec to average 5Mbytes/sec per journal device.
> 
> I'm not sure I'm following this...
> But I'm rather curious.
> Are you saying that the required journal bandwidth versus OSD write
> bandwidth has an approx 1:200 ratio??
> 
> Note that I took it the other way.
> Given the Intel specs
>  - What sustained bandwidth is allowed to have the device last its
> lifetime.
>  - How much more usage would a 3710 give in regards to a 3520 SSD per
>dollar spent.
> 
> --WjW
> 
> > On Tue, May 2, 2017 at 1:47 PM Willem Jan Withagen
> 
> > >> wrote:
> >
> > On 02-05-17 19:16, Дробышевский, Владимир wrote:
> > > Willem,
> > >
> > >   please note that you use 1.6TB Intel S3520 endurance
> rating in your
> > > calculations but then compare prices with 480GB model, which
> has only
> > > 945TBW or 1.1DWPD (
> > >
> >   
>  
> https://ark.intel.com/products/93026/Intel-SSD-DC-S3520-Series-480GB-2_5in-SATA-6Gbs-3D1-MLC
> > > ). It also worth to notice that S3710 has tremendously
> higher write
> > > speed\IOPS and especially SYNC writes. Haven't seen S3520
> real sync
> > > write tests yet but don't think they differ much from S3510
> ones.
> >
> > Arrgh, you are right. I guess I had too many pages open, and
> copied the
> > wrong one.
> >
> > But the good news is that the stats were already in favour of
> the 3710
> > so this only increases that conclusion.
> >
> > The bad news is that the sustained write speed goes down with a
> > factor 4.
> > So that is 5Mbyte/sec. Which is real easy to obtain, even with
> hardware
> > 0f 2000.
> >
> > --WjW
> >
> >
> > > Best regards,
> > > Vladimir
> > >
> > > 2017-05-02 21:05 GMT+05:00 Willem Jan Withagen
> 
> > >
> > > 
>  > >
> > > On 27-4-2017 20:46, Alexandre DERUMIER wrote:
> > > > Hi,
> > > >
> > > >>> What I'm trying to get from the list is /why/ the
> > "enterprise" drives
> > > >>> are important. Performance? Reliability? Something else?
> > > >
> > > > performance, for sure (for SYNC write,
> >   
>  
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> > >
> > 
> 
> )
> > > >
> > > > Reliabity : yes, enteprise drive have supercapacitor
>

[ceph-users] Help! how to create multiple zonegroups in single realm?

2017-05-03 Thread yiming xie

Hi orit:
I try to create multiple zonegroups in single realm, but failed. Pls tell me 
the correct way about creating multiple zonegroups
Tks a lot!!

>> 1.create the firstr zone group on the c1 cluster
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
>> --default
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create 
>> --rgw-zonegroup=us --endpoints=http://localhost:8001 
>>  --master --default
>> 
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
>> --rgw-zone=us-1 --endpoints=http://localhost:8001  
>> --master --default --access-key=$SYSTEM_ACCESS_KEY 
>> --secret=$SYSTEM_SECRET_KEY
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user 
>> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY 
>> --secret=$SYSTEM_SECRET_KEY --system
>> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
>> //start rgw
>> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log 
>> --debug-rgw=20 --debug-ms=1 -i client.rgw.us -1 
>> -rgw-zone=us-1
>> 
>> 2.create the scondary zone group on the c2 cluster
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull 
>> --url=http://localhost:8001  
>> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create 
>> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 
>>  --rgw-realm=earth
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
>> --rgw-zonegroup=us2 --rgw-realm=earth
>> 
>> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=), 
>> switching to local zonegroup configuration
>> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
>> couldn't init storage provider
>> .
>> 
>> ./bin/radosgw-admin -c ./run/c2/ceph.conf zone create ..
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Luis Periquito

One of the things I've noticed in the latest (3+ years) batch of CPUs
is that they ignore more the cpu scaler drivers and do what they want.
More than that interfaces like the /proc/cpuinfo are completely
incorrect.

I keep checking the real frequencies using applications like the
"i7z", and it shows per core real frequency.

On the flip side as more of it is directly controlled by the CPUs it
also means it should be safer to run over a longer period of time.

On my testing, it was made with Trusty with default 3.13 kernel,
changing the driver to performance and disabling all powersave options
on BIOS meant circa 50% better latency (for both SSD and HDD cluster),
and with around 10% power usage increase.

On Wed, May 3, 2017 at 8:43 AM, Dan van der Ster  wrote:
> Hi Blair,
>
> We use cpu_dma_latency=1, because it was in the latency-performance profile.
> And indeed by setting cpu_dma_latency=0 on one of our OSD servers,
> powertop now shows the package as 100% in turbo mode.
>
> So I suppose we'll pay for this performance boost in energy.
> But more importantly, can the CPU survive being in turbo 100% of the time?
>
> -- Dan
>
>
>
> On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite
>  wrote:
>> Hi all,
>>
>> We recently noticed that despite having BIOS power profiles set to
>> performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies
>> never seemed to be getting into the top of the range, and in fact spent a
>> lot of time in low C-states despite that BIOS option supposedly disabling
>> C-states.
>>
>> After some investigation this C-state issue seems to be relatively common,
>> apparently the BIOS setting is more of a config option that the OS can
>> choose to ignore. You can check this by examining
>> /sys/module/intel_idle/parameters/max_cstate - if this is >1 and you *think*
>> C-states are disabled then your system is messing with you.
>>
>> Because the contemporary Intel power management driver
>> (https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now
>> limits the proliferation of OS level CPU power profiles/governors, the only
>> way to force top frequencies is to either set kernel boot command line
>> options or use the /dev/cpu_dma_latency, aka pmqos, interface.
>>
>> We did the latter using the pmqos_static.py, which was previously part of
>> the RHEL6 tuned latency-performance profile, but seems to have been dropped
>> in RHEL7 (don't yet know why), and in any case the default tuned profile is
>> throughput-performance (which does not change cpu_dma_latency). You can find
>> the pmqos-static.py script here
>> https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py.
>>
>> After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes we
>> saw a conservative 30% increase in backfill and recovery throughput - now
>> when our main RBD pool of 900+ OSDs is backfilling we expect to see ~22GB/s,
>> previously that was ~15GB/s.
>>
>> We have just got around to opening a case with Red Hat regarding this as at
>> minimum Ceph should probably be actively using the pmqos interface and tuned
>> should be setting this with recommendations for the latency-performance
>> profile in the RHCS install guide. We have done no characterisation of it on
>> Ubuntu yet, however anecdotally it looks like it has similar issues on the
>> same hardware.
>>
>> Merry xmas.
>>
>> Cheers,
>> Blair
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Blair Bethwaite

Hi all,

We recently noticed that despite having BIOS power profiles set to
performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies
never seemed to be getting into the top of the range, and in fact spent a
lot of time in low C-states despite that BIOS option supposedly disabling
C-states.

After some investigation this C-state issue seems to be relatively common,
apparently the BIOS setting is more of a config option that the OS can
choose to ignore. You can check this by examining
/sys/module/intel_idle/parameters/max_cstate
- if this is >1 and you *think* C-states are disabled then your system is
messing with you.

Because the contemporary Intel power management driver (
https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now
limits the proliferation of OS level CPU power profiles/governors, the only
way to force top frequencies is to either set kernel boot command line
options or use the /dev/cpu_dma_latency, aka pmqos, interface.

We did the latter using the pmqos_static.py, which was previously part of
the RHEL6 tuned latency-performance profile, but seems to have been dropped
in RHEL7 (don't yet know why), and in any case the default tuned profile is
throughput-performance (which does not change cpu_dma_latency). You can
find the pmqos-static.py script here
https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py
.

After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes we
saw a conservative 30% increase in backfill and recovery throughput - now
when our main RBD pool of 900+ OSDs is backfilling we expect to see
~22GB/s, previously that was ~15GB/s.

We have just got around to opening a case with Red Hat regarding this as at
minimum Ceph should probably be actively using the pmqos interface and
tuned should be setting this with recommendations for the
latency-performance profile in the RHCS install guide. We have done no
characterisation of it on Ubuntu yet, however anecdotally it looks like it
has similar issues on the same hardware.

Merry xmas.

Cheers,
Blair
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help! create the secondary zone group failed!

2017-05-03 Thread yiming xie


>I try to create two zone groups, but I have a problem.
>I do not know where there was a mistake in this process.
> 
> 1.create the firstr zone group on the c1 cluster
> ./bin/radosgw-admin -c ./run/c1/ceph.conf realm create --rgw-realm=earth 
> --default
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zonegroup create --rgw-zonegroup=us 
> --endpoints=http://localhost:8001  --master --default
> 
> ./bin/radosgw-admin -c ./run/c1/ceph.conf zone create --rgw-zonegroup=us 
> --rgw-zone=us-1 --endpoints=http://localhost:8001  
> --master --default --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> ./bin/radosgw-admin -c ./run/c1/ceph.conf user create --uid=zone.user 
> --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY 
> --secret=$SYSTEM_SECRET_KEY --system
> ./bin/radosgw-admin -c ./run/c1/ceph.conf period update --commit
> //start rgw
> ./bin/radosgw -c ./run/c1/ceph.conf --log-file=./run/c1/out/rgw.log 
> --debug-rgw=20 --debug-ms=1 -i client.rgw.us -1 
> -rgw-zone=us-1
> 
> 2.create the scondary zone group on the c2 cluster
> ./bin/radosgw-admin -c ./run/c2/ceph.conf realm pull 
> --url=http://localhost:8001  
> --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY
> ./bin/radosgw-admin -c ./run/c2/ceph.conf zonegroup create 
> --rgw-zonegroup=us2 --endpoints=http://localhost:8002 
>  --rgw-realm=earth
> ./bin/radosgw-admin -c ./run/c2/ceph.conf period update --commit 
> --rgw-zonegroup=us2 --rgw-realm=earth
> 
> 2017-05-03 00:51:20.190417 7f538dbbb6c0  1 Cannot find zone id= (name=), 
> switching to local zonegroup configuration
> 2017-05-03 00:51:20.192342 7f538dbbb6c0 -1 Cannot find zone id= (name=)
> couldn't init storage provider
> 
> Tks!
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

39 matches

Mail list logo