Re: [ceph-users] The performance of ceph with RDMA

2017-03-23 Thread 邱宏瑋
Thanks both of your reply.

Hi haomai.

If we compare the RDMA with Tcp/Ip stack, as I know, we can use the RDMA to
offload the traffic and reduce the CPU usage, which means the other
components can user more CPU to increase some performance metrics, such as
IOPS ?


Hi Deepak,

I would describe more detail of my environment and hope you can give me
more advice about it.

[Ceph Cluster]

   - 1 pool
   - 1 rbd

[Host Daemon]

   - 1 ceph-mon
   - 8 ceph-hosts
   - 1 fio server (compile with librbd and librbd is compile to support the
   RDMA)

[fio config]

 [global]

 ioengine=rbd

 clientname=admin

 pool=rbd

 rbdname=rbd

 clustername=ceph

 runtime=120

 iodepth=128

 numjobs=6

 group_reporting

 size=256G

 direct=1

 ramp_time=5

 [r75w25]

 bs=4k

 rw=randrw

 rwmixread=75



In my RDMA experiment, I start the fio clinet on host 1 and it will trigger
3 fio servers (on each hosts) to start the rand_read_write for specific rbd.
Although I don't specific the public/cluster network address in the
ceph.conf, I guess all traffic between cluster will over 10G networks since
I only input 10G's IP addresses in my manually deploy.
Since the ceph.conf indicate to use the RDMA as the ms_type, I think the
connection between fio and rbd is based on RDMA,

During the fio processing, I observe following system metrics.

1. System CPU usage
2. NIC (1G) throughput
3. NIC (10G) throughtput
4. SSD IO stat.


Only the CPU usage is full (100%) and used by fio server and ceph-osds, the
other 3 metrics still have a room to use, so I think the bottleneck of my
environment is CPU usage.
So, according to those observation and concept of RDMA, I assume that the
RDMA can offload the network traffic to reduce the CPU and give other co

I think if we can use the RDMA for (cluster/private network), it can
offload the network traffic within cluster to reduce the CPU usage to
release more CPU for other components.

If I have any misunderstanding , please correct me,

Thanks your help!







Best Regards,

Hung-Wei Chiu(邱宏瑋)
--
Computer Center, Department of Computer Science
National Chiao Tung University

2017-03-24 2:22 GMT+08:00 Deepak Naidu :

> RDMA is of interest to me. So my below comment.
>
>
>
> >> What surprised me is that the result of RDMA mode is almost the same
> as the basic mode, the iops, latency, throughput, etc.
>
>
>
> Pardon  my knowledge here. If I read your ceph.conf and your notes. It
> seems that you are using RDMA only for “cluster/private network” ? so how
> do you expect RDMA to be efficient on client IOPS/latency/throughput.
>
>
>
>
>
> --
>
> Deepak
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Haomai Wang
> *Sent:* Thursday, March 23, 2017 4:34 AM
> *To:* Hung-Wei Chiu (邱宏瑋)
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] The performance of ceph with RDMA
>
>
>
>
>
>
>
> On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) <
> hwc...@cs.nctu.edu.tw> wrote:
>
> Hi,
>
>
>
> I use the latest (master branch, upgrade at 2017/03/22) to build ceph with
> RDMA and use the fio to test its iops/latency/throughput.
>
>
>
> In my environment, I setup 3 hosts and list the detail of each host below.
>
>
>
> OS: ubuntu 16.04
>
> Storage: SSD * 4 (256G * 4)
>
> Memory: 64GB.
>
> NICs: two NICs, one (intel 1G) for public network and the other (mellanox
> 10G) for private network.
>
>
>
> There're 3 monitor and 24 osds equally distributed within 3 hosts which
> means each hosts contains 1 mon and 8 osds.
>
>
>
> For my experiment, I use two configs, basic and RDMA.
>
>
>
> Basic
>
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
>
> mon initial members = ceph-1,ceph-2,ceph-3
>
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
>
> osd pool default size = 2
>
> osd pool default pg num = 1024
>
> osd pool default pgp num = 1024
>
>
>
>
>
> RDMA
>
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
>
> mon initial members = ceph-1,ceph-2,ceph-3
>
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
>
> osd pool default size = 2
>
> osd pool default pg num = 1024
>
> osd pool default pgp num = 1024
>
> ms_type=async+rdma
>
> ms_async_rdma_device_name = mlx4_0
>
>
>
>
>
> What surprised me is that the result of RDMA mode is almost the same as
> the basic mode, the iops, latency, throughput, etc.
>
> I also try to use different pattern of the fio parameter, such as read and
> write ratio, random operations or sequence operations.
>
> All results are the same.
>
>
>
> yes, most of latency comes from other components now.. although we still
> want to avoid extra copy in rdma side.
>
>
>
> so current rdma backend only means it just can be choice compared to
> tcp/ip network. more benefits need to be get from others.
>
>
>
>
>
> In order to figure out what's going on. I do the following steps.
>
>
>
> 1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to
> make sure my RDMA environment.
>
> 2. To make sure the network traffic is 

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread Brad Hubbard
Oh wow, I completely misunderstood your question.

Yes, src/osd/PG.cc and src/osd/PG.h are compiled into the ceph-osd binary which
is included in the ceph-osd rpm as you said in your OP.

On Fri, Mar 24, 2017 at 3:10 AM, nokia ceph  wrote:
> Hello Piotr,
>
> I didn't understand, could you please elaborate about this procedure as
> mentioned in the last update.  It would be really helpful if you share any
> useful link/doc to understand what you actually meant. Yea correct, normally
> we do this procedure but it takes more time. But here my intention is to how
> to find out the rpm which caused the change. I think we are in opposite
> direction.
>
>>> But wouldn't be faster and/or more convenient if you would just recompile
>>> binaries in-place (or use network symlinks) 
>
> Thanks
>
>
>
> On Thu, Mar 23, 2017 at 6:47 PM, Piotr Dałek 
> wrote:
>>
>> On 03/23/2017 02:02 PM, nokia ceph wrote:
>>
>>> Hello Piotr,
>>>
>>> We do customizing ceph code for our testing purpose. It's a part of our
>>> R :)
>>>
>>> Recompiling source code will create 38 rpm's out of these I need to find
>>> which one is the correct rpm which I made change in the source code.
>>> That's
>>> what I'm try to figure out.
>>
>>
>> Yes, I understand that. But wouldn't be faster and/or more convenient if
>> you would just recompile binaries in-place (or use network symlinks) instead
>> of packaging entire Ceph and (re)installing its packages each time you do
>> the change? Generating RPMs takes a while.
>>
>>
>> --
>> Piotr Dałek
>> piotr.da...@corp.ovh.com
>> https://www.ovh.com/us/
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Preconditioning an RBD image

2017-03-23 Thread Peter Maloney
Hi Nick,

I didn't test with a colocated journal. I figure ceph knows what it's
doing with the journal device, and it has no filesystem, so there's no
xfs journal, file metadata, etc. to cache due to small random sync writes.

I tested the bcache and journals on some SAS SSDs (rados bench was ok
but real clients were really low bandwidth), and journals on NVMe
(P3700) and bcache on some SAS SSDs, and also tested both on the NVMe. I
think the performance is slightly better with it all on the NVMe (hdds
being the bottleneck... tests in VMs show the same, but rados bench
looks a tiny bit better). The bcache partition is shared by the osds,
and the journals are separate partitions.

I'm not sure it's really triple overhead. bcache doesn't write all your
data to the writeback cache... just as much small sync writes as long as
the cache doesn't fill up, or get too busy (based on await). And the
bcache device flushes very slowly to the hdd, not overloading it (unless
cache is full). And when I make it do it faster, it seems to do it more
quickly than without bcache (like it does it more sequentially, or
without sync; but I didn't really measure... just looked at, eg. 400MB
dirty data, and then it flushes in 20 seconds). And if you overwrite the
same data a few times (like a filesystem journal, or some fs metadata),
you'd think it wouldn't have to write it more than once to the hdd in
the end. Maybe that means something small like leveldb isn't written
often to the hdd.

And it's not just a write cache. The default is 10% writeback, which
means the rest is read cache. And it keeps read stats so it knows which
data is the most popular. My nodes right now show 33-44% cache hits
(cache is too small I think). And bcache reorders writes on the cache
device so they are sequential, and can write to both at the same time so
it can actually go faster than a pure ssd in specific situations (mixed
sequential and random, only until the cache fills).

I think I owe you another graph later when I put all my VMs on there
(probably finally fixed my rbd snapshot hanging VM issue ...worked
around it by disabling exclusive-lock,object-map,fast-diff). The
bandwidth hungry ones (which hung the most often) were moved shortly
after the bcache change, and it's hard to explain how it affects the
graphs... easier to see with iostat while changing it and having a mix
of cache and not than ganglia afterwards.

Peter

On 03/23/17 21:18, Nick Fisk wrote:
>
> Hi Peter,
>
>  
>
> Interesting graph. Out of interest, when you use bcache, do you then
> just leave the journal collocated on the combined bcache device and
> rely on the writeback to provide journal performance, or do you still
> create a separate partition on whatever SSD/NVME you use, effectively
> giving triple write overhead?
>
>  
>
> Nick
>
>  
>
> *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
> Behalf Of *Peter Maloney
> *Sent:* 22 March 2017 10:06
> *To:* Alex Gorbachev ; ceph-users
> 
> *Subject:* Re: [ceph-users] Preconditioning an RBD image
>
>  
>
> Does iostat (eg.  iostat -xmy 1 /dev/sd[a-z]) show high util% or await
> during these problems?
>
> Ceph filestore requires lots of metadata writing (directory splitting
> for example), xattrs, leveldb, etc. which are small sync writes that
> HDDs are bad at (100-300 iops), and SSDs are good at (cheapo would be
> 6k iops, and not so crazy DC/NVMe would be 20-200k iops and more). So
> in theory, these things are mitigated by using an SSD, like bcache on
> your osd device. You could also try something like that, at least to test.
>
> I have tested with bcache in writeback mode and found hugely obvious
> differences seen by iostat, for example here's my before and after
> (heavier load due to converting week 49-50 or so, and the highest
> spikes being the scrub infinite loop bug in 10.2.3):
>
> http://www.brockmann-consult.de/ganglia/graph.php?cs=10%2F25%2F2016+10%3A27=03%2F09%2F2017+17%3A26=xlarge
> []=ceph.*[]=sd[c-z]_await=show=1=100
>
> But when you share a cache device, you get a single point of failure
> (and bcache, like all software, can be assumed to have bugs too). And
> I recommend vanilla kernel 4.9 or later which has many bcache fixes,
> or Ubuntu's 4.4 kernel which has the specific fixes I checked for.
>
> On 03/21/17 23:22, Alex Gorbachev wrote:
>
> I wanted to share the recent experience, in which a few RBD
> volumes, formatted as XFS and exported via Ubuntu
> NFS-kernel-server performed poorly, even generated an "out of
> space" warnings on a nearly empty filesystem.  I tried a variety
> of hacks and fixes to no effect, until things started 

[ceph-users] CentOS7 Mounting Problem

2017-03-23 Thread Georgios Dimitrakakis

Hello Ceph community!

I would like some help with a new CEPH installation.

I have install Jewel on CentOS7 and after the reboot my OSDs are not 
mount automatically and as a consequence ceph is not operating 
normally...


What can I do?

Could you please help me solve the problem?


Regards,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs cache tiering - hitset

2017-03-23 Thread Nick Fisk
 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mike 
Lovell
Sent: 20 March 2017 22:31
To: n...@fisk.me.uk
Cc: Webert de Souza Lima ; ceph-users 

Subject: Re: [ceph-users] cephfs cache tiering - hitset

 

 

 

On Mon, Mar 20, 2017 at 4:20 PM, Nick Fisk  > wrote:

Just a few corrections, hope you don't mind

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mike Lovell
> Sent: 20 March 2017 20:30
> To: Webert de Souza Lima   >
> Cc: ceph-users  >
> Subject: Re: [ceph-users] cephfs cache tiering - hitset
>
> i'm not an expert but here is my understanding of it. a hit_set keeps track of
> whether or not an object was accessed during the timespan of the hit_set.
> for example, if you have a hit_set_period of 600, then the hit_set covers a
> period of 10 minutes. the hit_set_count defines how many of the hit_sets to
> keep a record of. setting this to a value of 12 with the 10 minute
> hit_set_period would mean that there is a record of objects accessed over a
> 2 hour period. the min_read_recency_for_promote, and its newer
> min_write_recency_for_promote sibling, define how many of these hit_sets
> and object must be in before and object is promoted from the storage pool
> into the cache pool. if this were set to 6 with the previous examples, it 
> means
> that the cache tier will promote an object if that object has been accessed at
> least once in 6 of the 12 10-minute periods. it doesn't matter how many
> times the object was used in each period and so 6 requests in one 10-minute
> hit_set will not cause a promotion. it would be any number of access in 6
> separate 10-minute periods over the 2 hours.

Sort of, the recency looks at the last N most recent hitsets. So if set to 6, 
then the object would have to be in all last 6 hitsets. Because of this, during 
testing I found setting recency above 2 or 3 made the behavior quite binary. If 
an object was hot enough, it would probably be in every hitset, if it was only 
warm it would never be in enough hitsets in row. I did experiment with X out of 
N promotion logic, ie must be in 3 hitsets out of 10 non sequential. If you 
could find the right number to configure, you could get improved cache 
behavior, but if not, then there was a large chance it would be worse.

For promotion I think having more hitsets probably doesn't add much value, but 
they may help when it comes to determining what to flush.

 

that's good to know. i just made an assumption without actually digging in to 
the code. do you recommend keeping the number of hitsets equal to the max of 
either min_read_recency_for_promote and min_write_recency_for_promote? how are 
the hitsets checked during flush and/or eviction?

 

 

Possibly, I’ve not really looked into how effective the hitsets are for 
determining what to flush. But hitset overhead is minimal, so I normally just 
stick with 10 hitsets and don’t even think about it anymore.

 

 

mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-23 Thread Deepak Naidu
Fixing typo

 What version of ceph-fuse?
ceph-fuse-10.2.6-0.el7.x86_64

--
Deepak

-Original Message-
From: Deepak Naidu 
Sent: Thursday, March 23, 2017 9:49 AM
To: John Spray
Cc: ceph-users
Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or 
kernel cephfs mount

>> What version of ceph-fuse?

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611)
 
--
Deepak

>> On Mar 23, 2017, at 6:28 AM, John Spray  wrote:
>> 
>> On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu  wrote:
>> Hi John,
>> 
>> 
>> 
>> I tried the below option for ceph-fuse & kernel mount. Below is what 
>> I see/error.
>> 
>> 
>> 
>> 1)  When trying using ceph-fuse, the mount command succeeds but I see
>> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if 
>> this is normal message or some error
> 
> What version of ceph-fuse?
> 
> John
> 
>> 
>> 2)  When trying the kernel mount, the mount command just hangs & after
>> few seconds I see mount error 5 = Input/output error. I am using 
>> 4.9.15-040915-generic kernel on Ubuntu 16.x
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> 
>> -Original Message-
>> From: John Spray [mailto:jsp...@redhat.com]
>> Sent: Wednesday, March 22, 2017 6:16 AM
>> To: Deepak Naidu
>> Cc: ceph-users
>> Subject: Re: [ceph-users] How to mount different ceph FS using 
>> ceph-fuse or kernel cephfs mount
>> 
>> 
>> 
>>> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu  wrote:
>>> 
>>> Greetings,
>> 
>> 
>> 
>> 
>>> I have below two cephFS "volumes/filesystem" created on my ceph
>> 
>>> cluster. Yes I used the "enable_multiple" flag to enable the 
>>> multiple
>> 
>>> cephFS feature. My question
>> 
>> 
>> 
>> 
>>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>> 
>>> either using kernel mount of ceph-fuse mount.
>> 
>> 
>> 
>> The option for ceph_fuse is --client_mds_namespace=dataX (you can do 
>> this on the command line or in your ceph.conf)
>> 
>> 
>> 
>> With the kernel client use "-o mds_namespace=DataX" (assuming you 
>> have a sufficiently recent kernel)
>> 
>> 
>> 
>> Cheers,
>> 
>> John
>> 
>> 
>> 
>> 
>>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>>> during
>> 
>>> the fuse mount or kernel mount
>> 
>> 
>> 
>> 
>> 
>> 
>>> [root@Admin ~]# ceph fs ls
>> 
>> 
>>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>> 
>> 
>> 
>> 
>>> --
>> 
>> 
>>> Deepak
>> 
>> 
>>> 
>> 
>>> This email message is for the sole use of the intended recipient(s)
>> 
>>> and may contain confidential information.  Any unauthorized review,
>> 
>>> use, disclosure or distribution is prohibited.  If you are not the
>> 
>>> intended recipient, please contact the sender by reply email and
>> 
>>> destroy all copies of the original message.
>> 
>>> 
>> 
>> 
>>> ___
>> 
>>> ceph-users mailing list
>> 
>>> ceph-users@lists.ceph.com
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Preconditioning an RBD image

2017-03-23 Thread Nick Fisk
Hi Peter,

 

Interesting graph. Out of interest, when you use bcache, do you then just
leave the journal collocated on the combined bcache device and rely on the
writeback to provide journal performance, or do you still create a separate
partition on whatever SSD/NVME you use, effectively giving triple write
overhead?

 

Nick

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Peter Maloney
Sent: 22 March 2017 10:06
To: Alex Gorbachev ; ceph-users

Subject: Re: [ceph-users] Preconditioning an RBD image

 

Does iostat (eg.  iostat -xmy 1 /dev/sd[a-z]) show high util% or await
during these problems?

Ceph filestore requires lots of metadata writing (directory splitting for
example), xattrs, leveldb, etc. which are small sync writes that HDDs are
bad at (100-300 iops), and SSDs are good at (cheapo would be 6k iops, and
not so crazy DC/NVMe would be 20-200k iops and more). So in theory, these
things are mitigated by using an SSD, like bcache on your osd device. You
could also try something like that, at least to test.

I have tested with bcache in writeback mode and found hugely obvious
differences seen by iostat, for example here's my before and after (heavier
load due to converting week 49-50 or so, and the highest spikes being the
scrub infinite loop bug in 10.2.3): 

http://www.brockmann-consult.de/ganglia/graph.php?cs=10%2F25%2F2016+10%3A27

=03%2F09%2F2017+17%3A26=xlarge[]=ceph.*[]=sd[c-z]_await
nd=show=1=100

But when you share a cache device, you get a single point of failure (and
bcache, like all software, can be assumed to have bugs too). And I recommend
vanilla kernel 4.9 or later which has many bcache fixes, or Ubuntu's 4.4
kernel which has the specific fixes I checked for.

On 03/21/17 23:22, Alex Gorbachev wrote:

I wanted to share the recent experience, in which a few RBD volumes,
formatted as XFS and exported via Ubuntu NFS-kernel-server performed poorly,
even generated an "out of space" warnings on a nearly empty filesystem.  I
tried a variety of hacks and fixes to no effect, until things started
magically working just after some dd write testing. 

 

The only explanation I can come up with is that preconditioning, or
thickening, the images with this benchmarking is what caused the
improvement.

 

Ceph is Hammer 0.94.7 running on Ubuntu 14.04, kernel 4.10 on OSD nodes and
4.4 on NFS nodes.

 

Regards,

Alex

Storcium

-- 

-- 

Alex Gorbachev

Storcium






___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

 

-- 
 

Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
 
Internet: http://www.brockmann-consult.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to think a two different disk's technologies architecture

2017-03-23 Thread Maxime Guyot
Hi Alexandro,

As I understand you are planning NVMe for Journal for SATA HDD and collocated 
journal for SATA SSD?

Option 1:
- 24x SATA SSDs per server, will have a bottleneck with the storage 
bus/controller.  Also, I would consider the network capacity 24xSSDs will 
deliver more performance than 24xHDD with journal, but you have the same 
network capacity on both types of nodes.
- This option is a little easier to implement: just move nodes in different 
CRUSHmap root
- Failure of a server (assuming size = 3) will impact all PGs
Option 2:
- You may have noisy neighbors effect between HDDs and SSDs, if HDDs are able 
to saturate your NICs or storage controller. So be mindful of this with the 
hardware design
- To configure the CRUSHmap for this you need to split each server in 2, I 
usually use “server1-hdd” and “server1-ssd” and map the right OSD in the right 
bucket, so a little extra work here but you can easily fix a “crush location 
hook” script for it (see example 
http://www.root314.com/2017/01/15/Ceph-storage-tiers/)
- In case of a server failure recovery will be faster than option 1 and will 
impact less PGs

Some general notes:
- SSD pools perform better with higher frequency CPUs
- the 1GB of RAM per TB is a little outdated, the current consensus for HDD 
OSDs is around 2GB/OSD (see 
https://www.redhat.com/cms/managed-files/st-rhcs-config-guide-technology-detail-inc0387897-201604-en.pdf)
- Network wise, if the SSD OSDs are rated for 500MB/s and use collocated 
journal you could generate up to 250MB/s of traffic per SSD OSD (24Gbps for 12x 
or 48Gbps for 24x) therefore I would consider doing 4x10G and consolidate both 
client and cluster network on that

Cheers,
Maxime

On 23/03/17 18:55, "ceph-users on behalf of Alejandro Comisario" 
 wrote:

Hi everyone!
I have to install a ceph cluster (6 nodes) with two "flavors" of
disks, 3 servers with SSD and 3 servers with SATA.

Y will purchase 24 disks servers (the ones with sata with NVE SSD for
the SATA journal)
Processors will be 2 x E5-2620v4 with HT, and ram will be 20GB for the
OS, and 1.3GB of ram per storage TB.

The servers will have 2 x 10Gb bonding for public network and 2 x 10Gb
for cluster network.
My doubts resides, ar want to ask the community about experiences and
pains and gains of choosing between.

Option 1
3 x servers just for SSD
3 x servers jsut for SATA

Option 2
6 x servers with 12 SSD and 12 SATA each

Regarding crushmap configuration and rules everything is clear to make
sure that two pools (poolSSD and poolSATA) uses the right disks.

But, what about performance, maintenance, architecture scalability, etc ?

thank you very much !

-- 
Alejandrito
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Setting a different number of minimum replicas for reading and writing operations

2017-03-23 Thread Gregory Farnum
Nope. This is a theoretical possibility but would take a lot of code change
that nobody has embarked upon yet.
-Greg
On Wed, Mar 22, 2017 at 2:16 PM Sergio A. de Carvalho Jr. <
scarvalh...@gmail.com> wrote:

> Hi all,
>
> Is it possible to create a pool where the minimum number of replicas for
> the write operation to be confirmed is 2 but the minimum number of replicas
> to allow the object to be read is 1?
>
> This would be useful when a pool consists of immutable objects, so we'd
> have:
> * size 3 (we always keep 3 replicas of all objects)
> * min size for write 2 (write is complete once 2 replicas are created)
> * min size for read 1 (read is allowed even if only 1 copy of the object
> is available)
>
> Thanks,
>
> Sergio
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to think a two different disk's technologies architecture

2017-03-23 Thread Udo Lembke
Hi,
ceph speeds up with more nodes and more OSDs - so go for 6 nodes with
mixed SSD+SATA.

Udo

On 23.03.2017 18:55, Alejandro Comisario wrote:
> Hi everyone!
> I have to install a ceph cluster (6 nodes) with two "flavors" of
> disks, 3 servers with SSD and 3 servers with SATA.
>
> Y will purchase 24 disks servers (the ones with sata with NVE SSD for
> the SATA journal)
> Processors will be 2 x E5-2620v4 with HT, and ram will be 20GB for the
> OS, and 1.3GB of ram per storage TB.
>
> The servers will have 2 x 10Gb bonding for public network and 2 x 10Gb
> for cluster network.
> My doubts resides, ar want to ask the community about experiences and
> pains and gains of choosing between.
>
> Option 1
> 3 x servers just for SSD
> 3 x servers jsut for SATA
>
> Option 2
> 6 x servers with 12 SSD and 12 SATA each
>
> Regarding crushmap configuration and rules everything is clear to make
> sure that two pools (poolSSD and poolSATA) uses the right disks.
>
> But, what about performance, maintenance, architecture scalability, etc ?
>
> thank you very much !
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The performance of ceph with RDMA

2017-03-23 Thread Deepak Naidu
RDMA is of interest to me. So my below comment.

>> What surprised me is that the result of RDMA mode is almost the same as the 
>> basic mode, the iops, latency, throughput, etc.

Pardon  my knowledge here. If I read your ceph.conf and your notes. It seems 
that you are using RDMA only for “cluster/private network” ? so how do you 
expect RDMA to be efficient on client IOPS/latency/throughput.


--
Deepak


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Haomai 
Wang
Sent: Thursday, March 23, 2017 4:34 AM
To: Hung-Wei Chiu (邱宏瑋)
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] The performance of ceph with RDMA



On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) 
> wrote:
Hi,

I use the latest (master branch, upgrade at 2017/03/22) to build ceph with RDMA 
and use the fio to test its iops/latency/throughput.

In my environment, I setup 3 hosts and list the detail of each host below.

OS: ubuntu 16.04
Storage: SSD * 4 (256G * 4)
Memory: 64GB.
NICs: two NICs, one (intel 1G) for public network and the other (mellanox 10G) 
for private network.

There're 3 monitor and 24 osds equally distributed within 3 hosts which means 
each hosts contains 1 mon and 8 osds.

For my experiment, I use two configs, basic and RDMA.

Basic
[global]
 fsid = 
0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024


RDMA
[global]
 fsid = 
0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024
ms_type=async+rdma
ms_async_rdma_device_name = mlx4_0


What surprised me is that the result of RDMA mode is almost the same as the 
basic mode, the iops, latency, throughput, etc.
I also try to use different pattern of the fio parameter, such as read and 
write ratio, random operations or sequence operations.
All results are the same.

yes, most of latency comes from other components now.. although we still want 
to avoid extra copy in rdma side.

so current rdma backend only means it just can be choice compared to tcp/ip 
network. more benefits need to be get from others.


In order to figure out what's going on. I do the following steps.

1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to make 
sure my RDMA environment.
2. To make sure the network traffic is transmitted by RDMA, I dump the traffic 
within the private network and the answear is yes. it use the RDMA.
3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.
4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.
5. Modify the ms_async_rdma_receive_buffers to 2048, no change.

After above operations, I guess maybe my Ceph setup environment is not good for 
RDMA to improve the performance.

Do anyone know what kind of the ceph environment (replicated size, # of osd, # 
of mon, etc) is good for RDMA?

Thanks in advanced.



Best Regards,

Hung-Wei Chiu(邱宏瑋)
--
Computer Center, Department of Computer Science
National Chiao Tung University

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to think a two different disk's technologies architecture

2017-03-23 Thread Alejandro Comisario
Hi everyone!
I have to install a ceph cluster (6 nodes) with two "flavors" of
disks, 3 servers with SSD and 3 servers with SATA.

Y will purchase 24 disks servers (the ones with sata with NVE SSD for
the SATA journal)
Processors will be 2 x E5-2620v4 with HT, and ram will be 20GB for the
OS, and 1.3GB of ram per storage TB.

The servers will have 2 x 10Gb bonding for public network and 2 x 10Gb
for cluster network.
My doubts resides, ar want to ask the community about experiences and
pains and gains of choosing between.

Option 1
3 x servers just for SSD
3 x servers jsut for SATA

Option 2
6 x servers with 12 SSD and 12 SATA each

Regarding crushmap configuration and rules everything is clear to make
sure that two pools (poolSSD and poolSATA) uses the right disks.

But, what about performance, maintenance, architecture scalability, etc ?

thank you very much !

-- 
Alejandrito
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly - APR

2017-03-23 Thread Patrick McGarry
Hey cephers,

Just a reminder that the next Ceph Developer Monthly meeting is coming up:

http://wiki.ceph.com/Planning

If you have work that you are doing that is feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

http://wiki.ceph.com/CDM_05-APR-2017

If you have questions or comments, please let me know. Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-23 Thread Alejandro Comisario
Deffinitelly in our case OSD were not the guilty ones, since all osd that
where blocking requests allways from the same pool, worked flawlesly (and
still do) after we deleted the pool where we always saw the blocked PG's.

Since the pool was accesed by just one client, and had almost no ops to it,
i really dont know how to reproduce the issue, but surely scares me to
happen ever again, and most, taking into consideration that blocked iops on
a OSD could cascade through all the cluster and block all other pools.

It was technically hard to explain to the management that one 1.5GB pool
locked almost 250 vms from different TB size pools, and most of it, not
having a root cause (meaning, why only that pool generated blocked iops)

hope to hear some more technical insights or someone else that went through
the same.
best.

On Thu, Mar 23, 2017 at 5:47 AM, Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

> I think Greg (who appears to be a ceph committer) basically said he was
> interested in looking at it, if only you had the pool that failed this way.
>
> Why not try to reproduce it, and make a log of your procedure so he can
> reproduce it too? What caused the slow requests... copy on write from
> snapshots? A bad disk? exclusive-lock with 2 clients writing at the same
> time maybe?
>
> I'd be interested in a solution too... like why can't idle disks (non-full
> disk queue) mean that the osd op or whatever queue can still fill with
> requests not related to the blocked pg/objects? I would love for ceph to
> handle this better. I suspect some issues I have are related to this (slow
> requests on one VM can freeze others [likely blame the osd], even requiring
> kill -9 [likely blame client librbd]).
>
> On 03/22/17 16:18, Alejandro Comisario wrote:
>
> any thoughts ?
>
> On Tue, Mar 14, 2017 at 10:22 PM, Alejandro Comisario <
> alejan...@nubeliu.com> wrote:
>
>> Greg, thanks for the reply.
>> True that i cant provide enough information to know what happened since
>> the pool is gone.
>>
>> But based on your experience, can i please take some of your time, and
>> give me the TOP 5 fo what could happen / would be the reason to happen what
>> hapened to that pool (or any pool) that makes Ceph (maybe hapened
>> specifically in Hammer ) to behave like that ?
>>
>> Information that i think will be of value, is that the cluster was 5
>> nodes large, running "0.94.6-1trusty" i added two nodes running the latest
>> "0.94.9-1trusty" and replication into those new disks never ended, since i
>> saw WEIRD errors on the new OSDs, so i thought that packages needed to be
>> the same, so i "apt-get upgraded" the 5 old nodes without restrting
>> nothing, so rebalancing started to happen without errors (WEIRD).
>>
>> after these two nodes reached 100% of the disks weight, the cluster
>> worked perfectly for about two weeks, till this happened.
>> After the resolution from my first email, everything has been working
>> perfect.
>>
>> thanks for the responses.
>>
>>
>> On Fri, Mar 10, 2017 at 4:23 PM, Gregory Farnum 
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 7, 2017 at 10:18 AM Alejandro Comisario <
>>> alejan...@nubeliu.com> wrote:
>>>
 Gregory, thanks for the response, what you've said is by far, the most
 enlightneen thing i know about ceph in a long time.

 What brings even greater doubt, which is, this "non-functional" pool,
 was only 1.5GB large, vs 50-150GB on the other effected pools, the tiny
 pool was still being used, and just because that pool was blovking
 requests, the whole cluster was unresponsive.

 So , what do you mean by "non-functional" pool ? how a pool can become
 non-functional ? and what asures me that tomorrow (just becaue i deleted
 the 1.5GB pool to fix the whole problem) another pool doesnt becomes
 non-functional ?

>>>
>>> Well, you said there were a bunch of slow requests. That can happen any
>>> number of ways, if you're overloading the OSDs or something.
>>> When there are slow requests, those ops take up OSD memory and throttle,
>>> and so they don't let in new messages until the old ones are serviced. This
>>> can cascade across a cluster -- because everything is interconnected,
>>> clients and OSDs end up with all their requests targeted at the slow OSDs
>>> which aren't letting in new IO quickly enough. It's one of the weaknesses
>>> of the standard deployment patterns, but it usually doesn't come up unless
>>> something else has gone pretty wrong first.
>>> As for what actually went wrong here, you haven't provided near enough
>>> information and probably can't now that the pool has been deleted. *shrug*
>>> -Greg
>>>
>>
>


-- 
*Alejandro Comisario*
*CTO | NUBELIU*
E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857
_
www.nubeliu.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
Hello Piotr,

I didn't understand, could you please elaborate about this procedure as
mentioned in the last update.  It would be really helpful if you share any
useful link/doc to understand what you actually meant. Yea correct,
normally we do this procedure but it takes more time. But here my intention
is to how to find out the rpm which caused the change. I think we are in
opposite direction.

>> But wouldn't be faster and/or more convenient if you would just
recompile binaries in-place (or use network symlinks)

Thanks



On Thu, Mar 23, 2017 at 6:47 PM, Piotr Dałek 
wrote:

> On 03/23/2017 02:02 PM, nokia ceph wrote:
>
> Hello Piotr,
>>
>> We do customizing ceph code for our testing purpose. It's a part of our
>> R :)
>>
>> Recompiling source code will create 38 rpm's out of these I need to find
>> which one is the correct rpm which I made change in the source code.
>> That's
>> what I'm try to figure out.
>>
>
> Yes, I understand that. But wouldn't be faster and/or more convenient if
> you would just recompile binaries in-place (or use network symlinks)
> instead of packaging entire Ceph and (re)installing its packages each time
> you do the change? Generating RPMs takes a while.
>
>
> --
> Piotr Dałek
> piotr.da...@corp.ovh.com
> https://www.ovh.com/us/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-23 Thread Deepak Naidu
>> What version of ceph-fuse?

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611)
 
--
Deepak

>> On Mar 23, 2017, at 6:28 AM, John Spray  wrote:
>> 
>> On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu  wrote:
>> Hi John,
>> 
>> 
>> 
>> I tried the below option for ceph-fuse & kernel mount. Below is what I
>> see/error.
>> 
>> 
>> 
>> 1)  When trying using ceph-fuse, the mount command succeeds but I see
>> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if this is
>> normal message or some error
> 
> What version of ceph-fuse?
> 
> John
> 
>> 
>> 2)  When trying the kernel mount, the mount command just hangs & after
>> few seconds I see mount error 5 = Input/output error. I am using
>> 4.9.15-040915-generic kernel on Ubuntu 16.x
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> 
>> -Original Message-
>> From: John Spray [mailto:jsp...@redhat.com]
>> Sent: Wednesday, March 22, 2017 6:16 AM
>> To: Deepak Naidu
>> Cc: ceph-users
>> Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or
>> kernel cephfs mount
>> 
>> 
>> 
>>> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu  wrote:
>>> 
>>> Greetings,
>> 
>> 
>> 
>> 
>>> I have below two cephFS “volumes/filesystem” created on my ceph
>> 
>>> cluster. Yes I used the “enable_multiple” flag to enable the multiple
>> 
>>> cephFS feature. My question
>> 
>> 
>> 
>> 
>>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>> 
>>> either using kernel mount of ceph-fuse mount.
>> 
>> 
>> 
>> The option for ceph_fuse is --client_mds_namespace=dataX (you can do this on
>> the command line or in your ceph.conf)
>> 
>> 
>> 
>> With the kernel client use "-o mds_namespace=DataX" (assuming you have a
>> sufficiently recent kernel)
>> 
>> 
>> 
>> Cheers,
>> 
>> John
>> 
>> 
>> 
>> 
>>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>>> during
>> 
>>> the fuse mount or kernel mount
>> 
>> 
>> 
>> 
>> 
>> 
>>> [root@Admin ~]# ceph fs ls
>> 
>> 
>>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>> 
>> 
>> 
>> 
>>> --
>> 
>> 
>>> Deepak
>> 
>> 
>>> 
>> 
>>> This email message is for the sole use of the intended recipient(s)
>> 
>>> and may contain confidential information.  Any unauthorized review,
>> 
>>> use, disclosure or distribution is prohibited.  If you are not the
>> 
>>> intended recipient, please contact the sender by reply email and
>> 
>>> destroy all copies of the original message.
>> 
>>> 
>> 
>> 
>>> ___
>> 
>>> ceph-users mailing list
>> 
>>> ceph-users@lists.ceph.com
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Tech Talk in 20 mins

2017-03-23 Thread Patrick McGarry
Hey cephers,

Just a reminder that the next Ceph Tech Talk will begin in
approximately 20 minutes. I hope you can all join us:

http://ceph.com/ceph-tech-talks/


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ImportError: No module named ceph_deploy.cli

2017-03-23 Thread c . monty
Hello!

I have installed 
ceph-deploy-1.5.36git.1479985814.c561890-6.6.noarch.rpm
on SLES11 SP4.

When I start ceph-deploy, I get an error:
ceph@ldcephadm:~/dlm-lve-cluster> ceph-deploy new ldcephmon1
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 18, in 
from ceph_deploy.cli import main
ImportError: No module named ceph_deploy.cli


What is causing this error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-23 Thread John Spray
On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu  wrote:
> Hi John,
>
>
>
> I tried the below option for ceph-fuse & kernel mount. Below is what I
> see/error.
>
>
>
> 1)  When trying using ceph-fuse, the mount command succeeds but I see
> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if this is
> normal message or some error

What version of ceph-fuse?

John

>
> 2)  When trying the kernel mount, the mount command just hangs & after
> few seconds I see mount error 5 = Input/output error. I am using
> 4.9.15-040915-generic kernel on Ubuntu 16.x
>
>
>
> --
>
> Deepak
>
>
>
> -Original Message-
> From: John Spray [mailto:jsp...@redhat.com]
> Sent: Wednesday, March 22, 2017 6:16 AM
> To: Deepak Naidu
> Cc: ceph-users
> Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or
> kernel cephfs mount
>
>
>
> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu  wrote:
>
>> Greetings,
>
>>
>
>>
>
>>
>
>> I have below two cephFS “volumes/filesystem” created on my ceph
>
>> cluster. Yes I used the “enable_multiple” flag to enable the multiple
>
>> cephFS feature. My question
>
>>
>
>>
>
>>
>
>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>
>> either using kernel mount of ceph-fuse mount.
>
>
>
> The option for ceph_fuse is --client_mds_namespace=dataX (you can do this on
> the command line or in your ceph.conf)
>
>
>
> With the kernel client use "-o mds_namespace=DataX" (assuming you have a
> sufficiently recent kernel)
>
>
>
> Cheers,
>
> John
>
>
>
>>
>
>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>> during
>
>> the fuse mount or kernel mount
>
>>
>
>>
>
>>
>
>>
>
>>
>
>> [root@Admin ~]# ceph fs ls
>
>>
>
>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>
>> [rcpool_cepfsData ]
>
>>
>
>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>
>> [rcpool_cepfsData ]
>
>>
>
>>
>
>>
>
>>
>
>>
>
>> --
>
>>
>
>> Deepak
>
>>
>
>> 
>
>> This email message is for the sole use of the intended recipient(s)
>
>> and may contain confidential information.  Any unauthorized review,
>
>> use, disclosure or distribution is prohibited.  If you are not the
>
>> intended recipient, please contact the sender by reply email and
>
>> destroy all copies of the original message.
>
>> 
>
>>
>
>> ___
>
>> ceph-users mailing list
>
>> ceph-users@lists.ceph.com
>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread Piotr Dałek

On 03/23/2017 02:02 PM, nokia ceph wrote:


Hello Piotr,

We do customizing ceph code for our testing purpose. It's a part of our R :)

Recompiling source code will create 38 rpm's out of these I need to find
which one is the correct rpm which I made change in the source code. That's
what I'm try to figure out.


Yes, I understand that. But wouldn't be faster and/or more convenient if you 
would just recompile binaries in-place (or use network symlinks) instead of 
packaging entire Ceph and (re)installing its packages each time you do the 
change? Generating RPMs takes a while.


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
Hello Piotr,

We do customizing ceph code for our testing purpose. It's a part of our R
:)

Recompiling source code will create 38 rpm's out of these I need to find
which one is the correct rpm which I made change in the source code. That's
what I'm try to figure out.

Thanks

On Thu, Mar 23, 2017 at 6:18 PM, Piotr Dałek 
wrote:

> On 03/23/2017 01:41 PM, nokia ceph wrote:
>
>> Hey brad,
>>
>> Thanks for the info.
>>
>> Yea we know that these are test rpm's.
>>
>> The idea behind my question is if I made any changes in the ceph source
>> code, then I recompile it. Then I need to find which is the appropriate
>> rpm
>> mapped to that changed file. If I find the exact RPM, then apply that RPM
>> in
>> our existing ceph cluster instead of applying/overwriting  all the
>> compiled
>> rpms.
>>
>> I hope this cleared your doubt.
>>
>
> And why exactly you want to rebuild rpms each time? If the machines are
> powerful enough, you could recompile binaries in place. Or symlink them via
> nfs (or whatever) to build machine and build once there.
>
> --
> Piotr Dałek
> piotr.da...@corp.ovh.com
> https://www.ovh.com/us/
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread Piotr Dałek

On 03/23/2017 01:41 PM, nokia ceph wrote:

Hey brad,

Thanks for the info.

Yea we know that these are test rpm's.

The idea behind my question is if I made any changes in the ceph source
code, then I recompile it. Then I need to find which is the appropriate rpm
mapped to that changed file. If I find the exact RPM, then apply that RPM in
our existing ceph cluster instead of applying/overwriting  all the compiled
rpms.

I hope this cleared your doubt.


And why exactly you want to rebuild rpms each time? If the machines are 
powerful enough, you could recompile binaries in place. Or symlink them via 
nfs (or whatever) to build machine and build once there.


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
Hey brad,

Thanks for the info.

Yea we know that these are test rpm's.

The idea behind my question is if I made any changes in the ceph source
code, then I recompile it. Then I need to find which is the appropriate rpm
mapped to that changed file. If I find the exact RPM, then apply that RPM
in our existing ceph cluster instead of applying/overwriting  all the
compiled rpms.

I hope this cleared your doubt.

Thanks




On Wed, Mar 22, 2017 at 5:47 AM, Brad Hubbard  wrote:

> Based solely on the information given the only rpms with this specific
> commit in
> them would be here
> https://shaman.ceph.com/builds/ceph/wip-prune-past-intervals-kraken/
> (specifically
> https://4.chacra.ceph.com/r/ceph/wip-prune-past-intervals-kraken/
> 8263140fe539f9c3241c1c0f6ee9cfadde9178c0/centos/7/flavors/default/x86_64/
> ).
> These are test rpms, not official releases.
>
> Note that the branch "wip-prune-past-intervals-kraken" exists only in the
> ceph-ci repo and *not* the main ceph repo and that the particular commit
> above
> does not seem to have made it into the "ceph" repo.
>
> $ git log -S _simplify_past_intervals
> $ git log --grep="_simplify_past_intervals"
> $
>
> Given this commit is not in the ceph repo I would suggest we have never
> shipped
> an official rpm that contains this commit.
>
> It's not totally clear to me exactly what you are trying to achieve, maybe
> you
> could have another go at describing your objective?
>
> On Wed, Mar 22, 2017 at 12:26 AM, nokia ceph 
> wrote:
> > Hello,
> >
> > I made some changes in the below file on ceph kraken v11.2.0 source code
> as
> > per this article
> >
> > https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken
> >
> > ..src/osd/PG.cc
> > ..src/osd/PG.h
> >
> > Is there any way to find which rpm got affected by these two files. I
> > believe it should be ceph-osd-11.2.0-0.el7.x86_64.rpm . Can you confirm
> > please ?
> >
> > I failed to find it from the ceph.spec file.
> >
> > Could anyone please guide me the right procedure to check this.
> >
> > The main intention is that if we find the exact rpm affected by these
> files,
> > we can simply overwrite it with the old rpm.
> >
> > Awaiting for comments.
> >
> > Thanks
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit
Hi,

no i did not enable the journalling feature since we do not use mirroring.


On Thu, Mar 23, 2017 at 08:10:05PM +0800, Dongsheng Yang wrote:
> Did you enable the journaling feature?
> 
> On 03/23/2017 07:44 PM, Christoph Adomeit wrote:
> >Hi Yang,
> >
> >I mean "any write" to this image.
> >
> >I am sure we have a lot of not-used-anymore rbd images in our pool and I am 
> >trying to identify them.
> >
> >The mtime would be a good hint to show which images might be unused.
> >
> >Christoph
> >
> >On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:
> >>Hi Christoph,
> >>
> >>On 03/23/2017 07:16 PM, Christoph Adomeit wrote:
> >>>Hello List,
> >>>
> >>>i am wondering if there is meanwhile an easy method in ceph to find more 
> >>>information about rbd-images.
> >>>
> >>>For example I am interested in the modification time of an rbd image.
> >>Do you mean some metadata changing? such as resize?
> >>
> >>Or any write to this image?
> >>
> >>Thanx
> >>Yang
> >>>I found some posts from 2015 that say we have to go over all the objects 
> >>>of an rbd image and find the newest mtime put this is not a preferred 
> >>>solution for me. It takes to much time and too many system resources.
> >>>
> >>>Any Ideas ?
> >>>
> >>>Thanks
> >>>   Christoph
> >>>
> >>>
> >>>___
> >>>ceph-users mailing list
> >>>ceph-users@lists.ceph.com
> >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> 

-- 
Es gibt keine  Cloud, es gibt nur die Computer anderer Leute
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Dongsheng Yang

Did you enable the journaling feature?

On 03/23/2017 07:44 PM, Christoph Adomeit wrote:

Hi Yang,

I mean "any write" to this image.

I am sure we have a lot of not-used-anymore rbd images in our pool and I am 
trying to identify them.

The mtime would be a good hint to show which images might be unused.

Christoph

On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:

Hi Christoph,

On 03/23/2017 07:16 PM, Christoph Adomeit wrote:

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find more 
information about rbd-images.

For example I am interested in the modification time of an rbd image.

Do you mean some metadata changing? such as resize?

Or any write to this image?

Thanx
Yang

I found some posts from 2015 that say we have to go over all the objects of an 
rbd image and find the newest mtime put this is not a preferred solution for 
me. It takes to much time and too many system resources.

Any Ideas ?

Thanks
   Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit
Hi Yang,

I mean "any write" to this image.

I am sure we have a lot of not-used-anymore rbd images in our pool and I am 
trying to identify them.

The mtime would be a good hint to show which images might be unused.

Christoph

On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:
> Hi Christoph,
> 
> On 03/23/2017 07:16 PM, Christoph Adomeit wrote:
> >Hello List,
> >
> >i am wondering if there is meanwhile an easy method in ceph to find more 
> >information about rbd-images.
> >
> >For example I am interested in the modification time of an rbd image.
> 
> Do you mean some metadata changing? such as resize?
> 
> Or any write to this image?
> 
> Thanx
> Yang
> >
> >I found some posts from 2015 that say we have to go over all the objects of 
> >an rbd image and find the newest mtime put this is not a preferred solution 
> >for me. It takes to much time and too many system resources.
> >
> >Any Ideas ?
> >
> >Thanks
> >   Christoph
> >
> >
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
Es gibt keine  Cloud, es gibt nur die Computer anderer Leute
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Dongsheng Yang



On 03/23/2017 07:32 PM, Dongsheng Yang wrote:

Hi Christoph,

On 03/23/2017 07:16 PM, Christoph Adomeit wrote:

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find 
more information about rbd-images.


For example I am interested in the modification time of an rbd image.


Do you mean some metadata changing? such as resize?


If you mean metadata changing, I think this command would be enough:
$ rados -p rbd stat rbd_header.11e3238e1f29
rbd/rbd_header.11e3238e1f29 mtime 2017-03-23 19:31:52.00, size 0


Or any write to this image?


But if you want this one, I am afraid it's not so handy currently. maybe
 going through the all data blocks of this image and sorting the mtime
of them would be workable.


Thanx
Yang


I found some posts from 2015 that say we have to go over all the 
objects of an rbd image and find the newest mtime put this is not a 
preferred solution for me. It takes to much time and too many system 
resources.


Any Ideas ?

Thanks
   Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The performance of ceph with RDMA

2017-03-23 Thread Haomai Wang
On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) 
wrote:

> Hi,
>
> I use the latest (master branch, upgrade at 2017/03/22) to build ceph with
> RDMA and use the fio to test its iops/latency/throughput.
>
> In my environment, I setup 3 hosts and list the detail of each host below.
>
> OS: ubuntu 16.04
> Storage: SSD * 4 (256G * 4)
> Memory: 64GB.
> NICs: two NICs, one (intel 1G) for public network and the other (mellanox
> 10G) for private network.
>
> There're 3 monitor and 24 osds equally distributed within 3 hosts which
> means each hosts contains 1 mon and 8 osds.
>
> For my experiment, I use two configs, basic and RDMA.
>
> Basic
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
> mon initial members = ceph-1,ceph-2,ceph-3
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
> osd pool default size = 2
> osd pool default pg num = 1024
> osd pool default pgp num = 1024
>
>
> RDMA
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
> mon initial members = ceph-1,ceph-2,ceph-3
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
> osd pool default size = 2
> osd pool default pg num = 1024
> osd pool default pgp num = 1024
> ms_type=async+rdma
> ms_async_rdma_device_name = mlx4_0
>
>
> What surprised me is that the result of RDMA mode is almost the same as
> the basic mode, the iops, latency, throughput, etc.
> I also try to use different pattern of the fio parameter, such as read and
> write ratio, random operations or sequence operations.
> All results are the same.
>

yes, most of latency comes from other components now.. although we still
want to avoid extra copy in rdma side.

so current rdma backend only means it just can be choice compared to tcp/ip
network. more benefits need to be get from others.


>
> In order to figure out what's going on. I do the following steps.
>
> 1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to
> make sure my RDMA environment.
> 2. To make sure the network traffic is transmitted by RDMA, I dump the
> traffic within the private network and the answear is yes. it use the RDMA.
> 3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.
> 4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.
> 5. Modify the ms_async_rdma_receive_buffers to 2048, no change.
>
> After above operations, I guess maybe my Ceph setup environment is not
> good for RDMA to improve the performance.
>
> Do anyone know what kind of the ceph environment (replicated size, # of
> osd, # of mon, etc) is good for RDMA?
>
> Thanks in advanced.
>
>
>
> Best Regards,
>
> Hung-Wei Chiu(邱宏瑋)
> --
> Computer Center, Department of Computer Science
> National Chiao Tung University
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Dongsheng Yang

Hi Christoph,

On 03/23/2017 07:16 PM, Christoph Adomeit wrote:

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find more 
information about rbd-images.

For example I am interested in the modification time of an rbd image.


Do you mean some metadata changing? such as resize?

Or any write to this image?

Thanx
Yang


I found some posts from 2015 that say we have to go over all the objects of an 
rbd image and find the newest mtime put this is not a preferred solution for 
me. It takes to much time and too many system resources.

Any Ideas ?

Thanks
   Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find more 
information about rbd-images.

For example I am interested in the modification time of an rbd image.

I found some posts from 2015 that say we have to go over all the objects of an 
rbd image and find the newest mtime put this is not a preferred solution for 
me. It takes to much time and too many system resources.

Any Ideas ?

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Install issue

2017-03-23 Thread JB Data
Hi,

I follow this link
https://www.howtoforge.com/tutorial/how-to-install-a-ceph-cluster-on-ubuntu-16-04/
to install my uBuntu cluster

I'm stopped at

*Install Ceph on All Nodes*This command crash my VM cause by lack of
ressources.
*ceph-deploy install ceph-admin ceph-osd1 ceph-osd2 ceph-osd3 mon1*

I upgrade my VM config, but now I can't redo the command.

*...*
















*[2017-03-23 05:08:19,898][ceph-admin][DEBUG ] Setting up ceph-common
(10.2.6-1xenial) ...[2017-03-23 05:08:19,915][ceph-admin][DEBUG ] Setting
system user ceph properties..usermod: user ceph is currently used by
process 1610[2017-03-23 05:08:19,918][ceph-admin][DEBUG ] dpkg: error
processing package ceph-common (--configure):[2017-03-23
05:08:19,919][ceph-admin][WARNING] No apport report written because the
error message indicates its a followup error from a previous
failure[2017-03-23 05:08:20,301][ceph-admin][DEBUG ] Errors were
encountered while processing:[2017-03-23 05:08:20,302][ceph-admin][DEBUG ]
ceph-common[2017-03-23 05:08:20,302][ceph-admin][DEBUG ]
ceph-base[2017-03-23 05:08:20,303][ceph-admin][DEBUG ]  ceph-mds[2017-03-23
05:08:20,303][ceph-admin][DEBUG ]  ceph-mon[2017-03-23
05:08:20,303][ceph-admin][DEBUG ]  ceph-osd[2017-03-23
05:08:20,303][ceph-admin][DEBUG ]  radosgw[2017-03-23
05:08:22,220][ceph-admin][WARNING] E: Sub-process /usr/bin/dpkg returned an
error code (1)[2017-03-23 05:08:22,220][ceph-admin][ERROR ] RuntimeError:
command returned non-zero exit status: 100[2017-03-23
05:08:22,221][ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
--assume-yes -q --no-install-recommends install -o
Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw*
I try many ways to clean my install:








*sudo apt autoremovesudo apt-get install -fsudo apt-get updatesudo apt-get
updgradeceph-deploy uninstall ceph-admin ceph-osd1 ceph-osd2 ceph-osd3
ceph-monceph-deploy purge ceph-admin ceph-osd1 ceph-osd2 ceph-osd3
ceph-monceph-deploy purgedata ceph-admin ceph-osd1 ceph-osd2 ceph-osd3
ceph-mon*




*...sudo dpkg --configure -asudo dpkg --purge --force-depends ceph-osd
ceph-mon radosgw  ceph-mds ceph-base ceph-common...sudo vi
/var/lib/dpkg/status*
I remove the cluster, redo the install, anyway, the *ceph-deploy* always
fails.

Can someone help me to solve my problem.

ThU









@JBD 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] The performance of ceph with RDMA

2017-03-23 Thread 邱宏瑋
Hi,

I use the latest (master branch, upgrade at 2017/03/22) to build ceph with
RDMA and use the fio to test its iops/latency/throughput.

In my environment, I setup 3 hosts and list the detail of each host below.

OS: ubuntu 16.04
Storage: SSD * 4 (256G * 4)
Memory: 64GB.
NICs: two NICs, one (intel 1G) for public network and the other (mellanox
10G) for private network.

There're 3 monitor and 24 osds equally distributed within 3 hosts which
means each hosts contains 1 mon and 8 osds.

For my experiment, I use two configs, basic and RDMA.

Basic
[global]

fsid = 0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024


RDMA
[global]

fsid = 0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024
ms_type=async+rdma
ms_async_rdma_device_name = mlx4_0


What surprised me is that the result of RDMA mode is almost the same as the
basic mode, the iops, latency, throughput, etc.
I also try to use different pattern of the fio parameter, such as read and
write ratio, random operations or sequence operations.
All results are the same.

In order to figure out what's going on. I do the following steps.

1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to
make sure my RDMA environment.
2. To make sure the network traffic is transmitted by RDMA, I dump the
traffic within the private network and the answear is yes. it use the RDMA.
3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.
4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.
5. Modify the ms_async_rdma_receive_buffers to 2048, no change.

After above operations, I guess maybe my Ceph setup environment is not good
for RDMA to improve the performance.

Do anyone know what kind of the ceph environment (replicated size, # of
osd, # of mon, etc) is good for RDMA?

Thanks in advanced.



Best Regards,

Hung-Wei Chiu(邱宏瑋)
--
Computer Center, Department of Computer Science
National Chiao Tung University
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-23 Thread Peter Maloney
I think Greg (who appears to be a ceph committer) basically said he was
interested in looking at it, if only you had the pool that failed this way.

Why not try to reproduce it, and make a log of your procedure so he can
reproduce it too? What caused the slow requests... copy on write from
snapshots? A bad disk? exclusive-lock with 2 clients writing at the same
time maybe?

I'd be interested in a solution too... like why can't idle disks
(non-full disk queue) mean that the osd op or whatever queue can still
fill with requests not related to the blocked pg/objects? I would love
for ceph to handle this better. I suspect some issues I have are related
to this (slow requests on one VM can freeze others [likely blame the
osd], even requiring kill -9 [likely blame client librbd]).

On 03/22/17 16:18, Alejandro Comisario wrote:
> any thoughts ?
>
> On Tue, Mar 14, 2017 at 10:22 PM, Alejandro Comisario
> > wrote:
>
> Greg, thanks for the reply.
> True that i cant provide enough information to know what happened
> since the pool is gone.
>
> But based on your experience, can i please take some of your time,
> and give me the TOP 5 fo what could happen / would be the reason
> to happen what hapened to that pool (or any pool) that makes Ceph
> (maybe hapened specifically in Hammer ) to behave like that ?
>
> Information that i think will be of value, is that the cluster was
> 5 nodes large, running "0.94.6-1trusty" i added two nodes running
> the latest "0.94.9-1trusty" and replication into those new disks
> never ended, since i saw WEIRD errors on the new OSDs, so i
> thought that packages needed to be the same, so i "apt-get
> upgraded" the 5 old nodes without restrting nothing, so
> rebalancing started to happen without errors (WEIRD).
>
> after these two nodes reached 100% of the disks weight, the
> cluster worked perfectly for about two weeks, till this happened.
> After the resolution from my first email, everything has been
> working perfect.
>
> thanks for the responses.
>  
>
> On Fri, Mar 10, 2017 at 4:23 PM, Gregory Farnum
> > wrote:
>
>
>
> On Tue, Mar 7, 2017 at 10:18 AM Alejandro Comisario
> > wrote:
>
> Gregory, thanks for the response, what you've said is by
> far, the most enlightneen thing i know about ceph in a
> long time.
>
> What brings even greater doubt, which is, this
> "non-functional" pool, was only 1.5GB large, vs 50-150GB
> on the other effected pools, the tiny pool was still being
> used, and just because that pool was blovking requests,
> the whole cluster was unresponsive.
>
> So , what do you mean by "non-functional" pool ? how a
> pool can become non-functional ? and what asures me that
> tomorrow (just becaue i deleted the 1.5GB pool to fix the
> whole problem) another pool doesnt becomes non-functional ?
>
>
> Well, you said there were a bunch of slow requests. That can
> happen any number of ways, if you're overloading the OSDs or
> something.
> When there are slow requests, those ops take up OSD memory and
> throttle, and so they don't let in new messages until the old
> ones are serviced. This can cascade across a cluster --
> because everything is interconnected, clients and OSDs end up
> with all their requests targeted at the slow OSDs which aren't
> letting in new IO quickly enough. It's one of the weaknesses
> of the standard deployment patterns, but it usually doesn't
> come up unless something else has gone pretty wrong first.
> As for what actually went wrong here, you haven't provided
> near enough information and probably can't now that the pool
> has been deleted. *shrug*
> -Greg
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New metrics.ceph.com!

2017-03-23 Thread Wido den Hollander

> Op 22 maart 2017 om 18:05 schreef Patrick McGarry :
> 
> 
> Hey cephers,
> 
> Just wanted to share that the new interactive metrics dashboard is now
> available for tire-kicking.
> 
> https://metrics.ceph.com
> 

Very nice!

> There are still a few data pipeline issues and other misc cleanup that
> probably needs to happen. We have removed some of the repo tracking to
> be more streamlined, but the history is still there, so I promise it’s
> not quite as dire as it may seem at first.
> 
> The dashboard itself is still built by our friends over at Bitergia
> using their Grimoire Lab tool: http://grimoirelab.github.io which
> includes the ability to either click on the fields, or write your own
> queries against the data. Please feel free to play around with it and
> let me know if you have any questions. Thanks!

Can we add a -record so that you can reach it over IPv6?

Wido

> 
> 
> -- 
> 
> Best Regards,
> 
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com