Re: [ceph-users] Ceph inside Docker containers inside VirtualBox

2019-04-21 Thread Varun Singh
On Fri, Apr 19, 2019 at 6:53 PM Varun Singh  wrote:
>
> On Fri, Apr 19, 2019 at 10:44 AM Varun Singh  wrote:
> >
> > On Thu, Apr 18, 2019 at 9:53 PM Siegfried Höllrigl
> >  wrote:
> > >
> > > Hi !
> > >
> > > I am not 100% sure, but i think, --net=host does not propagate /dev/
> > > inside the conatiner.
> > >
> > >  From the Error Message :
> > >
> > > 2019-04-18 07:30:06  /opt/ceph-container/bin/entrypoint.sh: ERROR- The
> > > device pointed by OSD_DEVICE (/dev/vdd) doesn't exist !
> > >
> > >
> > > I whould say, you should add something like --device=/dev/vdd to the 
> > > docker run command for the osd.
> > >
> > > Br
> > >
> > >
> > > Am 18.04.2019 um 14:46 schrieb Varun Singh:
> > > > Hi,
> > > > I am trying to setup Ceph through Docker inside a VM. My host machine
> > > > is Mac. My VM is an Ubuntu 18.04. Docker version is 18.09.5, build
> > > > e8ff056.
> > > > I am following the documentation present on ceph/daemon Docker Hub
> > > > page. The idea is, if I spawn docker containers as mentioned on the
> > > > page, I should get a ceph setup without KV store. I am not worried
> > > > about KV store as I just want to try it out. Following are the
> > > > commands I am firing to bring the containers up:
> > > >
> > > > Monitor:
> > > > docker run -d --net=host -v /etc/ceph:/etc/ceph -v
> > > > /var/lib/ceph/:/var/lib/ceph/ -e MON_IP=10.0.2.15 -e
> > > > CEPH_PUBLIC_NETWORK=10.0.2.0/24 ceph/daemon mon
> > > >
> > > > Manager:
> > > > docker run -d --net=host -v /etc/ceph:/etc/ceph -v
> > > > /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mgr
> > > >
> > > > OSD:
> > > > docker run -d --net=host --pid=host --privileged=true -v
> > > > /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/ -e
> > > > OSD_DEVICE=/dev/vdd ceph/daemon osd
> > > >
> > > >  From the above commands I am able to spawn monitor and manager
> > > > properly. I verified this by firing this command on both monitor and
> > > > manager containers:
> > > > sudo docker exec d1ab985 ceph -s
> > > >
> > > > I get following outputs for both:
> > > >
> > > >cluster:
> > > >  id: 14a6e40a-8e54-4851-a881-661a84b3441c
> > > >  health: HEALTH_OK
> > > >
> > > >services:
> > > >  mon: 1 daemons, quorum serverceph-VirtualBox (age 62m)
> > > >  mgr: serverceph-VirtualBox(active, since 56m)
> > > >  osd: 0 osds: 0 up, 0 in
> > > >
> > > >data:
> > > >  pools:   0 pools, 0 pgs
> > > >  objects: 0 objects, 0 B
> > > >  usage:   0 B used, 0 B / 0 B avail
> > > >  pgs:
> > > >
> > > > However when I try to bring up OSD using above command, it doesn't
> > > > work. Docker logs show this output:
> > > > 2019-04-18 07:30:06  /opt/ceph-container/bin/entrypoint.sh: static:
> > > > does not generate config
> > > > 2019-04-18 07:30:06  /opt/ceph-container/bin/entrypoint.sh: ERROR- The
> > > > device pointed by OSD_DEVICE (/dev/vdd) doesn't exist !
> > > >
> > > > I am not sure why the doc asks to pass /dev/vdd to OSD_DEVICE env var.
> > > > I know there are five different ways to spawning the OSD, but I am not
> > > > able to figure out which one would be suitable for a simple
> > > > deployment. If you could please let me know how to spawn OSDs using
> > > > Docker, it would help a lot.
> > > >
> > > >
> >
> > Thanks Br, I will try this out today.
> >
> > --
> > Regards,
> > Varun Singh
>
> Hi,
> So following your suggestion I tried following two commands:
> 1. I added --device=/dev/vdd switch without removing OSD_DEVICE env
> var. This resulted in same error before
> docker run -d --net=host --pid=host --privileged=true
> --device=/dev/vdd -v /etc/ceph:/etc/ceph -v
> /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdd
> ceph/daemon osd
>
>
> 2. Then I removed OSD_DEVICE env var and just added --device=/dev/vdd switch
> docker run -d --net=host --pid=host --privileged=true
> --device=/dev/vdd -v /etc/ceph:/etc/ceph -v
> /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/  ceph/daemon osd
>
> OSD_DEVICE related error went away and I think ceph created an OSD
> successfully. But it wasn't able to connect to cluster. Is it because
> I did not give and network related information? I get the following
> error now:
>
> 2019-04-18 08:30:47  /opt/ceph-container/bin/entrypoint.sh: static:
> does not generate config
> 2019-04-18 08:30:47  /opt/ceph-container/bin/entrypoint.sh:
> Bootstrapped OSD(s) found; using OSD directory
> 2019-04-18 08:30:47  /opt/ceph-container/bin/entrypoint.sh: Creating osd
> 2019-04-18 08:30:52.944 7f897ca6d700 -1 auth: unable to find a keyring
> on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
> directory
> 2019-04-18 08:30:52.944 7f897ca6d700 -1 AuthRegistry(0x7f8978063e78)
> no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring,
> disabling cephx
> 2019-04-18 08:30:52.964 7f897ca6d700 -1 auth: unable to find a keyring
> on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
> directory
> 2019-04-18 08:30:52.964 7f897ca6d700 -1 

[ceph-users] Bluestore with so many small files

2019-04-21 Thread 刘 俊
Hi All,

I still see this issue with latest ceph Luminous 12.2.11 and 12.2.12.

I have set bluestore_min_alloc_size = 4096 before the test.

when I write 10 small objects less than 64KB through rgw, the RAW USED 
showed in "ceph df" looks incorrect.

For example, I test three times and clean up the rgw data pool each time, the 
object size for first time is 4KB, for second time is 32KB, for third time is 
64KB.

The RAW USED showed in "ceph df" are the same(18GB),  looks like always equal 
to 64KB*10/1024*3. (replicator is 3 here)

Any thought?

Jamie

___

Hi Behnam,

On 2/12/2018 4:06 PM, Behnam Loghmani wrote:
> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with radosgw.
>
> Actual size of files is something about 32G but it filled 70G of each osd.
>
> what's the reason of this high disk usage?
Most probably the major reason is BlueStore allocation granularity. E.g.
an object of 1K bytes length needs 64K of disk space if default
bluestore_min_alloc_size_hdd  (=64K) is applied.
Additional inconsistency in space reporting might also appear since
BlueStore adds up DB volume space when accounting total store space.
While free space is taken from Block device only. is As a result when
reporting "Used" space always contain that total DB space part ( i.e.
Used = Total(Block+DB) - Free(Block) ). That correlates to other
comments in this thread about RockDB space usage.
There is a pending PR to fix that:
https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
You may look for "Bluestore: inaccurate disk usage statistics problem"
in this mail list for previous discussion as well.

> should I change "bluestore_min_alloc_size_hdd"? and If I change it and
> set it to smaller size, does it impact on performance?
Unfortunately I haven't benchmark "small writes over hdd" cases much
hence don't have exacts answer here. Indeed these 'min_alloc_size'
family of parameters might impact the performance quite significantly.
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani


>
> On Mon, Feb 12, 2018 at 5:06 PM, David Turner  gmail.com
>  gmail.com>> wrote:
>
> Some of your overhead is the Wal and rocksdb that are on the OSDs.
> The Wal is pretty static in size, but rocksdb grows with the amount
> of objects you have. You also have copies of the osdmap on each osd.
> There's just overhead that adds up. The biggest is going to be
> rocksdb with how many objects you have.
>
>
> On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
>  gmail.com 
>  gmail.com>> wrote:
>
> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with
> radosgw.
>
> Actual size of files is something about 32G but it filled 70G of
> each osd.
>
> what's the reason of this high disk usage?
> should I change "bluestore_min_alloc_size_hdd"? and If I change
> it and set it to smaller size, does it impact on performance?
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani
> ___
> ceph-users mailing list
> ceph-users at 
> lists.ceph.com 
>  lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpected IOPS Ceph Benchmark Result

2019-04-21 Thread Christian Balzer


Hello,

firstly, this has been discussed here in many incarnation.
And is likely the reason for the silence, a little research goes a long
way.

For starters, do yourself a favor and monitor your Ceph nodes with atop
or collect/graph everything at a very low resolution (5s at least) to get
an idea of what is how busy.
This will also show you what if you're actually dealing with the correct
devices when choosing SSD or HDD pools as well as caching effects, see
below.

Small IOPS stress the CPU part of things significantly and this is where
I'd expect you to hit limits potentially. 
The fact that 2 parallel tests don't improve this also suggests this.

Network attached storage in general and Ceph in particular will suffer for
single thread IOPS due to latency.
Local attached storage is always going to be significantly faster, you
can't compare these two.

Use consistent settings for your FIOs, i.e. direct=1 and libaio for all.
1K IOPS for a single SAS HDD feels very high, you're likely looking at
caching in the OS (20GB < 32GB) and/or controller.
The size of your test will _also_ fit into the combined caches of the
OSDs, explaining your HDD pool speeds as well, provided that was correctly
set up to begin with.

You're saying "array" multiple times, this is not how Ceph works.
Reads come from the acting OSD, not in a RAID0 fashion from all 3 OSDs
that hold the respective object.

Speaking of objects, with 4k IOPS, you're writing to the same OSDs 1000
times, again no gain from distribution here.

This should get you hopefully on the right track.

Christian

On Sun, 21 Apr 2019 13:55:37 +0700 Muhammad Fakhri Abdillah wrote:

> Hey everyone,
> Currently running a 4 node Proxmox cluster with external Ceph cluster (Ceph
> using CentOS 7). 4 nodes Ceph OSD installed, each node have spesification
> like this:
> - 8 Core intel Xeon processor
> - 32GB RAM
> - 2 x 600GB HDD SAS for CentOS (RAID1 as a System)
> - 9 x 1200GB HDD SAS for Data (RAID0 each, bluestore), with 2 x 480GB SSD
> for block.db & block.wal
> - 3 x 960GB SSD for faster pool (RAID0 each, bluestore without separate
> block.db & block.wal)
> - 10Gb eth network
> 
> So, total we have 36 OSD hdd and 12 OSD ssd.
> 
> And Here is our network topology :
> 
> https://imgur.com/eAHb18I
> 
> 
> On those cluster, i make 4 pool with 3 replication:
> 1. rbd-data (mount at proxmox for store block data on vm. This pool I set
> on hdd OSD)
> 2. rbd-os (mount at proxmox for store block OS on vm for better
> performance. This pool I set on ssd OSD)
> 3. cephfs-data (using same device and ruleset like rbd-data, mount at
> proxmox as a cephfs-data)
> 4. cephfs-metadata
> 
> Here is our crushmap config (to make sure that we already separate ssd disk
> and hdd disk into different pool and ruleset :
> 
> # begin crush map
> .
> ...
> 
> # buckets
> host z1 {
> id -3   # do not change unnecessarily
> id -16 class hdd# do not change unnecessarily
> id -22 class ssd# do not change unnecessarily
> # weight 10.251
> alg straw2
> hash 0  # rjenkins1
> item osd.0 weight 1.139
> item osd.1 weight 1.139
> item osd.2 weight 1.139
> item osd.3 weight 1.139
> item osd.4 weight 1.139
> item osd.5 weight 1.139
> item osd.6 weight 1.139
> item osd.7 weight 1.139
> item osd.8 weight 1.139
> }
> host z2 {
> id -5   # do not change unnecessarily
> id -17 class hdd# do not change unnecessarily
> id -23 class ssd# do not change unnecessarily
> # weight 10.251
> alg straw2
> hash 0  # rjenkins1
> item osd.9 weight 1.139
> item osd.10 weight 1.139
> item osd.11 weight 1.139
> item osd.12 weight 1.139
> item osd.13 weight 1.139
> item osd.14 weight 1.139
> item osd.15 weight 1.139
> item osd.16 weight 1.139
> item osd.17 weight 1.139
> }
> host z3 {
> id -7   # do not change unnecessarily
> id -18 class hdd# do not change unnecessarily
> id -24 class ssd# do not change unnecessarily
> # weight 10.251
> alg straw2
> hash 0  # rjenkins1
> item osd.18 weight 1.139
> item osd.19 weight 1.139
> item osd.20 weight 1.139
> item osd.21 weight 1.139
> item osd.22 weight 1.139
> item osd.23 weight 1.139
> item osd.24 weight 1.139
> item osd.25 weight 1.139
> item osd.26 weight 1.139
> }
> host s1 {
> id -9   # do not change unnecessarily
> id -19 class hdd# do not change unnecessarily
> id -25 class ssd# do not change unnecessarily
> # weight 10.251
> alg straw2
> hash 0  # rjenkins1
> item osd.27 weight 1.139
> item osd.28 weight 

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-21 Thread Brad Hubbard
Glad it worked.

On Mon, Apr 22, 2019 at 11:01 AM Can Zhang  wrote:
>
> Thanks for your detailed response.
>
> I freshly installed a CentOS 7.6 and run install-deps.sh and
> do_cmake.sh this time, and it works this time. Maybe the problem was
> caused by dirty environment.
>
>
> Best,
> Can Zhang
>
>
> On Fri, Apr 19, 2019 at 6:28 PM Brad Hubbard  wrote:
> >
> > OK. So this works for me with master commit
> > bdaac2d619d603f53a16c07f9d7bd47751137c4c on Centos 7.5.1804.
> >
> > I cloned the repo and ran './install-deps.sh' and './do_cmake.sh
> > -DWITH_FIO=ON' then 'make all'.
> >
> > # find ./lib  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> > _ZTIN13PriorityCache8PriCacheE
> > ./lib/libfio_ceph_objectstore.so:018f72d0 V
> > _ZTIN13PriorityCache8PriCacheE
> >
> > # LD_LIBRARY_PATH=./lib ./bin/fio --enghelp=libfio_ceph_objectstore.so
> > conf: Path to a ceph configuration file
> > oi_attr_len : Set OI(aka '_') attribute to specified length
> > snapset_attr_len: Set 'snapset' attribute to specified length
> > _fastinfo_omap_len  : Set '_fastinfo' OMAP attribute to specified length
> > pglog_simulation: Enables PG Log simulation behavior
> > pglog_omap_len  : Set pglog omap entry to specified length
> > pglog_dup_omap_len  : Set duplicate pglog omap entry to specified length
> > single_pool_mode: Enables the mode when all jobs run against
> > the same pool
> > preallocate_files   : Enables/disables file preallocation (touch
> > and resize) on init
> >
> > So my result above matches your result on ubuntu but not on centos. It
> > looks to me like we used to define in libceph-common but currently
> > it's defined in libfio_ceph_objectstore.so. For reasons that are
> > unclear you are seeing the old behaviour. Why this is and why it isn't
> > working as designed is not clear to me but I suspect if you clone the
> > repo again and build from scratch (maybe in a different directory if
> > you wish to keep debugging, see below) you should get a working
> > result. Could you try that as a test?
> >
> > If, on the other hand, you wish to keep debugging your current
> > environment I'd suggest looking at the output of the following command
> > as it may shed further light on the issue.
> >
> > # LD_DEBUG=all LD_LIBRARY_PATH=./lib ./bin/fio
> > --enghelp=libfio_ceph_objectstore.so
> >
> > 'LD_DEBUG=lib' may suffice but that's difficult to judge without
> > knowing what the problem is. I still suspect somehow you have
> > mis-matched libraries and, if that's the case, it's probably not worth
> > pursuing. If you can give me specific steps so I can reproduce this
> > from a freshly cloned tree I'd be happy to look further into it.
> >
> > Good luck.
> >
> > On Thu, Apr 18, 2019 at 7:00 PM Brad Hubbard  wrote:
> > >
> > > Let me try to reproduce this on centos 7.5 with master and I'll let
> > > you know how I go.
> > >
> > > On Thu, Apr 18, 2019 at 3:59 PM Can Zhang  wrote:
> > > >
> > > > Using the commands you provided, I actually find some differences:
> > > >
> > > > On my CentOS VM:
> > > > ```
> > > > # sudo find ./lib*  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> > > > _ZTIN13PriorityCache8PriCacheE
> > > > ./libceph-common.so:0221cc08 V _ZTIN13PriorityCache8PriCacheE
> > > > ./libceph-common.so.0:0221cc08 V _ZTIN13PriorityCache8PriCacheE
> > > > ./libfio_ceph_objectstore.so: U 
> > > > _ZTIN13PriorityCache8PriCacheE
> > > > ```
> > > > ```
> > > > # ldd libfio_ceph_objectstore.so |grep common
> > > > libceph-common.so.0 => /root/ceph/build/lib/libceph-common.so.0
> > > > (0x7fd13f3e7000)
> > > > ```
> > > > On my Ubuntu VM:
> > > > ```
> > > > $ sudo find ./lib*  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> > > > _ZTIN13PriorityCache8PriCacheE
> > > > ./libfio_ceph_objectstore.so:019d13e0 V 
> > > > _ZTIN13PriorityCache8PriCacheE
> > > > ```
> > > > ```
> > > > $ ldd libfio_ceph_objectstore.so |grep common
> > > > libceph-common.so.0 =>
> > > > /home/can/work/ceph/build/lib/libceph-common.so.0 (0x7f024a89e000)
> > > > ```
> > > >
> > > > Notice the "U" and "V" from nm results.
> > > >
> > > >
> > > >
> > > >
> > > > Best,
> > > > Can Zhang
> > > >
> > > > On Thu, Apr 18, 2019 at 9:36 AM Brad Hubbard  
> > > > wrote:
> > > > >
> > > > > Does it define _ZTIN13PriorityCache8PriCacheE ? If it does, and all is
> > > > > as you say, then it should not say that _ZTIN13PriorityCache8PriCacheE
> > > > > is undefined. Does ldd show that it is finding the libraries you think
> > > > > it is? Either it is finding a different version of that library
> > > > > somewhere else or the version you have may not define that symbol.
> > > > >
> > > > > On Thu, Apr 18, 2019 at 11:12 AM Can Zhang  wrote:
> > > > > >
> > > > > > It's already in LD_LIBRARY_PATH, under the same directory of
> > > > > > libfio_ceph_objectstore.so
> > > > > >
> > > > > >
> > > > > > $ ll lib/|grep libceph-common
> > > > > 

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-21 Thread Can Zhang
Thanks for your detailed response.

I freshly installed a CentOS 7.6 and run install-deps.sh and
do_cmake.sh this time, and it works this time. Maybe the problem was
caused by dirty environment.


Best,
Can Zhang


On Fri, Apr 19, 2019 at 6:28 PM Brad Hubbard  wrote:
>
> OK. So this works for me with master commit
> bdaac2d619d603f53a16c07f9d7bd47751137c4c on Centos 7.5.1804.
>
> I cloned the repo and ran './install-deps.sh' and './do_cmake.sh
> -DWITH_FIO=ON' then 'make all'.
>
> # find ./lib  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> _ZTIN13PriorityCache8PriCacheE
> ./lib/libfio_ceph_objectstore.so:018f72d0 V
> _ZTIN13PriorityCache8PriCacheE
>
> # LD_LIBRARY_PATH=./lib ./bin/fio --enghelp=libfio_ceph_objectstore.so
> conf: Path to a ceph configuration file
> oi_attr_len : Set OI(aka '_') attribute to specified length
> snapset_attr_len: Set 'snapset' attribute to specified length
> _fastinfo_omap_len  : Set '_fastinfo' OMAP attribute to specified length
> pglog_simulation: Enables PG Log simulation behavior
> pglog_omap_len  : Set pglog omap entry to specified length
> pglog_dup_omap_len  : Set duplicate pglog omap entry to specified length
> single_pool_mode: Enables the mode when all jobs run against
> the same pool
> preallocate_files   : Enables/disables file preallocation (touch
> and resize) on init
>
> So my result above matches your result on ubuntu but not on centos. It
> looks to me like we used to define in libceph-common but currently
> it's defined in libfio_ceph_objectstore.so. For reasons that are
> unclear you are seeing the old behaviour. Why this is and why it isn't
> working as designed is not clear to me but I suspect if you clone the
> repo again and build from scratch (maybe in a different directory if
> you wish to keep debugging, see below) you should get a working
> result. Could you try that as a test?
>
> If, on the other hand, you wish to keep debugging your current
> environment I'd suggest looking at the output of the following command
> as it may shed further light on the issue.
>
> # LD_DEBUG=all LD_LIBRARY_PATH=./lib ./bin/fio
> --enghelp=libfio_ceph_objectstore.so
>
> 'LD_DEBUG=lib' may suffice but that's difficult to judge without
> knowing what the problem is. I still suspect somehow you have
> mis-matched libraries and, if that's the case, it's probably not worth
> pursuing. If you can give me specific steps so I can reproduce this
> from a freshly cloned tree I'd be happy to look further into it.
>
> Good luck.
>
> On Thu, Apr 18, 2019 at 7:00 PM Brad Hubbard  wrote:
> >
> > Let me try to reproduce this on centos 7.5 with master and I'll let
> > you know how I go.
> >
> > On Thu, Apr 18, 2019 at 3:59 PM Can Zhang  wrote:
> > >
> > > Using the commands you provided, I actually find some differences:
> > >
> > > On my CentOS VM:
> > > ```
> > > # sudo find ./lib*  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> > > _ZTIN13PriorityCache8PriCacheE
> > > ./libceph-common.so:0221cc08 V _ZTIN13PriorityCache8PriCacheE
> > > ./libceph-common.so.0:0221cc08 V _ZTIN13PriorityCache8PriCacheE
> > > ./libfio_ceph_objectstore.so: U 
> > > _ZTIN13PriorityCache8PriCacheE
> > > ```
> > > ```
> > > # ldd libfio_ceph_objectstore.so |grep common
> > > libceph-common.so.0 => /root/ceph/build/lib/libceph-common.so.0
> > > (0x7fd13f3e7000)
> > > ```
> > > On my Ubuntu VM:
> > > ```
> > > $ sudo find ./lib*  -iname '*.so*' | xargs nm -AD 2>&1 | grep
> > > _ZTIN13PriorityCache8PriCacheE
> > > ./libfio_ceph_objectstore.so:019d13e0 V 
> > > _ZTIN13PriorityCache8PriCacheE
> > > ```
> > > ```
> > > $ ldd libfio_ceph_objectstore.so |grep common
> > > libceph-common.so.0 =>
> > > /home/can/work/ceph/build/lib/libceph-common.so.0 (0x7f024a89e000)
> > > ```
> > >
> > > Notice the "U" and "V" from nm results.
> > >
> > >
> > >
> > >
> > > Best,
> > > Can Zhang
> > >
> > > On Thu, Apr 18, 2019 at 9:36 AM Brad Hubbard  wrote:
> > > >
> > > > Does it define _ZTIN13PriorityCache8PriCacheE ? If it does, and all is
> > > > as you say, then it should not say that _ZTIN13PriorityCache8PriCacheE
> > > > is undefined. Does ldd show that it is finding the libraries you think
> > > > it is? Either it is finding a different version of that library
> > > > somewhere else or the version you have may not define that symbol.
> > > >
> > > > On Thu, Apr 18, 2019 at 11:12 AM Can Zhang  wrote:
> > > > >
> > > > > It's already in LD_LIBRARY_PATH, under the same directory of
> > > > > libfio_ceph_objectstore.so
> > > > >
> > > > >
> > > > > $ ll lib/|grep libceph-common
> > > > > lrwxrwxrwx. 1 root root19 Apr 17 11:15 libceph-common.so ->
> > > > > libceph-common.so.0
> > > > > -rwxr-xr-x. 1 root root 211853400 Apr 17 11:15 libceph-common.so.0
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Can Zhang
> > > > >
> > > > > On Thu, Apr 18, 2019 at 7:00 AM Brad 

Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-21 Thread Robin H. Johnson
On Sun, Apr 21, 2019 at 03:11:44PM +0200, Marc Roos wrote:
> Double thanks for the on-topic reply. The other two repsonses, were 
> making me doubt if my chinese (which I didn't study) is better than my 
> english.
They were almost on topic, but not that useful. Please don't imply
language failings on this list. English may be the lingua franca, but it
is by far not the first language for most list members. Not being useful
to you didn't mean they weren't useful overall.

>  >> I am a bit curious on how production ceph clusters are being used. I 
> am 
>  >> reading here that the block storage is used a lot with openstack and 
> 
>  >> proxmox, and via iscsi with vmare. 
>  >Have you looked at the Ceph User Surveys/Census?
>  >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
>  >https://ceph.com/geen-categorie/results-from-the-ceph-census/
> 
> Sort of what I was looking for, so 42% use rgw, of which 74% s3.
> I guess this main archive usage, is mostly done by providers
Not just archive, but also API-driven for web services, usually hidden
behind hostnames/CDNs. Image/video upload sites are a big part of this,
esp. things like Instagram clones in emerging markets.

>  >As the quantity of data by a single user increases, the odds that GUI
>  >tools are used for it decreases, as it's MUCH more likely to be driven
>  >by automation & tooling around the API.
> Hmm, interesting. I am having more soho clients. And was thinking of
> getting them such gui client.
That's great, but orthogonal to the overall issue. Some of the cloud
providers DO offer setup docs for GUI clients as well, off the top of my
head I know Dreamhost & DigitalOcean's ones, because I contributed to
their docs:
https://help.dreamhost.com/hc/en-us/sections/11559232-DreamObjects-clients
https://www.digitalocean.com/docs/spaces/resources/

> I think if you take the perspective of some end user that associates s3,
> with something like an audi and nothing else. It is quite necessary 
> to have a client that is easy and secure to use, where you just enter
>  preferably only two things, your access key and your secret.
There's a bare minimum of three things you'd need in a generic client:
- endpoint(s)
- access key
- secret

The Endpoint could be partially pre-provisioned (think like you'd give
your clients an INI file that pointed them to your private Ceph RGW
deployment). If it's a deployment with multiple regions, endpoints &
region-specifics become more important (e.g. AWS S3 has differing
signature requirements in different regions)

> The advantage of having a more rgw specific gui client, is that you
> - do not have the default amazon 'advertisements' (think of storage 
> classes etc.)
> - less configuration options, everything ceph does not support we do not
>   need to configure. 
> - no ftp, no what ever else, just this s3
> - you do not have configuration options that ceph doesn't offer 
>   (eg. this life cycle, bucket access logging?)
- Storage Classes: supported
- Bucket Lifecycle: supported
- Bucket Access Logging: not quite supported, PR exists, some debate
  about better designs. https://github.com/ceph/ceph/pull/14841

>   I can imagine if you have quite a few clients, you could get quite 
> some questions to answer, about things not working.
> - you have better support for specific things like multi tenant account, 
> etc.
Tenacy in RGW if effectively parallel S3 scopes; with different
endpoints.

> - for once the https urls are correctly advertised
What issue do you have with HTTPS URLs? The main gotcha that most people
hit is that S3's ssl hostname validation rule is NOT the same as the
general SSL hostname validation rule, and trips up browser access.
Specifically in a wildcard SSL cert, '*.myrgwendpoint.com', the general
rule is that '*' should only match one DNS fragment [e.g. no '.'], while
S3's validation says it can match one or more DNS fragments.
The AWS S3 docs are even horrible about this, with the text:
"To work around this, use HTTP or write your own certificate
verification logic."
https://github.com/awsdocs/amazon-s3-developer-guide/blame/f498926b68f4f1b11c7f708ac0fbd52ee2a0aa19/doc_source/BucketRestrictions.md#L35

> Whether one likes it or not ceph is afaik not fully s3 compatible
No, Ceph isn't fully AWS-S3 compatible, and I did specifically include in my
talk at Cephalocon last year that we should explicitly be returning 501
NotImplemented in more cases. AWS-S3 in itself is a moving target, and
some of the operations ARE best offloaded to something other than Ceph.

Even if Ceph/RGW does support a given set of operations, does the
deployment want to consider those operations supported? This thinking
lead to the torrent ops being behind a configuration option in Ceph, and
other ops can be & are blocked by providers in the reverse proxy.

There ARE RGW-specific features that would be valuable to have in more
clients:
- RGW Admin operations [the list of them is much longer than the docs
  suggest]
- 

Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-21 Thread Marc Roos


Double thanks for the on-topic reply. The other two repsonses, were 
making
me doubt if my chinese (which I didn't study) is better than my english.


 >> I am a bit curious on how production ceph clusters are being used. I 
am 
 >> reading here that the block storage is used a lot with openstack and 

 >> proxmox, and via iscsi with vmare. 
 >Have you looked at the Ceph User Surveys/Census?
 >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
 >https://ceph.com/geen-categorie/results-from-the-ceph-census/

Sort of what I was looking for, so 42% use rgw, of which 74% s3.
I guess this main archive usage, is mostly done by providers

 >> But I since nobody here is interested in a better rgw client for end 

 >> users. I am wondering if the rgw is even being used like this, and 
what 
 >> most production environments look like. 
 >Your end-user client thread was specifically asking targeting GUI
 >clients on OSX & Windows. I feel that the GUI client usage of S3
 >protocol has a much higher visibility to data size ratio than
 >automation/tooling usage.
 >
 >As the quantity of data by a single user increases, the odds that GUI
 >tools are used for it decreases, as it's MUCH more likely to be driven
 >by automation & tooling around the API.

Hmm, interesting. I am having more soho clients. And was thinking of
getting them such gui client.

 >My earliest Ceph production deployment was mostly RGW (~16TB raw), 
with
 >a little bit of RBD/iSCSI usage (~1TB of floating disk between VMs).
 >Very little of the RGW usage was GUI driven (there certainly was some,
 >because it made business sense to offer it rather than FTP sites; but 
it
 >tiny compared to the automation flows).
 >
 >My second production deployment I worked was Dreamhost's DreamObjects,
 >which was over 3PB then: and MOST of the usage was still not 
GUI-driven.
 >
 >I'm working at DigitalOcean's Spaces offering now; again, mostly 
non-GUI
 >access.
 >
 >For the second part of your original-query, I feel that any new 
clients
 >SHOULD not be RGW-specific; they should be able to work on a wide 
range
 >of services that expose the S3 API, and have a good test-suite around
 >that (s3-tests, but for testing the client implementation; even Boto 
is
 >not bug-free).
 >

I think if you take the perspective of some end user that associates s3,
with something like an audi and nothing else. It is quite necessary 
to have a client that is easy and secure to use, where you just enter
 preferably only two things, your access key and your secret.

The advantage of having a more rgw specific gui client, is that you
- do not have the default amazon 'advertisements' (think of storage 
classes etc.)
- less configuration options, everything ceph does not support we do not
  need to configure. 
- no ftp, no what ever else, just this s3
- you do not have configuration options that ceph doesn't offer 
  (eg. this life cycle, bucket access logging?)
  I can imagine if you have quite a few clients, you could get quite 
some
  questions to answer, about things not working.
- you have better support for specific things like multi tenant account, 
etc.
- for once the https urls are correctly advertised

Whether one likes it or not ceph is afaik not fully s3 compatible




 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Osd update from 12.2.11 to 12.2.12

2019-04-21 Thread Marc Roos



Just updated luminous, and setting max_scrubs value back. Why do I get 
osd's reporting differently 


I get these:
osd.18: osd_max_scrubs = '1' (not observed, change may require restart) 
osd_objectstore = 'bluestore' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.19: osd_max_scrubs = '1' (not observed, change may require restart) 
osd_objectstore = 'bluestore' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.20: osd_max_scrubs = '1' (not observed, change may require restart) 
osd_objectstore = 'bluestore' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.21: osd_max_scrubs = '1' (not observed, change may require restart) 
osd_objectstore = 'bluestore' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.22: osd_max_scrubs = '1' (not observed, change may require restart) 
osd_objectstore = 'bluestore' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)


And I get osd's reporting like this:
osd.23: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.24: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.25: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.26: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.27: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)
osd.28: osd_max_scrubs = '1' (not observed, change may require restart) 
rocksdb_separate_wal_dir = 'false' (not observed, change may require 
restart)







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unexpected IOPS Ceph Benchmark Result

2019-04-21 Thread Muhammad Fakhri Abdillah
Hey everyone,
Currently running a 4 node Proxmox cluster with external Ceph cluster (Ceph
using CentOS 7). 4 nodes Ceph OSD installed, each node have spesification
like this:
- 8 Core intel Xeon processor
- 32GB RAM
- 2 x 600GB HDD SAS for CentOS (RAID1 as a System)
- 9 x 1200GB HDD SAS for Data (RAID0 each, bluestore), with 2 x 480GB SSD
for block.db & block.wal
- 3 x 960GB SSD for faster pool (RAID0 each, bluestore without separate
block.db & block.wal)
- 10Gb eth network

So, total we have 36 OSD hdd and 12 OSD ssd.

And Here is our network topology :

https://imgur.com/eAHb18I


On those cluster, i make 4 pool with 3 replication:
1. rbd-data (mount at proxmox for store block data on vm. This pool I set
on hdd OSD)
2. rbd-os (mount at proxmox for store block OS on vm for better
performance. This pool I set on ssd OSD)
3. cephfs-data (using same device and ruleset like rbd-data, mount at
proxmox as a cephfs-data)
4. cephfs-metadata

Here is our crushmap config (to make sure that we already separate ssd disk
and hdd disk into different pool and ruleset :

# begin crush map
.
...

# buckets
host z1 {
id -3   # do not change unnecessarily
id -16 class hdd# do not change unnecessarily
id -22 class ssd# do not change unnecessarily
# weight 10.251
alg straw2
hash 0  # rjenkins1
item osd.0 weight 1.139
item osd.1 weight 1.139
item osd.2 weight 1.139
item osd.3 weight 1.139
item osd.4 weight 1.139
item osd.5 weight 1.139
item osd.6 weight 1.139
item osd.7 weight 1.139
item osd.8 weight 1.139
}
host z2 {
id -5   # do not change unnecessarily
id -17 class hdd# do not change unnecessarily
id -23 class ssd# do not change unnecessarily
# weight 10.251
alg straw2
hash 0  # rjenkins1
item osd.9 weight 1.139
item osd.10 weight 1.139
item osd.11 weight 1.139
item osd.12 weight 1.139
item osd.13 weight 1.139
item osd.14 weight 1.139
item osd.15 weight 1.139
item osd.16 weight 1.139
item osd.17 weight 1.139
}
host z3 {
id -7   # do not change unnecessarily
id -18 class hdd# do not change unnecessarily
id -24 class ssd# do not change unnecessarily
# weight 10.251
alg straw2
hash 0  # rjenkins1
item osd.18 weight 1.139
item osd.19 weight 1.139
item osd.20 weight 1.139
item osd.21 weight 1.139
item osd.22 weight 1.139
item osd.23 weight 1.139
item osd.24 weight 1.139
item osd.25 weight 1.139
item osd.26 weight 1.139
}
host s1 {
id -9   # do not change unnecessarily
id -19 class hdd# do not change unnecessarily
id -25 class ssd# do not change unnecessarily
# weight 10.251
alg straw2
hash 0  # rjenkins1
item osd.27 weight 1.139
item osd.28 weight 1.139
item osd.29 weight 1.139
item osd.30 weight 1.139
item osd.31 weight 1.139
item osd.32 weight 1.139
item osd.33 weight 1.139
item osd.34 weight 1.139
item osd.35 weight 1.139
}
root sas {
id -1   # do not change unnecessarily
id -21 class hdd# do not change unnecessarily
id -26 class ssd# do not change unnecessarily
# weight 51.496
alg straw2
hash 0  # rjenkins1
item z1 weight 12.874
item z2 weight 12.874
item z3 weight 12.874
item s1 weight 12.874
}
host z1-ssd {
id -101 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
id -11 class ssd# do not change unnecessarily
# weight 2.619
alg straw2
hash 0  # rjenkins1
item osd.36 weight 0.873
item osd.37 weight 0.873
item osd.38 weight 0.873
}
host z2-ssd {
id -104 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -12 class ssd# do not change unnecessarily
# weight 2.619
alg straw2
hash 0  # rjenkins1
item osd.39 weight 0.873
item osd.40 weight 0.873
item osd.41 weight 0.873
}
host z3-ssd {
id -107 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
id -13 class ssd# do not change unnecessarily
# weight 2.619
alg straw2
hash 0  # rjenkins1
item osd.42 weight 0.873
item osd.43 weight 0.873
item osd.44 weight 0.873
}
host s1-ssd {
id -110 # do not change unnecessarily
id -8 class hdd