[ceph-users] rwg/civetweb log verbosity level

2018-11-27 Thread zyn赵亚楠
Hi there,

I have a question about rgw/civetweb log settings.

Currently, rgw/civetweb prints 3 lines of logs with loglevel 1 (high priority) 
for each HTTP request, like following:

$ tail /var/log/ceph/ceph-client.rgw.node-1.log
2018-11-28 11:52:45.339229 7fbf2d693700  1 == starting new request 
req=0x7fbf2d68d190 =
2018-11-28 11:52:45.341961 7fbf2d693700  1 == req done req=0x7fbf2d68d190 
op status=0 http_status=200 ==
2018-11-28 11:52:45.341993 7fbf2d693700  1 civetweb: 0x558f0433: 127.0.0.1 
- - [28/Nov/2018:11:48:10 +0800] "HEAD 
/swift/v1/images.xxx.com/8801234/BFAB307D-F5FE-4BC6-9449-E854944A460F_160_180.jpg
 HTTP/1.1" 1 0 - goswift/1.0

The above 3 lines occupies roughly 0.5KB space on average, varying a little 
with the lengths of bucket names and object names.

Now the problem is, when requests are intensive, it will consume a huge mount 
of space. For example, 4 million requests (on a single RGW node) will result to 
2GB, which takes only ~6 hours to happen in our cluster node in busy period (a 
large part may be HEAD requests).

When trouble shooting, I usually need to turn the loglevel to 5, 10 or even 
bigger to check the detailed logs, but most of the log space is occupied by the 
above access logs (level 1), which doesn’t provide much information.

My question is, is there a way to configure Ceph skip those logs? E.g. only 
print logs with verbosity in a specified range (NOT support, according to my 
investigation).
Or, are there any suggested ways for turning on more logs for debugging?

Best Regards
Arthur Chiao

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD wont start after moving to a new node with ceph 12.2.10

2018-11-27 Thread Paul Emmerich
This is *probably* unrelated to the upgrade as it's complaining at a
very early stage about data corruption.
(Earlier than the bug that would trigger related to the 12.2.9 issues)
So this might just be a coincidence with a bad disk.

That being said: you are running a 12.2.9 OSD and you probably should
not upgrade to 12.2.10 especially while a backfill is running.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Di., 27. Nov. 2018 um 23:04 Uhr schrieb Cassiano Pilipavicius
:
>
> Hi, I am facing a problem where a OSD wont start after moving to a new
> node with 12.2.10 (the old one has 12.2.8)
>
> I have one node of my cluster failed and trued to move 3 osds to a new
> node. 2 of the 3 osds has started and is running fine at the moment
> (backfiling is still in place.) but one of the osds just dont start with
> the following error on the logs (writing mostly to try to find if this
> is a bug or if have I done something wrong):
>
> 2018-11-27 19:44:38.013454 7fba0d35fd80 -1
> bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0xb1a184d1, expected 0xb682fc52, device
> location [0x1~1000], logical extent 0x0~1000, object
> #-1:7b3f43c4:::osd_superblock:0#
> 2018-11-27 19:44:38.013501 7fba0d35fd80 -1 osd.1 0 OSD::init() : unable
> to read osd superblock
> 2018-11-27 19:44:38.013511 7fba0d35fd80  1
> bluestore(/var/lib/ceph/osd/ceph-1) umount
> 2018-11-27 19:44:38.065478 7fba0d35fd80  1 stupidalloc 0x0x55ebb04c3f80
> shutdown
> 2018-11-27 19:44:38.077261 7fba0d35fd80  1 freelist shutdown
> 2018-11-27 19:44:38.077316 7fba0d35fd80  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:217]
> Shutdown: canceling all background work
> 2018-11-27 19:44:38.077982 7fba0d35fd80  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:343]
> Shutdown complete
> 2018-11-27 19:44:38.107923 7fba0d35fd80  1 bluefs umount
> 2018-11-27 19:44:38.108248 7fba0d35fd80  1 stupidalloc 0x0x55ebb01cddc0
> shutdown
> 2018-11-27 19:44:38.108302 7fba0d35fd80  1 bdev(0x55ebb01cf800
> /var/lib/ceph/osd/ceph-1/block) close
> 2018-11-27 19:44:38.362984 7fba0d35fd80  1 bdev(0x55ebb01cf600
> /var/lib/ceph/osd/ceph-1/block) close
> 2018-11-27 19:44:38.470791 7fba0d35fd80 -1  ** ERROR: osd init failed:
> (22) Invalid argument
>
> My cluster has too many mixed versions, I havent realized that the
> versions is changed when running a yum update and righ now I have the
> following situation:ceph versions
> {
>  "mon": {
>  "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
> luminous (stable)": 1,
>  "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 2
>  },
>  "mgr": {
>  "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
> luminous (stable)": 1
>  },
>  "osd": {
>  "ceph version 12.2.10
> (177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2,
>  "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
> luminous (stable)": 18,
>  "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 27,
>  "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> luminous (stable)": 1
>  },
>  "mds": {},
>  "overall": {
>  "ceph version 12.2.10
> (177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2,
>  "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
> luminous (stable)": 20,
>  "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 29,
>  "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> luminous (stable)": 1
>  }
> }
>
> Is there an easy way to get the OSD working again? I am thinking about
> waiting the backfill/recovery to finish and them upgrade all nodes to
> 12.2.10 and if the OSD dont come up, recreating the OSD.
>
> Regards,
> Cassiano Pilipavicius.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD wont start after moving to a new node with ceph 12.2.10

2018-11-27 Thread Cassiano Pilipavicius
Hi, I am facing a problem where a OSD wont start after moving to a new 
node with 12.2.10 (the old one has 12.2.8)


I have one node of my cluster failed and trued to move 3 osds to a new 
node. 2 of the 3 osds has started and is running fine at the moment 
(backfiling is still in place.) but one of the osds just dont start with 
the following error on the logs (writing mostly to try to find if this 
is a bug or if have I done something wrong):


2018-11-27 19:44:38.013454 7fba0d35fd80 -1 
bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0xb1a184d1, expected 0xb682fc52, device 
location [0x1~1000], logical extent 0x0~1000, object 
#-1:7b3f43c4:::osd_superblock:0#
2018-11-27 19:44:38.013501 7fba0d35fd80 -1 osd.1 0 OSD::init() : unable 
to read osd superblock
2018-11-27 19:44:38.013511 7fba0d35fd80  1 
bluestore(/var/lib/ceph/osd/ceph-1) umount
2018-11-27 19:44:38.065478 7fba0d35fd80  1 stupidalloc 0x0x55ebb04c3f80 
shutdown

2018-11-27 19:44:38.077261 7fba0d35fd80  1 freelist shutdown
2018-11-27 19:44:38.077316 7fba0d35fd80  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:217] 
Shutdown: canceling all background work
2018-11-27 19:44:38.077982 7fba0d35fd80  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:343] 
Shutdown complete

2018-11-27 19:44:38.107923 7fba0d35fd80  1 bluefs umount
2018-11-27 19:44:38.108248 7fba0d35fd80  1 stupidalloc 0x0x55ebb01cddc0 
shutdown
2018-11-27 19:44:38.108302 7fba0d35fd80  1 bdev(0x55ebb01cf800 
/var/lib/ceph/osd/ceph-1/block) close
2018-11-27 19:44:38.362984 7fba0d35fd80  1 bdev(0x55ebb01cf600 
/var/lib/ceph/osd/ceph-1/block) close
2018-11-27 19:44:38.470791 7fba0d35fd80 -1  ** ERROR: osd init failed: 
(22) Invalid argument


My cluster has too many mixed versions, I havent realized that the 
versions is changed when running a yum update and righ now I have the 
following situation:ceph versions

{
    "mon": {
    "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) 
luminous (stable)": 1,
    "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
luminous (stable)": 2

    },
    "mgr": {
    "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) 
luminous (stable)": 1

    },
    "osd": {
    "ceph version 12.2.10 
(177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2,
    "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) 
luminous (stable)": 18,
    "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
luminous (stable)": 27,
    "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) 
luminous (stable)": 1

    },
    "mds": {},
    "overall": {
    "ceph version 12.2.10 
(177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2,
    "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) 
luminous (stable)": 20,
    "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
luminous (stable)": 29,
    "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) 
luminous (stable)": 1

    }
}

Is there an easy way to get the OSD working again? I am thinking about 
waiting the backfill/recovery to finish and them upgrade all nodes to 
12.2.10 and if the OSD dont come up, recreating the OSD.


Regards,
Cassiano Pilipavicius.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Josh Durgin

On 11/27/18 12:11 PM, Josh Durgin wrote:

13.2.3 will have a similar revert, so if you are running anything other
than 12.2.9 or 13.2.2 you can go directly to 13.2.3.


Correction: I misremembered here, we're not reverting these patches for
13.2.3, so 12.2.9 users can upgrade to 13.2.2 or later, but other
luminous users should avoid 13.2.2 or later for the time being, unless
they can accept some downtime during the upgrade.

See http://tracker.ceph.com/issues/36686#note-6 for more detail.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Josh Durgin

On 11/27/18 12:00 PM, Robert Sander wrote:

Am 27.11.18 um 15:50 schrieb Abhishek Lekshmanan:


   As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
   upgrade to v12.2.10 until the linked tracker issue has been fixed.


What about clusters currently running 12.2.9 (because this was the
version in the repos when they got installed / last upgraded) where new
nodes are scheduled to setup?
Can the new nodes be installed with 12.2.10 and run with the other
12.2.9 nodes?
Should the new nodes be pinned to 12.2.9?


To be safe, pin them to 12.2.9 until we have a safe upgrade path in a
future luminous release. Alternately you can restart them all at once as
12.2.10 if you don't mind a short loss of availability.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Josh Durgin

On 11/27/18 9:40 AM, Graham Allan wrote:



On 11/27/2018 08:50 AM, Abhishek Lekshmanan wrote:


We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.


I wonder if you can comment on upgrade policy for a mixed cluster - eg 
where the majority is running 12.2.8 but a handful of newly-added osd 
nodes were installed with 12.2.9. Should the 12.2.8 nodes be upgraded to 
12.2.10 (this does sound like it should have no negative effects) and 
just the 12.2.9 nodes kept to wait for a future release - or wait on all?


I'd suggest upgrading everything to 12.2.10. If you aren't hitting
crashes already with this mixed 12.2.9 + 12.2.8 cluster, a further
upgrade shouldn't cause any issues.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Josh Durgin

On 11/27/18 8:26 AM, Simon Ironside wrote:

On 27/11/2018 14:50, Abhishek Lekshmanan wrote:


We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.


Is it safe for v12.2.9 users upgrade to v13.2.2 Mimic?

http://tracker.ceph.com/issues/36686 suggests a similar revert might be 
on the cards for v13.2.3 so I'm not sure.


Yes, 13.2.2 has the same pg hard limit code as 12.2.9, so that upgrade
is safe. The only danger is running a mixed-version cluster where some
of the osds have the pg hard limit code, and others do not.

13.2.3 will have a similar revert, so if you are running anything other
than 12.2.9 or 13.2.2 you can go directly to 13.2.3.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Robert Sander
Am 27.11.18 um 15:50 schrieb Abhishek Lekshmanan:

>   As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
>   upgrade to v12.2.10 until the linked tracker issue has been fixed.

What about clusters currently running 12.2.9 (because this was the
version in the repos when they got installed / last upgraded) where new
nodes are scheduled to setup?
Can the new nodes be installed with 12.2.10 and run with the other
12.2.9 nodes?
Should the new nodes be pinned to 12.2.9?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-11-27 Thread Maxime Guyot
Hi,

I'm running into an issue with the RadosGW Swift API when the S3 bucket
versioning is enabled. It looks like it silently drops any metadata sent
with the "X-Object-Meta-foo" header (see example below).
This is observed on a Luminous 12.2.8 cluster. Is that a normal thing? Am I
misconfiguring something here?


With S3 bucket versioning OFF:
$ openstack object set --property foo=bar test test.dat
$ os object show test test.dat
++--+
| Field  | Value|
++--+
| account| v1   |
| container  | test |
| content-length | 507904   |
| content-type   | binary/octet-stream  |
| etag   | 03e8a398f343ade4e1e1d7c81a66e400 |
| last-modified  | Tue, 27 Nov 2018 13:53:54 GMT|
| object | test.dat |
| properties | Foo='bar'|  <= Metadata is here
++--+

With S3 bucket versioning ON:
$ openstack object set --property foo=bar test test2.dat
$ openstack object show test test2.dat
++--+
| Field  | Value|
++--+
| account| v1   |
| container  | test |
| content-length | 507904   |
| content-type   | binary/octet-stream  |
| etag   | 03e8a398f343ade4e1e1d7c81a66e400 |
| last-modified  | Tue, 27 Nov 2018 13:56:50 GMT|
| object | test2.dat| <= Metadata is absent
++--+

Cheers,

/ Maxime
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Paul Emmerich
And this exact problem was one of the reasons why we migrated
everything to PXE boot where the OS runs from RAM.
That kind of failure is just the worst to debug...
Also, 1 GB of RAM is cheaper than a separate OS disk.

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Di., 27. Nov. 2018 um 19:22 Uhr schrieb Cody :
>
> Hi everyone,
>
> Many, many thanks to all of you!
>
> The root cause was due to a failed OS drive on one storage node. The
> server was responsive to ping, but unable to login. After a reboot via
> IPMI, docker daemon failed to start due to I/O errors and dmesg
> complained about the failing OS disk. I failed to catch the problem
> initially since  'ceph -s' kept showing HEALTH and the cluster was
> "functional" despite of slow performance.
>
> I really appreciate all the tips and advices received from you all and
> learned a lot. I will carry your advices (e.g. using bluestore,
> enterprise ssd/hdd, separating public and cluster traffics, etc) into
> my next round PoC.
>
> Thank you very much!
>
> Best regards,
> Cody
>
> On Tue, Nov 27, 2018 at 6:31 AM Vitaliy Filippov  wrote:
> >
> > > CPU: 2 x E5-2603 @1.8GHz
> > > RAM: 16GB
> > > Network: 1G port shared for Ceph public and cluster traffics
> > > Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> > > OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
> >
> > 0.84 MB/s sequential write is impossibly bad, it's not normal with any
> > kind of devices and even with 1G network, you probably have some kind of
> > problem in your setup - maybe the network RTT is very high or maybe osd or
> > mon nodes are shared with other running tasks and overloaded or maybe your
> > disks are already dead... :))
> >
> > > As I moved on to test block devices, I got a following error message:
> > >
> > > # rbd map image01 --pool testbench --name client.admin
> >
> > You don't need to map it to run benchmarks, use `fio --ioengine=rbd`
> > (however you'll still need /etc/ceph/ceph.client.admin.keyring)
> >
> > --
> > With best regards,
> >Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW performance with lots of objects

2018-11-27 Thread Mark Nelson

Hi Robert,


Solved is probably a strong word.  I'd say that things have improved.  
Bluestore in general tends to handle large numbers of objects better 
than filestore does for several reasons including that it doesn't suffer 
from pg directory splitting (though RocksDB compaction can become a 
bottleneck with very large DBs and heavy metadata traffic)  Bluestore 
also has less overhead for OMAP operations and so far we've generally 
seen higher OMAP performance (ie how bucket indexes are currently 
stored).  The bucket index sharding of course helps too.  One counter 
argument is that bluestore uses the KeyvalueDB a lot more aggressively 
than filestore does and that could have an impact on bucket indexes 
hosted on the same OSDs as user objects.  This gets sort of complicated 
though and may primarily be an issue if all of your OSDs are backed by 
NVMe and sustaining very high write traffic. Ultimately I suspect that 
if you ran the same 500+m object single-bucket test, that a modern 
bluestore deployment would probably be faster than what you saw 
pre-luminous with filestore. Whether or not it's acceptable is a 
different question.  For example I've noticed in past tests that delete 
performance improved dramatically when objects were spread across a 
higher number of buckets.  Probably the best course of action will be to 
run tests and diagnose the behavior to see if it's going to meet your needs.



Thanks,

Mark


On 11/27/18 12:10 PM, Robert Stanford wrote:


In the old days when I first installed Ceph with RGW the performance 
would be very slow after storing 500+ million objects in my buckets. 
With Luminous and index sharding is this still a problem or is this an 
old problem that has been solved?


Regards
R

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Cody
Hi everyone,

Many, many thanks to all of you!

The root cause was due to a failed OS drive on one storage node. The
server was responsive to ping, but unable to login. After a reboot via
IPMI, docker daemon failed to start due to I/O errors and dmesg
complained about the failing OS disk. I failed to catch the problem
initially since  'ceph -s' kept showing HEALTH and the cluster was
"functional" despite of slow performance.

I really appreciate all the tips and advices received from you all and
learned a lot. I will carry your advices (e.g. using bluestore,
enterprise ssd/hdd, separating public and cluster traffics, etc) into
my next round PoC.

Thank you very much!

Best regards,
Cody

On Tue, Nov 27, 2018 at 6:31 AM Vitaliy Filippov  wrote:
>
> > CPU: 2 x E5-2603 @1.8GHz
> > RAM: 16GB
> > Network: 1G port shared for Ceph public and cluster traffics
> > Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> > OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
>
> 0.84 MB/s sequential write is impossibly bad, it's not normal with any
> kind of devices and even with 1G network, you probably have some kind of
> problem in your setup - maybe the network RTT is very high or maybe osd or
> mon nodes are shared with other running tasks and overloaded or maybe your
> disks are already dead... :))
>
> > As I moved on to test block devices, I got a following error message:
> >
> > # rbd map image01 --pool testbench --name client.admin
>
> You don't need to map it to run benchmarks, use `fio --ioengine=rbd`
> (however you'll still need /etc/ceph/ceph.client.admin.keyring)
>
> --
> With best regards,
>Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW performance with lots of objects

2018-11-27 Thread Robert Stanford
In the old days when I first installed Ceph with RGW the performance would
be very slow after storing 500+ million objects in my buckets. With
Luminous and index sharding is this still a problem or is this an old
problem that has been solved?

Regards
R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Graham Allan




On 11/27/2018 08:50 AM, Abhishek Lekshmanan wrote:


We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.


I wonder if you can comment on upgrade policy for a mixed cluster - eg 
where the majority is running 12.2.8 but a handful of newly-added osd 
nodes were installed with 12.2.9. Should the 12.2.8 nodes be upgraded to 
12.2.10 (this does sound like it should have no negative effects) and 
just the 12.2.9 nodes kept to wait for a future release - or wait on all?


Thanks, Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph IO stability issues

2018-11-27 Thread Jean-Philippe Méthot
Hi,

We’re currently progressively pushing into production a CEPH Mimic cluster and 
we’ve noticed a fairly strange behaviour. We use Ceph as a storage backend for 
Openstack block device. Now, we’ve deployed a few VMs on this backend to test 
the waters. These VMs are practically empty, with only the regular cpanel 
services running on them and no actual website set. We notice that about twice 
in a span of about 5 minutes, the iowait will jump to ~10% without any VM-side 
explanation, no specific service taking any more io bandwidth than usual. 

I must also add that the speed of the cluster is excellent. It’s really more of 
a stability issue that bothers me here. I see the jump in iowait as the VM 
being unable to read or write on the ceph cluster for a second or so. I've 
considered that it could be the deep scrub operations, but those seem to 
complete in 0.1 second, as there’s practically no data to scrub.

The cluster pool configuration is as such:
-RBD on erasure-coded pool (a replicated metadata pool and an erasure coded 
data pool) with overwrites enabled
-The data pool size is k=6 m=2, so 8, with 1024 PGs
-The metadata pool size is 3, with 64 PGs


Of course, this is running on bluestore.
As for the hardware, the config is as follow:
-10 hosts
-9 OSD per host
-Each OSD is a Intel DC S3510
-CPUs are dual E5-2680v2 (40 threads total @2.8GHz)
-Each host has 128 GB of ram
-Network is 2x bonded 10gbps, 1 for storage, 1 for replication

I understand that I will eventually hit a speed block because of either the 
CPUs or the network, but maximum speed is not my current concern here and can 
be upgraded when needed. I’ve been wondering, could these hiccups be caused by 
data caching at the client level? If so, what could I do to fix this?

Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Simon Ironside

On 27/11/2018 14:50, Abhishek Lekshmanan wrote:


We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.


Is it safe for v12.2.9 users upgrade to v13.2.2 Mimic?

http://tracker.ceph.com/issues/36686 suggests a similar revert might be 
on the cards for v13.2.3 so I'm not sure.


Thanks,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Abhishek Lekshmanan

We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.

Notable Changes
---

* This release reverts the PG hard-limit patches added in v12.2.9 in which,
  a partial upgrade during a recovery/backfill, can cause the osds on the
  previous version, to fail with assert(trim_to <= info.last_complete). The
  workaround for users is to upgrade and restart all OSDs to a version with the
  pg hard limit, or only upgrade when all PGs are active+clean.

  See also: http://tracker.ceph.com/issues/36686

  As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
  upgrade to v12.2.10 until the linked tracker issue has been fixed.

* The bluestore_cache_* options are no longer needed. They are replaced
  by osd_memory_target, defaulting to 4GB. BlueStore will expand
  and contract its cache to attempt to stay within this
  limit. Users upgrading should note this is a higher default
  than the previous bluestore_cache_size of 1GB, so OSDs using
  BlueStore will use more memory by default.

  For more details, see BlueStore docs[1]


For the complete release notes with changelog, please check out the
release blog entry at:
http://ceph.com/releases/v12-2-10-luminous-released

Getting ceph:

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-12.2.10.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 177915764b752804194937482a39e95e0ca3de94


[1]: 
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#cache-size

--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH DR RBD Mount

2018-11-27 Thread Vikas Rana
Hi There,

We are replicating a 100TB RBD image to DR site. Replication works fine.

rbd --cluster cephdr mirror pool status nfs --verbose

health: OK

images: 1 total

1 replaying



dir_research:

  global_id:   11e9cbb9-ce83-4e5e-a7fb-472af866ca2d

  state:   up+replaying

  description: replaying, master_position=[object_number=591701, tag_tid=1,
entry_tid=902879873], mirror_position=[object_number=446354, tag_tid=1,
entry_tid=727653146], entries_behind_master=175226727

  last_update: 2018-11-14 16:17:23




We then, use nbd to map the RBD image at the DR site but when we try to
mount it, we get


# mount /dev/nbd2 /mnt

mount: block device /dev/nbd2 is write-protected, mounting read-only

*mount: /dev/nbd2: can't read superblock*



We are using 12.2.8.


Any help will be greatly appreciated.


Thanks,

-Vikas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Vitaliy Filippov

CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Network: 1G port shared for Ceph public and cluster traffics
Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)


0.84 MB/s sequential write is impossibly bad, it's not normal with any  
kind of devices and even with 1G network, you probably have some kind of  
problem in your setup - maybe the network RTT is very high or maybe osd or  
mon nodes are shared with other running tasks and overloaded or maybe your  
disks are already dead... :))



As I moved on to test block devices, I got a following error message:

# rbd map image01 --pool testbench --name client.admin


You don't need to map it to run benchmarks, use `fio --ioengine=rbd`  
(however you'll still need /etc/ceph/ceph.client.admin.keyring)


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Libvirt snapshot rollback still has 'new' data

2018-11-27 Thread Marc Roos


I just rolled back a snapshot, and when I started the (windows) vm, I 
noticed still a software update I installed after this snapshot. 

What am I doing wrong that libvirt is not reading the rolled back 
snapshot (,but uses something from cache)?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Darius Kasparavičius
Hi,


Most likely the issue is with your consumer grade journal ssd. Run
this to your ssd to check if it performs: fio --filename=
--direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1
--runtime=60 --time_based --group_reporting --name=journal-test
On Tue, Nov 27, 2018 at 2:06 AM Cody  wrote:
>
> Hello,
>
> I have a Ceph cluster deployed together with OpenStack using TripleO.
> While the Ceph cluster shows a healthy status, its performance is
> painfully slow. After eliminating a possibility of network issues, I
> have zeroed in on the Ceph cluster itself, but have no experience in
> further debugging and tunning.
>
> The Ceph OSD part of the cluster uses 3 identical servers with the
> following specifications:
>
> CPU: 2 x E5-2603 @1.8GHz
> RAM: 16GB
> Network: 1G port shared for Ceph public and cluster traffics
> Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
>
> This is not beefy enough in any way, but I am running for PoC only,
> with minimum utilization.
>
> Ceph-mon and ceph-mgr daemons are hosted on the OpenStack Controller
> nodes. Ceph-ansible version is 3.1 and is using Filestore with
> non-colocated scenario (1 SSD for every 2 OSDs). Connection speed
> among Controllers, Computes, and OSD nodes can reach ~900Mbps tested
> using iperf.
>
> I followed the Red Hat Ceph 3 benchmarking procedure [1] and received
> following results:
>
> Write Test:
>
> Total time run: 80.313004
> Total writes made:  17
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 0.846687
> Stddev Bandwidth:   0.320051
> Max bandwidth (MB/sec): 2
> Min bandwidth (MB/sec): 0
> Average IOPS:   0
> Stddev IOPS:0
> Max IOPS:   0
> Min IOPS:   0
> Average Latency(s): 66.6582
> Stddev Latency(s):  15.5529
> Max latency(s): 80.3122
> Min latency(s): 29.7059
>
> Sequencial Read Test:
>
> Total time run:   25.951049
> Total reads made: 17
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   2.62032
> Average IOPS: 0
> Stddev IOPS:  0
> Max IOPS: 1
> Min IOPS: 0
> Average Latency(s):   24.4129
> Max latency(s):   25.9492
> Min latency(s):   0.117732
>
> Random Read Test:
>
> Total time run:   66.355433
> Total reads made: 46
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   2.77295
> Average IOPS: 0
> Stddev IOPS:  3
> Max IOPS: 27
> Min IOPS: 0
> Average Latency(s):   21.4531
> Max latency(s):   66.1885
> Min latency(s):   0.0395266
>
> Apparently, the results are pathetic...
>
> As I moved on to test block devices, I got a following error message:
>
> # rbd map image01 --pool testbench --name client.admin
> rbd: failed to add secret 'client.admin' to kernel
>
> Any suggestions on the above error and/or debugging would be greatly
> appreciated!
>
> Thank you very much to all.
>
> Cody
>
> [1] 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#benchmarking_performance
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pre-split causing slow requests when rebuild osd ?

2018-11-27 Thread Paul Emmerich
If you are re-creating or adding the OSD anywyas: consider using
Bluestore for the new ones, it performs *so much* better. Especially
in scenarios like these.
Running a mixed configuration is no problem in our experience.

Paul


--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Di., 27. Nov. 2018 um 08:50 Uhr schrieb hnuzhoulin2 :
>
> Hi,guys
>
>
> I have a 42 nodes cluster,and I create the pool using expected_num_objects to 
> pre-split filestore dirs.
>
> today I rebuild a osd because a disk error,it cause much slow 
> request,filestore logs like below
>
> 2018-11-26 16:49:41.003336 7f2dad075700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) create_collection 
> /home/ceph/var/lib/osd/ceph-4/current/388.433_head = 0
> 2018-11-26 16:49:41.003479 7f2dad075700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) create_collection 
> /home/ceph/var/lib/osd/ceph-4/current/388.433_TEMP = 0
> 2018-11-26 16:49:41.003570 7f2dad075700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) _set_replay_guard 33.0.0
> 2018-11-26 16:49:41.003591 7f2dad876700  5 
> filestore(/home/ceph/var/lib/osd/ceph-4) _journaled_ahead 0x55e054382300 seq 
> 81 osr(388.2bd 0x55e053ed9280) [Transaction(0x55e06d3046
> 80)]
> 2018-11-26 16:49:41.003603 7f2dad876700  5 
> filestore(/home/ceph/var/lib/osd/ceph-4) queue_op 0x55e054382300 seq 81 
> osr(388.2bd 0x55e053ed9280) 1079089 bytes   (queue has 50 ops
>  and 15513428 bytes)
> 2018-11-26 16:49:41.003608 7f2dad876700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4)  queueing ondisk 0x55e06cc83f80
> 2018-11-26 16:49:41.024714 7f2d9d055700  5 
> filestore(/home/ceph/var/lib/osd/ceph-4) queue_transactions existing 
> 0x55e053a5d1e0 osr(388.f2a 0x55e053ed92e0)
> 2018-11-26 16:49:41.166512 7f2dac874700 10 filestore oid: 
> #388:c940head# not skipping op, *spos 32.0.1
> 2018-11-26 16:49:41.166522 7f2dac874700 10 filestore  > header.spos 0.0.0
> 2018-11-26 16:49:41.170670 7f2dac874700 10 filestore oid: 
> #388:c940head# not skipping op, *spos 32.0.2
> 2018-11-26 16:49:41.170680 7f2dac874700 10 filestore  > header.spos 0.0.0
> 2018-11-26 16:49:41.183259 7f2dac874700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) _do_op 0x55e05ddb3480 seq 32 r = 0, 
> finisher 0x55e051d122e0 0
> 2018-11-26 16:49:41.187211 7f2dac874700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) _finish_op 0x55e05ddb3480 seq 32 
> osr(388.293 0x55e053ed84b0)/0x55e053ed84b0 lat 47.804533
> 2018-11-26 16:49:41.187232 7f2dac874700  5 
> filestore(/home/ceph/var/lib/osd/ceph-4) _do_op 0x55e052113e60 seq 34 
> osr(388.2d94 0x55e053ed91c0)/0x55e053ed91c0 start
> 2018-11-26 16:49:41.187236 7f2dac874700 10 
> filestore(/home/ceph/var/lib/osd/ceph-4) _do_transaction on 0x55e05e022140
> 2018-11-26 16:49:41.187239 7f2da4864700  5 
> filestore(/home/ceph/var/lib/osd/ceph-4) queue_transactions (writeahead) 82 
> [Transaction(0x55e0559e6d80)]
>
> looks like it is very slow when create pg dir like: 
> /home/ceph/var/lib/osd/ceph-4/current/388.433
>
> but at the start of service,the status of osd is not up,it works well. no 
> slow request,and pg dir is creating.
> but when the osd state is up,slow request is coming and  pg dir is creating.
>
> when I disable the config filestore merge threshold = -10 in the ceoh.conf.
> the rebuild process works well,pg dirs are  created  very fast.then I see dir 
> split in log
>
> 2018-11-26 19:16:56.406276 7f768b189700  1 _created [8,F,8] has 593 objects, 
> starting split.
> 2018-11-26 19:16:56.977392 7f768b189700  1 _created [8,F,8] split completed.
> 2018-11-26 19:16:57.032567 7f768b189700  1 _created [8,F,8,6] has 594 
> objects, starting split.
> 2018-11-26 19:16:57.814694 7f768b189700  1 _created [8,F,8,6] split completed.
>
>
> so,how can I set to let all pg dirs created before the osd state is up?or 
> other solution.
>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com