from:"Travis Nielsen"

[ceph-users] Rook Deployments

2018-07-12 Thread Travis Nielsen

Any Rook users out there running Ceph in Kubernetes? We would love to hear 
about your experiences. Rook is currently hosted by the CNCF in the sandbox 
 stage and we are proposing 
that Rook graduate to the incubating stage. Part of graduating is growing the 
user base and showing a number of users running Rook in production scenarios. 
Your support would be appreciated.

If you haven’t heard of Rook before, it’s a great time to get started. In the 
next few days we will be releasing v0.8 and are going through a final test 
pass.This would be a great time to help us find if there are any issues. And 
after the 0.8 release will also be a great time to start your Rook clusters and 
see what it will take to run your production workloads. If you have any 
questions, the Rook Slack  is a great 
place to get them answered.

Thanks!
Travis
https://rook.io 
https://rook.io/docs/rook/master/ 
https://github.com/rook/rook 



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-29 Thread Travis Nielsen

Since bluestore was declared stable in Luminous, is there any remaining
scenario to use filestore in new deployments? Or is it safe to assume that
bluestore is always better to use in Luminous? All documentation I can
find points to bluestore being superior in all cases.

Thanks,
Travis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW flush_read_list error

2017-10-11 Thread Travis Nielsen

To the client they were showing up as a 500 error. Ty, do you know of any
client-side issues that could have come up during the test run? And there
was only a single GET happening at a time, right?




On 10/11/17, 9:27 AM, "ceph-users on behalf of Casey Bodley"
 wrote:

>Hi Travis,
>
>This is reporting an error when sending data back to the client.
>Generally it means that the client timed out and closed the connection.
>Are you also seeing failures on the client side?
>
>Casey
>
>
>On 10/10/2017 06:45 PM, Travis Nielsen wrote:
>> In Luminous 12.2.1, when running a GET on a large (1GB file) repeatedly
>> for an hour from RGW, the following error was hit intermittently a
>>number
>> of times. The first error was hit after 45 minutes and then the error
>> happened frequently for the remainder of the test.
>>
>> ERROR: flush_read_list(): d->client_cb->handle_data() returned -5
>>
>> Here is some more context from the rgw log around one of the failures.
>>
>> 2017-10-10 18:20:32.321681 I | rgw: 2017-10-10 18:20:32.321643
>> 7f8929f41700 1 civetweb: 0x55bd25899000: 10.32.0.1 - -
>> [10/Oct/2017:18:19:07 +] "GET /bucket100/testfile.tst HTTP/1.1" 1 0
>>-
>> aws-sdk-java/1.9.0 Linux/4.4.0-93-generic
>> OpenJDK_64-Bit_Server_VM/25.131-b11/1.8.0_131
>> 2017-10-10 18:20:32.383855 I | rgw: 2017-10-10 18:20:32.383786
>> 7f8924736700 1 == starting new request req=0x7f892472f140 =
>> 2017-10-10 18:20:46.605668 I | rgw: 2017-10-10 18:20:46.605576
>> 7f894af83700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
>> returned -5
>> 2017-10-10 18:20:46.605934 I | rgw: 2017-10-10 18:20:46.605914
>> 7f894af83700 1 == req done req=0x7f894af7c140 op status=-5
>> http_status=200 ==
>> 2017-10-10 18:20:46.606249 I | rgw: 2017-10-10 18:20:46.606225
>> 7f8924736700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
>> returned -5
>>
>> I don't see anything else standing out in the log. The object store was
>> configured with an erasure-coded data pool with k=2 and m=1.
>>
>> There are a number of threads around this, but I don't see a resolution.
>> Is there a tracking issue for this?
>> 
>>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.cep
>>h.com%2Fpipermail%2Fceph-users-ceph.com%2F2016-February%2F007756.ht&data=
>>02%7C01%7CTravis.Nielsen%40quantum.com%7C5ba068e75938455da6a408d510c50ddd
>>%7C322a135f14fb4d72aede122272134ae0%7C1%7C0%7C636433360880521631&sdata=Wb
>>EdpMEB%2BvjZS%2BxclppC3%2BHALu6iayzwjTQeFK3qMp8%3D&reserved=0
>> ml
>> 
>>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spin
>>ics.net%2Flists%2Fceph-users%2Fmsg16117.html&data=02%7C01%7CTravis.Nielse
>>n%40quantum.com%7C5ba068e75938455da6a408d510c50ddd%7C322a135f14fb4d72aede
>>122272134ae0%7C1%7C0%7C636433360880521631&sdata=5PSDwmEnZB7g9atCeRlZvTPUX
>>wHB3c1bFjiDt7VfkKI%3D&reserved=0
>> 
>>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spin
>>ics.net%2Flists%2Fceph-devel%2Fmsg37657.html&data=02%7C01%7CTravis.Nielse
>>n%40quantum.com%7C5ba068e75938455da6a408d510c50ddd%7C322a135f14fb4d72aede
>>122272134ae0%7C1%7C0%7C636433360880521631&sdata=4oEXugLrXjmIPRnz4LAiavbMF
>>kgUnEj5jBw%2F%2Bk9BYJE%3D&reserved=0
>>
>>
>> Here's our tracking Rook issue.
>> 
>>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.c
>>om%2Frook%2Frook%2Fissues%2F1067&data=02%7C01%7CTravis.Nielsen%40quantum.
>>com%7C5ba068e75938455da6a408d510c50ddd%7C322a135f14fb4d72aede122272134ae0
>>%7C1%7C0%7C636433360880521631&sdata=pYi3%2FZupNoy7Act1bQomEee6seO%2BKDt%2
>>BzkgzmcYeJV4%3D&reserved=0
>>
>>
>> Thanks,
>> Travis
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW flush_read_list error

2017-10-10 Thread Travis Nielsen

In Luminous 12.2.1, when running a GET on a large (1GB file) repeatedly
for an hour from RGW, the following error was hit intermittently a number
of times. The first error was hit after 45 minutes and then the error
happened frequently for the remainder of the test.

ERROR: flush_read_list(): d->client_cb->handle_data() returned -5

Here is some more context from the rgw log around one of the failures.

2017-10-10 18:20:32.321681 I | rgw: 2017-10-10 18:20:32.321643
7f8929f41700 1 civetweb: 0x55bd25899000: 10.32.0.1 - -
[10/Oct/2017:18:19:07 +] "GET /bucket100/testfile.tst HTTP/1.1" 1 0 -
aws-sdk-java/1.9.0 Linux/4.4.0-93-generic
OpenJDK_64-Bit_Server_VM/25.131-b11/1.8.0_131
2017-10-10 18:20:32.383855 I | rgw: 2017-10-10 18:20:32.383786
7f8924736700 1 == starting new request req=0x7f892472f140 =
2017-10-10 18:20:46.605668 I | rgw: 2017-10-10 18:20:46.605576
7f894af83700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5
2017-10-10 18:20:46.605934 I | rgw: 2017-10-10 18:20:46.605914
7f894af83700 1 == req done req=0x7f894af7c140 op status=-5
http_status=200 ==
2017-10-10 18:20:46.606249 I | rgw: 2017-10-10 18:20:46.606225
7f8924736700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5

I don't see anything else standing out in the log. The object store was
configured with an erasure-coded data pool with k=2 and m=1.

There are a number of threads around this, but I don't see a resolution.
Is there a tracking issue for this?
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007756.ht
ml
https://www.spinics.net/lists/ceph-users/msg16117.html
https://www.spinics.net/lists/ceph-devel/msg37657.html

Here's our tracking Rook issue.
https://github.com/rook/rook/issues/1067

Thanks,
Travis

On 10/10/17, 3:05 PM, "ceph-users on behalf of Jack"
 wrote:

>Hi,
>
>I would like some information about the following
>
>Let say I have a running cluster, with 4 OSDs: 2 SSDs, and 2 HDDs
>My single pool has size=3, min_size=2
>
>For a write-only pattern, I thought I would get SSDs performance level,
>because the write would be acked as soon as min_size OSDs acked
>
>But I am right ?
>
>(the same setup could involve some high latency OSDs, in the case of
>country-level cluster)
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph
>.com%2Flistinfo.cgi%2Fceph-users-ceph.com&data=02%7C01%7CTravis.Nielsen%40
>quantum.com%7C16f668da252f4e6f355308d5102b09c1%7C322a135f14fb4d72aede12227
>2134ae0%7C1%7C0%7C636432699404298770&sdata=tmIMMyQ7ia%2FVmHrSGcF9t4sMpt2bj
>dexriEhEg3XUGU%3D&reserved=0

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Mon time to form quorum

2017-08-08 Thread Travis Nielsen

At cluster creation I'm seeing that the mons are taking a while time to form 
quorum. It seems like I'm hitting a timeout of 60s somewhere. Am I missing a 
config setting that would help paxos establish quorum sooner? When initializing 
with the monmap I would have expected the mons to initialize very quickly.

The scenario is:

  *   Luminous RC 2
  *   The mons are initialized with a monmap
  *   Running in Kubernetes (Rook)

The symptoms are:

  *   When all three mons start in parallel, they appear to determine their 
rank immediately. I assume this means they establish communication. A log 
message is seen such as this in each of the mon logs:
 *   2017-08-08 17:03:16.383599 7f8da7c85f40  0 
mon.rook-ceph-mon1@-1(probing) e0  my rank is now 0 (was –1)
  *   Now paxos enters a loop that times out every two seconds and lasts about 
60s, trying to probe the other monitors. During this wait, I am able to curl 
the mon endpoints successfully.
 *   2017-08-08 17:03:17.345877 7f02b779af40 10 
mon.rook-ceph-mon0@1(probing) e0 probing other monitors
 *   2017-08-08 17:03:19.346032 7f02ae568700  4 
mon.rook-ceph-mon0@1(probing) e0 probe_timeout 0x55c93678bb00
  *   After about 60 seconds the probe succeeds and the mons start responding
 *   2017-08-08 17:04:17.356928 7f02ae568700 10 
mon.rook-ceph-mon0@1(probing) e0 probing other monitors
 *   2017-08-08 17:04:17.366587 7f02a855c700 10 
mon.rook-ceph-mon0@1(probing) e0 ms_verify_authorizer 10.0.0.254:6790/0 mon 
protocol 2

The relevant settings in the config are:
mon initial members  = rook-ceph-mon0 rook-ceph-mon1 rook-ceph-mon2
mon host  = 10.0.0.24:6790,10.0.0.163:6790,10.0.0.139:6790
public addr   = 10.0.0.24
cluster addr  = 172.17.0.5

The full log for this mon at debug log level 20 can be found here:
https://gist.github.com/travisn/2c2641a6b80a7479b3b22accb41a5193

Any ideas?

Thanks,
Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rook Deployments

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

Re: [ceph-users] RGW flush_read_list error

[ceph-users] RGW flush_read_list error

[ceph-users] Mon time to form quorum

5 matches

Site Navigation

Mail list logo

Footer information