nt err = get_version(summary.version+1, bl);
assert(err == 0);
assert(bl.length());
Has anyone seen similar or have any ideas?
ceph 13.2.8
Thanks!
Kevin
The first crash/restart
Jan 14 20:47:11 sephmon5 ceph-mon:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_A
t.com/en/conference-numbers&sa=D&ust=1579363980705000&usg=AOvVaw2aHSwR3wGU0yTs-bCsUFoC>
2.) Enter Meeting ID: 908675367
3.) Press #
Want to test your video connection?
https://bluejeans.com/111<https://www.google.com/url?q=https://bluejeans.com/111&sa=D&ust=1579363980705000&
ting ID: 908675367
3.) Press #
Want to test your video connection?
https://bluejeans.com/111<https://www.google.com/url?q=https://bluejeans.com/111&sa=D&ust=1572095869727000&usg=AOvVaw1bRfUtekflHoeS36FKwXw2>
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineerin
OK, looks like clock skew is the problem. I thought this is caused by the
reboot but it did not fix itself after some minutes (mon3 was 6 seconds
ahead).
After forcing time sync from the same server, it seems to be solved now.
Kevin
Am Fr., 20. Sept. 2019 um 07:33 Uhr schrieb Kevin Olbrich
"time": "2019-09-20 05:31:52.315083",
"event": "psvc:dispatch"
},
{
"time": "2019-09-20 05:31:52.315161",
"event": "auth:wait_for_readable"
},
{
"time": "2019-09-20 05:31:52.315167",
"event": "auth:wait_for_readable/paxos"
},
{
"time": "2019-09-20 05:31:52.315230",
"event": "paxos:wait_for_readable"
}
],
"info": {
"seq": 1709,
"src_is_mon": false,
"source": "client.?
[fd91:462b:4243:47e::1:3]:0/997594187",
"forwarded_to_leader": false
}
}
}
This is a new situation for me. What am I supposed to do in this case?
Thank you!
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
t;
2.) Enter Meeting ID: 908675367 3.) Press #
Want to test your video connection?
https://bluejeans.com/111<https://www.google.com/url?q=https://bluejeans.com/111&sa=D&ust=1567360416884000&usg=AOvVaw0Euz9flNV7X85AWSYNZ2R->
Kevin
On 8/2/19 12:08 PM, Mike Perez wrote:
We have s
ile to get to a weight of 11 if
I did anything smaller.
for i in {264..311}; do ceph osd crush reweight osd.${i} 11.0;sleep 4;done
Kevin
On 7/24/19 12:33 PM, Xavier Trilla wrote:
Hi Kevin,
Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by
one. What do you change, the c
ach for peering.
Let the cluster balance and get healthy or close to healthy.
Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am
at the desired weight.
Kevin
On 7/24/19 11:44 AM, Xavier Trilla wrote:
Hi,
What would be the proper way to add 100 new OSDs to a cluster?
Update
We're going to hold off until August for this so we can promote it on the Ceph
twitter with more notice. Sorry for the inconvenience if you were planning on
the meeting tomorrow. Keep a watch on the list, twitter, or ceph calendar for
updates.
Kevin
On 7/5/19 11:15 PM, Kevin H
olunteer a
topic for meetings. I will be brainstorming some conversation starters but it
would also be interesting to have people give a deep dive into their use of
ceph and what they have built around it to support the science being done at
their facility.
Kevin
On 6/17/19 10:43 AM, Ke
ll be impossible to pick a time
that works well for everyone but initially we considered something later in the
work day for EU countries.
Reply to me if you're interested and please include your timezone.
Kevin
___
ceph-users mailing list
Am Di., 28. Mai 2019 um 10:20 Uhr schrieb Wido den Hollander :
>
>
> On 5/28/19 10:04 AM, Kevin Olbrich wrote:
> > Hi Wido,
> >
> > thanks for your reply!
> >
> > For CentOS 7, this means I can switch over to the "rpm-nautilus/el7"
> > repos
Hi Wido,
thanks for your reply!
For CentOS 7, this means I can switch over to the "rpm-nautilus/el7"
repository and Qemu uses a nautilus compatible client?
I just want to make sure, I understand correctly.
Thank you very much!
Kevin
Am Di., 28. Mai 2019 um 09:46 Uhr schrieb Wido den
much!
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
evice, please excuse any typos.
On Fri, May 24, 2019, 4:42 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote:
Hi,
we already tried "rados -p ec31 getxattr 10004dfce92.003d
parent" but this is just hanging forever if we are looking for
unfound objects. It wo
and found nothing. This is also working for non
unfound objects.
Is there another way to find the corresponding file?
On 24.05.19 11:12 vorm., Burkhard Linke wrote:
Hi,
On 5/24/19 9:48 AM, Kevin Flöh wrote:
We got the object ids of the missing objects with|ceph pg 1.24c
li
up those objects with:|
ceph pg 1.24c mark_unfound_lost revert But first we would like to know which file(s) is affected. Is
there a way to map the object id to the corresponding file?
||
On 23.05.19 3:52 nachm., Alexandre Marangone wrote:
The PGs will stay active+recovery_wait+degraded until
anything else happens, you should stop and let us know.
-- dan
On Thu, May 23, 2019 at 10:59 AM Kevin Flöh wrote:
This is the current status of ceph:
cluster:
id: 23e72372-0d44-4cad-b24f-3641b14b86f4
health: HEALTH_ERR
9/125481144 objects unfound (0.000
g/backfilling another PG.
On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote:
Hi,
we have set the PGs to recover and now they are stuck in
active+recovery_wait+degraded and instructing them to deep-scrub does not
change anything. Hence, the rados report is empty. Is there a way to stop the
rec
recovery_wait might be caused by missing objects. Do we need
to delete them first to get the recovery going?
Kevin
On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote:
Hi,
thank you, it worked. The PGs are not i
to repair?
Regards,
Kevin
On 21.05.19 4:52 nachm., Wido den Hollander wrote:
On 5/21/19 4:48 PM, Kevin Flöh wrote:
Hi,
we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?
You need to start with markin
Hi,
we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?
regards,
Kevin
On 20.05.19 9:22 vorm., Kevin Flöh wrote:
Hi Frederic,
we do not have access to the original OSDs. We exported the remaining
then.
Best,
Kevin
On 17.05.19 2:36 nachm., Frédéric Nass wrote:
Le 14/05/2019 à 10:04, Kevin Flöh a écrit :
On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those
incomplete PGs?
It would be useful to double confirm that: check with
-id} mark_unfound_lost revert|delete
Cheers,
Kevin
On 15.05.19 8:55 vorm., Kevin Flöh wrote:
The hdds of OSDs 4 and 23 are completely lost, we cannot access them
in any way. Is it possible to use the shards which are maybe stored on
working OSDs as shown in the all_participants list?
On 14.05.19
ceph osd pool get ec31 min_size
min_size: 3
On 15.05.19 9:09 vorm., Konstantin Shalygin wrote:
ceph osd pool get ec31 min_size
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in
any way. Is it possible to use the shards which are maybe stored on
working OSDs as shown in the all_participants list?
On 14.05.19 5:24 nachm., Dan van der Ster wrote:
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote
Hi,
since we have 3+1 ec I didn't try before. But when I run the command you
suggested I get the following error:
ceph osd pool set ec31 min_size 2
Error EINVAL: pool min_size must be between 3 and 4
On 14.05.19 6:18 nachm., Konstantin Shalygin wrote:
peering does not seem to be blocked
"4(1),23(2),24(0)"
}
]
}
],
"probing_osds": [
"2(0)",
"4(1)",
"23(2)",
"24(0)",
On 14.05.19 10:08 vorm., Dan van der Ster wrote:
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
On 13.05.19 10:51 nachm., Lionel Bouton wrote:
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
Dear ceph experts,
[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here
ut fixing the old one and
copy whatever is left.
Best regards,
Kevin
On Mon, May 13, 2019 at 4:20 PM Kevin Flöh wrote:
Dear ceph experts,
we have several (maybe related) problems with our ceph cluster, let me
first show you the current ceph status:
cluster:
id: 23e72372-0d
On 13.05.19 10:51 nachm., Lionel Bouton wrote:
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
Dear ceph experts,
[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost
t;: "true"
for the affected osds, which had no effect. Furthermore, the cluster is
behind on trimming by more than 40,000 segments and we have folders and
files which cannot be deleted or moved. (which are not on the 2
incomplete pgs). Is there any way to solve these problems?
Best regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Are you sure that firewalld is stopped and disabled?
Looks exactly like that when I missed one host in a test cluster.
Kevin
Am Di., 12. März 2019 um 09:31 Uhr schrieb Zhenshi Zhou :
> Hi,
>
> I deployed a ceph cluster with good performance. But the logs
> indicate that the clust
3660 1.0 447GiB 142GiB 305GiB 31.84 0.70 43
40 ssd 0.87329 1.0 894GiB 407GiB 487GiB 45.53 1.00 98
41 ssd 0.87329 1.0 894GiB 353GiB 541GiB 39.51 0.87 102
TOTAL 29.9TiB 13.7TiB 16.3TiB 45.66
MIN/MAX VAR: 0.63/1.72 STDDEV: 13.59
Kevin
Am So., 6. Jan.
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke
:
>
> Hi,
>
> I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.
>
> I grow that rbd and ext4 starting with an 2TB rbd that way:
>
> rbd resize testpool/disk01--size 4194304
>
> resize2fs /dev/rbd0
>
> Today I wanted to ext
On 1/18/19 7:26 AM, Igor Fedotov wrote:
Hi Kevin,
On 1/17/2019 10:50 PM, KEVIN MICHAEL HRPCEK wrote:
Hey,
I recall reading about this somewhere but I can't find it in the docs or list
archive and confirmation from a dev or someone who knows for sure would be
nice. What I recall is
s://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L12331
if (offset + length >= OBJECT_MAX_SIZE) {
r = -E2BIG;
} else {
_assign_nid(txc, o);
r = _do_write(txc, c, o, offset, length, bl, fadvise_flags);
txc->write_o
seconds.
Kevin
Am Do., 17. Jan. 2019 um 11:57 Uhr schrieb Johan Thomsen :
>
> Hi,
>
> I have a sad ceph cluster.
> All my osds complain about failed reply on heartbeat, like so:
>
> osd.10 635 heartbeat_check: no reply from 192.168.160.237:6810 osd.42
> ever on either front
It would but you should not:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html
Kevin
Am Di., 8. Jan. 2019 um 15:35 Uhr schrieb Rodrigo Embeita
:
>
> Thanks again Kevin.
> If I reduce the size flag to a value of 2, that should fix the problem?
>
> Reg
You use replication 3 failure-domain host.
OSD 2 and 4 are full, thats why your pool is also full.
You need to add two disks to pf-us1-dfs3 or swap one from the larger
nodes to this one.
Kevin
Am Di., 8. Jan. 2019 um 15:20 Uhr schrieb Rodrigo Embeita
:
>
> Hi Yoann, thanks for your re
Looks like the same problem like mine:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032054.html
The free space is total while Ceph uses the smallest free space (worst OSD).
Please check your (re-)weights.
Kevin
Am Di., 8. Jan. 2019 um 14:32 Uhr schrieb Rodrigo Embeita
If I understand the balancer correct, it balances PGs not data.
This worked perfectly fine in your case.
I prefer a PG count of ~100 per OSD, you are at 30. Maybe it would
help to bump the PGs.
Kevin
Am Sa., 5. Jan. 2019 um 14:39 Uhr schrieb Marc Roos :
>
>
> I have straw2, balancer=
48.94 3.92TiB 992255
rbd_vms_ssd_014 372KiB 0662GiB 148
rbd_vms_ssd_01_ec 6 2.85TiB 68.81 1.29TiB 770506
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
Kevin
Am Sa., 5. Jan. 2019 um 0
sure code pools
where too many disks failed at the same time, you will then see
negative values as OSD IDs.
Maybe this helps a little bit.
Kevin
Am Sa., 5. Jan. 2019 um 00:20 Uhr schrieb Arun POONIA
:
>
> Hi Kevin,
>
> I tried deleting newly added server from Ceph Cluster and looks li
.5 mark_unfound_lost revert|delete
Src: http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/
Kevin
Am Fr., 4. Jan. 2019 um 20:47 Uhr schrieb Arun POONIA
:
>
> Hi Kevin,
>
> Can I remove newly added server from Cluster and see if it heals cluster ?
>
> Whe
start a new one and bring back
the backups (using a better PG count).
Kevin
Am Fr., 4. Jan. 2019 um 20:25 Uhr schrieb Arun POONIA
:
>
> Can anyone comment on this issue please, I can't seem to bring my cluster
> healthy.
>
> On Fri, Jan 4, 2019 at 6:26 AM Arun POONIA
>
PS: Could be http://tracker.ceph.com/issues/36361
There is one HDD OSD that is out (which will not be replaced because
the SSD pool will get the images and the hdd pool will be deleted).
Kevin
Am Fr., 4. Jan. 2019 um 19:46 Uhr schrieb Kevin Olbrich :
>
> Hi!
>
> I did what you wrote
ld)
-1/-1 (stderr threshold)
max_recent 1
max_new 1000
log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
--- end dump of recent events ---
Kevin
Am Mi., 2. Jan. 2019 um 17:35 Uhr schrieb Konstantin Shalygin :
>
> On a medium sized cluster with device-classes, I am ex
of VMs with BBR but the hypervisors run fq_codel +
cubic (OSDs run Ubuntu defaults).
Did someone test qdisc and congestion control settings?
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users
to freeze (because the smallest OSD is taken into account
for free space calculation).
This would be the worst case as over 100 VMs would freeze, causing lot
of trouble. This is also the reason I did not try to enable the
balancer again.
Kind regards
Ke
keep this in mind as this is still better than shutting down
the whole VM.
@all
Thank you very much for your inputs. I will try some less important
VMs and then start migration of the big one.
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@l
which causes de-deplication on RAM and this host runs about 10 Windows
VMs.
During reboots or updates, RAM can get full again.
Maybe I am to cautious about live-storage-migration, maybe I am not.
What are your experiences or advices?
Thank you very much!
Kin
I now had the time to test and after installing this package, uploads to
rbd are working perfectly.
Thank you very much fur sharing this!
Kevin
Am Mi., 7. Nov. 2018 um 15:36 Uhr schrieb Kevin Olbrich :
> Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard <
> nhuill...@do
I read the whole thread and it looks like the write cache should always be
disabled as in the worst case, the performance is the same(?).
This is based on this discussion.
I will test some WD4002FYYZ which don't mention "media cache".
Kevin
Am Di., 13. Nov. 2018 um 09:27 Uhr
scheduler set to noop as it is
optimized to consume whole, non-shared devices.
Just my 2 cents ;-)
Kevin
Am Mo., 12. Nov. 2018 um 15:08 Uhr schrieb Dan van der Ster <
d...@vanderster.com>:
> We've done ZFS on RBD in a VM, exported via NFS, for a couple years.
> It's very stabl
uot;) mount.
I had such a setup with nfs and switched to mount CephFS directly. If using
NFS with the same data, you must make sure your HA works well to avoid data
corruption.
With ceph-fuse you directly connect to the cluster, one component less that
breaks.
Kevin
Am Mo., 12. Nov. 2018 um 12
l
mirrors as apt is unable to use older versions (which does work on yum/dnf).
Thats why we are implementing "mirror-sync" / rsync with a copy of the repo
and the desired packages until such solution is available.
Kevin
>> Simon
>> _
Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard <
nhuill...@dolomede.fr>:
>
> > It lists rbd but still fails with the exact same error.
>
> I stumbled upon the exact same error, and since there was no answer
> anywhere, I figured it was a very simple problem: don't forget to
> install t
I met the same problem. I had to create GPT table for each disk, create
first partition over full space and then fed these to ceph-volume (should
be similar for ceph-deploy).
Also I am not sure if you can combine fs-type btrfs with bluestore (afaik
this is for filestore).
Kevin
Am Di., 6. Nov
blkdebug blkreplay blkverify bochs cloop dmg file ftp
> ftps gluster host_cdrom host_device http https iscsi iser luks nbd nfs
> null-aio null-co parallels qcow qcow2 qed quorum raw rbd replication
> sheepdog ssh vdi vhdx vmdk vpc vvfat
It lists rbd but still fails with the exact same err
Is it possible to use qemu-img with rbd support on Debian Stretch?
I am on Luminous and try to connect my image-buildserver to load images
into a ceph pool.
root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
> rbd:rbd_vms_ssd_01/test_vm
> qemu-img: Unknown protocol 'r
Hi!
Is there an easy way to check when an image was last modified?
I want to make sure, that the images I want to clean up, were not used for
a long time.
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com
-ganesha as standalone VM.
Kevin
Am Di., 9. Okt. 2018 um 19:39 Uhr schrieb Erik McCormick <
emccorm...@cirrusseven.com>:
> On Tue, Oct 9, 2018 at 1:27 PM Erik McCormick
> wrote:
> >
> > Hello,
> >
> > I'm trying to set up an nfs-ganesha server with the Cep
Hi Jakub,
"ceph osd metadata X" this is perfect! This also lists multipath devices
which I was looking for!
Kevin
Am Mo., 8. Okt. 2018 um 21:16 Uhr schrieb Jakub Jaszewski <
jaszewski.ja...@gmail.com>:
> Hi Kevin,
> Have you tried ceph osd metadata OSDid ?
>
> Ja
Hi!
Yes, thank you. At least on one node this works, the other node just
freezes but this might by caused by a bad disk that I try to find.
Kevin
Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander :
> Hi,
>
> $ ceph-volume lvm list
>
> Does that work for you?
>
&
Hi!
Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id?
Before I migrated from filestore with simple-mode to bluestore with lvm, I
was able to find the raw disk with "df".
Now, I need to go from LVM LV to PV to disk every time I need to
check/smartctl a di
topped
2018-10-08 10:32:17.823 7f6af518e1c0 -1 osd.46 0 OSD:init: unable to mount
object store
2018-10-08 10:32:17.823 7f6af518e1c0 -1 ** ERROR: osd init failed: (5)
Input/output error
Anything interesting here?
I will try to export the down PGs from the disks. I got a bunch of new
disks to re
of three disks back. Object corruption would not be a problem
(regarding drop of a journal), as this cluster hosts backups which will
fail validation and regenerate. Just marking the OSD lost does not seem to
be an option.
Is there some sort of fsck for BlueFS?
Kevin
Igor Fedotov schrieb am Mi
Small addition: the failing disks are in the same host.
This is a two-host, failure-domain OSD cluster.
Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich :
> Hi!
>
> Yesterday one of our (non-priority) clusters failed when 3 OSDs went down
> (EC 8+2) together.
> *This is st
I have 8 PGs down, the remeining are active and recovery / rebalance.*
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
t priority. It seems like you may also benefit from
setting mon_osd_cache_size to a very large number if you have enough memory on
your mon servers.
I'll hop on the irc today.
Kevin
On 09/25/2018 05:53 PM, by morphin wrote:
After I tried too many things with so many helps on IRC. My pool
hea
the ipmi resets that needed to be dealt with to get all PGs active+clean,
and the cephx change was rolled back to operate normally.
Sage, thanks again for your assistance with this.
Kevin
tl;dr Cache as much as you can.
On 09/24/2018 09:24 AM, Sage Weil wrote:
Hi Kevin,
Do you have an u
there a better way?
Kevin
Am So., 23. Sep. 2018 um 18:08 Uhr schrieb Paul Emmerich
:
>
> The usual trick for clients not supporting this natively is the option
> "rbd_default_data_pool" in ceph.conf which should also work here.
>
>
> Paul
> Am So., 23. Sep. 2018 um
would now take at least twice
the time).
Do I miss a parameter for qemu-kvm?
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ect1st > > >, std::less' to see where all of the encoding
activity is coming from? I see two possibilities (the mon attempts to
cache encoded maps, and the MOSDMap message itself will also reencode
if/when that fails).
Also: mon_osd_cache_size = 10 by default... try making that
oves this issue until the cluster is stable again.
Kevin
On 09/20/2018 08:13 AM, David Turner wrote:
Out of curiosity, what disks do you have your mons on and how does the disk
usage, both utilization% and full%, look while this is going on?
On Wed, Sep 19, 2018, 1:57 PM Kevin Hrpcek
m
Thank you very much Paul.
Kevin
Am Do., 20. Sep. 2018 um 15:19 Uhr schrieb Paul Emmerich <
paul.emmer...@croit.io>:
> Hi,
>
> device classes are internally represented as completely independent
> trees/roots; showing them in one tree is just syntactic sugar.
>
> Fo
To answer my own question:
ceph osd crush tree --show-shadow
Sorry for the noise...
Am Do., 20. Sep. 2018 um 14:54 Uhr schrieb Kevin Olbrich :
> Hi!
>
> Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host.
> I also have replication rules to distinguish between
device-class based rule)?
Will the crush weight be calculated from the OSDs up to the failure-domain
based on the crush rule?
The only crush-weights I know and see are those shown by "ceph osd tree".
Kind regards
Kevin
___
ceph-users mailing
,osd all report luminous features yet have 13.2.1 installed,
so maybe that is normal.
Kevin
On 09/19/2018 09:35 AM, Sage Weil wrote:
It's hard to tell exactly from the below, but it looks to me like there is
still a lot of OSDMap reencoding going on. Take a look at 'ceph features
se(429) = 0
munmap(0x7f2ea8c97000, 2468005) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) = 429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst",
{st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0
65..133383355) lease_timeout -- calling new election
Thanks
Kevin
On 09/10/2018 07:06 AM, Sage Weil wrote:
I took a look at the mon log you sent. A few things I noticed:
- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and
Hi!
is the compressible hint / incompressible hint supported on qemu+kvm?
http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
If not, only aggressive would work in this case for rbd, right?
Kind regards
Kevin
___
ceph-users
ect_msg: got bad authorizer
2018-09-10 03:30:17.324 7ff0ab678700 -1 osd.84 992286 heartbeat_check:
no reply from 10.1.9.28:6843 osd.578 since back 2018-09-10
03:15:35.358240 front 2018-09-10 03:15:47.879015 (cutoff 2018-09-10
03:29:17.326329)
Kevin
On 09/10/2018 07:06 AM, Sage Weil wrote:
I
this problem in 2.6.3:
https://github.com/nfs-ganesha/nfs-ganesha/issues/339
Can the build in the repos be compiled against this bugfix release?
Thank you very much.
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
seems like the mix of luminous and
mimic did not play well together for some reason. Maybe it has to do
with the scale of my cluster, 871 osd, or maybe I've missed some some
tuning as my cluster has scaled to this size.
Kevin
On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:
Nothing too crazy
back up in mimic. Depending on how bad things are, setting pause on
the cluster to just finish the upgrade faster might not be a bad idea
either.
This should be a simple question, have you confirmed that there are no
networking problems between the MONs while the elections are happening?
O
OSDs are trying to fail each other. I'll put in
the rocksdb_cache_size setting.
Thanks for taking a look.
Kevin
On 09/08/2018 06:04 PM, Sage Weil wrote:
Hi Kevin,
I can't think of any major luminous->mimic changes off the top of my head
that would impact CPU usage, but it's a
t that stable and the PGs were 90% good with
the finish line in sight and then the mons started their issue of
releecting every minute. Now I can't keep any decent amount of PGs up
for more than a few hours. This started on Wednesday.
Any help would be greatly appreciated.
Thanks
-usage
I would like to re-use these cards for high-end (max IO) for database VMs.
Some notes or feedback about the setup (ceph-volume etc.) would be
appreciated.
Thank you.
Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
I did not test with ceph yet.
Is anyone using CephFS + bluestore + ec 3/2 + without WAL/DB-dev and is it
working well?
Thank you.
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
re not planning any upgrade from 12.2.5 atm. Please correct me, if I am
wrong.
Kevin
> Quote:
> The v12.2.5 release has a potential data corruption issue with erasure
> coded pools. If you ran v12.2.5 with erasure coding, please see below.
>
> See: https://ceph.com/releases/12-
Hi,
on upgrade from 12.2.4 to 12.2.5 the balancer module broke (mgr crashes
minutes after service started).
Only solution was to disable the balancer (service is running fine since).
Is this fixed in 12.2.7?
I was unable to locate the bug in bugtracker.
Kevin
2018-07-17 18:28 GMT+02:00
PS: It's luminous 12.2.5!
Mit freundlichen Grüßen / best regards,
Kevin Olbrich.
2018-07-14 15:19 GMT+02:00 Kevin Olbrich :
> Hi,
>
> why do I see activating followed by peering during OSD add (refill)?
> I did not change pg(p)_num.
>
> Is this normal? From my other
Hi,
why do I see activating followed by peering during OSD add (refill)?
I did not change pg(p)_num.
Is this normal? From my other clusters, I don't think that happend...
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
You can keep the same layout as before. Most place DB/WAL combined in one
partition (similar to the journal on filestore).
Kevin
2018-07-13 12:37 GMT+02:00 Robert Stanford :
>
> I'm using filestore now, with 4 data devices per journal device.
>
> I'm confused by th
Sorry for the long posting but trying to cover everything
I woke up to find my cephfs filesystem down. This was in the logs
2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc
0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.:head
I had one standby MDS, but as far as
Sounds a little bit like the problem I had on OSDs:
[ceph-users] Blocked requests activating+remapped after extending pg(p)_num
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html>
*Kevin
Olbrich*
- [ceph-users] Blocked requests activating+remapped
afterextendi
2018-07-10 14:37 GMT+02:00 Jason Dillaman :
> On Tue, Jul 10, 2018 at 2:37 AM Kevin Olbrich wrote:
>
>> 2018-07-10 0:35 GMT+02:00 Jason Dillaman :
>>
>>> Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least
>>> present on the clien
ink local when there is an ULA-prefix available.
The address is available on brX on this client node.
- Kevin
> On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich wrote:
>
>> 2018-07-09 21:25 GMT+02:00 Jason Dillaman :
>>
>>> BTW -- are you running Ceph on a one-node computer
1 - 100 of 221 matches
Mail list logo